Syllabus for Pattern Recognition and
Machine Learning
(COMP393, Special Topic in Computer
Science)
Fall 2007
Dickinson College
Instructor: John
MacCormick
Goals
- Understand
the theory behind several techniques for using computers to automatically
extract, or "learn", structure from data
- Gain
practical experience in applying these machine learning techniques to real
data
- Have
fun by (i) writing code to implement the techniques and (ii) conducting
creative investigations into how to make them work well
- Understand
the importance of machine learning in state-of-the-art research by reading
published research papers
Teaching methods
- lectures
for understanding theoretical material
- self-directed
tutorial sessions for working on implementations
- class
discussions of relevant research papers
- some informal
student presentations of the interesting results that we will find
When and where
- Classes:
Monday and Thursday 3pm-4:15pm, Tome 231
- Office
hours: Thursday 9-11am, Friday 2-3pm
Books
- Ian H. Witten and Eibe Frank. Data
Mining: Practical Machine Learning Tools and Techniques (Second Edition).
Morgan Kaufmann, 2005.
- Ethem
Alpaydin. Introduction to Machine Learning. MIT press, 2004.
Assessment and grading
- Final
grade will comprise:
- homework
assignments 50%
- final
project 35%
- quizzes
10%
- in-class
presentations and participation 5%
- Final
grades may be altered to fit a curve if necessary
- No
final exam; instead we will schedule student presentations of the final
project in the exam slot (3pm Saturday, December 15)
Homework assignments
- There
will be 5 homework assignments due at the start of class on the following
dates:
Assignment 1
|
Monday, September 3
|
Assignment 2
|
Thursday, September 13
|
Assignment 3
|
Thursday, September 27
|
Assignment 4
|
Thursday, October 18
|
Assignment 5
|
Thursday, November 15
|
- The
assignments will generally consist of a computer program to be written
implementing some recently-covered material, and a reasonably brief (1-4
pages) report describing the output of the program on some data sets.
- Every
homework assignment will be marked out of 100, broken down as follows:
- 50
for the code (is it correct and well-written?),
- 40
for the written report (are the results summarized clearly and concisely,
and any conclusions drawn correctly?),
- 10
for originality (did you do anything over and above what the assignment
requested, or discover anything interesting or unusual in your
investigations?).
- If a
homework assignment does not have a coding component, the breakdown will
be 90 for the written report and 10 for originality.
- Assignments
will not be given equal weight in the final grade -- the weighting of each
assignment will be announced with the content of the assignment.
- The
code and report should be submitted as a single file (e.g., a zip file)
via Blackboard. I may also request
that you bring printouts of the report to class, and/or demonstrate your
code in class, but the Blackboard submission will be the official one for
the purposes of grading and determining lateness.
- Each
student is permitted a total of two no-penalty days of lateness over the
entire semester; every subsequent day of lateness incurs a 25% penalty for
the late assignment. “Days of
lateness” are computed by rounding up to the nearest whole number of
24-hour periods, so an assignment submitted one minute after the due time
is regarded as being one day late, and an assignment submitted 27 hours
after the due time is regarded as being two days late.
- Some
assignments call for "stand-alone programs"; these can be done
in any programming language of the student's choice. Other assignments will require students
to use, modify, or augment the Weka machine learning toolkit using Java.
Final project
- Each
student will complete a self-directed project applying machine learning to
some real data. Students can choose
their own topic or work on a topic suggested by me.
- Projects
can optionally be done in teams; the size of the team will be taken into
account when grading (for example, a team of two is expected to produce
more high-quality output than a student working alone). All members of a team will be awarded
the same grade.
- We'll
begin working on these projects in early November, and devote the last
three weeks of classes to them.
- The
final output of the project will be a computer program implementing the
student's proposal, a report (probably 10-15 pages, but this might change)
describing the findings, and a presentation of these findings.
- The
project will be marked out of 100, broken down as follows:
- 40
for the code,
- 40
for the written report, and
- 20
for the presentation.
Quizzes
- There
will be 4 short, easy quizzes at the start of class on the following
dates:
Quiz 1
|
Thursday, September 6
|
Quiz 2
|
Thursday, September 20
|
Quiz 3
|
Thursday, October 11
|
Quiz 4
|
Thursday, November 1
|
- Each
quiz will be 15-20 minutes long.
- The
quizzes will be very easy because you will be given the questions (with a
few of the actual numbers changed) in advance.
- The
whole point of the quizzes is to give me feedback on whether you
understand the essential material without forcing you to memorize a huge
amount of information.
Amount of work
College policy recommends approximately 3 hours of
independent work for every hour of class time.
Our class meets for 2.5 hours per week.
Therefore, you should expect to spend 7-8 hours per week (outside of
class time) on reading, homework assignments and projects.
Plagiarism, copying, and collaborating
The College's standard policy on plagiarism applies and you
should be familiar with it, but there are some special considerations for this
course because of the large amount of independent coding you will be
doing. Therefore, in addition to the
standard college policy:
- some
assignments or parts of assignments will state clearly that every piece of
code you write must be your own -- you may not copy and paste even one
line or one word from existing code (except your own code from earlier
assignments in this course)
- some
assignments or parts of assignments will state that you may copy and paste
code from existing sources (especially, say, example code from the Weka
toolkit), but in all cases the comments in your code must clearly state
which lines have been copied from where, which lines have been modified by
you, and which lines have been written completely by you
- written
reports must, in all cases, be completely original work done by you and no
one else
- all
assignments (in contrast to the final project) must be done
individually.
- I
strongly encourage students to help each other in understanding the
course material and in practical matters such as debugging, but all
design of algorithms, writing of code, and writing of reports must be
done individually.
- A
good way to achieve this is to discuss problems verbally with other
students, but never give another student anything in written or
electronic form (even rough working on scratch paper)
- the
above points also apply to the final project, except that collaboration
within teams is obviously permitted; code may be copied from existing
sources if it is attributed clearly, but the project should contain a
substantial portion of code written by the student(s).
Guidelines for grading
Grading of code, reports, and presentations will not be
performed using a rigorous numerical rubric, but significant weight will be
attached to each of the following features and a piece of work must rate highly
on each feature in order to achieve a high grade:
- code:
correctness, efficiency of algorithms, formatting, clarity (including
comments)
- written
reports: clear scientific writing style (including correct spelling and
grammar), appropriate presentation of data including tables and graphs
where necessary, correctness of the conclusions, completeness of analysis
- presentations:
logical order of presentation, clear description of key points, effective
use of visual media (simple, clear slides and/or use of whiteboard, with
appropriate use of graphs), fluent presentation style
Accommodations
I will follow college policy on accommodations for students
who need them.