Syllabus for Pattern Recognition and Machine Learning

(COMP393, Special Topic in Computer Science)

 

 

Fall 2007

Dickinson College

Instructor: John MacCormick

 

 

Goals

 

  • Understand the theory behind several techniques for using computers to automatically extract, or "learn", structure from data
  • Gain practical experience in applying these machine learning techniques to real data
  • Have fun by (i) writing code to implement the techniques and (ii) conducting creative investigations into how to make them work well
  • Understand the importance of machine learning in state-of-the-art research by reading published research papers

Teaching methods

 

  • lectures for understanding theoretical material
  • self-directed tutorial sessions for working on implementations
  • class discussions of relevant research papers
  • some informal student presentations of the interesting results that we will find

When and where

 

  • Classes: Monday and Thursday 3pm-4:15pm, Tome 231
  • Office hours: Thursday 9-11am, Friday 2-3pm

Books

 

  • Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, 2005.
  • Ethem Alpaydin. Introduction to Machine Learning. MIT press, 2004.

 

Assessment and grading

 

  • Final grade will comprise:
    • homework assignments 50%
    • final project 35%
    • quizzes 10%
    • in-class presentations and participation 5%
  • Final grades may be altered to fit a curve if necessary
  • No final exam; instead we will schedule student presentations of the final project in the exam slot (3pm Saturday, December 15)

 

Homework assignments

 

  • There will be 5 homework assignments due at the start of class on the following dates:

 

Assignment 1

Monday, September 3

Assignment 2

Thursday, September 13

Assignment 3

Thursday, September 27

Assignment 4

Thursday, October 18

Assignment 5

Thursday, November 15

 

  • The assignments will generally consist of a computer program to be written implementing some recently-covered material, and a reasonably brief (1-4 pages) report describing the output of the program on some data sets. 
  • Every homework assignment will be marked out of 100, broken down as follows:
    • 50 for the code (is it correct and well-written?),
    • 40 for the written report (are the results summarized clearly and concisely, and any conclusions drawn correctly?),
    • 10 for originality (did you do anything over and above what the assignment requested, or discover anything interesting or unusual in your investigations?). 
    • If a homework assignment does not have a coding component, the breakdown will be 90 for the written report and 10 for originality. 
  • Assignments will not be given equal weight in the final grade -- the weighting of each assignment will be announced with the content of the assignment.
  • The code and report should be submitted as a single file (e.g., a zip file) via Blackboard.  I may also request that you bring printouts of the report to class, and/or demonstrate your code in class, but the Blackboard submission will be the official one for the purposes of grading and determining lateness.
  • Each student is permitted a total of two no-penalty days of lateness over the entire semester; every subsequent day of lateness incurs a 25% penalty for the late assignment.  “Days of lateness” are computed by rounding up to the nearest whole number of 24-hour periods, so an assignment submitted one minute after the due time is regarded as being one day late, and an assignment submitted 27 hours after the due time is regarded as being two days late.
  • Some assignments call for "stand-alone programs"; these can be done in any programming language of the student's choice.  Other assignments will require students to use, modify, or augment the Weka machine learning toolkit using Java.

Final project

 

  • Each student will complete a self-directed project applying machine learning to some real data.  Students can choose their own topic or work on a topic suggested by me. 
  • Projects can optionally be done in teams; the size of the team will be taken into account when grading (for example, a team of two is expected to produce more high-quality output than a student working alone).  All members of a team will be awarded the same grade.
  • We'll begin working on these projects in early November, and devote the last three weeks of classes to them. 
  • The final output of the project will be a computer program implementing the student's proposal, a report (probably 10-15 pages, but this might change) describing the findings, and a presentation of these findings. 
  • The project will be marked out of 100, broken down as follows:
    • 40 for the code,
    • 40 for the written report, and
    • 20 for the presentation.

 

Quizzes

 

  • There will be 4 short, easy quizzes at the start of class on the following dates:

 

Quiz 1

Thursday, September 6

Quiz 2

Thursday, September 20

Quiz 3

Thursday, October 11

Quiz 4

Thursday, November 1

 

  • Each quiz will be 15-20 minutes long.
  • The quizzes will be very easy because you will be given the questions (with a few of the actual numbers changed) in advance. 
  • The whole point of the quizzes is to give me feedback on whether you understand the essential material without forcing you to memorize a huge amount of information.

Amount of work

 

College policy recommends approximately 3 hours of independent work for every hour of class time.  Our class meets for 2.5 hours per week.  Therefore, you should expect to spend 7-8 hours per week (outside of class time) on reading, homework assignments and projects.

Plagiarism, copying, and collaborating

 

The College's standard policy on plagiarism applies and you should be familiar with it, but there are some special considerations for this course because of the large amount of independent coding you will be doing.  Therefore, in addition to the standard college policy:

  • some assignments or parts of assignments will state clearly that every piece of code you write must be your own -- you may not copy and paste even one line or one word from existing code (except your own code from earlier assignments in this course)
  • some assignments or parts of assignments will state that you may copy and paste code from existing sources (especially, say, example code from the Weka toolkit), but in all cases the comments in your code must clearly state which lines have been copied from where, which lines have been modified by you, and which lines have been written completely by you
  • written reports must, in all cases, be completely original work done by you and no one else
  • all assignments (in contrast to the final project) must be done individually. 
    • I strongly encourage students to help each other in understanding the course material and in practical matters such as debugging, but all design of algorithms, writing of code, and writing of reports must be done individually. 
    • A good way to achieve this is to discuss problems verbally with other students, but never give another student anything in written or electronic form (even rough working on scratch paper)
  • the above points also apply to the final project, except that collaboration within teams is obviously permitted; code may be copied from existing sources if it is attributed clearly, but the project should contain a substantial portion of code written by the student(s).

 

Guidelines for grading

 

Grading of code, reports, and presentations will not be performed using a rigorous numerical rubric, but significant weight will be attached to each of the following features and a piece of work must rate highly on each feature in order to achieve a high grade:

  • code: correctness, efficiency of algorithms, formatting, clarity (including comments)
  • written reports: clear scientific writing style (including correct spelling and grammar), appropriate presentation of data including tables and graphs where necessary, correctness of the conclusions, completeness of analysis
  • presentations: logical order of presentation, clear description of key points, effective use of visual media (simple, clear slides and/or use of whiteboard, with appropriate use of graphs), fluent presentation style

 

Accommodations

 

I will follow college policy on accommodations for students who need them.