This is a dual undergraduate and graduate course. Undergraduates take the lecture only. Graduate students also do the Lab + Project combo in addition.

**Class Sessions **

*Lecture: *Thursday and Friday 15:45 - 17:00, West Hall 3

*Lab and Project:* to be agreed with participants

**Course topics.** Machine learning (ML) is all about algorithms which are fed with (large quantities of) real-world data, and which return a compressed "model" of the data. An example is the "world model" of a robot: the input data are sensor data streams, from which the robot learns a model of its environment -- needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English -- useful, for instance, in automated speech recognition systems. There exists a large number of formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms: most notably, the "curse of dimensionality" and the almost deadly problem of under- vs.\ overfitting. The **lecture** introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, mixtures of Gaussians, Parzen windows). Furthermore, the lecture also provides a refresher of the requisite concepts from probability theory, statistics, and linear algebra. The basic format of the **lab** (for graduate students) is two miniprojects, each taking 4-5 weeks. Students will get a challenging dataset and a modelling task (for instance: "learn a model to classify digits from blurry images"). The modelling task can be solved by the elementary methods provided in the lecture -- but only poorly. Students are expected to explore more advanced and powerful methods on their own initiative (helpful hints will be given). Since this is the first time that I offer this separate graduate lab, we may change its makeup as we go along if we find that useful. For instance, we might replace one miniproject with a serious probability theory crash course (super useful if you do *real* machine learning), or we might squeeze in seminar-style paper reading sessions. The **project** is a highly self-steered project comparable to a BSc thesis. Topics come from ongoing research in my group and are agreed on a case-by-case basis.

**Lecture notes.** I will start by using the existing ML lecture notes which I wrote for a graduate lecture (this year ML is taught as an undergraduate lecture for the first time). As the semester unfolds I will probably distil a new set of lecture notes. You can download the lecture notes -- as far as I have adapted them - here (latest update: Sept 25, 22:30). Section 3 of the lecture notes are here (latest update: Oct 1). Section 4 is here. Section 10 on hidden Markov models is here (new version from Nov 8: inserted new subsection 10.5 on the ML estimation principle). If you want to inspect the full-scope (graduate) lecture notes, you find them here.

**Grading and exams. **For the **lecture**, the course grade is computed from classroom participation (10%), homeworks (30%), midterm (25%) and final exam (35%). The **lab** grading is foreseen to be based on two miniproject reports (50% each) but if we change the lab makeup this will be adapted. The **project** grade is based on the project proposal (30%) and the final report (70%).

**Helpful materials:**

Slides of an Introduction to Dynamical Systems

Slides of a Machine Learning Course given at the "Interdisciplinary College" 2006

Slides of a Neural Network Course (23 MB) given at the "Interdisciplinary College" 2008

A condensed primer on measure theory and probability theory, by Manjunath Gandhi

An online textbook on probability theory (by Rick Durrett)

For exam preparation: Final exam 2003 and fragments of solutions; Midterm 2004 with selected solutions; Final exam 2004 with solutions

Hints for writing good miniproject reports, or rather, for avoiding typical blunders

**Schedule (for the lecture; will be filled with substance as the semester evolves)**

Sep 4 | Introduction; course planning. |

Sep 5 | Introducing the digits example. Curse of dimensionality. Exercise sheet 1. Download the basic digits example Matlab routines. Return date for exercise 1: Sunday Sept 14, midnight... (disregard the Friday deadline stated on the exercise sheet) |

Sep 11 | Regression, time series prediction, Takens theorem. |

Sep 12 | Blackbox vs. analytical modeling, bias-variance dilemma, general ML terminology and abstract formulation of modeling task. |

Sep 18 | (double class) Cross-validation and regularization. Bayes theorem and optimal decision boundaries. Exercise sheet 2 |

Sep 19 | no class |

Sep 25 | The concept of a random variable. Optimal decision boundaries 2. |

Sep 26 | Still optimal decision boundaries... Exercise sheet 3. Return date: Oct 9 (disregard return time stated on older version of sheet. Note that Exercise sheet 4 will be posted on Oct 2). |

Oct 2 | Linear discriminants and linear regression. Exercise sheet 4. New bonus scheme in new version of sheet 4 (updated Oct 14) The reference paper from Duin and Tax. |

Oct 9 | Generalized linear discriminants. Perceptrons. Miniproject 1 (for Lab only) |

Oct 10 | K-means clustering (Section 5 in LN) |

Oct 16 | Rehearsal, midterm exam preparation. |

Oct 17 | Midterm exam |

Oct 23 | post-exam rehearsal. Introducing feedforward neural networks. Exercise sheet 5 and its data file xypoints.txt. |

Oct 24 | The principle of model optimization by gradient descent of the loss function. |

Oct 30 | The backpropagation algorithm. |

Oct 31 | A closer look at the bias-variance dilemma. New chapter 7 of LN |

Nov 6 | Introduction to HMMs: Markov chains. Miniproject 2 (Lab only) Exercise sheet 6 (Lecture) Return date: Nov 20 (disregard return time stated on sheet). |

Nov 7 | Definition and basic properties of HMMs. |

Nov 13 | Parametric statistics, ML estimation principle, EM principle. Exercise sheet 7 (lecture) |

Nov 14 | Basic HMM inference algorithms and Baum-Welch learning algorithm. Bonus EM programming exercise package (optional, new submission deadline: Nov 7 midnight) |

Nov 20 | HMM: Baum-Welch learning algorithm |

Nov 21 | Bayesian Networks: Introduction. Lecture Note Chapter on Bayesian Networks |

Nov 27 | No class |

Nov 28 | Bayesian Networks: brute-force inference cancelled |

Dec 4 | |

Dec 5 | |

Dec 9 | Final exam (10:00, IRC conference hall) |

**Information for the Probability Theory Tutorial**

Highly recommended (though not mandatory) homework for our meeting on October 9th: here, exercises 3,5,7.

**References**

*The online lecture notes are self-contained, and no further literature is necessary for this course. However, if you want to study some topics in more depth, the following are recommended references.*

Bishop, Christopher M.: Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995.) IRC: QA76.87 .B574 1995* The main course reference (beyond the online lecture notes) *

Bishop, Christopher M.: Pattern Recognition and Machine Learning. Springer Verlag, 2006 *Much more up-to-date and comprehensive than the previously mentioned Bishop book, but I dare say too thick and advanced for an undergraduate course (730 pages) -- more like a handbook for practicians. To find your way into ML, the older, slimmer Bishop book will work better. *

Michie, D., Spiegelhalter, D.J., Taylor, C.C.: *Machine Learning, Neural and Statistical Classification* (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/ and at the course resource repository. *A transparently written book, concentrating on classification. Good backup reading. Thanks to Mantas for pointing this out!*

Duda, R.O., Hart, P.E., Stork, D.G.: *Pattern Classification*, 2nd edition (John Wiley, 2001) IRC: Q327 .D83 2001 *Covers more than the Bishop book, more detailed and more mathematically oriented. Backup reference for the deep probers*

T. Hastie, R. Tibshirani, J. Friedman, *The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. Springer Verlag 2001. IRC: Q325.75 .H37 2001* I have found this book only recently and haven't studied it in detail – looks extremely well written, combining (statistical) maths with applications and principal methods of machine learning, full of illuminating color graphics. May become my favourite. *

Farhang-Boroujeny, B.: *Adaptive Filters, Theory and Applications* (John Wiley, 1999). IRC: TK7872.F5 F37 1998 *Some initial portions of this book describe online linear filtering with the LMS algorithm, which will possibly be covered in the course*

Mitchell, Tom M.: *Machine Learning* (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. *More general and more comprehensive than the course, covers many branches of ML that are not treated in the course. Gives a good overview of the larger picture of ML*

Nabney, Ian T.: *NETLAB: Algorithms for Pattern Recognition* (Springer Verlag, 2001). IRC: TA1637 .N33 2002. *A companion book to the Bishop book, concentrating on Matlab implementations of the main techniques described in the Bishop book. Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/ *