This is a dual undergraduate and graduate course. Undergraduates take the lecture only. Graduate students also do the Lab + Project combo in addition.
Lecture: Thursday and Friday 15:45 - 17:00, West Hall 3
Lab and Project: to be agreed with participants
Course topics. Machine learning (ML) is all about algorithms which are fed with (large quantities of) real-world data, and which return a compressed "model" of the data. An example is the "world model" of a robot: the input data are sensor data streams, from which the robot learns a model of its environment -- needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English -- useful, for instance, in automated speech recognition systems. There exists a large number of formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms: most notably, the "curse of dimensionality" and the almost deadly problem of under- vs.\ overfitting. The lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, mixtures of Gaussians, Parzen windows). Furthermore, the lecture also provides a refresher of the requisite concepts from probability theory, statistics, and linear algebra. The basic format of the lab (for graduate students) is two miniprojects, each taking 4-5 weeks. Students will get a challenging dataset and a modelling task (for instance: "learn a model to classify digits from blurry images"). The modelling task can be solved by the elementary methods provided in the lecture -- but only poorly. Students are expected to explore more advanced and powerful methods on their own initiative (helpful hints will be given). Since this is the first time that I offer this separate graduate lab, we may change its makeup as we go along if we find that useful. For instance, we might replace one miniproject with a serious probability theory crash course (super useful if you do real machine learning), or we might squeeze in seminar-style paper reading sessions. The project is a highly self-steered project comparable to a BSc thesis. Topics come from ongoing research in my group and are agreed on a case-by-case basis.
Lecture notes. I will start by using the existing ML lecture notes which I wrote for a graduate lecture (this year ML is taught as an undergraduate lecture for the first time). As the semester unfolds I will probably distil a new set of lecture notes. You can download the lecture notes -- as far as I have adapted them - here (latest update: Sept 25, 22:30). Section 3 of the lecture notes are here (latest update: Oct 1). Section 4 is here. Section 10 on hidden Markov models is here (new version from Nov 8: inserted new subsection 10.5 on the ML estimation principle). If you want to inspect the full-scope (graduate) lecture notes, you find them here.
Grading and exams. For the lecture, the course grade is computed from classroom participation (10%), homeworks (30%), midterm (25%) and final exam (35%). The lab grading is foreseen to be based on two miniproject reports (50% each) but if we change the lab makeup this will be adapted. The project grade is based on the project proposal (30%) and the final report (70%).
Slides of a Machine Learning Course given at the "Interdisciplinary College" 2006
Slides of a Neural Network Course (23 MB) given at the "Interdisciplinary College" 2008
A condensed primer on measure theory and probability theory, by Manjunath Gandhi
An online textbook on probability theory (by Rick Durrett)
Hints for writing good miniproject reports, or rather, for avoiding typical blunders
Schedule (for the lecture; will be filled with substance as the semester evolves)
|Sep 4||Introduction; course planning.|
|Sep 5||Introducing the digits example. Curse of dimensionality. Exercise sheet 1. Download the basic digits example Matlab routines. Return date for exercise 1: Sunday Sept 14, midnight... (disregard the Friday deadline stated on the exercise sheet)|
|Sep 11||Regression, time series prediction, Takens theorem.|
|Sep 12||Blackbox vs. analytical modeling, bias-variance dilemma, general ML terminology and abstract formulation of modeling task.|
|Sep 18||(double class) Cross-validation and regularization. Bayes theorem and optimal decision boundaries. Exercise sheet 2|
|Sep 19||no class|
|Sep 25||The concept of a random variable. Optimal decision boundaries 2.|
|Sep 26||Still optimal decision boundaries... Exercise sheet 3. Return date: Oct 9 (disregard return time stated on older version of sheet. Note that Exercise sheet 4 will be posted on Oct 2).|
|Oct 2||Linear discriminants and linear regression. Exercise sheet 4. New bonus scheme in new version of sheet 4 (updated Oct 14) The reference paper from Duin and Tax.|
|Oct 9||Generalized linear discriminants. Perceptrons. Miniproject 1 (for Lab only)|
|Oct 10||K-means clustering (Section 5 in LN)|
|Oct 16||Rehearsal, midterm exam preparation.|
|Oct 17||Midterm exam|
|Oct 23||post-exam rehearsal. Introducing feedforward neural networks. Exercise sheet 5 and its data file xypoints.txt.|
|Oct 24||The principle of model optimization by gradient descent of the loss function.|
|Oct 30||The backpropagation algorithm.|
|Oct 31||A closer look at the bias-variance dilemma. New chapter 7 of LN|
|Nov 6||Introduction to HMMs: Markov chains. Miniproject 2 (Lab only) Exercise sheet 6 (Lecture) Return date: Nov 20 (disregard return time stated on sheet).|
|Nov 7||Definition and basic properties of HMMs.|
|Nov 13||Parametric statistics, ML estimation principle, EM principle. Exercise sheet 7 (lecture)|
|Nov 14||Basic HMM inference algorithms and Baum-Welch learning algorithm. Bonus EM programming exercise package (optional, new submission deadline: Nov 7 midnight)|
|Nov 20||HMM: Baum-Welch learning algorithm|
|Nov 21||Bayesian Networks: Introduction. Lecture Note Chapter on Bayesian Networks|
|Nov 27||No class|
|Nov 28||Bayesian Networks: brute-force inference cancelled|
|Dec 9||Final exam (10:00, IRC conference hall)|
Information for the Probability Theory Tutorial
Highly recommended (though not mandatory) homework for our meeting on October 9th: here, exercises 3,5,7.
The online lecture notes are self-contained, and no further literature is necessary for this course. However, if you want to study some topics in more depth, the following are recommended references.
Bishop, Christopher M.: Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995.) IRC: QA76.87 .B574 1995 The main course reference (beyond the online lecture notes)
Bishop, Christopher M.: Pattern Recognition and Machine Learning. Springer Verlag, 2006 Much more up-to-date and comprehensive than the previously mentioned Bishop book, but I dare say too thick and advanced for an undergraduate course (730 pages) -- more like a handbook for practicians. To find your way into ML, the older, slimmer Bishop book will work better.
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/ and at the course resource repository. A transparently written book, concentrating on classification. Good backup reading. Thanks to Mantas for pointing this out!
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edition (John Wiley, 2001) IRC: Q327 .D83 2001 Covers more than the Bishop book, more detailed and more mathematically oriented. Backup reference for the deep probers
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag 2001. IRC: Q325.75 .H37 2001 I have found this book only recently and haven't studied it in detail – looks extremely well written, combining (statistical) maths with applications and principal methods of machine learning, full of illuminating color graphics. May become my favourite.
Farhang-Boroujeny, B.: Adaptive Filters, Theory and Applications (John Wiley, 1999). IRC: TK7872.F5 F37 1998 Some initial portions of this book describe online linear filtering with the LMS algorithm, which will possibly be covered in the course
Mitchell, Tom M.: Machine Learning (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. More general and more comprehensive than the course, covers many branches of ML that are not treated in the course. Gives a good overview of the larger picture of ML
Nabney, Ian T.: NETLAB: Algorithms for Pattern Recognition (Springer Verlag, 2001). IRC: TA1637 .N33 2002. A companion book to the Bishop book, concentrating on Matlab implementations of the main techniques described in the Bishop book. Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/