Jacobs University Bremen, Spring 2018, Herbert Jaeger
Class sessions: Wednesdays 8:15 – 9:30 Thursdays 9:45-11:00, lecture hall in Res. 1
Tutorial sessions: Tuesdays 19:00 – 20:15, lecture hall in Res. 1
TAs: Tayyab Mateen (t.mateen at jacobs-university.de) and Tianlin Liu (t.liu at jacobs-university.de)
Course description. Machine learning (ML) is all about algorithms which are fed with (large quantities of) real-world data, and which return a compressed model' of the data. An example is the world model' of a robot: the input data are sensor data streams, from which the robot learns a model of its environment -- needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English -- useful, for instance, in automated speech recognition systems. There is a large number of formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms: most notably, the "curse of dimensionality'' and the almost deadly-dangerous problem of under- vs.\ overfitting. This lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, mixtures of Gaussians, Parzen windows). Furthermore, the course also provides a refresher of the requisite concepts from probability theory, statistics, and linear algebra.
Homework. There will be two kinds of homeworks, which are treated quite differently. A. Paper-and-pencil problems. These homeworks give an opportunity to exercise the theoretical concepts introduced in the lecture. These homeworks will not be checked or graded, and doing them is not mandatory. Instead, the problems will be discussed and show-solved in weekly tutorial sessions held by the TAs. Model solutions will be put online a week after issuing the problem sheets. B. Programming miniprojects. The other type of homework comes in the form of small-sized machine learning programming projects. Students work in teams of two, each team submitting a single solution, by email to the TAs, consisting of the code and a documentation (typeset pdf document, preferably generated in Latex, other word processing software allowed). These miniproject homeworks will be graded. Programming can be done in Matlab or Python.
Grading and exams: Grading and exams: The final course grade will be composed from programming homeworks (20%), presence sheets (10%), and quizzes/exams. There will be three miniquizzes (written in class, 30 minutes), the best two of which will each account to 20% of the final grade (worst will be dropped), and one final exam, counting 30%. All quizzes and the final exam are open book.
Fully self-contained lecture notes are here (Version 1.5, April 17, 2018. Change w.r.t. previous version: corrected a number of typos, also in formulas. Now checked up to (and including) Section 7).
Schedule (this will be filled in synchrony with reality as we go along)
|Introduction: What this course aims to achieve|
|Feb 7||Introducing the TICS example. Continuous -> discrete data transformations: binning and clustering. Reading: Sections 2.1, 2.2 in the lecture notes|
|Feb 8||Discrete -> continuous data transformations. A quick recap of basic concepts from probability theory. Reading: Appendix A in the lecture notes Exercise sheet 1|
|Feb 14||The curse of dimensionality and the concept of manifolds in high-dimensional vector spaces. Reading: Section 2.3|
|Feb 15||Data manifolds, continued. The field of ML: overview and navigation guide. Reading: Section 3 Exercise sheet 2|
|Feb 21||Basics of pattern classification. A look in passing at decision trees. Reading: Section 4 of LNs.|
|Feb 22||Optimal decision boundaries. Reading: Section 4 Exercise sheet 3|
|Feb 28||Optimal decision boundaries, finished. Dimension reduction by feature extraction. Reading: Section 5 up to (excluding) 5.1|
|Mar 1||K-means clustering. Reading: Section 5.1. Exercise sheet 4|
|Mar 6||19:00 CNLH: first miniquiz|
|Mar 7||Principal Component Analysis - principle Reading: LN Section 5.2|
|Mar 8||PCA - mathematical properties, algorithm. Reading: LN Section 5.3, 5.4. Exercise sheet 5|
|Mar 14||no class|
|Mar 15||PCA - mathematical properties. Eigendigits. Exercise sheet 6|
|Mar 21||Linear regresssion, part 1. reading: LN Section 6.1, 6.2 up to (including) Equation (19).|
|Mar 22||Linear regresssion, part 2. reading: LN Section 6, complete. Exercise sheet 7|
|Apr 4||Discussion of HW 5. Discussion of scientific writing ethics. The overfitting problem. Reading: LN Section 7.1|
|Apr 5|| |
Bias-variance dilemma: examples. A quick refresher on the concepts of random variables and expectation. Reading: LN Sections 7.2 -- 7.4. Exercise sheet 8 (Submission deadline corrected: Sunday April 15, 23:59 hrs)
|Apr 11||Bias-variance dilemma: formalization. One remedy: model size adaptation. Reading: LN Sections 7.3, 7.4|
|Apr 12||Cross-validation. Exercise sheet 9 Reading: LN Section 7.5|
|Apr 12||19:00 CNLH: second miniquiz|
|Apr 18||Regularization, ridge regression, and why it is called bias-variance dilemma. Reading: LN Sections 7.6, 7.7, 7.8|
|Apr 19||The multilayer perceptron. Architecture. (Slides from a 2008 Spring School course on neural networks, check out slides 44-84). Reading: LN Section 8.1 Exercise sheet 10|
|Apr 25||Universal approximation properties. Training: task definition. Reading: LN Section 8.2, 8.3|
|Apr 26||Training MLPs: general set-up. Optimizing empirical risk by gradient descent: general principle. Reading: LN Section 8.3, 8.4 Exercise sheet 11 (a substantial programming task, more than 2 weeks processing time - yet, start early!)|
|May 2||Remaining sessions undocumented since webpage was hacked|
|May 31||Final exam, 9:00-11:00, East Wing, IRC (pre-announcement, subject to confirmation)|
The online lecture notes are self-contained, and no further literature is necessary for this course. However, if you want to study some topics in more depth, the following are recommended references.
Bishop, Christopher M.: Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995.) IRC: QA76.87 .B574 1995 A recommendable basic reference (beyond the online lecture notes)
Bishop, Christopher M.: Pattern Recognition and Machine Learning. Springer Verlag, 2006 Much more up-to-date and comprehensive than the previously mentioned Bishop book, but I dare say too thick and advanced for an undergraduate course (730 pages) -- more like a handbook for practicians. To find your way into ML, the older, slimmer Bishop book will work better.
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/ and at the course resource repository. A transparently written book, concentrating on classification. Good backup reading. Thanks to Mantas for pointing this out!
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edition (John Wiley, 2001) IRC: Q327 .D83 2001 Covers more than the Bishop book, more detailed and more mathematically oriented. Backup reference for the deep probers
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag 2001. IRC: Q325.75 .H37 2001 I have found this book only recently and haven't studied it in detail – looks extremely well written, combining (statistical) maths with applications and principal methods of machine learning, full of illuminating color graphics. May become my favourite.
Farhang-Boroujeny, B.: Adaptive Filters, Theory and Applications (John Wiley, 1999). IRC: TK7872.F5 F37 1998 Some initial portions of this book describe online linear filtering with the LMS algorithm, which will possibly be covered in the course
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, 2016. Legal online version available. The "bible" of deep learning.
Mitchell, Tom M.: Machine Learning (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. More general and more comprehensive than the course, covers many branches of ML that are not treated in the course. Gives a good overview of the larger picture of ML
Nabney, Ian T.: NETLAB: Algorithms for Pattern Recognition (Springer Verlag, 2001). IRC: TA1637 .N33 2002. A companion book to the Bishop book, concentrating on Matlab implementations of the main techniques described in the Bishop book. Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/