Jacobs University Bremen, Spring 2019, Herbert Jaeger
Class sessions: Mondays 8:15-9:45 (Lecture Hall Res. III) and Wednesdays 8:15-9:45 (Lecture Hall Res. III)
Tutorial sessions: Tuesdays 17:15-18:30, West Hall 4.
TAs: Steven Abreu (s.abreu at jacobs-university.de) and Tianlin Liu (t.liu at jacobs-university.de)
Course description. Machine learning (ML) is all about algorithms which are fed with (large quantities of) real-world data, and which return a compressed model' of the data. An example is the world model' of a robot: the input data are sensor data streams, from which the robot learns a model of its environment -- needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English -- useful, for instance, in automated speech recognition systems. There is a large number of formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms: most notably, the "curse of dimensionality'' and the almost deadly-dangerous problem of under- vs. overfitting. This lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, mixtures of Gaussians, Parzen windows). Furthermore, the course also provides a refresher of the requisite concepts from probability theory, statistics, and linear algebra.
Homework. There will be two kinds of homeworks, which are treated quite differently. A. Paper-and-pencil problems. These homeworks give an opportunity to exercise the theoretical concepts introduced in the lecture. These homeworks will not be checked or graded, and doing them is not mandatory. Instead, the problems will be discussed and show-solved in weekly tutorial sessions held by the TA. Model solutions will be put online a week after issuing the problem sheets. B. Programming miniprojects. The other type of homework comes in the form of small-sized machine learning programming projects. Students work in teams of two, each team submitting a single solution, by email to the TA, consisting of the code and a documentation (typeset pdf document, preferably generated in Latex, other word processing software allowed). These miniproject homeworks will be graded. Programming can be done in Matlab or Python.
Grading and exams: Grading and exams: The final course grade will be composed from programming homeworks (20%), quizzes (50%) and a final exam (30%). There will be three quizzes (written in class, 30 minutes), the best two of which will each account to 25% of the final grade (worst will be dropped). All quizzes and the final exam are open book.
Quiz makeup rules: if a quiz is missed without excuse, it will be graded with 0 points. One makeup will be offered soon after the quiz for medically excused quizzes according to the Jacobs rules (especially, the medical excuse must be announced to me before the quiz). Non-medical excuses can be accepted and makeups be arranged on a case-by-case basis. If the first makeup is likewise missed for medical reasons, similar rules apply to get admitted to a second makeup (medical excuse must be announced to me before the makeup). The second makeup is then to sit for the quiz in the next year's edition of this course; or the student may opt to get the grade of the final exam counted also as grade for the quiz.
Fully self-contained lecture notes are here (version 1.11, last update Mar 5, change: typo in Appendix A (eqn. 74) corrected) .
Schedule (this will be filled in synchrony with reality as we go along)
|Feb 11||Introducing the TICS example. Continuous <-> discrete data transformations. Reading: Sections 2.1, 2.2 in the lecture notes|
|Feb 13||A quick recap of basic concepts from probability theory. Reading: Appendix A in the lecture notes Exercise sheet 1 | Solutions|
|Feb 18||The curse of dimensionality and the concept of manifolds in high-dimensional vector spaces. Reading: Section 2.3|
|Feb 20||The field of ML: overview and navigation guide. Reading: Section 3 Exercise sheet 2 | Solutions|
|Feb 25||Basics of pattern classification. A look in passing at decision trees. Optimal decision boundaries. Reading: Section 4 of LNs|
|Feb 27||Dimension reduction through vector quantization: K-means clustering. Reading: LN Section 5.1 Exercise sheet 3 | Solutions | Miniproject 1 The first programming miniproject - this will be graded!|
|Mar 4||Principal Component Analysis - principle. Reading: LN Section 5.2|
|Mar 6||PCA - mathematical properties, algorithm. Eigendigits etc. Reading: LN Section 5.3, 5.4, 5.5 Exercise sheet 4 (paper and pencil exercise, not to be returned, not graded) | Solutions ... and at noontime: first miniquiz. Time: 12:45-13:15 Location: CNLH|
|Mar 11||Linear regresssion, part 1. Reading: LN Section 6.1, 6.2 up to (including) Equation 19).|
|Mar 13||Linear regresssion, part 2. Reading: LN Section 6, complete.|
|Mar 18||A probability refresher: expectation, variance, covariance. -- Training and testing errors. Reading: LN Appendix D and LN Section 7.1 Exercise sheet 5 | Solutions|
|Mar 20||The problem of overfitting. Reading: LN Section 7.2|
|Mar 25||Supervised learning: formal theory. Risk minimization through adapting model size. Reading: LN Sections 7.3, 7.4|
|Mar 27||Cross-validation: the key to everybody's success in ML. Exercise sheet 6 | Solutions | Miniproject 2 The second programming miniproject - this will be graded! --- Reading: LN Section 7.5 ... and at noontime: second miniquiz. Time: 12:45-13:15 Location: CNLH|
|Apr 1||Using regularization to fight overfitting. Ridge regression. Reading: LN Sections 7.6, 7.7|
|Apr 3||Why it's called the Bias-Variance Dilemma. Reading: LN Sections 7.8 Exercise 7 | Solutions|
|Apr 8||Neural networks: introduction. Historical forefather: the perceptron. No reading. Slides|
|Apr 10||A zoo of neural networks -- "connectionist", associative memory networks, Boltzmann machines, spiking. No reading.|
|Apr 24||The multilayer perceptron: architecture. Universal approximation property. General training schema. Reading: LN Sections 8.2, 8.3|
|May 23||9:00-11:00, SSC Hall 3 and 4: Final exam|
The online lecture notes are self-contained, and no further literature is necessary for this course. However, if you want to study some topics in more depth, the following are recommended references.
Bishop, Christopher M.: Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995.) IRC: QA76.87 .B574 1995 A recommendable basic reference (beyond the online lecture notes)
Bishop, Christopher M.: Pattern Recognition and Machine Learning. Springer Verlag, 2006 Much more up-to-date and comprehensive than the previously mentioned Bishop book, but I dare say too thick and advanced for an undergraduate course (730 pages) -- more like a handbook for practicians. To find your way into ML, the older, slimmer Bishop book will work better.
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/ and at the course resource repository. A transparently written book, concentrating on classification. Good backup reading. Thanks to Mantas for pointing this out!
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edition (John Wiley, 2001) IRC: Q327 .D83 2001 Covers more than the Bishop book, more detailed and more mathematically oriented. Backup reference for the deep probers
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag 2001. IRC: Q325.75 .H37 2001 I have found this book only recently and haven't studied it in detail – looks extremely well written, combining (statistical) maths with applications and principal methods of machine learning, full of illuminating color graphics. May become my favourite.
Farhang-Boroujeny, B.: Adaptive Filters, Theory and Applications (John Wiley, 1999). IRC: TK7872.F5 F37 1998 Some initial portions of this book describe online linear filtering with the LMS algorithm, which will possibly be covered in the course
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, 2016. Legal online version available. The "bible" of deep learning.
Mitchell, Tom M.: Machine Learning (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. More general and more comprehensive than the course, covers many branches of ML that are not treated in the course. Gives a good overview of the larger picture of ML
Nabney, Ian T.: NETLAB: Algorithms for Pattern Recognition (Springer Verlag, 2001). IRC: TA1637 .N33 2002. A companion book to the Bishop book, concentrating on Matlab implementations of the main techniques described in the Bishop book. Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
Brownlee, J.: (author's own publication, online at author's ML service portal ). A decidedly user-friendly, hands-on intro to linear algebra, targetting ML usage, with Python exercises.