# Machine Learning, Spring 2019

#### Jacobs University Bremen, Spring 2019, Herbert Jaeger

**Class
sessions:** Mondays 8:15-9:45 (Lecture Hall Res. III) and
Wednesdays 8:15-9:45 (Lecture Hall Res. III)

**Tutorial sessions**: Tuesdays 17:15-18:30, West Hall 4.

**TAs**: Steven Abreu (s.abreu at
jacobs-university.de) and Tianlin Liu (t.liu at jacobs-university.de)

**Course
description. **Machine learning (ML) is all about algorithms
which are fed with (large quantities of) real-world data, and which
return a compressed model' of the data. An example is
the world model' of a robot: the input data are sensor data
streams,
from which the robot learns a model of its environment -- needed, for
instance, for navigation. Another example is a spoken language model:
the input data are speech recordings, from which ML methods build a
model of spoken English -- useful, for instance, in automated speech
recognition systems. There is a large number of formalisms in which
such
models can be cast, and an equally large diversity of learning
algorithms. However, there is a relatively small number of fundamental
challenges which are common to all of these formalisms and algorithms:
most notably, the "curse of dimensionality'' and the almost
deadly-dangerous problem of under- vs. overfitting. This lecture
introduces such fundamental concepts and illustrates them with a
choice of elementary model formalisms (linear classifiers and
regressors, radial basis function networks, clustering, mixtures of
Gaussians, Parzen windows). Furthermore, the course also provides a
refresher of the requisite concepts from probability theory,
statistics, and linear algebra.

**Homework. **There will be two kinds of homeworks,
which are treated quite differently. **A. Paper-and-pencil
problems**.
These homeworks give an opportunity to exercise the theoretical
concepts introduced in the lecture. These homeworks will not be
checked or graded, and doing them is not mandatory. Instead, the
problems will be discussed and show-solved in weekly tutorial sessions
held by the TA. Model solutions will be put online a week after
issuing the problem
sheets. **B. Programming miniprojects.** The other type
of
homework comes in the form of small-sized machine learning
programming projects. Students work in teams of two, each team
submitting a single solution, by email to the TA, consisting of the
code and a documentation (typeset pdf document, preferably generated
in Latex, other word processing software allowed). These miniproject
homeworks will be graded. Programming can be done in Matlab or
Python.

**Grading and exams:** Grading and exams:
The final course grade will be composed from programming homeworks
(20%), quizzes (50%) and a final exam (30%). There will be three quizzes
(written in class, 30 minutes), the best two of which will each
account to 25% of the final grade (worst will be dropped). All quizzes and the final exam are open
book.

**Quiz makeup rules:** if a quiz is missed without excuse, it
will be graded with 0 points. One makeup will be offered soon after the quiz for
medically excused quizzes according to the Jacobs rules (especially,
the medical excuse must be announced to me *before* the quiz).
Non-medical excuses can be accepted and makeups be arranged on a
case-by-case basis. If the first makeup is likewise missed for medical reasons, similar rules apply to get admitted to a second makeup (medical excuse must be announced to me before the makeup). The second makeup is then to sit for the quiz in the next year's edition of this course; or the student may opt to get the grade of the final exam counted also as grade for the quiz.

**The 2018 final exam ** for your private study and preparation is here. And the solutions are here.

**Fully self-contained lecture notes** are
here
(version 1.11, last update Mar 5, change: typo in Appendix A (eqn. 74) corrected) .

**Schedule **(this
will be filled in synchrony with reality as we go along)

Feb 6 | Introduction |

Feb 11 | Introducing the TICS example. Continuous <-> discrete data transformations. Reading: Sections 2.1, 2.2 in the lecture notes |

Feb 13 | A quick recap of basic concepts from probability theory. Reading: Appendix A in the lecture notes Exercise sheet 1 | Solutions |

Feb 18 | The curse of dimensionality and the concept of manifolds in high-dimensional vector spaces. Reading: Section 2.3 |

Feb 20 | The field of ML: overview and navigation guide. Reading: Section 3 Exercise sheet 2 | Solutions |

Feb 25 | Basics of pattern classification. A look in passing at decision trees. Optimal decision boundaries. Reading: Section 4 of LNs |

Feb 27 | Dimension reduction through vector quantization: K-means clustering. Reading: LN Section 5.1 Exercise sheet 3 | Solutions | Miniproject 1 The first programming miniproject - this will be graded! |

Mar 4 | Principal Component Analysis - principle. Reading: LN Section 5.2 |

Mar 6 | PCA - mathematical properties, algorithm. Eigendigits etc. Reading: LN Section 5.3, 5.4, 5.5 Exercise sheet 4 (paper and pencil exercise, not to be returned, not graded) | Solutions ... and at noontime: first miniquiz. Time: 12:45-13:15 Location: CNLH |

Mar 11 | Linear regresssion, part 1. Reading: LN Section 6.1, 6.2 up to (including) Equation 19). |

Mar 13 | Linear regresssion, part 2. Reading: LN Section 6, complete. |

Mar 18 | A probability refresher: expectation, variance, covariance. -- Training and testing errors. Reading: LN Appendix D and LN Section 7.1 Exercise sheet 5 | Solutions |

Mar 20 | The problem of overfitting. Reading: LN Section 7.2 |

Mar 25 | Supervised learning: formal theory. Risk minimization through adapting model size. Reading: LN Sections 7.3, 7.4 |

Mar 27 | Cross-validation: the key to everybody's success in ML. Exercise sheet 6 | Solutions | Miniproject 2 The second programming miniproject - this will be graded! --- Reading: LN Section 7.5 ... and at noontime: second miniquiz. Time:
12:45-13:15 Location: CNLH |

Apr 1 | Using regularization to fight overfitting. Ridge regression. Reading: LN Sections 7.6, 7.7 |

Apr 3 | Why it's called the Bias-Variance Dilemma. Reading: LN Sections 7.8 Exercise 7 | Solutions |

Apr 8 | Neural networks: introduction. Historical forefather: the perceptron. No reading. Slides |

Apr 10 | A zoo of neural networks -- "connectionist", associative memory networks, Boltzmann machines, spiking. No reading. |

Apr 24 | The multilayer perceptron: architecture. Universal approximation property. General training schema. Reading: LN Sections 8.2, 8.3 |

Apr 29 | |

May 6 | |

May 8 | |

May 13 | |

May 15 | |

May 23 | 9:00-11:00, SSC Hall 3 and 4: Final exam |

**References**

*The online lecture notes are
self-contained, and no further literature is necessary for this
course. However, if you want to study some topics in more depth, the
following are recommended references.*

Bishop, Christopher
M.: Neural Networks for Pattern Recognition (Oxford
Univ. Press, 1995.) IRC: QA76.87 .B574 1995* A recommendable basic
reference (beyond the online lecture notes) *

Bishop,
Christopher M.: Pattern Recognition and Machine Learning. Springer
Verlag, 2006 *Much
more up-to-date and comprehensive than the previously mentioned
Bishop book, but I dare say too thick and advanced for an
undergraduate course (730 pages) -- more like a handbook for
practicians. To find your way into ML, the older, slimmer Bishop book
will work better. *

Michie, D., Spiegelhalter, D.J.,
Taylor, C.C.: *Machine Learning, Neural and Statistical
Classification* (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/
and at the course
resource repository. *A transparently written book,
concentrating on classification. Good backup reading. Thanks to Mantas
for pointing this out!*

Duda, R.O., Hart, P.E., Stork,
D.G.: *Pattern Classification*, 2nd edition (John Wiley, 2001)
IRC: Q327 .D83 2001 *Covers more than the Bishop book, more
detailed and more mathematically oriented. Backup reference for the
deep probers*

T. Hastie, R. Tibshirani, J. Friedman,
*The Elements of Statistical Learning: Data Mining, Inference, and
Prediction*. Springer Verlag 2001. IRC: Q325.75 .H37
2001*
I have found this book only recently and haven't studied it in detail
–
looks extremely well written, combining (statistical) maths with
applications and principal methods of machine learning, full of
illuminating color graphics. May become my favourite. *

Farhang-Boroujeny, B.: *Adaptive Filters, Theory and
Applications* (John Wiley, 1999). IRC: TK7872.F5 F37 1998
*Some
initial portions of this book describe online linear filtering with
the
LMS algorithm, which will possibly be covered in the course*

Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT
Press, 2016. Legal online
version available. *The "bible" of deep
learning. *

Mitchell, Tom M.: *Machine
Learning* (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. *More
general and more comprehensive than the course, covers many branches
of
ML that are not treated in the course. Gives a good overview of the
larger picture of ML*

Nabney, Ian T.: *NETLAB:
Algorithms for Pattern Recognition* (Springer Verlag, 2001). IRC:
TA1637 .N33 2002. *A
companion book to the Bishop book, concentrating on Matlab
implementations of the main techniques described in the Bishop book.
Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
*

Brownlee, J.: (author's own publication, online at author's ML service portal
). * A decidedly user-friendly, hands-on intro to linear algebra, targetting ML usage, with Python exercises.*