**Times and locations. **Lecture: Tue 17:15 - 18:30, Thu 17:15 - 18:30. Tutorial: Thu 14:15 - 15:30, Fri 14:15 - 15:30. Location: West Hall 8 (at all times)

**Theme.** The lecture (and its lab) offers an introduction to a number of methods from modern machine learning – more specifically, *unsupervised* learning methods. This part of machine learning has deep and fascinating relationships with modern statistical physics, from which some beautiful and powerful computational methods are derived. This complements the material covered in the "Advanced Machine Learning" lecture, which I offer in alternating years – the methods in that other lecture concern *supervised* learning methods. Taking both lectures in two successive years will give you a fairly substantial coverage of important branches of machine learning.

The whole thing is organized as a lecture+lab twin, where technically each is a Jacobs course and each is a corequisite to the other – that is, you take both or none. We will be flexible in allocating time to either. The main idea is:

- The lecture covers material in the traditional style of a taught lecture, with paper homeworks and written exams.
- The lab is itself split into two equally weighted parts:
- A suite of two (not-so-)mini projects, where you will design and implement machine learning solutions to solve some challenging problem. While the lecture contents will provide you with the requisite know-how, this still will leave you much space for own experimentation (if you really want to have super solutions). Grading of this component derives from your project documentation.
- A 1-session-per-week, mandatory math primer, where you will get a refresher (or starter) on probability theory and medium advanced linear algebra – the two workhorses for machine learning. Grading of this component comes from written exams - for organisational simplicity, the exams written in the lecture will contain a math component for this lab part.

**Grading and exams.** There will be a unique and joint grade for the lecture + lab twin. The grade will be computed from homeworks (15%), miniprojects (45%), midterm (15%) and final exam (25%). The miniprojects can earn you extra bonus points, as explained in the miniproject spec sheets. These bonus points are added to the final course percentage point score. – The Jacobs cheating rules apply: if cheating is detected, the homework / exam will be nulled (both for the donator and the recipient of the shared information) and the incident reported to the registrar (never happened so far in my graduate courses – about which I am very happy). – Late submission of miniproject and homework deliverables will be penalized with a 10 point subtraction per day.

**Lecture notes and auxiliary materials.** A fully self-contained set of lecture notes is here (16 MB) (new version with corrections in the Boltzmann machine part posted Nov 17)

Further materials:

- A condensed probability primer (
**typo-corrected version 02.12.2012**) by Manjunath Gandhi - Hints for writing good miniproject reports
- For your exam preparation: solved AlgStatMod final exam 2010, final exam 2007, final exam 2005, midterm 2010, midterm 2010 probability theory part, midterm 2007

**Schedule**

The schedule will be filled in sync with reality as we go along.

Sep 5 | Introduction; course planning |

Sep 6 (lecture) | Multivariate Gaussians. Lecture notes: Section 3.1. |

Sep 11 (lecture) | Multivariate Gaussians. Lecture notes: Section 3.1. Reading: Lecture notes Section 3.1 up to (excluding) "Further Properties of multivariate Gaussians" |

Sep 13 (lecture) | Mixtures of Gaussians. Reading: Lecture notes Section 3.2 up to (excluding) "The Expectation-Maximization algorithm" Exercise sheet 1 gradedata.txt dataPreparation.m |

Sep 13 (tutorial) | Introductory Remarks, Sigma-algebra, Measurable Space. |

Sep 14 (tutorial) |
Sigma-algebras, Measurable functions. |

Sep 18 (lecture) | Learning MoG's with the EM algorithm. Introduction to miniproject 1. MP1 task sheet |

Sep 20 (tutorial) | Measurable functions contd., Need for the notion of Lebesgue Integral. Tutorial Homework-1 |

Sep 20 (lecture) |
Parzen windows. |

Sep 21 (tutorial) |
Approximation of real functions by simple functions. Definition of Lebesgue integrals of non-negative functions. |

Sep 25 (tutorial) | Lebesgue integral of real valued functions. Definition of Lebesgue measure. |

Sep 25 (lecture) | Sampling: general principles; sampling by transformation; rejection sampling. Exercise sheet 2 (lecture) |

Sep 26 (tutorial) |
Lebesgue measure (continued). Distribution of a random variable, Cumulative distribution function. |

Oct 2 (lecture) | Basics of Markov processes. |

Oct 4 (tutorial) |
Probability density function: Definition and Existence |

Oct 4 (lecture) | Basics of MCMC, Gibbs sampler. Exercise sheet 3 (lecture) |

Oct 5 (tutorial) |
-------- |

Oct 9 (lecture) | The metropolis sampler. Reading for Oct 11: Section 4.8 from lecture notes. The original paper is here. |

Oct 11 (tutorial) | Probability density function cont., Random Vectors and the need for joint distributions |

Oct 11 (lecture) | Sampling application example: reconstructing phylogenetic trees. Exercise sheet 4 (lecture) exercise4Sampling.zip |

Oct 12 (tutorial) |
Joint distribution and CDF of random vectors. Tutorial Homework -2 |

Oct 16 (lecture) |
Simulated annealing I. |

Oct 18 (tutorial) | Joint densities; Mean, Variance, Covariance of two random variables. |

Oct 18 (lecture) | Simulated Annealing II: free energy |

Oct 25 (lecture) |
Covariance and Linear dependence between random variables; Independent random variables |

Nov. 1 (Tutorial) |
Cancelled due to Midterm |

Nov 1 (Lecture) |
Midterm exam. (Tutorial Portions : Chapter 1 to Chapter 6 (inclusive) in the primer) Miniproject 2 task sheet Miniproject 2 materials zip-file |

Nov 6 (lecture) | Ising models, energy modeling. Boltzmann machine intro |

Nov. 8 (Tutorial) |
Independence |

Nov 8 (lecture) | Boltzmann machine. |

Nov 13 (lecture) | Restricted Boltzmann machine, deep belief networks. Exercise sheet 5 (lecture) |

Nov 15 (tutorial) |
Stochastic Process; Convergence in Probability; Weak Law of Large Numbers |

Nov 22 (tutorial) |
Stochastic Convergence |

Nov 22 (lecture) | Undirected graphical models and Markov random fields. Exercise sheet 6 (lecture) |

Nov 27 (lecture) | MRF: image processing applications. Join tree algorithm: constructing the join tree |

Nov 29 (tutorial) | Central limit theorem. Stationary Process. (Tutorial primer version with corrected typos updated) |

Nov 29 (lecture) | Computing inferences in join trees. |

Dec 11 | Final exam. Tentative time: 17:15 Tutorial Exam Solutions |

**References (additional background reading, not required)**

Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer Verlag, New York 2006. Online copy available through IRC. *A thick book on what its title says, with a particularly strong and long chapter on Bayesian networks of various shades. *

Neal, Radford M. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, Dpt. of CS, Univ. Toronto, 1993. Local online version

Mitchell, Tom M.: *Machine Learning* (McGraw-Hill, 1997) IRC: Q325.5 .M58. *Chapter 6 gives a concise intro to Bayesian inference and elementary Bayesian networks.*

Duda, R.O., Hart, P.E., Stork, D.G.: *Pattern Classification*, 2nd edition (John Wiley, 2001) IRC: Q327.D83 *Chapter 3 presents foundations of Bayesian vs. maximum likelihood estimation, including intro to hidden Markov models and the EM algorithm – just like chapter 6 from the Mitchell book. *

Durbin, R. et al.: *Biological sequence analysis : probabalistic models of proteins and nucleic acids.* Cambridge, UK : New York : Cambridge University Press, 1998 QP620 .B576 1998 *Good description of some sampling techniques, interesting applications in biosequence modelling*

S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi (1983) *Optimization by Simulated Annealing.* Science, Vol. 220 Nr 4598, 671-680. *Classical (by now), very readable paper introducing simulated annealing. Includes a beautiful demonstration of simulated annealing in a computer layout optimization task. *Local online copy* *

Ackley, D.H. and Hinton, G.E. and Sejnowski, T.J.: A Learning Algorithm for Boltzmann Machines. Cognitive Science 9 (1985), 147-169. *Another classic, introducing a neural network model of an associative error-correcting memory, based on the Boltzmann distribution. *Local online copy

Huang, C., Darwiche, A.: Inference in Belief Networks: A Procedural Guide. Int. J. of Approximate Reasoning 11 (1994), p. 158ff. *this is the tutorial text o which much of the algorithmic outline of the Bayesian Network chapter in my lecture notes is based. *Local online copy