Important Admin Note: This is a dual undergraduate 3rd year specialization course and graduate course for the 1rst semester of the Data Engineering Master Program. Formally it is a graduate course. 3rd-year undergraduates can choose it as a 5-credit specialization area course and are explicitly invited to do so.
Classes: Monday and Tuesday 14:15 - 15:30, East Hall 8.
Contents. This course gives an introduction to the basic concepts of statistical modeling. We bring together the two views of statistics and of machine learning. While both traditions have developed advanced statistical tools to "analyse data", the fundamental questions that are asked (and answered) differ. Stated briefly, statisticians try to answer specific, decision-relevant questions on the basis of data, whereas machine learners aim at modeling complex pieces of the world in as accurately and comprehensively as possible, given data. Both views are important in the current fast developments in "Big Data" or "Data Analytics". The course proceeds in four main parts: (i) the fundamental concepts of statistical modeling: probability spaces, observation spaces, random variables; (ii) a crash refresher on basic mathematical formulas and laws; (iii) introduction to statistical methods (using the R programming language); (iv) introduction to methods of machine learning (using Matlab or Python). The course will be jointly taught by a statistician (A. Wilhelm) and a machine learner (H. Jaeger), and will be highly enriched by examples, exercises and miniprojects.
The course grade will be computed from the following components: 1. four miniquizzes will be written (each at the end of one of our four theme blocks), of which the best three will be taken and counting each by 15% toward the course grade; 2. classroom presence 10%; 3. Homeworks 20%; 4. final exam 25%.
|Sep 7||Data generating environments, data recording procedures, data value spaces -- examples.|
|Sep 8||Universe, elementary events, RVs, products, stoch. processes. Exercise sheet 1|
|Sep 14||events generated by RVs. Sigma-fields.|
|Sep 15||Probability spaces. Notations for probabilities. Conditional probability. Exercise sheet 2|
|Sep 21||Miniquiz 1 (Room: CS lecture hall, Research 1). Samples. Distributions.|
|Sep 22||Probability distributions. Probability mass functions, density functions, CDFs. Exercise sheet 3|
|Sep 28: 13:00 - 14:15||Characteristics of distributions, expected values, variances.|
|Sep 29||Functions of random variables, more distributions and joint probability. Exercise sheet 4|
|Oct 5: 13:00 - 14:15||No class|
|Oct 6||The statistical model; Parametrisation and identifiability. Exercise sheet 5|
|Oct 12||The statistical problem|
|Oct 13||Miniquiz 2 (Room: CS lecture hall, Research 1). Exercise sheet 6|
|Oct 26||Criteria for selecting statistical procedures: Bayes and Minimax|
|Oct 27||Criteria for selecting statistical procedures: Unbiasedness and Maximum Likelihood. Exercise sheet 7|
|Nov 2||Hypothesis testing, The Neyman-Pearson Lemma|
|Nov 3||Best linear unbiased estimation, The general linear model. Exercise Sheet 8 Data for Exercise 8|
|Nov 9||Human vs. machine learning. An ML case study: generating captions for images.|
|Nov 10||Case study continued.|
|Nov 16||The themes and areas of ML. Classification tasks.|
|Nov 17||Miniquiz 3 (Room: CS lecture hall, Research 1). Dimension reduction by features. Exercise 9 package|
|Nov 23||PCA. The ugly face of overfitting.|
|Nov 24||Cross-validation, regularization. And why it's called bias-variance dilemma. Exercise 10|
|Nov 30||Multilayer perceptron: architecture and the backprop training algorithm.|
|Dec 1||Miniquiz 4 (Room: CS lecture hall) A glimpse on "deep learning". --- online course evaluation: please bring your computers|
|Dec 7||Wrap-up session.|
|Dec 14||Final exam: 16:00-18:00, Conrad Naber Lecture Hall|