Principles of Statistical Modeling, Spring 2019

Jacobs University Bremen, Spring 2019, Herbert Jaeger

Classes: Wed 9:45-11:00 and Fri 11:15-12:30, Lecture Hall Research III

Tutorial sessions: t.b.a.

TA: Tianlin Liu (t.liu at

Contents. This course gives an introduction to the basic concepts of statistical modeling. We bring together the two views of statistics and of machine learning. While both traditions have developed advanced statistical tools to analyse data, the fundamental questions that are asked (and answered) differ. Stated briefly, statisticians try to answer specific, decision-relevant questions on the basis of data, whereas machine learners aim at modeling complex pieces of the world in as accurately and comprehensively as possible, given data. Both views are important in the current fast developments in “Big Data” or “Data Analytics”. The course proceeds in four main parts: (i) the fundamental concepts of statistical modeling: probability spaces, observation spaces, random variables; (ii) a crash refresher on basic mathematical formulas and laws; (iii) introduction to statistical methods; (iv) introduction to methods of machine learning. The course was developed jointly by a statistician (A. Wilhelm) and a machine learner (H. Jaeger), and will be highly enriched by examples, exercises and miniprojects.

Lecture notes are here.

Homework. There will be two kinds of homeworks, which are treated quite differently. A. Standard Paper-and-pencil  problems. These homeworks give an opportunity to exercise the theoretical concepts introduced in the lecture. These homeworks will not be checked or graded, and doing them is not mandatory. Instead, the problems will be discussed and show-solved in weekly tutorial sessions held by the TA. Model solutions will be put online a week after issuing the problem sheets. B. Modeling miniprojects. The other type of homework will come as one (or two, if time permits) miniproject(s), issued in the last month of the course (since only then you’ll have mastered the necessary techniques). These miniprojects are graded. In these miniprojects you will be given a real-world dataset and your task will be to carry out an elementary statistical analysis of basic characteristics of these data. The challenge: you will be asked to write a short report which will be graded based on correct use of concepts and terminology, clean formalism, and clarity (instructive graphics are desirable). Typesetting preferably in Latex.

Grading. The course grade will be computed from the following components: 1. three miniquizzes written in class (30 min) of which the best two will be taken and counting each by 20% toward the course grade; 2. classroom presence 10%; 3. miniproject homeworks 20%; 4. final exam 30%. All quizzes and exams are open-book.

Schedule (to be filled in agreement with the unfolding of reality)

Feb 6 Introduction
Feb 8 Lots of examples for probability measurement scenarios. Reading: Lecture Notes Section 1. Exercise sheet 1 Solutions
Feb 13 Elementary events and random variables. Reading: LN Section 3
Feb 15 Operations on RVs 1: products and projections. Modeling time series data by RVs. Reading: LN Section 3.1 and Appendix A. Note: there were typographic errors in Appendix A in the originally published LN version, has been corrected, if you download the new version you will find a more readable version of Appendix A. Exercise sheet 2
Feb 20 Operations on RVs 2: transformations of RVs. Reading: LN Sections 3.2 and 4.
Feb 22 Events and sigma-fields. Reading: LN Section 6.1, 6.2 up to (excluding) Theorem 6.2.1.
Feb 27
Mar 1
Mar 6
Mar 8
Mar 13
Mar 15
Mar 20
Mar 22
Mar 27
Mar 29
Apr 3
Apr 5
Apr 10
Apr 12
Apr 24
Apr 26
May 3
May 8
May 10
May 15
Apr 17