# Principles of Statistical Modeling, Spring 2018

#### Jacobs University Bremen, Spring 2018, Herbert Jaeger

**Classes**: Wed 9:45-11:00 (East Hall 4) and Fri 11:15-12:30, East Hall 8

**Tutorial session: **Tue 17:15-18:30, West Hall 4

**TAs**: Xu He (x.he at jacobs-university.de) and Tianlin Liu (t.liu at jacobs-university.de)

**Contents. **This course gives an introduction to the
basic concepts of statistical modeling. We bring together the two views
of statistics and of machine learning. While both traditions have
developed advanced statistical tools to analyse data, the fundamental
questions that are asked (and answered) differ. Stated briefly,
statisticians try to *answer specific, decision-relevant questions* on the basis of data, whereas machine learners aim at *modeling complex pieces of the world in as accurately and comprehensively as possible*,
given data. Both views are important in the current fast developments
in “Big Data” or “Data Analytics”. The course proceeds in four main
parts: (i) the fundamental concepts of statistical modeling: probability
spaces, observation spaces, random variables; (ii) a crash refresher on
basic mathematical formulas and laws; (iii) introduction to statistical
methods; (iv) introduction to methods of machine learning. The course
was developed jointly by a statistician (A. Wilhelm) and a machine
learner (H. Jaeger), and will be highly enriched by examples, exercises
and miniprojects.

**Lecture notes **are here.
This is a new (version 0.1, April 17) version which combines in one
manuscript the parts that were distributed as separate documents before.

**Homework. **There will be two kinds of homeworks, which are treated quite differently. **A. Paper-and-pencil problems**.
These homeworks give an opportunity to exercise the theoretical
concepts introduced in the lecture. These homeworks will not be checked
or graded, and doing them is not mandatory. Instead, the problems will
be discussed and show-solved in weekly tutorial sessions held by the
TAs. Model solutions will be put online a week after issuing the problem
sheets. **B. Programming miniprojects.** The other type of
homework comes in the form of small-sized programming projects.
Students work in teams of two or three, each team submitting a single
solution, by email to the TAs, consisting of the code and a
documentation (typeset pdf document, preferably generated in Latex,
other word processing software allowed). These miniproject homeworks
will be graded. Programming can be done in Matlab or Python.

**Grading. **The course grade will be computed from the
following components: 1. three miniquizzes written in class (30 min) of
which the best two will be taken and counting each by 20% toward the
course grade; 2. classroom presence 10%; 3. programming homeworks 20%;
4. final exam 30%. All quizzes and exams are open-book.

**Schedule (to be filled in agreement with the unfolding of reality**

Feb 2 |
Introduction. |

Feb 7 | Lots of examples for probability measurement scenarios. Reading: Lecture Notes Part 1, Section 2 Exercise sheet 1 |

Feb 9 | Elementary events and random variables. Reading: LN Section 3 |

Feb 14 | Operations on RVs 1: products and projections Reading: LN Section 4.1 and Appendix A |

Feb 16 | Operations on RVs 2: transformations of RVs. Modeling time series data by RVs. Reading: LN Sections 4.2 and 5. Exercise sheet 2 |

Feb 21 | Events and sigma-fields. Reading: LN Section 7.1, 7.2 up to (excluding) Theorem 3. |

Feb 23 | More on sigma-fields. The Borel sigma-field. Generating sigma-fields. Reading: LN 7.2, to its end. Exercise sheet 3 |

Feb 28 | Measurable functions. Observing structure through the structure of observations. Reading: LN 7, complete |

Mar 2 | The full picture: probability spaces. Notation: how to correctly write down probability statements. Reading: LN Section 8. Exercise sheet 4 |

Mar 7 | Conditional probability. Exercise sheet 5 |

Mar 9 | no class |

Mar 14 | no class |

Mar 15 | miniquiz 1. 19:00, CNLH |

Mar 16 | Bayes' formula. Samples. Reading: LN Section 9. Exercise sheet 6 |

Mar 21 | Estimators. Distributions. Representing distributions. Marginals. Reading: LN Section 9 to end, LN Section 10 |

Mar 23 | Expectation, variance, covariance, moments. Reading: LN Section 11 Exercise sheet 7 |

Apr 4 | Independence. Markov Chains I. Reading: LN Sections 12 and13. Exercise sheet 8 |

Apr 6 | Markov Chains II. A glimpse on hidden Markov models. |

Apr 11 | A glimpse on Bayesian model estimation. Reading: LN Section 16 (we skip Sections 14 and 15). |

Apr 13 | Some widely used distributions. Solutions to sheet 8 Exercise sheet 9 an old final exam Reading: LN Section 17 |

Apr 18 | Uses of probability theory in the natural sciences and signal processing & control. |

Apr 19 | miniquiz 2. 19:00, CNLH |

Apr 20 | Quiz outcome - discussion. Exercise sheet 10 |

Apr 25 | Part II: statistics. Introduction. Statistics: formalization of the statistical problem. |

Apr 27 | 9:45: extra tutorial, likely in West Hall 4 |

Apr 27 | Statistical procedures. |

May 2 | Remaining sessions not documented since website was hacked. |

May 4 | 9:45: extra tutorial |

May 4 | |

May 9 | |

May 11 | 9:45: extra tutorial |

May 11 | |

May 16 | |

May 22 | 12:30 - 14:30 final exam CNLH (pre-announcement, needs confirmation) |