# Principles of Statistical Modeling, Spring 2019

#### Jacobs University Bremen, Spring 2019, Herbert Jaeger

**Classes**: Monday 9:45-11:00 and Wed 9:45-11:00, both in Lecture Hall Research III.

**Tutorial sessions**: Mondays 15:45-17:00, LH Res. III

**TA**: Tianlin Liu (t.liu at jacobs-university.de)

**Contents. **This course gives an introduction to the
basic concepts of statistical modeling. We bring together the two views
of statistics and of machine learning. While both traditions have
developed advanced statistical tools to analyse data, the fundamental
questions that are asked (and answered) differ. Stated briefly,
statisticians try to *answer specific, decision-relevant questions* on the basis of data, whereas machine learners aim at *modeling complex pieces of the world in as accurately and comprehensively as possible*,
given data. Both views are important in the current fast developments
in “Big Data” or “Data Analytics”. The course proceeds in four main
parts: (i) the fundamental concepts of statistical modeling: probability
spaces, observation spaces, random variables; (ii) a crash refresher on
basic mathematical formulas and laws; (iii) introduction to statistical
methods; (iv) introduction to methods of machine learning. The course
was developed jointly by a statistician (A. Wilhelm) and a machine
learner (H. Jaeger), and will be highly enriched by examples, exercises
and miniprojects.

**Lecture notes **are here (latest version from April 4, added new section 18).

**Homework. **There will be two kinds of homeworks,
which are treated quite differently. **A. Standard
Paper-and-pencil problems.** These homeworks give an
opportunity to exercise the theoretical concepts introduced in the
lecture. These homeworks will not be checked or graded, and doing
them is not mandatory. Instead, the problems will be discussed and
show-solved in weekly tutorial sessions held by the TA. Model
solutions will be put online a week after issuing the problem
sheets. **B. Modeling miniprojects.** The other type of
homework will come as one (or two, if time permits) miniproject(s),
issued in the last month of the course (since only then you’ll have
mastered the necessary techniques). These miniprojects are graded. In
these miniprojects you will be given a real-world dataset and your
task will be to carry out an elementary statistical analysis of basic
characteristics of these data. The challenge: you will be asked to
write a short report which will be graded based on correct use of
concepts and terminology, clean formalism, and clarity (instructive graphics are desirable). Typesetting preferably in Latex.

**Grading. **The course grade will be computed from the
following components: 1. three miniquizzes written in class (30 min) of
which the best two will be taken and counting each by 20% toward the
course grade; 2. classroom presence 10%; 3. miniproject homeworks 20%;
4. final exam 30%. All quizzes and exams are open-book.

**The 2018 final exam ** for your private study and preparation is here. And the solutions are here.

**Schedule (to be filled in agreement with the unfolding of reality)**

Feb 6 | Introduction |

Feb 8 | Lots of examples for probability measurement scenarios. Reading: Lecture Notes Section 1. Exercise sheet 1 Solutions |

Feb 13 | Elementary events and random variables. Reading: LN Section 3 |

Feb 15 | Operations on RVs 1: products and projections. Modeling time series data by RVs. Reading: LN Section 3.1 and Appendix A. Note: there were typographic errors in Appendix A in the originally published LN version, has been corrected, if you download the new version you will find a more readable version of Appendix A. Exercise sheet 2 | Solutions |

Feb 20 | Operations on RVs 2: transformations of RVs. Reading: LN Sections 3.2 and 4. |

Feb 22 | Events and sigma-fields. Reading: LN Section 6.1, 6.2 up to (including) Definition 6.2.1. |

Feb 25 | More on sigma-fields. The Borel sigma-field. Generating sigma-fields. Reading: LN 6.2, to its end. |

Feb 27 | Measurable functions. Observing world structure through the structure of observations. Reading: LN Section 6, complete Exercise sheet 3 | Solutions |

Mar 4 | The full picture: probability spaces. Correct notation. Conditional probability. Reading: LN Section 7.1 – 7.3 Exercise sheet 4 | Solutions |

Mar 6 | Bayes’ formula. The frequentist conception of probability. Reading: LN 7 complete. |

Mar 11 | Samples and estimators. Reading: LN Sections 8.1, 8.2, 8.3 |

Mar 13 | Miniquiz 1 (in class). Distributions. Reading: LN Sections 9.1, 9.2 |

Mar 18 | Probability mass functions and pdfs. Parametrized distributions. Expectation, variance, moments. Reading: LN Sections 9, 10. |

Mar 20 | Independence. Comments on causality vs. correlation vs. (in)dependence; and on the rare commodity of factorization of distributions. Reading: LN Section 11. |

Mar 25 | Markov chains. Reading: LN Section 12. |

Mar 27 | A glimpse on Bayesian model estimation. Reading: LN Section 15 (we skip Sections 13 and 14) Exercise sheet 5 | Solutions |

Apr 1 | Some classical, widely useful distributions. Reading: LN Section 16, up to (excluding) 16.2.3 |

Apr 3 | Miniquiz 2 (in class). The normal distribution. Reading: LN Section 16.2.3 |

Apr 8 | Painting the grand picture: probability everywhere, but different at different places. Reading: LN Section 17 |

Apr 10 | Miniquiz 2: discussion of solutions. Reading: LN Section 18. Note: new Section 18 in lecture notes, check out latest release |

Apr 24 | The Statistics way of thinking: Introduction. Reading: LN Section 19, up to (including) 19.1. PSM miniproject: task sheet | Essentials of Technical Writing |

Apr 29 | |

May 6 | |

May 8 | |

May 13 | |

May 15 | |

May 22 | 16:00-18:00, CNLH: Final Exam |