Model Selection and Prediction by Minimum Description Length
The Minimum Description Length Principle is a method for inductive inference that has its origins in J. Rissanen's groundbreaking publications of the 1980s and 90s. Although MDL can be applied to general statistics and machine learning problems, most specialized development and application has been in the area of (statistical) model selection and sequential prediction. MDL's starts by equating "learning from data" with "being able to compress the data". In a model selection and testing context, this leads to procedures that, in their simplest guise, behave very similarly to Bayesian model selection with "default" or "reference priors"; yet the interpretation of what happens is completely different.
In the first hour of this talk I will outline, from scratch, the theory of 'universal data compression' and discuss the four standard, most prevalent universal codes: normalized maximum likelihood, Bayes, 2-part and "prequential plug-in", showing how each of these gives rise to their own version of MDL model selection.
In the second hour, I will discuss more advanced MDL methods such as the 'switch code' which achieves 'almost' the best properties of AIC and BIC at the same time; I will indicate new developments on 'safe' hypothesis tests, and I will outline the link between MDL and the theory of nonstochastic individual sequential prediction, which is a popular topic in machine learning theory.