Online EM algorithm for robust clustering: Application to Parkinson disease

Updated: 13 days ago
Location: Grenoble, RHONE ALPES
Deadline: 15 Jan 2022

A popular way to approach clustering tasks is via a parametric mixture model. The vast majority of the work on such mixtures has been based on Gaussian mixture models. However, in some applications the tails of Gaussian distributions are shorter than appropriate or parameter estimations are affected by atypical observations (outliers).  To address this issue, mixtures of so-called multiple scale Student distributions have been proposed and used for clustering (Forbes and Wraith 2014). In contrast to the Gaussian case, no closed-form solution exists for such mixtures but tractability is maintained via the use of the expectation-maximisation (EM) algorithm. However, such mixtures require more parameters than the standard Student or Gaussian mixtures and the EM algorithm used to estimate the mixture parameters involves more complex numerical optimizations.  Consequently, when the number of samples to be clustered becomes large, applying EM on the whole data set (Batch EM) may become costly both in terms of time and memory requirements. A natural approach to bypass this issue is to consider an online version of the algorithm, that can incorporate the samples incrementally or in mini batches (eg. Cappé and Moulines 2009).

In this work, we propose to design a tractable online EM for mixtures of multiple scale Student distributions in order to use it then to detect subtle brain anomalies from MR brain scans for patients suffering from Parkinson disease. The application to Parkinson disease will be carried out jointly with M. Dojat from Grenoble Institute of Neuroscience.


Cappé and Moulines 2009. On-line expectation-maximization algorithm for latent data models. Journal of the Royal Statistical Society B, 71:593–613, 2009.

Forbes and Wraith 2014. A new family of multivariate heavytailed distributions with variable marginal amounts of tailweights: Application to robust clustering. Statistics and Computing, 24(6):971–984, 2014.

View or Apply

Similar Positions