Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;8(1):289-317.

mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models

Affiliations

mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models

Luca Scrucca et al. R J. 2016 Aug.

Abstract

Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Number of weekly downloads from the RStudio CRAN mirror over time for some of R packages dealing with Gaussian finite mixture modelling.
Figure 2
Figure 2
Ellipses of isodensity for each of the 14 Gaussian models obtained by eigen-decomposition in case of three groups in two dimensions.
Figure 3
Figure 3
BIC plot for models fitted to the wine data.
Figure 4
Figure 4
Contour plot of estimated mixture densities (a) and uncertainty boundaries (b) on the projection subspace estimated with MclustDR for the wine dataset.
Figure 5
Figure 5
Pairwise scatterplots for the diabetes data with points marked according to classification.
Figure 6
Figure 6
Plots of BIC and ICL model selection criteria for the diabetes data.
Figure 7
Figure 7
Histograms of LRTS bootstrap distributions for testing the number of mixture components in the diabetes data. The dotted vertical lines refer to the sample values of LRTS.
Figure 8
Figure 8
True class membership (a) and estimated classification using GMM (b) for the hemophilia dataset.
Figure 9
Figure 9
Bootstrap distribution for the mixture proportions. The vertical dotted lines refer to the MLEs for the GMM fitted to the hemophilia data.
Figure 10
Figure 10
Bootstrap distribution for the mixture component means. The vertical dotted lines refer to the MLEs for the GMM fitted to the hemophilia data.
Figure 11
Figure 11
Bootstrap percentile intervals for the means of the GMM fitted to the hemophilia dataset. Solid lines refer to nonparametric bootstrap, dashed lines to the weighted likelihood bootstrap.
Figure 12
Figure 12
Scatterplot matrix for the Flea beetles data with points marked according to the true classes.
Figure 13
Figure 13
(a) Histogram with mixture-based density estimate curve, and (b) histograms by group-year with estimated mixture-component densities, for the Hidalgo1872 stamps dataset.
Figure 14
Figure 14
Plot of the Old Faithful data (a), mixture-based density estimate contours (b), image plot of density estimate (c) and perspective plot of the bivariate density estimate (d).
Figure 15
Figure 15
Pairwise scatterplots between variables for the Wisconsin breast cancer data (panels a–c). Points are marked by cancer diagnosis (benign = formula image, malignant = formula image), whereas ellipses correspond to covariances of mixture components estimated with MclustDA. Plot of data projected along the first two estimated directions obtained with MclustDR, and uncertainty classification boundaries (d).

Similar articles

Cited by

References

    1. Ahlquist JS, Breunig C. Model-based clustering and typologies in the social sciences. Political Analysis. 2012;20(1):92–112.
    1. Aitkin M, Rubin DB. Estimation and hypothesis testing in finite mixture models. Journal of the Royal Statistical Society. Series B (Methodological) 1985;47(1):67–75.
    1. Banfield J, Raftery AE. Model-based Gaussian and non-Gaussian clustering. Biometrics. 1993;49: 803–821.
    1. Basford KE, Greenway DR, McLachlan GJ, Peel D. Standard errors of fitted component means of normal mixtures. Computational Statistics. 1997;12(1):1–18.
    1. Benaglia T, Chauveau D, Hunter DR, Young D. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software. 2009;32(6):1–29. http://www.jstatsoft.org/v32/i06/

LinkOut - more resources