Gaussian Mixture Models (GMM) in Scikit Learn

Last Updated : 16 May, 2025

Gaussian Mixture Model (GMM) is a flexible clustering technique that models data as a mixture of multiple Gaussian distributions. Unlike k-means which assumes spherical clusters GMM allows clusters to take various shapes making it more effective for complex datasets.

If you're new to GMM you can refer to the article Gaussian Mixture Model for theoritical understanding

Covariance Types in Gaussian Mixture Models

In GMM covariance matrix plays a important role in shaping the individual Gaussian components of the mixture. Selecting the right covariance type is essential for effectively modeling the structure and relationships within the data. Scikit-Learn offers four types of covariance matrices:

Full : Each component has its own full covariance matrix. It allows each component to have a unique shape, orientation and size in all dimensions. Provides the most flexibility but also increases computational cost.
Tied : All components share a single common full covariance matrix. Provide all clusters to have the same shape and orientation making it more restrictive. Useful when components are expected to be similar in spread.
Diagonal : Each component has its own diagonal covariance matrix and allow each component to have different variances along each dimension but assumes no correlation between dimensions. Computationally efficient and useful for high-dimensional data.
Spherical : Each component has a single variance value across all dimensions. Assumes that all clusters are spherical and identical in all directions. The simplest model but often too restrictive for real-world data.

Each covariance type offers different levels of flexibility and constraints that impact how GMM models the data. Choosing the right covariance_type parameter is important.

Implementation of GMM Covariances

To work with GMM covariances in Scikit-Learn we will use the built-in wine dataset.

Step 1: Importing Required Libraries

Before using Gaussian Mixture Models (GMM) in Scikit-Learn we need to import the necessary libraries.

Scikit-Learn: This is the main library that provides the GaussianMixture class for GMM.
NumPy: Used for handling and manipulating numerical data efficiently.

Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.mixture import GaussianMixture
from sklearn.decomposition import PCA

Step 2: Data Preparation

Data preparation is a important step in our program. Ensuring that our data is in the correct format for GMM is essential. We must prepare our data accordingly. To begin let's load wine dataset.

Python

wine = datasets.load_wine() #loading the wine dataset
X = wine.data[:, :2]

Step 3: Initializing Gaussian Mixture Model

In this step we will initialize a Gaussian Mixture Model. To do this we specify two key parameters:

n_components : This parameter determines the number of components in our model and it reflects the number of clusters or distributions the data will be divided into.
covariance_type: The covariance type defines the structure of the covariance matrix for each component. It can be set to one of the four options: full, tied, diag or spherical.

Python

n_components = 2  # Number of clusters
covariance_types = ['full', 'tied', 'diag', 'spherical']

Step 4: Fitting the GMM Model

Fitting the GMM model is a important step that helps us understand the underlying structure of the data and how it relates to the chosen covariance type.

Python

gmm_models = {cov_type: GaussianMixture(n_components=n_components, covariance_type=cov_type)
              for cov_type in covariance_types}

for cov_type, gmm_model in gmm_models.items():
    gmm_model.fit(X)

Step 5: Accessing Covariances

You can access the covariance matrices of the components through the "covariances_" attribute of our fitted GMM model. The shape of these covariance matrices depends on the specified by covariance_type. This will help in accessing the covariances.

Python

covariances = {cov_type: gmm_model.covariances_
               for cov_type, gmm_model in gmm_models.items()}

Step 6: Using the GMM Model for Clustering or Predictions

With our GMM model fully prepared the final step is to utilize the model for clustering or making predictions depend on the specific task at hand.

Python

predictions = {cov_type: gmm_model.predict(X)
               for cov_type, gmm_model in gmm_models.items()}

Step 7 : Visualizations

Python

plt.figure(figsize=(12, 8))

for i, (cov_type, gmm_model) in enumerate(gmm_models.items(), 1):
    plt.subplot(2, 2, i)
    plt.scatter(X[:, 0], X[:, 1], c=predictions[cov_type], cmap='viridis', edgecolors='k', s=40)
    plt.title(f'GMM Clustering with {cov_type} Covariance')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.colorbar()
    
    print(f'Covariance Matrix ({cov_type} - Component):\n{covariances[cov_type][0]}')

plt.tight_layout()
plt.show()