Calculate and Plot a Cumulative Distribution function with Matplotlib in Python

Last Updated : 19 May, 2025

Cumulative Distribution Functions (CDFs) show the probability that a variable is less than or equal to a value, helping us understand data distribution. For example, a CDF of test scores reveals the percentage of students scoring below a certain mark. Let’s explore simple and efficient ways to calculate and plot CDFs using Matplotlib in Python.

Using np.arange

This is a simple way to compute the CDF. First, the data is sorted and then np.arange is used to create evenly spaced cumulative probabilities. It's fast and perfect when you want a clean and intuitive CDF without extra dependencies.

Python

import numpy as np
import matplotlib.pyplot as plt

d = np.sort(np.random.randn(500))
c = np.arange(1, len(d)+1) / len(d)

plt.plot(d, c, '.', color='blue')
plt.xlabel('Data Values')         
plt.ylabel('CDF')                  
plt.title('CDF via Sorting')       
plt.grid()                         
plt.show()

Output

Explanation:

np.random.randn(500) generates 500 random data points from a standard normal distribution (mean = 0, std = 1).
np.sort(d) sorts the generated data in ascending order to prepare for cumulative comparison.
np.arange(1, len(d)+1) / len(d) creates evenly spaced cumulative probabilities ranging from 1/500 to 500/500 .
plt.plot(d, c, '.', color='blue') plots the sorted data values against their corresponding CDF values as blue dots on the graph.

Using statsmodels ECDF Method

statsmodels library provides a built-in class ECDF for empirical CDFs. It’s very convenient and gives you a ready-to-use CDF function. If you're okay with an extra dependency, this is one of the cleanest and most accurate methods.

Python

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.distributions.empirical_distribution import ECDF

d = np.random.randn(500)
e = ECDF(d)

plt.step(e.x, e.y, color='green')
plt.xlabel('Data Values')
plt.ylabel('CDF')
plt.title('CDF via ECDF')
plt.grid()
plt.show()

Output

Explanation:

np.random.randn(500) generates 500 random data points from a standard normal distribution (mean = 0, std = 1).
ECDF(d) computes the empirical cumulative distribution function of the data d.
plt.step(e.x, e.y, color='green') plots the ECDF as a step function with green color, showing the cumulative probabilities.

Using scipy.stats.cumfreq method

scipy.stats.cumfreq computes cumulative frequency tables. This method is slightly more advanced, offering more control over binning and output values. It's a good option if you already use SciPy in your workflow.

Python

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import cumfreq

d = np.random.randn(500)
r = cumfreq(d, numbins=25)
x = r.lowerlimit + np.linspace(0, r.binsize * r.cumcount.size, r.cumcount.size)
c = r.cumcount / r.cumcount[-1]

plt.plot(x, c, color='purple')
plt.xlabel('Data Values')
plt.ylabel('CDF')
plt.title('CDF via cumfreq')
plt.grid()
plt.show()

Output

Explanation:

np.random.randn(500) generates 500 random data points from a standard normal distribution (mean = 0, std = 1).
cumfreq(d, numbins=25) computes the cumulative frequency distribution of the data d using 25 bins.
r.lowerlimit + np.linspace(0, r.binsize * r.cumcount.size, r.cumcount.size) creates the bin edges corresponding to the cumulative counts.
r.cumcount / r.cumcount[-1] normalizes the cumulative counts to get cumulative probabilities (CDF values).

Using histogram

This is the most visual and basic method. You create a histogram and visually accumulate the bins to estimate the CDF. While not precise, it's helpful for understanding the concept of cumulative distribution.

Python

import numpy as np
import matplotlib.pyplot as plt

d = np.random.randn(500)
cnt, b = np.histogram(d, bins=10)
p = cnt / cnt.sum()
c = np.cumsum(p)

fig, a1 = plt.subplots(figsize=(8, 6))
a1.bar(b[:-1], p, width=np.diff(b), color='red', alpha=0.6)
a1.set_ylabel('PDF', color='red')

a2 = a1.twinx()
a2.plot(b[1:], c, color='blue')
a2.set_ylabel('CDF', color='blue')

plt.title("PDF & CDF via Histogram")
plt.show()

Output

Explanation:

np.random.randn(500) generates 500 random data points from a standard normal distribution.
np.histogram(d, bins=10) bins the data into 10 intervals and counts occurrences.
cnt / cnt.sum() normalizes counts to get the PDF.
np.cumsum(p) calculates the cumulative sum of the PDF to obtain the CDF.a1.bar() plots the PDF and a1.twinx() with a2.plot() plots the CDF on a second y-axis.