Interquartile Range to Detect Outliers in Data

Last Updated : 12 Feb, 2025

Outliers are observations that deviate significantly from the overall pattern of a dataset and this deviation can lead to poor results in analysis. Interquartile Range (IQR) is a technique that detects outliers by measuring the variability in a dataset. In this article we will learn about it.

Detecting Outlier with IQR

IQR is used to measure variability by dividing a data set into quartiles. The data is sorted in ascending order and then we split it into 4 equal parts. The values Q1 (25th percentile), Q2 (50th percentile or median) and Q3 (75th percentile) separate dataset in 4 equal parts.

If a dataset has 2n or 2n+1 data points, then

Q2 = median of the dataset.
Q1 = median of n smallest data points.
Q3 = median of n highest data points.

The IQR is calculated as: IQR=Q3−Q1 = Q3 - Q1

Data points that fall below Q1−1.5×IQR or above Q3+1.5×IQR are considered outliers.

Example:

Assume the below data:

6, 2, 1, 5, 4, 3, 50.

If these values represent the number of chapatis eaten in lunch then 50 is clearly an outlier. Let’s use Python to detect it.

Step 1: Import necessary libraries.

Python

import numpy as np 
import seaborn as sns

Step 2: Sorting data in ascending order.

Python

data = [6, 2, 3, 4, 5, 1, 50]
sort_data = np.sort(data)
sort_data

Output:

array([ 1, 2, 3, 4, 5, 6, 50])

Step 3: Calculating Q1, Q2, Q3 and IQR.

Python

Q1 = np.percentile(data, 25, interpolation = 'midpoint') 
Q2 = np.percentile(data, 50, interpolation = 'midpoint') 
Q3 = np.percentile(data, 75, interpolation = 'midpoint') 

print('Q1 25 percentile of the given data is, ', Q1)
print('Q1 50 percentile of the given data is, ', Q2)
print('Q1 75 percentile of the given data is, ', Q3)

IQR = Q3 - Q1 
print('Interquartile range is', IQR)

Output:

Q1 25 percentile of the given data is, 2.5
Q1 50 percentile of the given data is, 4.0
Q1 75 percentile of the given data is, 5.5
Interquartile range is 3.0

Step 4: Find the lower and upper limits.

Python

low_lim = Q1 - 1.5 * IQR
up_lim = Q3 + 1.5 * IQR
print('low_limit is', low_lim)
print('up_limit is', up_lim)

Output:

low_limit is -2.0
up_limit is 10.0

Step 5: Identify the outliers.

Python

outlier =[]
for x in data:
    if ((x> up_lim) or (x<low_lim)):
         outlier.append(x)
print(' outlier in the dataset is', outlier)

Output:

outlier in the dataset is [50]

Step 6: Plot the box plot to highlight outliers.

Python

sns.boxplot(data)

This method along with visualizing data through box plots ensures more reliable and robust data preprocessing.

Interquartile Range to Detect Outliers in Data

ektamaini

Improve

Article Tags :

Practice Tags :

Interquartile Range to Detect Outliers in Data

Detecting Outlier with IQR

Similar Reads

Thank You!

What kind of Experience do you want to share?