Find the Difference Between Two Rows in Pandas
Finding the difference between two rows in Pandas typically involves calculating how values in columns change from one row to the next i.e the difference in sales data from past month to current month. This task is especially important in time-series data or datasets where temporal progression, rankings, or consecutive comparisons matter. It is done using pandas.DataFrame.diff() in pandas, helps to calculate the difference between two consecutive dataframe elements.
Method 1: Using the diff() Method
Syntax:
pandas.DataFrame.diff(periods=1, axis=0)
- periods: Represents periods to shift for computing difference, Integer type value. Default value is 1
- axis: Represents difference to be taken over rown or columns. Can take two values {0: rows, 1: columns}. Default value is 0
Example 1: Finding Difference Between Each Previous Row
import pandas as pd
data = {'Date': ['2024-11-25', '2024-11-26', '2024-11-27'],'Sales': [200, 250, 300]}
df = pd.DataFrame(data)
# Calculate difference between rows
df['Sales_Diff'] = df['Sales'].diff()
print(df)
Output:

This shows how the diff()
function calculates the difference between each row and the one preceding it. The ability to calculate differences between rows in a dataset is essential for identifying trends and patterns.
Example 2: Calculating Difference Over Multiple Periods
You can also calculate the difference over multiple periods by adjusting the periods
parameter. Useful when you need to compare data points that are not immediately consecutive, such as comparing sales figures over quarterly or yearly intervals.
import pandas as pd
data = {
'Date': ['2024-11-23', '2024-11-24', '2024-11-25', '2024-11-26', '2024-11-27', '2024-11-28', '2024-11-29'],
'Sales': [180, 190, 200, 250, 300, 350, 400]
}
df = pd.DataFrame(data)
# Calculate the difference with a period of 3
df['sales_diff_3_periods'] = df['Sales'].diff(periods=3)
print(df)
Output
Date Sales sales_diff_3_periods 0 2024-11-23 180 NaN 1 2024-11-24 190 NaN 2 2024-11-25 200 NaN 3 2024-11-26 250 ...
Example 3: Calculating Difference Along Columns in Pandas
The diff()
function can also be used to calculate differences along columns by changing the axis
parameter. This can be significant in several real-world scenarios, especially in datasets with temporal or categorical columns that represent related measurements. Let's discuss practical use cases with quick examples:
Example : In cases where each column represents a specific time period (e.g., days, months, years), calculating differences along columns helps track changes over time within the same entity. For instance: Comparing sales, revenue, or costs across days, quarters, or years to identify trends.
import pandas as pd
data = {'Region': ['North', 'South', 'East', 'West'],'Q1_Sales': [20000, 15000, 18000, 22000],'Q2_Sales': [25000, 18000, 20000, 24000],
'Q3_Sales': [30000, 22000, 25000, 26000],
'Q4_Sales': [35000, 25000, 30000, 30000]
}
df = pd.DataFrame(data)
# Calculate the difference in sales across quarters for each region
sales_diff = df.loc[:, 'Q1_Sales':].diff(axis=1)
df = pd.concat([df, sales_diff.add_prefix('Diff_')], axis=1)
print(df)
Output
Region Q1_Sales Q2_Sales ... Diff_Q2_Sales Diff_Q3_Sales Diff_Q4_Sales 0 North 20000 25000 ... 5000.0 5000.0 5000.0 1 South 15000 18000 ... ...
- Regions like "North" show consistent quarterly growth (+5000 per quarter).
- "West" shows smaller growth between Q1 and Q3 but a larger jump in Q4, suggesting an improvement.
Method 2: Using shift()
Method
The shift()
method shifts data by a specified number of periods, which can then be used to calculate differences manually. Unlike diff()
, shift()
provides more control over how data is aligned before calculating differences. This is useful for custom calculations like percentage changes.
import pandas as pd
data = {'Date': ['2024-11-25', '2024-11-26', '2024-11-27'], 'Sales': [200, 250, 300]}
df = pd.DataFrame(data)
# Use shift() to calculate differences manually
df['Previous_Sales'] = df['Sales'].shift(1) # Shift sales data by 1 period
df['Sales_Diff'] = df['Sales'] - df['Previous_Sales'] # Calculate difference
print(df)
Output
Date Sales Previous_Sales Sales_Diff 0 2024-11-25 200 NaN NaN 1 2024-11-26 250 200.0 50.0 2 2024-11-27 300 250.0 50.0
shift()
gives you control to align data before performing calculations, allowing custom operations like percentage changes or comparisons across non-adjacent rows. Ideal for scenarios where custom conditions are applied during data manipulation.