Joining two Pandas DataFrames using merge()
The merge()
function is designed to merge two DataFrames based on one or more columns with matching values. The basic idea is to identify columns that contain common data between the DataFrames and use them to align rows.
Let's understand the process of joining two pandas DataFrames using merge()
, explaining the key concepts, parameters, and practical examples to make the process clear and accessible.

If the column names are the same in both tables, you just need to use on
to specify that column name. For example:
import pandas as pd
# DataFrames to merge
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [24, 27, 22]})
# Merge DataFrames on the 'ID' column using an inner join
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)
Merged df:
ID Name Age
0 1 Alice 24
1 2 Bob 27
This example performs an inner join, resulting in a DataFrame that includes only the rows with matching ID
values.
How merge()
Function Works in Pandas?
The core idea behind merge()
is simple: it allows to specify how the rows from two DataFrames should be aligned based on one or more keys (columns or indexes). The result is a new DataFrame that contains data from both original DataFrames. Basic Syntax of merge():
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None)
Where:
left
: The first DataFrame.right
: The second DataFrame.how
: Specifies the type of join (default is 'inner').on
: Column(s) to join on. If not specified, Pandas will attempt to merge on columns with the same name in both DataFrames.left_on
andright_on
: Specify different columns from each DataFrame to join on if they don’t share the same column names.
The Join method is to determine which rows to keep based on matches between the two DataFrames. There are four types of joins, we will discuss each one in the following examples.
Combining Two Pandas DataFrames with merge()
: Examples
1. Inner Join: Keeping Only Matching Rows
An inner join keeps rows from both DataFrames where there is a match in the specified column(s).
import pandas as pd
df1 = pd.DataFrame({"fruit" : ["apple", "banana", "avocado"],
"market_price" : [21, 14, 35]})
display("The first DataFrame")
display(df1)
df2 = pd.DataFrame({"fruit" : ["banana", "apple", "avocado"],
"wholesaler_price" : [65, 68, 75]})
display("The second DataFrame")
display(df2)
# joining the DataFrames
display("The merged DataFrame")
pd.merge(df1, df2, on = "fruit", how = "inner")
Output :

2. Outer Join: Including All Rows from Both DataFrames
An outer join includes all rows from both DataFrames so If we use how = "Outer"
, it returns all elements in df1 and df2 but if element column are null then its return NaN value.
pd.merge(df1, df2, on = "fruit", how = "outer")
Output :

3. Left Join: Keeping All Rows from the Left DataFrame
A left join keeps all rows from the left DataFrame, adding only matching rows from the right.
pd.merge(df1, df2, on = "fruit", how = "left")
Output :

4. Right Join: Keeping All Rows from the Right DataFrame
A right join keeps all rows from the right DataFrame, adding only matching rows from the left.
pd.merge(df1, df2, on = "fruit", how = "right")
Output :

Key Takeaways
Here are the main points to remember when joining two DataFrames using merge()
:
- Common Columns: Ensure that the columns you are joining on are correctly identified and named.
- Join Types: Choose the appropriate join type (
inner
,left
,right
,outer
) based on your data and analysis needs. - Handling Duplicates: Use
suffixes
to manage duplicate column names that arise from the merge. - Index vs Columns: Decide whether to join on columns or indexes using
on
,left_on
,right_on
,left_index
, andright_index
parameters.