Append Pandas DataFrames Using for Loop

Last Updated : 20 Dec, 2024

When dealing with large datasets, we often need to combine dataframes into single dataframe. Usually concat() is used along with the for loop to append the dataframes. Let us consider an example:

Python

import pandas as pd
import numpy as np

# Create some example DataFrames
dataframes = [pd.DataFrame(np.random.rand(10, 5)) for _ in range(100)]

# Efficient way: collect in a list and concatenate once
combined_df = pd.concat(dataframes, ignore_index=True)

# Display the result
print(combined_df)

Output:

Screenshot-2024-12-14-194935 — Append Pandas DataFrames Using for Loop

Here we are generating 100 dataframes. Each dataframe comprises of 10 rows and 5 columns. Now using a for loop, we are iterating over the list of dataframes and finally using the concat method to append the dataframes. This is much more memory efficient.

Let us consider an another example: here we have 10 dataframes which are appended to the list with the help of list comprehension. Then using concat() we are concatenating all the dataframes.

Python

import pandas as pd

# Example DataFrames (Creating 10 DataFrames with simple values)
dfs = [pd.DataFrame({'A': [i, i+1], 'B': [i+2, i+3]}) for i in range(0, 10)]

# Concatenate all DataFrames in the list
result = pd.concat(df_list, ignore_index=False)

print(result)

Output:

Screenshot-2024-12-14-205227 — Append Pandas DataFrames Using for Loop

From the output we can see that the dataframes have been stacked one over the other. This technique is used for large datasets as it does not create dataframes in each iteration. Hence it is much more memory efficient.

Appending dataframes but with different columns

There can be scenarios when we need to append dataframes but each of them having different column names. So we need to preprocess the columns and append the dataframes using for loop and concat method.

Let us consider a scenario. Here we have three dataframes and each of them have different column names. Now we will first collect all the column names and use reindex in the for loop to ensure each dataframes has all the columns and append them to the list. Finally use concat to concatenate all the dataframes.

Python

import pandas as pd

# Creating 10 DataFrames with different columns
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

# List of DataFrames
dfs = [df1, df2, df3]

# List to store DataFrames for concatenation
df_list = []

# Get all columns across the DataFrames
all_columns = list(set(df1.columns).union(set(df2.columns), set(df3.columns)))

# For loop to append DataFrames, reindexing them to the same column set
for df in dfs:
    df = df.reindex(columns=all_columns)  # Reindex with all columns
    df_list.append(df)

# Concatenate all DataFrames
result = pd.concat(df_list, ignore_index=True)

print(result)

Output:

Screenshot-2024-12-14-210958 — Append Pandas DataFrames Using for Loop

From the output we can see that for those dataframes that do not have the particular column, it generates NaN value.

Append Pandas DataFrames Using for Loop - Examples

Example 1: Let us consider that we have list of dataframes. We will iterate over the list and for each iteration we will use concat method to concatenate the dataframes one by one.

Python

import pandas as pd

# Create sample DataFrames with different columns
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]})

# List of DataFrames to concatenate
dfs = [df1, df2]

# Initialize an empty DataFrame to concatenate into
result = pd.DataFrame()

# For loop to concatenate DataFrames
for df in dfs:
    result = pd.concat([result, df], ignore_index=True, sort=False)

print(result)

Output:

Screenshot-2024-12-19-213925 — Append Pandas DataFrames Using for Loop

From the output we can see that all the columns are present in the final dataframe. The values which does not exist in a particular column are assigned NaN. This method is useful for small datasets since concat() creates a new dataframe in every iteration and consumes much more memory . So we can also use reindex() to preprocess the dataframes and concat at one go as well.

Example 2: Here we have three dataframes. So we will iterate and append the dataframes to the list. Lastly we will use concat() to combine all the dataframes that are present in the list.

Python

import pandas as pd

# Create sample DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})

# Append DataFrames to a list
df_list = []
for i in range(1,4):
  df_list.append(eval(f'df{i}'))

# Concatenate all DataFrames in the list
result = pd.concat(df_list, ignore_index=True)

print(result)