Different ways to iterate over rows in Pandas Dataframe
Iterating over rows in a Pandas DataFrame allows to access row-wise data for operations like filtering or transformation. The most common methods include iterrows()
, itertuples()
, and apply()
. However, iteration can be slow for large datasets, so vectorized operations are often preferred.
Let's understand with a quick Example: Using iterrows() as
iterrows()
is a great choice when dealing with smaller datasets or when you need to perform row-specific operations that can't be easily vectorized. However, for larger datasets or performance-critical tasks, you may want to explore more efficient methods like itertuples()
or apply()
.
Method 1: Using iterrows - For smaller datasets
For example, you have a DataFrame representing sales data, and you want to calculate the total sales by multiplying the quantity by the price for each raw, you need to iterate over the rows. Here's a quick look at how it works:
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Iterating over rows
for index, row in df.iterrows():
total_sales = row['Quantity'] * row['Price']
print(f"{row['Item']} Total Sales: ${total_sales}")
Output
Apple Total Sales: $5.0 Banana Total Sales: $6.0 Orange Total Sales: $21.0
In this example, the iterrows()
method is used to iterate over each row of the DataFrame, and we calculate the total sales for each item. Now, let's explore how you can loop through rows, why different methods exist, and when to use each
Method 2: Using itertuples() - For larger datasets
itertuples()
is another efficient method for iterating over rows. Unlike iterrows()
, it returns each row as a named tuple instead of a Series. This can be faster, especially for larger datasets, because named tuples are more lightweight. This method also preserves data types and ideal for scenarios where you need to access row data quickly and in a structured format.
Example:
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Iterating using itertuples()
for row in df.itertuples():
total_sales = row.Quantity * row.Price
print(f"{row.Item} Total Sales: ${total_sales}")
Output
Apple Total Sales: $5.0 Banana Total Sales: $6.0 Orange Total Sales: $21.0
Method 3. Using apply() - F
or complex row-wise transformations
The apply()
method applies a function to each row or column of a DataFrame. It leverages vectorized operations, making it significantly faster than looping methods like iterrows()
or itertuples()
.
apply()
is highly versatile and optimized for performance particularly useful for complex row-wise or column-wise transformations without explicitly iterating over rows, thus saving time in large datasets.
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Defining a function to calculate total sales for each row
def calculate_total_sales(row):
return f"{row['Item']} Total Sales: ${row['Quantity'] * row['Price']}"
# Applying the function row-wise
result = df.apply(calculate_total_sales, axis=1)
for res in result:
print(res)
Output
Apple Total Sales: $5.0 Banana Total Sales: $6.0 Orange Total Sales: $21.0
Method 4: Index-based Iteration (iloc[]
or loc[]
) - For specific rows
Index-based iteration uses methods like iloc[]
(integer-based) or loc[]
(label-based) to access specific rows or slices of a DataFrame. This approach is straightforward but less efficient for large datasets. Useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Iterate over rows using .iloc[] (index-based)
print("Using iloc[] for index-based iteration:")
for i in range(len(df)):
print(f"Row {i}: {df.iloc[i]}")
# Iterate over rows using .loc[] (label-based)
print("\nUsing loc[] for label-based iteration:")
for i in df.index: # Here, index defaults to 0, 1, 2
print(f"Row {i}: {df.loc[i]}")
Output
Using iloc[] for index-based iteration: Row 0: Item Apple Quantity 10 Price 0.5 Name: 0, dtype: object Row 1: Item Banana Quantity 20 Price 0.3 Name: 1, dty...
Index-based iteration is useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.
Which Method : When to Use
For small datasets or debugging, all methods work well, but for larger datasets:
- Use
itertuples()
for better performance when iterating. - Use
apply()
for efficient and Pythonic transformations without explicit iteration.
Method | Notes |
---|---|
iterrows() | Converts rows to Series, which can be inefficient for large datasets. |
itertuples() | Preserves data types and is more efficient than iterrows() . |
apply() | Optimized for performance; best for transformations or calculations. |
Index-based | Useful for precise control but less efficient for large datasets |