Combining Multiple Columns in Pandas groupby with Dictionary
Combining multiple columns in Pandas groupby operation with a dictionary helps to aggregate and summarize the data in a custom manner. It is useful when you want to apply different aggregation functions to different columns of the same dataset.
Let's take an example of a sales dataset, where we need to group the data by Store column and then apply different aggregation functions to Sales and Quantity:
- df.groupby('Column name'): Groups the data by the column.
- .agg(agg_dict): Applies the aggregation functions specified in the dictionary. For Sales, it sums the values, and for Quantity, it calculates the mean.
- .reset_index(): Resets the index to get a clean DataFrame after aggregation.
import pandas as pd
# Sample DataFrame
data = {
'Store': ['A', 'A', 'B', 'B', 'A', 'B'],
'Product': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple', 'Banana'],
'Sales': [100, 150, 200, 100, 120, 180],
'Quantity': [10, 20, 30, 40, 15, 35]
}
df = pd.DataFrame(data)
display(df)
# Grouping by 'Store' and applying different aggregation functions
agg_dict = {
'Sales': 'sum', # Sum the 'Sales' column
'Quantity': 'mean' # Find the mean of the 'Quantity' column
}
result = df.groupby('Store').agg(agg_dict).reset_index()
print("\n",result)
Output:
Multiple Aggregations using GroupBy in Pandas DataFrame
You can also apply more complex or multiple aggregation functions to the same column. For example:
agg_dict = {
'Sales': ['sum', 'mean'], # Sum and mean for 'Sales'
'Quantity': ['max', 'min'] # Max and min for 'Quantity'
}
result = df.groupby('Store').agg(agg_dict).reset_index()
print(result)
Output:
Store Sales Quantity
sum mean max min
0 A 370 123.333333 20 10
1 B 480 160.000000 40 30
The aggregation dictionary specifies multiple functions for both Sales and Quantity. The result includes the sum and mean for Sales, and the maximum and minimum for Quantity.
Custom Aggregation Functions using GroupBy Function
If you want to use a custom aggregation function, you can pass a function name or lambda function inside the dictionary:
agg_dict = {
'Sales': lambda x: x.max() - x.min(), # Custom function: range of 'Sales'
'Quantity': 'sum' # Sum for 'Quantity'
}
result = df.groupby('Store').agg(agg_dict).reset_index()
print(result)
Output:
Store Sales Quantity
0 A 50 45
1 B 100 105
The custom function lambda x: x.max() - x.min() computes the range of Sales, while Quantity is summed.
Using a dictionary with groupby in Pandas makes it easy to perform multiple aggregations on different columns in one go. It enhances code readability, reduces complexity, and provides a flexible way to manipulate your data.