Binary operations on Pandas DataFrame and Series
Binary operations involve applying mathematical or logical operations on two objects, typically DataFrames or Series, to produce a new result. Let's learn how binary operations work in Pandas, focusing on their usage with DataFrames and Series.
The most common binary operations include:
- Arithmetic operations: Addition, subtraction, multiplication, division, etc.
- Comparison operations: Equal to, not equal to, greater than, less than, etc.
- Logical operations: And, or, etc.
Pandas makes it easy to perform these operations element-wise (i.e., on a per-row or per-column basis), which is particularly useful when working with large datasets.
Binary Operations on Pandas Series
1. Arithmetic Operations on Series
Arithmetic operations between two Series is applied element-wise. The index labels must align for the operation to work. If the indexes don’t match, Pandas will fill in missing values with NaN.
Example: Adding Two Series
import pandas as pd
s1 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s2 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
# Adding the two Series
result = s1 + s2
print(result)
Output
a 11 b 22 c 33 dtype: int64
2. Comparison Operations on Series
Comparison operations return a Series of boolean values, indicating whether the comparison is True
or False
for each corresponding element.
Example: Checking Equality
import pandas as pd
s1 = pd.Series([10, 20, 30])
s2 = pd.Series([10, 25, 30])
# Comparing the two Series
result = s1 == s2
print(result)
Output
0 True 1 False 2 True dtype: bool
Binary Operations on Pandas DataFrame
1. Arithmetic Operations on DataFrames
Similar to Series, DataFrame arithmetic operations apply element-wise between two DataFrames.
Note: The DataFrames must have the same shape or matching indexes and columns.
Example: Subtracting DataFrames
import pandas as pd
df1 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Subtracting the DataFrames
result = df1 - df2
print(result)
Output
A B 0 9 36 1 18 45 2 27 54
2. Comparison Operations on DataFrames
Like Series, comparison operations on DataFrames return a DataFrame of boolean values. These boolean values indicate whether the corresponding elements are equal or satisfy other comparison conditions.
Example: Checking Greater Than
import pandas as pd
df1 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
df2 = pd.DataFrame({'A': [5, 15, 35], 'B': [30, 60, 55]})
# Checking if elements of df1 are greater than df2
result = df1 > df2
print(result)
Output
A B 0 True True 1 True False 2 False True
2. Comparison Operations on DataFrames
Like Series, comparison operations on DataFrames return a DataFrame of boolean values. These boolean values indicate whether the corresponding elements are equal or satisfy other comparison conditions.
Example: Checking Greater Than
import pandas as pd
df1 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
df2 = pd.DataFrame({'A': [5, 15, 35], 'B': [30, 60, 55]})
# Checking if elements of df1 are greater than df2
result = df1 > df2
print(result)
Output
A B 0 True True 1 True False 2 False True
Logical Operations on DataFrame and Series
Pandas also supports logical operations (AND, OR, etc.) on DataFrames and Series. These are commonly used for filtering and applying conditions.
Example: Logical AND on Series
import pandas as pd
s1 = pd.Series([True, False, True])
s2 = pd.Series([False, False, True])
# Applying logical AND
result = s1 & s2
print(result)
Output
0 False 1 False 2 True dtype: bool
Handling Missing Data in Binary Operations
When performing binary operations on DataFrames or Series, missing data (NaN) can affect the results. Pandas handles missing data based on the operation:
- Arithmetic operations involving NaN will generally return NaN (e.g.,
NaN + 1 = NaN
). - Logical operations involving NaN might return
False
orTrue
, depending on the operation.
Example: Arithmetic with NaN
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})
df2 = pd.DataFrame({'A': [1, None, 3], 'B': [None, 5, 6]})
# Adding the DataFrames
result = df1 + df2
print(result)
Output
A B 0 2.0 NaN 1 NaN NaN 2 NaN 12.0
As seen above, where there is missing data (None or NaN), the result becomes NaN.
By leveraging these operations, you can perform complex calculations, comparisons, and transformations on your data, making Pandas a powerful tool for data analysis.