How to Count Occurrences of Specific Value in Pandas Column?
Let's learn how to count occurrences of a specific value in columns within a Pandas DataFrame using .value_counts() method and conditional filtering.
Count Occurrences of Specific Values using value_counts()
To count occurrences of values in a Pandas DataFrame, use the value_counts() method. This function helps analyze the frequency of values within a specific column or the entire Pandas DataFrame.
Let's start by setting up a sample DataFrame to demonstrate these methods. The DataFrame contains four columns: 'name', 'subjects', 'marks' and 'age'
# Example: Count Values in a DataFrame Column
import pandas as pd
# Create a DataFrame with 5 rows and 4 columns
data = pd.DataFrame({
'name': ['sravan', 'ojsawi', 'bobby', 'rohith',
'gnanesh', 'sravan', 'sravan', 'ojaswi'],
'subjects': ['java', 'php', 'java', 'php', 'java',
'html/css', 'python', 'R'],
'marks': [98, 90, 78, 91, 87, 78, 89, 90],
'age': [11, 23, 23, 21, 21, 21, 23, 21]
})
count_sravan = data['name'].value_counts().get('sravan', 0)
print("Occurrences of 'sravan':", count_sravan)
count_ojaswi = data['name'].value_counts().get('ojaswi', 0)
print("Occurrences of 'ojaswi':", count_ojaswi)
Output:
Occurrences of 'sravan': 3
Occurrences of 'ojaswi': 1
Each value_counts() method call specifies the column and value of interest to return the count of occurrences.
Syntax: data['column_name'].value_counts()[value]
where
- data: the input DataFrame.
- column_name: the target column in the DataFrame.
- value: the specific string or integer value to be counted within the column.
We can also count the occurrences of a specific value in a pandas column using following methods:
Table of Content
Using Conditional Filtering with sum()
This method compares values in the column with the specified value and then sums up True values, representing the count.
count_sravan = (data['name'] == 'sravan').sum()
print("Occurrences of 'sravan':", count_sravan)
count_gnanesh = (data['name'] == 'gnanesh').sum()
print("Occurrences of 'gnanesh':", count_gnanesh)
Output:
Occurrences of 'sravan': 3
Occurrences of 'gnanesh': 1
Using count() after Conditional Filtering
This approach filters the DataFrame for rows matching the condition, then counts the resulting rows.
count_sravan = data[data['name'] == 'sravan'].count()['name']
print("Occurrences of 'sravan':", count_sravan)
count_bobby = data[data['name'] == 'bobby'].count()['name']
print("Occurrences of 'bobby':", count_bobby)
Output:
Occurrences of 'sravan': 3
Occurrences of 'bobby': 1
Using len() after Conditional Filtering
Similar to the previous method, this one uses len() to get the length of the filtered DataFrame directly.
count_sravan = len(data[data['name'] == 'sravan'])
print("Occurrences of 'sravan':", count_sravan)
count_rohith = len(data[data['name'] == 'rohith'])
print("Occurrences of 'rohith':", count_rohith)
Output:
Occurrences of 'sravan': 3
Occurrences of 'rohith': 1
Using apply() with a Lambda Function
You can use the apply() function to create a custom function that checks each value. This method is useful if you want additional customization.
count_sravan = data['name'].apply(lambda x: x == 'sravan').sum()
print("Occurrences of 'sravan':", count_sravan)
count_ojaswi = data['name'].apply(lambda x: x == 'ojaswi').sum()
print("Occurrences of 'ojaswi':", count_ojaswi)
Output:
Occurrences of 'sravan': 3
Occurrences of 'ojaswi': 1
Using np.sum() for Conditional Counting
If you have a large DataFrame, using NumPy’s np.sum() can offer a performance boost by operating directly on the boolean mask.
import numpy as np
count_sravan = np.sum(data['name'] == 'sravan')
print("Occurrences of 'sravan':", count_sravan)
count_java = np.sum(data['subjects'] == 'java')
print("Occurrences of 'java':", count_java)
Output:
Occurrences of 'sravan': 3
Occurrences of 'java': 3
Using Grouping and Aggregation
This approach is useful if you need counts for multiple values simultaneously. Group by the column and use .size() to count occurrences.
counts = data.groupby('name').size()
count_sravan = counts.get('sravan', 0)
print("Occurrences of 'sravan':", count_sravan)
subject_counts = data.groupby('subjects').size()
count_php = subject_counts.get('php', 0)
print("Occurrences of 'php':", count_php)
Output:
Occurrences of 'sravan': 3
Occurrences of 'php': 2
When to use each Pandas Method
The table combined overview of each method for counting occurrences in a Pandas column, along with when to use them and their code syntax:
Method | When to Use | Code Example |
---|---|---|
value_counts() | When you need to count all unique values in the column or just a specific value. | df['column_name'].value_counts().get('value', 0) |
Conditional Filtering with sum() | When you're focused on one value or need clear and simple conditional checks. | (df['column_name'] == 'value').sum() |
count() after Conditional Filtering | When you need to count rows matching a condition and ensure they are non-null. | df[df['column_name'] == 'value'].count()['column_name'] |
len() after Conditional Filtering | When you need a simple count of rows matching a condition without concern for null values. | len(df[df['column_name'] == 'value']) |
apply() with Lambda Function | When you have complex row-wise logic or custom conditions. | df['column_name'].apply(lambda x: x == 'value').sum() |
np.sum() for Conditional Counting | When performance is a priority, especially with large datasets. | np.sum(df['column_name'] == 'value') |
Grouping and Aggregation | When you need counts for multiple categories or groups in the same column. | df.groupby('column_name').size().get('value', 0) |