Get First and Second Largest Values in Pandas DataFrame
When analyzing data in Python using the pandas library, you may encounter situations where you need to find the highest and second-highest values in a DataFrame's columns. This task can be crucial in various contexts, such as ranking, filtering top performers, or performing threshold-based analysis. This article will guide you through different methods to extract the first and second highest values in Pandas columns.
Introduction
In data analysis, it's often necessary to identify the top values in a dataset for ranking, filtering, or summarizing purposes. Pandas, the popular Python data manipulation library, provides multiple ways to extract these top values from columns, either directly or by sorting. We will explore these methods in detail, starting with the first and second highest values and generalizing to the nth largest values.
Getting the First (Highest/Largest) Value
The simplest task is to get the highest value in a column. This can be done using the max()
function or by sorting the columns in descending order and selecting the top value.
Method 1 - Using Max()
We can apply the max function on a Pandas series to get the first highest value.
import pandas as pd
# Sample DataFrame
data = {'Scores': [85, 92, 75, 91, 88]}
df = pd.DataFrame(data)
# Get the highest value
highest_value = df['Scores'].max()
print("Highest Value:", highest_value)
Output:
Highest Value: 92
Method 2 - Finding the First Highest Value in Each Column
Also, we can use df.max() function to find maximum in all the columns.
import pandas as pd
# Sample DataFrame
data = {
'Scores': [85, 92, 75, 91, 88],
'Age': ['Arun', 'Vikas', 'Varun', 'Yogesh', 'Satyam'],
'Profit': [50.44, 69.33, 34.14, 56.13, -20.02]
}
df = pd.DataFrame(data)
# Get the highest value
highest_value = df.max()
print("Highest Value:")
print(highest_value)
Output
Highest Value:
Scores 92
Age Yogesh
Profit 69.33
dtype: object
Method 3 - Sorting
Alternatively, we can sort the column in descending order and select the first value:
highest_value = df['Scores'].sort_values(ascending=False).iloc[0]
print("Highest Value:", highest_value)
Output:
Highest Value: 92
Getting the Second (Highest/Largest) Value
To find the second-highest value, we can use the nlargest() function, which returns the top n values in a column. We can then select the second value.
The nlargest(2)
function returns the two highest values, and selecting the last one gives us the second highest value.
# Get the second highest value
second_highest_value = df['Scores'].nlargest(2).iloc[-1]
print("Second Highest Value:", second_highest_value)
Output:
Second Highest Value: 91
Getting the nth Largest Value
The concept of finding the second highest value can be extended to find any nth largest value. The nlargest
()
function is particularly useful for this purpose.
n = 3 # For third highest value
# Get the nth largest value
nth_largest_value = df['Scores'].nlargest(n).iloc[-1]
print(f"{n}rd Largest Value:", nth_largest_value)
Output
3rd Largest Value: 88
Handling Ties
In cases where the highest values are tied (i.e., there are multiple occurrences of the highest value), the method will still correctly return the second unique highest value:
data_with_ties = {
'A': [5, 9, 9, 1, 7],
'B': [10, 10, 6, 10, 4],
'C': [7, 9, 2, 9, 5]
}
df_with_ties = pd.DataFrame(data_with_ties)
second_highest_with_ties = df_with_ties.apply(lambda x: x.nlargest(2).iloc[-1])
print("Second highest values with ties:\n", second_highest_with_ties)
Output

This code will still provide the correct second-highest value for each column, even when ties are present.
Handling Missing Data
If our DataFrame contains missing values (NaN), we might want to handle them appropriately. The nlargest() function automatically ignores NaN values:
data_with_nan = {
'A': [5, 9, None, 1, 7],
'B': [3, None, 6, 10, 4],
'C': [None, 3, 2, 9, 5]
}
df_with_nan = pd.DataFrame(data_with_nan)
second_highest_with_nan = df_with_nan.apply(lambda x: x.nlargest(2).iloc[-1])
print("Second highest values with NaN:\n", second_highest_with_nan)
Output

This will correctly calculate the second-highest values, ignoring the NaN entries.
Conclusion
Finding the first, second, or nth largest values in a Pandas DataFrame is a common task that can be handled efficiently using functions like max
()
, nlargest
()
, and sort_values
()
. Whether you're ranking students by their scores or identifying top performers in a dataset, these techniques will enable you to extract the necessary information quickly and effectively. By handling missing data and using the appropriate functions, you can ensure accurate and meaningful results in your data analysis tasks.