Get First and Second Largest Values in Pandas DataFrame

Last Updated : 05 Sep, 2024

When analyzing data in Python using the pandas library, you may encounter situations where you need to find the highest and second-highest values in a DataFrame's columns. This task can be crucial in various contexts, such as ranking, filtering top performers, or performing threshold-based analysis. This article will guide you through different methods to extract the first and second highest values in Pandas columns.

Introduction

In data analysis, it's often necessary to identify the top values in a dataset for ranking, filtering, or summarizing purposes. Pandas, the popular Python data manipulation library, provides multiple ways to extract these top values from columns, either directly or by sorting. We will explore these methods in detail, starting with the first and second highest values and generalizing to the nth largest values.

Getting the First (Highest/Largest) Value

The simplest task is to get the highest value in a column. This can be done using the max() function or by sorting the columns in descending order and selecting the top value.

Method 1 - Using Max()

We can apply the max function on a Pandas series to get the first highest value.

Python

import pandas as pd

# Sample DataFrame
data = {'Scores': [85, 92, 75, 91, 88]}
df = pd.DataFrame(data)

# Get the highest value
highest_value = df['Scores'].max()
print("Highest Value:", highest_value)

Output:

Highest Value: 92

Method 2 - Finding the First Highest Value in Each Column

Also, we can use df.max() function to find maximum in all the columns.

Python

import pandas as pd

# Sample DataFrame
data = {
    'Scores': [85, 92, 75, 91, 88],
    'Age': ['Arun', 'Vikas', 'Varun', 'Yogesh', 'Satyam'],
    'Profit': [50.44, 69.33, 34.14, 56.13, -20.02]
}
df = pd.DataFrame(data)

# Get the highest value
highest_value = df.max()
print("Highest Value:")
print(highest_value)

Output

Highest Value: 
Scores        92 
Age       Yogesh 
Profit     69.33 
dtype: object

Method 3 - Sorting

Alternatively, we can sort the column in descending order and select the first value:

Python

highest_value = df['Scores'].sort_values(ascending=False).iloc[0]
print("Highest Value:", highest_value)

Output:

Highest Value: 92

Getting the Second (Highest/Largest) Value

To find the second-highest value, we can use the nlargest() function, which returns the top n values in a column. We can then select the second value.

The nlargest(2) function returns the two highest values, and selecting the last one gives us the second highest value.

Python

# Get the second highest value
second_highest_value = df['Scores'].nlargest(2).iloc[-1]
print("Second Highest Value:", second_highest_value)

Output:

Second Highest Value: 91

Getting the nth Largest Value

The concept of finding the second highest value can be extended to find any nth largest value. The nlargest() function is particularly useful for this purpose.

Python

n = 3  # For third highest value

# Get the nth largest value
nth_largest_value = df['Scores'].nlargest(n).iloc[-1]
print(f"{n}rd Largest Value:", nth_largest_value)

Output

3rd Largest Value: 88

Handling Ties

In cases where the highest values are tied (i.e., there are multiple occurrences of the highest value), the method will still correctly return the second unique highest value:

Python

data_with_ties = {
    'A': [5, 9, 9, 1, 7],
    'B': [10, 10, 6, 10, 4],
    'C': [7, 9, 2, 9, 5]
}

df_with_ties = pd.DataFrame(data_with_ties)
second_highest_with_ties = df_with_ties.apply(lambda x: x.nlargest(2).iloc[-1])
print("Second highest values with ties:\n", second_highest_with_ties)

Output

Screenshot-2024-09-05-105603 — Handling Ties

This code will still provide the correct second-highest value for each column, even when ties are present.

Handling Missing Data

If our DataFrame contains missing values (NaN), we might want to handle them appropriately. The nlargest() function automatically ignores NaN values:

Python

data_with_nan = {
    'A': [5, 9, None, 1, 7],
    'B': [3, None, 6, 10, 4],
    'C': [None, 3, 2, 9, 5]
}

df_with_nan = pd.DataFrame(data_with_nan)
second_highest_with_nan = df_with_nan.apply(lambda x: x.nlargest(2).iloc[-1])
print("Second highest values with NaN:\n", second_highest_with_nan)

Output

Screenshot-2024-09-05-105653 — Handling Missing Values

This will correctly calculate the second-highest values, ignoring the NaN entries.

Conclusion

Finding the first, second, or nth largest values in a Pandas DataFrame is a common task that can be handled efficiently using functions like max(), nlargest(), and sort_values(). Whether you're ranking students by their scores or identifying top performers in a dataset, these techniques will enable you to extract the necessary information quickly and effectively. By handling missing data and using the appropriate functions, you can ensure accurate and meaningful results in your data analysis tasks.

Get First and Second Largest Values in Pandas DataFrame

monkserndp4

Improve

Article Tags :

Practice Tags :

python

Get First and Second Largest Values in Pandas DataFrame

Introduction

Getting the First (Highest/Largest) Value

Method 1 - Using Max()

Method 2 - Finding the First Highest Value in Each Column

Method 3 - Sorting

Getting the Second (Highest/Largest) Value

Getting the nth Largest Value

Handling Ties

Handling Missing Data

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?