Open In App

How to select a subset of a DataFrame?

Last Updated : 12 Jun, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

We often work with subsets of a dataset, whether extracting specific columns, filtering rows based on conditions, or both. In this guide, we’ll explore various ways to select subsets of data using the pandas library in Python. All examples use the nba.csv dataset.

Python
import pandas as pd

df = pd.read_csv("nba.csv")
df.head()

 Output

Selecting columns

To select a subset of a DataFrame, one common approach is isolating specific columns. This helps focus on only the relevant data you need for analysis or visualization.

Example 1: In this example, we are extracting just one column, "Age", from the dataset using square bracket notation.

Python
import pandas as pd
df = pd.read_csv("nba.csv")

a = df["Age"]
print(a.head())

Output

Output
Single Column

Explanation: df["Age"] returns the column named "Age" as a pandas.Series and head() shows the first 5 rows by default.

Example 2: In this example, we are selecting multiple columns, "Name" and "Age", by passing a list of column names.

Python
import pandas as pd
df = pd.read_csv("nba.csv")
df.head()

res = df[["Name", "Age"]]
print(res.head())

Output

Output
Multiple Columns

Explanation: df[["Name", "Age"]] uses a list to select multiple columns and the result is still a DataFrame containing just the specified columns.

Selecting rows

To select a subset of a DataFrame, another key approach is filtering rows based on conditions. This allows us to focus only on records that meet specific criteria, like age or salary thresholds.

Example 1: In this example, we filter rows where the value in the "Age" column is greater than 30.

Python
import pandas as pd
df = pd.read_csv("nba.csv")
df.head()

res = df[df["Age"] > 30]
print(res.head())

Output

Output
Filtered Rows

Explanation: df["Age"] > 30 returns a Boolean Series and df[condition], we retrieve only the rows where the condition is True.

Example 2: In this example, we select players who are older than 30 and have a salary greater than 10 million.

Python
import pandas as pd
df = pd.read_csv("nba.csv")
df.head()

res = df[(df["Age"] > 30) & (df["Salary"] > 10000000)]
print(res.head())

Output

Output
Complex Filter

Explanation: We use & for combining conditions (logical AND). Each condition is enclosed in parentheses and the result is a filtered DataFrame with both criteria met.

Selecting rows and columns together

To select a specific subset of a DataFrame based on both rows and columns, we use .loc[] or .iloc[]. This gives us precise control by allowing us to apply filters and simultaneously extract only relevant columns.

Example 1: In this example, we first select names of players older than 25, then select both name and age of players from the "Lakers" team.

Python
import pandas as pd

df = pd.read_csv("nba.csv")
a = df.loc[df["Age"] > 25, "Name"]
print(a.head())

b = df.loc[df["Team"] == "Lakers", ["Name", "Age"]]
print(b.head())

Output

Output
Label-Based Filter

Explanation:

  • df.loc[condition, column] allows us to select rows that match a condition and specific columns.
  • First query returns a Series of names for players older than 25.
  • Second query returns a DataFrame with name and age for players on the Lakers team.

Example 2: In this example, we select the first 5 rows and first 3 columns using row and column index positions.

Python
import pandas as pd
df = pd.read_csv("nba.csv")

res = df.iloc[:5, :3]
print(res)

Output

Output
Index-Based Slice

Explanation:

  • df.iloc[row_index, column_index] selects by integer index.
  • :5 selects rows from index 0 to 4.
  • :3 selects the first three columns (index 0 to 2).

Selecting with .head() and .tail()

To quickly view a subset of rows from the beginning, end, or randomly from a DataFrame, pandas provides convenient methods like .head(), .tail() and .sample().

Example 1: In this example, we fetch the first 10 rows and fetch the last 5 rows (default).

Python
import pandas as pd
df = pd.read_csv("nba.csv")

print(df.head(10))
print(df.tail())

Output

Output
Top Rows
Output
Bottom Rows

Explanation:

  • df.head(10) returns the first 10 rows from the top of the DataFrame.
  • df.tail() without arguments returns the last 5 rows by default.

Example 2: In this example, we select 5 random rows.

Python
import pandas as pd
df = pd.read_csv("nba.csv")

res= df.sample(5)
print(res)

Output

Output
Random Sample

Explanation: df.sample(5) randomly selects 5 rows from the DataFrame. The number inside sample() determines how many random rows you want.


Next Article
Practice Tags :

Similar Reads