Pandas Dataframe.sample() | Python
Pandas DataFrame.sample() function is used to select randomly rows or columns from a DataFrame. It proves particularly helpful while dealing with huge datasets where we want to test or analyze a small representative subset. We can define the number or proportion of items to sample and manage randomness through parameters such as n, frac and random_state.
Example : Sampling a Single Random Row
In this example, we load a dataset and generate a single random row using the sample() method by setting n=1.
import pandas as pd
# Load dataset
d = pd.read_csv("employees.csv")
# Sample one random row
r_row = d.sample(n=1)
# Display the result
r_row
Output

The sample(n=1)
function selects one random row from the DataFrame.
Syntax
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Parameters:
- n: int value, Number of random rows to generate.
- frac: Float value, Returns (float value * length of data frame values ) . frac cannot be used with n.
- replace: Boolean value, return sample with replacement if True.
- random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
- axis: 0 or 'row' for Rows and 1 or 'column' for Columns.
Return Type: New object of same type as caller.
To download the CSV file used, Click Here.
Examples of Pandas Dataframe.sample()
Example 1: Sample 25% of the DataFrame
In this example, we generate a random sample consisting of 25% of the entire DataFrame by using the frac
parameter.
import pandas as pd
d = pd.read_csv("employees.csv")
# Sample 25% of the data
sr = d.sample(frac=0.25)
# Verify the number of rows
print(f"Original rows: {len(d)}")
print(f"Sampled rows (25%): {len(sr)}")
# Display the result
sr
Output

As shown in the output image, the length of sample generated is 25% of data frame. Also the sample is generated randomly.
Example 2: Sampling with Replacement and a Fixed Random State
This example demonstrates how to sample multiple rows with replacement (i.e., allowing repetition of rows) and ensures reproducibility using a fixed random seed.
import pandas as pd
d = pd.read_csv("employees.csv")
# Sample 3 rows with replacement and fixed seed
sd = d.sample(n=3, replace=True, random_state=42)
sd
Output

The replace=True parameter allows the same row to be sampled more than once, making it ideal for bootstrapping. random_state=42 ensures the result is reproducible across multiple runs very useful during testing and debugging.