Pandas DataFrame.dropna() Method
DataFrame.dropna() function remove missing values (NaN or None) from a DataFrame. It can drop entire rows or columns depending on the axis and threshold you specify. This method is commonly used during data cleaning to eliminate incomplete data before analysis.
For Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None]})
print(df.dropna())
Output
A B 0 1.0 4.0
Explanation: By default, dropna() removes rows with any missing values. Row 0 has no missing data, so it's kept. Rows 1 and 2 contain NaN or None, so they're dropped. Only row 0 remains.
Syntax
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Parameters:
Parameter | Description |
---|---|
axis | 0 to drop rows (default), 1 to drop columns |
how | 'any' (default): drop if any value is missing and 'all': drop if all are missing |
thresh | Minimum number of non-NA values required to keep the row/column |
subset | Labels to consider for NA checks (subset of columns) |
inplace | If True, modifies the original DataFrame; if False (default), returns a new one |
Returns: A new DataFrame with the specified rows or columns removed unless inplace=True.
Examples
Example 1: We drop rows only if all values are missing.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [np.nan, np.nan, 3], 'B': [None, np.nan, 4]})
print(df.dropna(how='all'))
Output
A B 2 3.0 4.0
Explanation: Only the first two rows contain all missing values. The third row is kept because it has valid values.
Example 2: We drop columns that contain any missing values by setting axis=1.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, None, 6]})
print(df.dropna(axis=1))
Output
Empty DataFrame Columns: [] Index: [0, 1, 2]
Explanation: Since both columns 'A' and 'B' have at least one missing value (NaN or None), using dropna(axis=1) drops them. This leaves an empty DataFrame with only row indices and no columns.
Example 3: We use thresh to keep rows that have at least 2 non-null values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [None, 5, None]})
print(df.dropna(thresh=2))
Output
Empty DataFrame Columns: [A, B] Index: []
Explanation: thresh=2 keep rows that have at least 2 non-null values. Each row in the DataFrame has only 1 non-null value, so all rows are dropped.
Example 4: In this example, we drop rows that have missing values only in a specific column ('A') using subset.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None]})
print(df.dropna(subset=['A']))
Output
A B 0 1.0 4.0 2 3.0 NaN
Explanation: Only rows where column 'A' is NaN are dropped. Other missing values are ignored.
Example 5: In this example, we use inplace=True to modify the DataFrame directly.
import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [1, np.nan, 3], 'Y': [np.nan, 5, 6]})
df.dropna(inplace=True)
print(df)
Output
X Y 2 3.0 6.0
Explanation: Only the last row has no missing values. inplace=True updates df directly without returning a new object.