Split a column in Pandas dataframe and get part of it
Last Updated :
21 Jan, 2019
Improve
When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.
We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is
Python3 1==
Python3 1==
Python3 1==
str.split
, it can be used with split to get the desired part of the string. To get the nth part of the string, first split the column by delimiter and apply str[n-1] again on the object returned, i.e. Dataframe.columnName.str.split(" ").str[n-1]
.
Let's make it clear by examples.
Code #1: Print a data object of the splitted column.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[0])
Output:
Code #2: Print a list of returned data object.
0 Geek1 1 Geek2 2 Geek3 3 Geek4 4 Geek5 dtype: object
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[0].tolist())
Output:
Code #3: Print a list of elements.
['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[1].tolist())
Output:
['id', 'id', 'id', 'id', 'id']