Pyspark - Converting JSON to DataFrame
Last Updated :
29 Jun, 2021
Improve
In this article, we are going to convert JSON String to DataFrame in Pyspark.
Method 1: Using read_json()
We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas.
Syntax: pandas.read_json("file_name.json")
Here we are going to use this JSON file for demonstration:

Code:
# import pandas to read json file
import pandas as pd
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# creating a dataframe from the json file named student
dataframe = spark.createDataFrame(pd.read_json('student.json'))
# display the dataframe (Pyspark dataframe)
dataframe.show()
Output:
Method 2: Using spark.read.json()
This is used to read a json data from a file and display the data in the form of a dataframe
Syntax: spark.read.json('file_name.json')
JSON file for demonstration:

Code:
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# read json file
data = spark.read.json('college.json')
# display json data
data.show()
Output: