Pandas - Parsing JSON Dataset
JSON (JavaScript Object Notation) is a popular way to store and exchange data especially used in web APIs and configuration files. Pandas provides tools to parse JSON data and convert it into structured DataFrames for analysis. In this guide we will explore various ways to read, manipulate and normalize JSON datasets in Pandas.
Before working with JSON data we need to import pandas. If you're fetching JSON from a web URL or API you'll also need requests.
import pandas as pd
import requests
Reading JSON Files
To read a JSON file or URL in pandas we use the read_json function. In the below code path_or_buf is the file or web URL to the JSON file.
pd.read_json(path_or_buf)
Create a DataFrame and Convert It to JSON
if you don't have JSON file then create a small DataFrame and see how to convert it to JSON using different orientations.
- orient='split': separates columns, index and data clearly.
- orient='index': shows each row as a key-value pair with its index.
df = pd.DataFrame([['a', 'b'], ['c', 'd']],
index=['row 1', 'row 2'],
columns=['col 1', 'col 2'])
print(df.to_json(orient='split'))
print(df.to_json(orient='index'))
Output:

Read the JSON File directly from Web Data
You can fetch JSON data from online sources using the requests library and then convert it to a DataFrame. In the below example it reads and prints JSON data from the specified API endpoint using the pandas library in Python.
- requests.get(url) fetches data from the URL.
- response.json() converts response to a Python dictionary/list.
- json_normalize() converts nested JSON into a flat table.
import pandas as pd
import requests
url = 'https://jsonplaceholder.typicode.com/posts'
response = requests.get(url)
data = pd.json_normalize(response.json())
data.head()
Output:

Handling Nested JSON in Pandas
Sometimes JSON data has layers like lists or dictionaries inside other dictionaries then it is called as Nested JSON. To turn deeply nested JSON into a table use json_normalize() from pandas making it easier to analyze or manipulate in a table format.
- json.load(f): Loads the raw JSON into a Python dictionary.
- json_normalize(d['programs']): Extracts the list under the programs key and flattens it into columns.
import json
import pandas as pd
from pandas import json_normalize
with open('/content/raw_nyc_phil.json') as f:
d = json.load(f)
nycphil = json_normalize(d['programs'])
nycphil.head(3)
Output:

As you can see in above output it gives a readable table with columns like id , orchestra , season etc. Working with JSON can seem confusing at first especially when it's deeply nested. But with pandas and a little practice using json_normalize() you can turn messy JSON into clean and tabular data.