I am learning to process what i consider a complex json structure and am trying to load this into a dataframe. I want a single record for each outcome id. Here is the sample json structure
{
"_id": 12345,
"reports": [
{
"body": "\n***\nGeneral report text.",
"outcome": {
"comments": [],
"id": "1",
"status": {
"failed": {
"both": 0,
"human": 0,
"auto": 0,
"total": 0
},
"open": {
"both": 0,
"human": 0,
"auto": 0,
"total": 0
},
"passed": {
"both": 0,
"human": 0,
"auto": 1,
"total": 1
},
"code": {
"_input": 0,
"_output": 0,
"canceled": 0
},
"total": 1
}
},
"type": "outcome"
},
{
"body": "\n***\nGeneral report text.",
"outcome": {
"comments": [],
"id": "2",
"status": {
"failed": {
"both": 0,
"human": 0,
"auto": 0,
"total": 0
},
"open": {
"both": 0,
"human": 0,
"auto": 0,
"total": 0
},
"passed": {
"both": 0,
"human": 0,
"auto": 1,
"total": 1
},
"code": {
"_input": 0,
"_output": 0,
"canceled": 0
},
"total": 1
}
},
"type": "outcome"
}
]
}
The desired format in the dataframe is
report_id | outcome.id | body | outcome.comments | status.failed.both | status.failed.human | status.failed.auto so on and so forth
I would like all of the statuses in a single record (by outcome.id) and not sure how to normalize it all into a single record.
I've tried this but it does not give me the desired dataframe structure
df_reports = pd.json_normalize(data,record_path=['reports', 'outcome'],
meta=[
['reports','body'],
['outcome','comment'],
['outcome','id'],
['outcome','status']
]
idvalues, yet the JSON you show only has a single reportidvalue. Is your parsed JSON actually a list of dictionaries?df_reportsafterjson_normalize. You could also show dataframe with expected result. Maybe it will need to reformat it after usingjson_normalize"\n..."in your string rendering it invalid JSON. I have to assume that what you are showing is not JSON but rather JSON that has already been parsed into a dictionary. Then it is no longer a "json structure".