我是python/pandas的新手,我在將嵌套JSON轉換爲pandas數據框時遇到了一些問題。我正在向數據庫發送一個查詢並返回一個JSON字符串。對熊貓數據框的深層嵌套JSON響應
這是一個深度嵌套的JSON字符串,它包含幾個數組。來自數據庫的響應包含數千行。以下是JSON字符串中一行的一般結構:
{
"ID": "123456",
"profile": {
"criteria": [
{
"type": "type1",
"name": "name1",
"value": "7",
"properties": []
},
{
"type": "type2",
"name": "name2",
"value": "6",
"properties": [
{
"type": "MAX",
"name": "",
"value": "100"
},
{
"type": "MIN",
"name": "",
"value": "5"
}
]
},
{
"type": "type3",
"name": "name3",
"value": "5",
"properties": []
}
]
}
}
{
"ID": "456789",
"profile": {
"criteria": [
{
"type": "type4",
"name": "name4",
"value": "6",
"properties": []
}
]
}
}
我想使用python熊貓將此JSON字符串變平。
from cassandra.cluster import Cluster
import pandas as pd
from pandas.io.json import json_normalize
def pandas_factory(colnames, rows):
return pd.DataFrame(rows, columns=colnames)
cluster = Cluster(['xxx.xx.x.xx'], port=yyyy)
session = cluster.connect('nnnn')
session.row_factory = pandas_factory
json_string = session.execute('select json ......')
df = json_string ._current_rows
df_normalized= json_normalize(df)
print(df_normalized)
當我運行這段代碼,我得到一個關鍵的錯誤:
KeyError: 0
我需要幫助這個JSON字符串轉換成數據幀我使用json_normalize,因爲這是一個深度嵌套的JSON字符串有問題只有一些選定的列,看起來是這樣的:(數據的其餘部分可以跳過)
ID | criteria | type | name | value |
123456 1 type1 name1 7
123456 2 type2 name2 6
123456 3 type3 name3 5
456789 1 type4 name4 6
我試圖找到在這裏類似的問題,但我似乎無法將它應用到我Ĵ SON串。
任何幫助表示讚賞! :)
編輯:
被retured是查詢響應對象JSON字符串:結果集。我想,這就是爲什麼我在使用了一些問題:
json_string= session.execute('select json profile from visning')
temp = json.loads(json_string)
,並得到錯誤:
TypeError: the JSON object must be str, not 'ResultSet'
編輯#2:
只是爲了看看我的工作與,我打印結果查詢使用:
for line in session.execute('select json.....'):
print(line)
並得到是這樣的:
Row(json='{"ID": null, "profile": null}')
Row(json='{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}')
Row(json='{"ID": "456", "profile": {"criteria": []}}')
Row(json='{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}')
Row(json='{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}')
我在這個結構轉換爲可以在JSON中使用的JSON字符串的問題。負載():
json_string= session.execute('select json profile from visning')
json_list = list(json_string)
string= ''.join(list(map(str, json_list)))
temp = json.loads(string) <-- creates error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
編輯#3:
正如在評論下方的要求,印刷
for line in session.execute('select json.....'):
print((line.json))
得到輸出:
{"ID": null, "profile": null}
{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}
{"ID": "456", "profile": {"criteria": []}}
{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}
{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}
解決
我能夠這樣做(與@flevinkelming解決方案)將JSON字符串轉換成數據幀:
new_string = []
for line in session.execute('select json ....'):
new_string.append(json.loads(line.json))
cols = ['ID', 'criteria', 'type', 'name', 'value']
rows = []
for data in new_string:
data_id = data['ID']
criteria = data['profile']['criteria']
for d in criteria:
rows.append([data_id, criteria.index(d)+1, *list(d.values())[:-1]])
df = pd.DataFrame(rows, columns=cols)
感謝大家誰貢獻!這是一個很好的學習經歷。
你能提供一個至少有兩行的JSON嗎? – MaxU
我更新了問題並添加了另一行 – Shushu
@stovfl我添加了print((line.json))的輸出,參見編輯#3 – Shushu