2016-11-14 136 views
3

我很好奇,怎麼用熊貓閱讀以下結構的嵌套JSON:熊貓閱讀嵌套JSON

{ 
    "number": "", 
    "date": "01.10.2016", 
    "name": "R 3932", 
    "locations": [ 
     { 
      "depTimeDiffMin": "0", 
      "name": "Spital am Pyhrn Bahnhof", 
      "arrTime": "", 
      "depTime": "06:32", 
      "platform": "2", 
      "stationIdx": "0", 
      "arrTimeDiffMin": "", 
      "track": "R 3932" 
     }, 
     { 
      "depTimeDiffMin": "0", 
      "name": "Windischgarsten Bahnhof", 
      "arrTime": "06:37", 
      "depTime": "06:40", 
      "platform": "2", 
      "stationIdx": "1", 
      "arrTimeDiffMin": "1", 
      "track": "" 
     }, 
     { 
      "depTimeDiffMin": "", 
      "name": "Linz/Donau Hbf", 
      "arrTime": "08:24", 
      "depTime": "", 
      "platform": "1A-B", 
      "stationIdx": "22", 
      "arrTimeDiffMin": "1", 
      "track": "" 
     } 
    ] 
} 

在這裏,這保持了數組作爲JSON。我寧願將它擴展到列。

pd.read_json("/myJson.json", orient='records') 

編輯

感謝您的第一個答案。 我應該優化我的問題: 在數組中嵌套屬性的展平不是必需的。 只需將[A,B,C]連接df.locations ['name']即可。

我的文件包含多個JSON對象(每行1個)我想保留數字,日期,名稱和位置列。不過,我需要加入這些地點。

allLocations = "" 
isFirst = True 
for location in result.locations: 
    if isFirst: 
     isFirst = False 
     allLocations = location['name'] 
    else: 
     allLocations += "; " + location['name'] 
allLocations 

我在這裏的做法似乎不是有效/熊貓風格。

+0

給予好評的ÖBB –

回答

9

您可以使用json_normalize

import json 
from pandas.io.json import json_normalize  

with open('myJson.json') as data_file:  
    data = json.load(data_file) 

df = json_normalize(data, 'locations', ['date', 'number', 'name'], 
        record_prefix='locations_') 
print (df) 
    locations_arrTime locations_arrTimeDiffMin locations_depTime \ 
0              06:32 
1    06:37      1    06:40 
2    08:24      1      

    locations_depTimeDiffMin   locations_name locations_platform \ 
0      0 Spital am Pyhrn Bahnhof     2 
1      0 Windischgarsten Bahnhof     2 
2         Linz/Donau Hbf    1A-B 

    locations_stationIdx locations_track number name  date 
0     0   R 3932   R 3932 01.10.2016 
1     1       R 3932 01.10.2016 
2     22       R 3932 01.10.2016 

編輯:

您可以使用read_json與解析由DataFrame構造name和最後groupby與應用join

df = pd.read_json("myJson.json") 
df.locations = pd.DataFrame(df.locations.values.tolist())['name'] 
df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index() 
print (df) 
     date name number           locations 
0 2016-01-10 R 3932   Spital am Pyhrn Bahnhof,Windischgarsten Bahnho... 
+0

和json會是原始文件?或文件路徑? –

+0

在文檔中它是'Unserialized JSON objects',但我用dict測試它。 – jezrael

+1

我添加了閱讀文件,請檢查它。 – jezrael