2017-04-23 75 views
0

我有一個JSON輸出,我想在其中創建一個包含兩列的csv文件。第一列應包含userId,第二列應包含videoSeries的值。輸出看起來是這樣的:將數組和嵌套數組中的JSON值寫入單個CSV

{ 
    "start": 1490383076, 
    "stop": 1492975076, 
    "events": [ 
    { 
     "time": 1491294219, 
     "customParameters": [ 
     { 
      "group": "channelId", 
      "item": "dr3" 
     }, 
     { 
      "group": "videoGenre", 
      "item": "unknown" 
     }, 
     { 
      "group": "videoSeries", 
      "item": "min-mor-er-pink" 
     }, 
     { 
      "group": "videoSlug", 
      "item": "min-mor-er-pink" 
     } 
     ], 
     "userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16" 
    } 
    ], 
} 

我的CSV應該是這樣的:

-------------------------------------------------------------- 
User ID          videoSeries 
-------------------------------------------------------------- 
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink 
-------------------------------------------------------------- 

我一直在使用ijson和熊貓以獲得所需的輸出試過,但我無法從兩個不同的獲取值陣列整合到一個CSV

import ijson 
import pandas as pd 

with open('MY JSON FILE', 'r') as f: 
    objects = ijson.items(f, 'events.item') 
    pandaReadable = list(objects) 

df = pd.DataFrame(pandaReadable, columns=['userId', 'customParameters']) 
df.to_csv('C:/Users/.../Desktop/output.csv', columns=['userId', 'customParameters'], index=False) 

回答

1

試試這個辦法:

d是從您的JSON內置的字典:

In [150]: d 
Out[150]: 
{'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'}, 
    {'group': 'videoGenre', 'item': 'unknown'}, 
    {'group': 'videoSeries', 'item': 'min-mor-er-pink'}, 
    {'group': 'videoSlug', 'item': 'min-mor-er-pink'}], 
    'time': 1491294219, 
    'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}], 
'start': 1490383076, 
'stop': 1492975076} 

解決方案:

In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \ 
    ...: .query("group in ['videoSeries']")[['userId','item']] 
    ...: 
Out[153]: 
             userId    item 
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink 

,如果你需要有videoSeries作爲列名:

In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \ 
    ...: .query("group in ['videoSeries']")[['userId','item']] \ 
    ...: .rename(columns={'item':'videoSeries'}) 
    ...: 
Out[154]: 
             userId  videoSeries 
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink