2015-02-06 214 views
4

我有很長的JSON這樣的:http://pastebin.com/gzhHEYGy如何閱讀與熊貓json字典類型的文件?

我想將它放入一個大熊貓datframe爲了發揮它,所以通過的文件我做到以下幾點:

df = pd.read_json('/user/file.json') 
print df 

我這種回溯:

File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module> 
    df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json') 
    File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json 
    date_unit).parse() 
    File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse 
    self._parse_no_numpy() 
    File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy 
    loads(json, precise_float=self.precise_float), dtype=None) 
    File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__ 
    mgr = self._init_dict(data, index, columns, dtype=dtype) 
    File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict 
    dtype=dtype) 
    File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr 
    index = extract_index(arrays) 
    File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index 
    raise ValueError('arrays must all be same length') 
ValueError: arrays must all be same length 
從以前的問題,我發現我需要做這樣的事情

然後

但我沒有得到如何獲得內容像一個numpy數組。我如何在像這樣的大文件中保存數組的長度?提前致謝。

+0

這似乎是像字典一樣。 – skwoi 2015-02-06 19:42:43

回答

11

json方法不工作,因爲json文件不是它所期望的格式。正如我們可以很容易地加載一個JSON作爲一個字典,讓我們試試這樣:

import pandas as pd 
import json 
import os 

os.chdir('/Users/nicolas/Downloads') 

# Reading the json as a dict 
with open('json_example.json') as json_data: 
    data = json.load(json_data) 

# using the from_dict load function. Note that the 'orient' parameter 
#is not using the default value (or it will give the same error than you had) 
# We transpose the resulting df and set index column as its index to get this result 
pd.DataFrame.from_dict(data, orient='index').T.set_index('index') 

輸出:

                data columns 
index                   
311210177061863424 [25-34\n, FEMALE, @bikewa absolutely the best....  age 
310912785183813632 [25-34\n, FEMALE, Photo: I love the Burke-Gilm... gender 
311290293871849472 [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h... text 
309386414548717569 [25-34\n, FEMALE, Facebook Is Making The Most ... None 
312327801187495936 [25-34\n, FEMALE, Still upset about this &gt;&... None 
312249421079400449 [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've... None 
308692673194246145 [25-34\n, FEMALE, @Social_Freedom_ actually, t... None 
308995226633129984 [25-34\n, FEMALE, @seattleweekly that's more t... None 
308660851219501056 [25-34\n, FEMALE, @adamholdenbache I noticed 1... None 
308658690528014337 [25-34\n, FEMALE, @CEM_Social I am waiting pat... None 
309719798001070080 [25-34\n, FEMALE, Going to be watching Faceboo... None 
312349448049152002 [25-34\n, FEMALE, @anikamarketer I applied for... None 
312325152698404864 [25-34\n, FEMALE, @_chrisrojas_ wow, that's so... None 
310546490844135425 [25-34\n, FEMALE, Photo: Feeling like a bit of... None 
+0

非常感謝@knightofni的幫助 – skwoi 2015-02-08 23:53:58