2016-11-07 191 views
-1

我想下面的數據加載到我pandasdataframe熊貓 - 空數據幀

jsons_data = pd.DataFrame(columns=['playlist', 'user', 'track', 'count']) 

for index, js in enumerate(json_files): 
    with open(os.path.join(path_to_json, js)) as json_file: 
    json_text = json.load(json_file) 
    #my json layout 
    user = json_text.keys() 
    playlist = 'all_playlists' 
    track = [p for p in json_text.values()[0]] 
    count = [p.values() for p in json_text.values()] 
    print jsons_data 

,但我得到一個empty dataframe

[u'user1'] 
all_playlists 
[{u'Make You Feel My Love': 1.0, u'I See Fire': 1.0, u'High And Dry': 1.0, u'Fake Plastic Trees': 1.0, u'One': 1.0, u'Goodbye My Lover': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
[u'user2'] 
all_playlists 
[{u'Codex': 1.0, u'No Surprises': 1.0, u'O': 1.0, u'Go It Alone': 1.0}] 
[[1.0, 1.0, 1.0, 1.0]] 
[u'user3'] 
all_playlists 
[{u'Fake Plastic Trees': 1.0, u'High And Dry': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0]] 
[u'user4'] 
all_playlists 
[{u'No Distance Left To Run': 1.0, u'Running Up That Hill': 1.0, u'Fake Plastic Trees': 1.0, u'The Numbers': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0, 1.0, 1.0]] 
[u'user5'] 
all_playlists 
[{u'Wild Wood': 1.0, u'You Do Something To Me': 1.0, u'Reprise': 1.0}] 
[[1.0, 1.0, 1.0]] 
Empty DataFrame 
Columns: [playlist, user, track, count] 
Index: [] 

什麼是錯的代碼?

編輯:

json文件都以這種方式構成:

{ 
'user1':{ 
'Karma Police':1.0, 
'Roxanne':1.0, 
'Sonnet':1.0, 
'We Will Rock You':1.0, 
}} 
+1

您初始化DataFrame時沒有值和一些列名:'['playlist','user','track','count']'...您還期望什麼?你永遠不會觸摸循環中的'DataFrame' - 它怎麼可能影響它? –

+0

我不知道。我在學。也許你可以教我。 –

+0

這不是教程服務。不過,我建議你閱讀'pandas' [教程](http://pandas.pydata.org/pandas-docs/stable/dsintro.html)。它應該讓你立即開始運行。 –

回答

1

好吧,首先讓我們通過做一些假的數據與玩將使這一問題更加容易的理解開始:

# Dummy data to play with 
data1 = { 
'user1':{ 
    'Karma Police':1.0, 
    'Roxanne':1.0, 
    'Sonnet':1.0, 
    'We Will Rock You':1.0, 
    } 
} 

data2 = { 
'user2':{ 
    'Karma Police':1.0, 
    'Creep':1.0, 
    } 
} 

讓我說明這一點我們將在下面使用:

In : pd.DataFrame(data1).unstack() 

Out: 
user1 Karma Police  1.0 
     Roxanne    1.0 
     Sonnet    1.0 
     We Will Rock You 1.0 
dtype: float64 

# This is where you would normally iterate on the files 
mylist = [] 
for data in [data1, data2]: 
    # Make a dataframe then unstack, 
    # producing a series with a 2-multiindex as above 
    # And append it to the lsit 
    mylist.append(pd.DataFrame(data).unstack()) 

現在我們Concat的該名單,並做清理

merged = pd.concat(mylist) 
# Renaming to get the right column names 
merged.index.names = ['User', 'Track'] 
merged.name = 'Count' 
# Transpose to a dataframe instead of a Series 
merged = merged.to_frame() 
# Adding a new column with the same value throughout 
merged['Playlist'] = 'all_playlists' 


merged 
一點點

日期:

Output

你可以再調用reset_index如果你不喜歡這種方式。

+0

太好了,謝謝 –

0

在循環結束時,只需添加:

jsons_data.loc[index] = [playlist, user, track, count] 

它打印:

playlist     user \ 
0 decaf   [user1] 
1 decaf   [user2] 
2 decaf   [user3] 
3 decaf   [user4] 
4 decaf   [user5] 

               track \ 
0 [Make You Feel My Love, I See Fire, High And D... 
1    [Codex, No Surprises, O, Go It Alone] 
2 [Fake Plastic Trees, High And Dry, No Surprises] 
3 [No Distance Left To Run, Running Up That Hill... 
4  [Wild Wood, You Do Something To Me, Reprise] 

            count 
0 [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
1     [[1.0, 1.0, 1.0, 1.0]] 
2      [[1.0, 1.0, 1.0]] 
3   [[1.0, 1.0, 1.0, 1.0, 1.0]] 
4      [[1.0, 1.0, 1.0]] 
+2

這很難與no一起工作?完全擊敗使用熊貓的角度 –

+0

@JulienMarrec數據分析不是最友好的環境。但看起來,一旦這個框架提升了重量,繪製數據和導出數據('SQLite'等)就非常簡單。 –