2017-06-19 112 views
-1

假設我有一個熊貓數據幀,它看起來像這樣的事情:熊貓:如何計算每一行中各個單詞的數據幀

sentences 
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well'] 
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences'] 

我試圖計算每個每個單詞的計數行,並以一種我可以在需要時輕鬆使用它的方式存儲它們。到目前爲止,我失敗了,我會很感激一些幫助。

謝謝!

+0

您是否嘗試過使用df.column_name [。 value_counts()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)? – Tbaki

回答

0

您可以使用CounterDataFrame構造,但對遺漏值獲得NaNs

from collections import Counter 

print (type(df.loc[0, 'sentences'])) 
<class 'list'> 

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1.0 1  NaN 1.0 1 NaN NaN 1.0 NaN   1  NaN 
1 NaN 1  1.0 NaN 1 1.0 1.0 NaN 1.0   2  1.0 

    this well 
0  2 1.0 
1  2 NaN 

如果需要更換NaNs0添加DataFrame.fillna

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1 1  0 1 1  0  0 1  0   1   0 
1 0 1  1 0 1  1  1 0  1   2   1 

    this well 
0  2  1 
1  2  0 
+0

感謝您的迅速響應!如果不按字母順序重新排列,可以這樣做嗎? – emreorta

+0

不幸的是,因爲'DataFrame'構造函數對它進行排序:( – jezrael

+0

呃,好像我們不能擁有所有東西:D再次感謝! – emreorta