2016-03-03 204 views
2

列數組我有一個熊貓數據框,看起來像這樣:行值在大熊貓數據幀

+---+--------+-------------+------------------+ 
| | ItemID | Description | Feedback   | 
+---+--------+-------------+------------------+ 
| 0 | 8988 | Tall Chair | I hated it  | 
+---+--------+-------------+------------------+ 
| 1 | 8988 | Tall Chair | Best chair ever | 
+---+--------+-------------+------------------+ 
| 2 | 6547 | Big Pillow | Soft and amazing | 
+---+--------+-------------+------------------+ 
| 3 | 6547 | Big Pillow | Horrific color | 
+---+--------+-------------+------------------+ 

我想從「反饋」列中的值連接成一個新列,用逗號隔開, ItemID匹配的地方。像這樣:

+---+--------+-------------+----------------------------------+ 
| | ItemID | Description | NewColumn      | 
+---+--------+-------------+----------------------------------+ 
| 0 | 8988 | Tall Chair | I hated it, Best chair ever  | 
+---+--------+-------------+----------------------------------+ 
| 1 | 6547 | Big Pillow | Soft and amazing, Horrific color | 
+---+--------+-------------+----------------------------------+ 

我已經嘗試了幾個變化的樞軸,合併,堆疊等,我卡住了。
認爲 NewColumn最終將成爲一個數組,但我相當新的Python,所以我不確定。
此外,最終,我要去嘗試,並使用這個文本分類(新的「描述」產生一些「反饋」標籤[多類問題])

回答

1

我想你可以通過groupbyItemIDDescriptionapplyjoin和最後reset_index

print df.groupby(['ItemID', 'Description'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID Description       NewColumn 
0 6547 Big Pillow Soft and amazing, Horrific color 
1 8988 Tall Chair  I hated it, Best chair ever 

如果你不需要Description柱:

print df.groupby(['ItemID'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID       NewColumn 
0 6547 Soft and amazing, Horrific color 
1 8988  I hated it, Best chair ever