我目前有一個熊貓數據框，在單個問題上有很多答案，所以我想把它變成一個列表，這樣我就可以做餘弦相似。如何將熊貓數據框轉換爲有多對一關係的有序列表？

目前，我有數據框，這裏的問題是由通過PARENT_ID = q_id答案加盟，如圖片所示：

print (df) 
    q_id  q_body parent_id a_body 
0  1 question 1   1 answer 1 
1  1 question 1   1 answer 2 
2  1 question 1   1 answer 3 
3  2 question 2   2 answer 1 
4  2 question 2   2 answer 2

，我期待的產品是：

（「問題1」，「回答1」，「回答2」，「回答3」）

（「問題2」，「回答1」，「回答2」）

任何幫助，將不勝感激！非常感謝你。

來源

2017-03-08 Ming Ting

我認爲你需要groupby與apply：

#output is tuple with question value 
df = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x))) 
print (df) 
q_body 
question 1 (question 1, answer 1, answer 2, answer 3) 
question 2    (question 2, answer 1, answer 2) 
Name: a_body, dtype: object 

#output is list with question value 
df = df.groupby('q_body')['a_body'].apply(lambda x: [x.name] + list(x)) 
print (df) 
q_body 
question 1 [question 1, answer 1, answer 2, answer 3] 
question 2    [question 2, answer 1, answer 2] 
Name: a_body, dtype: object

#output is list without question value 
df = df.groupby('q_body')['a_body'].apply(list) 
print (df) 
q_body 
question 1 [answer 1, answer 2, answer 3] 
question 2    [answer 1, answer 2] 
Name: a_body, dtype: object 

#grouping by parent_id without question value 
df = df.groupby('parent_id')['a_body'].apply(list) 
print (df) 
parent_id 
1 [answer 1, answer 2, answer 3] 
2    [answer 1, answer 2] 
Name: a_body, dtype: object 

#output is string, values are concanecated by , 
df = df.groupby('parent_id')['a_body'].apply(', '.join) 
print (df) 
parent_id 
1 answer 1, answer 2, answer 3 
2    answer 1, answer 2 
Name: a_body, dtype: object

但是，如果需要輸出列表中添加tolist：

L = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x))).tolist() 
print (L) 
[('question 1', 'answer 1', 'answer 2', 'answer 3'), ('question 2', 'answer 1', 'answer 2')]

來源

2017-03-08 07:33:17 jezrael

謝謝jezrael，現在會更多地使用lambda。 –

很高興能爲您提供幫助。美好的一天。 – jezrael

df = pd.DataFrame([ 
     ['question 1', 'answer 1'], 
     ['question 1', 'answer 2'], 
     ['question 1', 'answer 3'], 
     ['question 2', 'answer 1'], 
     ['question 2', 'answer 2'], 
    ], columns=['q_body', 'a_body']) 

print(df) 

     q_body a_body 
0 question 1 answer 1 
1 question 1 answer 2 
2 question 1 answer 3 
3 question 2 answer 1 
4 question 2 answer 2

`apply(list)`

df.groupby('q_body').a_body.apply(list) 

q_body 
question 1 [answer 1, answer 2, answer 3] 
question 2    [answer 1, answer 2]

來源

2017-03-08 07:35:28 piRSquared

看它是否有助於你

result = df.groupby('q_id').agg({'q_body': lambda x: x.iloc[0], 'a_body': lambda x: ', '.join(x)}) 
result['output'] = result.q_body + ', ' + result.a_body

這將創建一個新的列輸出與期望的結果。

來源

2017-03-08 07:54:13 Pintu

如何將熊貓數據框轉換爲有多對一關係的有序列表？

回答

apply(list)

相關問題

`apply(list)`