如何從熊貓的多列創建排序列表值？

我有一個數據框與列A和列B，可以有相同的值對時排序。我想重複刪除這些列，因爲我不關心應用程序中的順序。如何從熊貓的多列創建排序列表值？

下面是一個示例數據框：

import pandas as pd 
df = pd.DataFrame({'col1':[1, 2, 3], 'col2':[2, 1, 4]}) 
print(df)

這是數據框的樣子：

index col1 col2 

0  1  2 

1  2  1 

2  3  4

我想實現的是創造條件，已經整理的前兩個列表中的新列每行的值，所以我將能夠基於此列對數據幀進行重複數據刪除。

的key_column應該是這樣的：

0 [1, 2] 

1 [1, 2] 

2 [3, 4]

我會再使用df.drop_duplicates（COL3）

我有一個想法，我應該要麼。適用使用或.MAP也許有些拉姆達功能，但沒有我想的工作至今：

df.apply(lambda row: sorted([row[0], row[1]]), axis=1) # this sorts the column values in place but doesn't create a new column with a list 
sorted([df['col1'], df['col2']]) # returns error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 
df.map(sorted) # dataframe object has no attribute map 
df[['col1', 'col2']].apply(lambda x: 
    sorted([','.join(x.astype(int).astype(str))]), axis=1) # creates a list but is not sorted

感謝您的幫助，我希望看到一個解決方案，還解釋 - 爲什麼它的工作原理。

來源

2017-08-25 StefanK

選項1

使用df.apply並通過sorted：

In [1234]: df['col3'] = df.apply(tuple, 1).apply(sorted).apply(tuple) 

In [1235]: df.drop_duplicates('col3') 
Out[1235]: 
    col1 col2 col3 
0  1  2 (1, 2) 
2  3  4 (3, 4)

選項2

呼叫np.sort上df.values，然後將結果分配到新列。

In [1208]: df['col3'] = pd.Series([tuple(x) for x in np.sort(df.values, 1)]); df 
Out[1208]: 
    col1 col2 col3 
0  1  2 (1, 2) 
1  2  1 (1, 2) 
2  3  4 (3, 4) 

In [1210]: df.drop_duplicates('col3') 
Out[1210]: 
    col1 col2 col3 
0  1  2 (1, 2) 
2  3  4 (3, 4)

來源

2017-08-25 11:36:16

你能澄清可能（或交相關鏈接）如何選擇1部作品，爲什麼你需要申請的元組的兩倍？例如，當我想將它轉換爲numpy數組時，我這樣做：'df.apply（tuple，1）.map（np.array）'並且它可以工作，但是當我做'df.apply（np.array， 1）'它不起作用 – StefanK

@StefanK我使用兩個'apply'調用的唯一原因是因爲我不想使用lambda！但是你也可以使用1。 –

@StefanK排序後的調用結果是列表（大熊貓隱式轉換它們）。所以需要另一個應用電話。 –

三個步驟：

df['x'] = df.apply(lambda x: tuple(sorted(x)), axis=1) 
df = df.drop_duplicates('x') 
del df['x']

來源

2017-08-25 11:50:23

如何從熊貓的多列創建排序列表值？

回答

相關問題