最頻繁的值熊貓更換行值數據框

我有如下數據框：最頻繁的值熊貓更換行值數據框

|   types | freq |  TypeList 
0 | Q11424 (item) | 29 | Q11424 (item),Q571 (item) 
1 |  Q571 (item) | 9 | Q11424 (item),Q571 (item) 
0 | Q11012 (item) | 6 | Q11012 (item) 
0 | Q4830453 (item) | 39 | Q4830453 (item) 
0 | Q7725634 (item) | 2 | Q7725634 (item),Q571 (item) 
1 |  Q571 (item) | 9 | Q7725634 (item),Q571 (item) 
0 | Q785479 (item) | 1 | Q785479 (item),Q1344 (item) 
1 |  Q1344 (item) | 1 | Q785479 (item),Q1344 (item)

列「類型」實際上是「類型串」的扁平列。 freq列表示列類型中每個值的頻率。這些頻率來自整個數據幀。在這裏，我只是添加了幾行這些行。例如。 Q571在類型列中出現了9次，因此freq = 9。 TypeList列是每個記錄的類型列表。如果TypeList列包含多個類型，我想添加新的列SuperType，它將具有最常見的類型。例如。我想下面的結果：

|   types | freq |  TypeList     |SuperType 
0 | Q11424 (item) | 29 | Q11424 (item),Q571 (item) | Q11424 
1 |  Q571 (item) | 9 | Q11424 (item),Q571 (item) | Q11424 
0 | Q11012 (item) | 6 | Q11012 (item)    | Q11012 
0 | Q4830453 (item) | 39 | Q4830453 (item)    | Q4830453 
0 | Q7725634 (item) | 2 | Q7725634 (item),Q571 (item) | Q571 
1 |  Q571 (item) | 9 | Q7725634 (item),Q571 (item) | Q571 
0 | Q785479 (item) | 1 | Q785479 (item),Q1344 (item) | Q785479 
1 |  Q1344 (item) | 1 | Q785479 (item),Q1344 (item) | Q785479

在第一行，TYPELIST列有值「Q11424（項目），Q571（項目）」。所以我想檢查這兩種類型的頻率，即29和9。並在該行的superType列中分配最頻繁的類型，即在這種情況下爲Q11424。

來源

2017-10-12 Nilakshi Naphade

通過使用transform

df['SuperType']=df.sort_values('freq').groupby('TypeList')['types'].transform('last') 
df['SuperType']=df.SuperType.str[:-6] 
df.sort_index() 
Out[1124]: 
      types freq      TypeList SuperType 
0 Q11424 (item) 29 Q11424 (item),Q571 (item) Q11424 
1  Q571 (item)  9 Q11424 (item),Q571 (item) Q11424 
2 Q11012 (item)  6    Q11012 (item) Q11012 
3 Q4830453 (item) 39    Q4830453 (item) Q4830453 
4 Q7725634 (item)  2 Q7725634 (item),Q571 (item)  Q571 
5  Q571 (item)  9 Q7725634 (item),Q571 (item)  Q571 
6 Q785479 (item)  1 Q785479 (item),Q1344 (item)  Q1344 
7  Q1344 (item)  1 Q785479 (item),Q1344 (item)  Q1344

編輯：

df=df.sort_values('freq') 
df['SuperType']=df.groupby('TypeList')['types'].transform('last').values 
df['SuperType']=df.SuperType.str[:-6]

來源

2017-10-12 14:43:54 Wen

@ScottBoston編輯.. – Wen

我得到「ValueError異常：可以從重複軸不重新索引」例外執行第一行後。只有當我嘗試將值賦給df ['superType']時纔會發生異常，否則它會起作用。 –

@NilakshiNaphade嘗試編輯 – Wen

最頻繁的值熊貓更換行值數據框

回答

相關問題