查找值大於列平均值的列

如果行值大於列的平均值（或中值），那麼如何打印列標題。。查找值大於列平均值的列

對於例如， df = a b c d 0 12 11 13 45 1 6 13 12 23 2 5 12 6 35

the output should be 0: a, c, d. 1: a, b, c. 2: b.

來源

2017-08-29 Ram

能否請你確認它是什麼，你_actually_什麼？你想要一個數據框的值還是隻是一個簡單的元組列表，你需要什麼？ –

帶列標題的元組簡單列表。感謝您澄清它。 – Ram

在這種情況下，您接受的答案會給您另外的答案。你能再看一遍嗎？ –

In [22]: df.gt(df.mean()).T.agg(lambda x: df.columns[x].tolist()) 
Out[22]: 
0 [a, c, d] 
1  [b, c] 
2   [d] 
dtype: object

或：

In [23]: df.gt(df.mean()).T.agg(lambda x: ', '.join(df.columns[x])) 
Out[23]: 
0 a, c, d 
1  b, c 
2   d 
dtype: object

來源

2017-08-29 21:25:05 MaxU

我需要重新檢查熊貓的手冊...甚至不知道'gt' :)謝謝〜 – Wen

@恩，是的，熊貓API是__huge__。我總是發現一些我以前不知道的方法... ;-) – MaxU

這太棒了，我被困在了與布爾值的DF： – Vaishali

使用df.apply生成一個面罩，然後你就可以遍歷和索引df.columns：

mask = df.apply(lambda x: x > x.mean()) 
out = [(i, ', '.join(df.columns[x])) for i, x in mask.iterrows()] 
print(out) 
[(0, 'a, c, d'), (1, 'b, c'), (2, 'd')]

來源

2017-08-29 21:14:34

您可以通過使用pandas試試這個，我打破了步驟

df=df.reset_index().melt('index') 
df['MEAN']=df.groupby('variable')['value'].transform('mean') 
df[df.value>df.MEAN].groupby('index').variable.apply(list) 

Out[1016]: 
index 
0 [a, c, d] 
1  [b, c] 
2   [d] 
Name: variable, dtype: object

來源

2017-08-29 21:15:24 Wen

d = defaultdict(list) 
v = df.values 
[d[df.index[r]].append(df.columns[c]) 
for r, c in zip(*np.where(v > v.mean(0)))]; 
dict(d) 

{0: ['a', 'c', 'd'], 1: ['b', 'c'], 2: ['d']}

來源

2017-08-29 21:35:27 piRSquared

查找值大於列平均值的列

回答

相關問題