熊貓數據幀和收集結果

鑑於以下數據框：熊貓數據幀和收集結果

import pandas as pd 
p1 = {'name': 'willy', 'age': 11, 'interest': "Lego"} 
p2 = {'name': 'willy', 'age': 11, 'interest': "games"} 
p3 = {'name': 'zoe', 'age': 9, 'interest': "cars"} 
df = pd.DataFrame([p1, p2, p3]) 
df 

    age interest name 
0 11 Lego  willy 
1 11 games  willy 
2 9 cars  zoe

我想知道每個人的利益的總和，並讓每個人只有在列表中顯示一次。我做了以下幾點：

Interests = df[['age', 'name', 'interest']].groupby(['age' , 'name']).count() 
Interests.reset_index(inplace=True) 
Interests.sort('interest', ascending=False, inplace=True) 
Interests 

    age name interest 
1 11 willy 2 
0 9 zoe  1

這個工程，但我覺得我做錯了。現在我正在使用'interest'列來顯示我的總和值，這是可以的，但正如我所說，我期望有一個更好的方式來做到這一點。

我看到很多關於熊貓計數/求和的問題，但對於我來說，我忽略了「重複」的部分是關鍵。

來源

2015-11-03 Lam

您可以使用大小（每個組的長度），而不是計數組中每個列中的非NaN。

In [11]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size() 
Out[11]: 
age name 
9 zoe  1 
11 willy 2 
dtype: int64 

In [12]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size().reset_index(name='count') 
Out[12]: 
    age name count 
0 9 zoe  1 
1 11 willy  2

來源

2015-11-03 16:44:23

這看起來很優雅。你能用[.reset_index（name ='count'）]解釋[12]會發生什麼。我知道它會創建'count'列，但它與重置索引有什麼關係？在未命名的列中的索引是不是0和1？ **編輯**你可以擴展爲什麼size（）會更好，然後count（）在這裏？ reset_index之前的 – Lam

它是一個Series，索引是一個MultiIndex（年齡和名字）。 reset_index使這些列成列，並且名稱給出Series中的列名稱（默認情況下它是0）。請參閱http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.reset_index.html –

謝謝，我仍然需要習慣這樣的想法，即對對象執行操作可以完全改變對象類型。我想在使用熊貓時，應該快速使用它:) – Lam

In [2]: df 
Out[2]: 
    age interest name 
0 11  Lego willy 
1 11 games willy 
2 9  cars zoe 

In [3]: for name,group in df.groupby('name'): 
    ...:  print name 
    ...:  print group.interest.count() 
    ...:  
willy 
2 
zoe 
1

來源

2015-11-03 16:46:26 Angelo

熊貓數據幀和收集結果

回答

相關問題