2016-10-04 42 views
1

我試圖使用熊貓來分組成員,來計算成員已購買的訂閱類型的數量,並獲得每個成員花費的總數。一旦加載的數據類似:熊貓groupby - 分組用戶和統計用戶的類型

df = 

Member Nbr Member Name-First Member Name-Last  Date-Joined    Member Type   Amount Addr-Formatted Date-Birth    Gender  Status  
1   Aboud    Tordon     2010-03-31 00:00:00  1 Year Membership 331.00 ADDRESS_1  1972-08-01 00:00:00  Male  Active 
1   Aboud    Tordon     2011-04-16 00:00:00  1 Year Membership 334.70 ADDRESS_1  1972-08-01 00:00:00  Male  Active 
1   Aboud    Tordon     2012-08-06 00:00:00  1 Year Membership 344.34 ADDRESS_1  1972-08-01 00:00:00  Male  Active 
1   Aboud    Tordon     2013-08-21 00:00:00  1 Year Membership 362.53 ADDRESS_1  1972-08-01 00:00:00  Male  Active 
1   Aboud    Tordon     2015-08-31 00:00:00  1 Year Membership 289.47 ADDRESS_1  1972-08-01 00:00:00  Male  Active 

2   Jean     Manuel     2012-12-10 00:00:00  4 Month Membership 148.79 ADDRESS_2  1984-08-01 00:00:00  Male  In-Active 
2   Jean     Manuel     2013-03-13 00:00:00  1 Year Membership 348.46 ADDRESS_2  1984-08-01 00:00:00  Male  In-Active 
2   Jean     Manuel     2014-03-15 00:00:00  1 Year Membership 316.86 ADDRESS_2  1984-08-01 00:00:00  Male  In-Active 

3   Val     Adams     2010-02-09 00:00:00  1 Year Membership 333.25 ADDRESS_3  1934-10-26 00:00:00  Female  Active 
3   Val     Adams     2011-03-09 00:00:00  1 Year Membership 333.88 ADDRESS_3  1934-10-26 00:00:00  Female  Active 
3   Val     Adams     2012-04-03 00:00:00  1 Year Membership 318.34 ADDRESS_3  1934-10-26 00:00:00  Female  Active 
3   Val     Adams     2013-04-15 00:00:00  1 Year Membership 350.73 ADDRESS_3  1934-10-26 00:00:00  Female  Active 
3   Val     Adams     2014-04-19 00:00:00  1 Year Membership 291.63 ADDRESS_3  1934-10-26 00:00:00  Female  Active 
3   Val     Adams     2015-04-19 00:00:00  1 Year Membership 247.35 ADDRESS_3  1934-10-26 00:00:00  Female  Active 

5   Michele    Younes     2010-02-14 00:00:00  1 Year Membership 333.25 ADDRESS_4  1933-06-23 00:00:00  Female  In-Active 
5   Michele    Younes     2011-05-23 00:00:00  1 Year Membership 317.77 ADDRESS_4  1933-06-23 00:00:00  Female  In-Active 
5   Michele    Younes     2012-05-28 00:00:00  1 Year Membership 328.16 ADDRESS_4  1933-06-23 00:00:00  Female  In-Active 
5   Michele    Younes     2013-05-31 00:00:00  1 Year Membership 360.02 ADDRESS_4  1933-06-23 00:00:00  Female  In-Active 

7   Adam     Herzburg    2010-07-11 00:00:00  1 Year Membership 335 48 ADDRESS_5  1987-08-30 00:00:00  Male  In-Active 
... 

由於最流行的Member Type1 Month3 Month4 Month6 Month,並且1 Year我想作一列計算給定的成員已購買的Member Type的數量。

也有2 Month5 Month,是極少出現,如果一個成員擁有這樣的合同,我想指望它作爲一個「雜項」 7 Month8 MonthPool-OnlyMember Type

我也試圖得到一個'Total'列,總結了給定成員花費的總金額。

基本上我想改變我以前的數據幀類似於:

df1= 
Member Nbr Member Name-First Member Name-Last 1_Month 3_Month 4_Month 6_Month 1_Year Misc Total Addr-Formatted Date-Birth   Gender  Status 
1   Aboud    Tordon    0  0  0  0  5  0  1662.04 ADDRESS_1  1972-08-01 00:00:00 Male  Active 
2   Jean    Manuel    0  0  1  0  2  0  813.86 ADDRESS_2  1984-08-01 00:00:00 Male  In-Active 
3   Val     Adams    0  0  0  0  6  0  1875.18 ADDRESS_3  1934-10-26 00:00:00 Female  Active 
5   Michele    Younes    0  0  0  0  4  0  1339.20 ADDRESS_4  1933-06-23 00:00:00 Female  In-Active 
7   Adam    Herzburg   0  0  0  0  1  0  335.48 ADDRESS_5  1933-06-23 00:00:00 Male  In-Active 

...

,我現在遇到的問題是,每當我用groupby我只能要麼總和增加金額,或單獨計算一種特定類型的合約,但我無法得到它類似於df1

回答

2

你可以先用字典dmapMember Type列,然後fillna按值Misc

d = {'1 Year Membership':'1_Year','1 Month Membership':'1_Month', '3 Month Membership':'3_Month', '4 Month Membership':'4_Month', '6 Month Membership':'6_Month'} 
df['Type'] = df['Member Type'].map(d).fillna('Misc') 
#print (df) 

然後groupby和聚集sum

df0 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'])['Amount'].sum() 
#print (df0) 

添加列Type到分組列的列表並聚合size,然後通過unstack重塑:

df1 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status', 'Type']).size().unstack(fill_value=0) 
#print (df1) 

concat兩個DataFrames

print (pd.concat([df0, df1], axis=1).reset_index()) 
    Member Nbr Member Name-First Member Name-Last Addr-Formatted \ 
0   1    Aboud   Tordon  ADDRESS_1 
1   2    Jean   Manuel  ADDRESS_2 
2   3    Val   Adams  ADDRESS_3 
3   5   Michele   Younes  ADDRESS_4 
4   7    Adam   Herzburg  ADDRESS_5 

      Date-Birth Gender  Status Amount 1_Year 4_Month 
0 1972-08-01 00:00:00 Male  Active 1662.04  5  0 
1 1984-08-01 00:00:00 Male In-Active 814.11  2  1 
2 1934-10-26 00:00:00 Female  Active 1875.18  6  0 
3 1933-06-23 00:00:00 Female In-Active 1339.20  4  0 
4 1987-08-30 00:00:00 Male In-Active 335.48  1  0 

編輯:

如果某些值在Member Type列缺,需要加reindex

df1 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status', 'Type']).size().unstack(fill_value=0).reindex(columns=d.values(), fill_value=0) 
#print (df1) 

print (pd.concat([df0, df1], axis=1).reset_index()) 
    Member Nbr Member Name-First Member Name-Last Addr-Formatted \ 
0   1    Aboud   Tordon  ADDRESS_1 
1   2    Jean   Manuel  ADDRESS_2 
2   3    Val   Adams  ADDRESS_3 
3   5   Michele   Younes  ADDRESS_4 
4   7    Adam   Herzburg  ADDRESS_5 

      Date-Birth Gender  Status Amount 6_Month 3_Month 4_Month \ 
0 1972-08-01 00:00:00 Male  Active 1662.04  0  0  0 
1 1984-08-01 00:00:00 Male In-Active 814.11  0  0  1 
2 1934-10-26 00:00:00 Female  Active 1875.18  0  0  0 
3 1933-06-23 00:00:00 Female In-Active 1339.20  0  0  0 
4 1987-08-30 00:00:00 Male In-Active 335.48  0  0  0 

    1_Year 1_Month 
0  5  0 
1  2  0 
2  6  0 
3  4  0 
4  1  0 

而不是第二個groupby(最快的)是可能的使用pivot_table

df2 = df.pivot_table(index=['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'], columns='Type', values='Amount', aggfunc=len, fill_value=0).reindex(columns=d.values(), fill_value=0) 
print (pd.concat([df0, df2], axis=1).reset_index()) 
    Member Nbr Member Name-First Member Name-Last Addr-Formatted \ 
0   1    Aboud   Tordon  ADDRESS_1 
1   2    Jean   Manuel  ADDRESS_2 
2   3    Val   Adams  ADDRESS_3 
3   5   Michele   Younes  ADDRESS_4 
4   7    Adam   Herzburg  ADDRESS_5 

      Date-Birth Gender  Status Amount 6_Month 3_Month 4_Month \ 
0 1972-08-01 00:00:00 Male  Active 1662.04  0  0  0 
1 1984-08-01 00:00:00 Male In-Active 814.11  0  0  1 
2 1934-10-26 00:00:00 Female  Active 1875.18  0  0  0 
3 1933-06-23 00:00:00 Female In-Active 1339.20  0  0  0 
4 1987-08-30 00:00:00 Male In-Active 335.48  0  0  0 

    1_Year 1_Month 
0  5  0 
1  2  0 
2  6  0 
3  4  0 
4  1  0