2017-08-15 51 views
1

我正在使用數據框來嘗試查找平均值,並在嘗試將值計數轉換爲我的分組df的平均值時被卡住。代碼如下:熊貓發現字符串發生的平均值

df2 = df.groupby(['school', 'Race/Ethnicity']).size() 

school   Race/Ethnicity       
school1   African American/Black      15 
       American Indian/Alaska Native    1 
       Bi-racial/Multi-racial      4 
       Latino/a         53 
       Other - Write In (Required)     1 
       White          2 
school2   African American/Black      1 
       American Indian/Alaska Native    5 
       Asian          1 
       Bi-Racial/Multi-Racial      1 
       Latino/a         26 

我有很多不同的學校,而不是大小,我想找到每個學校的每場比賽的意思。我如何遍歷這些組來找到每個組的總和,然後將每一行除以它的組的總和?

+2

查看示例數據會很有幫助,但聽起來您可以用'df.groupby('school')。size()'來區分'df2'。 –

+0

@AndrewL謝謝你,那正是我需要的!我知道我讓事情變得比他們需要的更艱難。 – Cameron

回答

1

使用normalize參數value_counts

df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True) 

school Race/Ethnicity    
school1 Latino/a       0.697368 
     African American/Black   0.197368 
     Bi-racial/Multi-racial   0.052632 
     White       0.026316 
     American Indian/Alaska Native 0.013158 
     Other - Write In (Required)  0.013158 
school2 Latino/a       0.764706 
     American Indian/Alaska Native 0.147059 
     African American/Black   0.029412 
     Asian       0.029412 
     Bi-Racial/Multi-Racial   0.029412 
Name: Race/Ethnicity, dtype: float64 

您也可以跳過排序

df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True, sort=False) 

school Race/Ethnicity    
school1 African American/Black   0.197368 
     American Indian/Alaska Native 0.013158 
     Bi-racial/Multi-racial   0.052632 
     Latino/a       0.697368 
     Other - Write In (Required)  0.013158 
     White       0.026316 
school2 African American/Black   0.029412 
     American Indian/Alaska Native 0.147059 
     Asian       0.029412 
     Bi-Racial/Multi-Racial   0.029412 
     Latino/a       0.764706 
Name: Race/Ethnicity, dtype: float64 

設置

df = pd.DataFrame(
    [['school1', 'African American/Black']] * 15 + 
    [['school1', 'American Indian/Alaska Native']] * 1 + 
    [['school1', 'Bi-racial/Multi-racial']] * 4 + 
    [['school1', 'Latino/a']] * 53 + 
    [['school1', 'Other - Write In (Required)']] * 1 + 
    [['school1', 'White']] * 2 + 
    [['school2', 'African American/Black']] * 1 + 
    [['school2', 'American Indian/Alaska Native']] * 5 + 
    [['school2', 'Asian']] * 1 + 
    [['school2', 'Bi-Racial/Multi-Racial']] * 1 + 
    [['school2', 'Latino/a']] * 26, 
    columns=['school', 'Race/Ethnicity'] 
)