2016-10-28 50 views
1

我是熊貓圖書館的新手,需要一些幫助。我有兩列這樣的:使用Python進行數據分析熊貓

Test Result  Risk Rating 
    Fail    Low     
    Pass    Medium 
    Skip    High 
    Pass    Low     
    Fail    Medium 
    Pass    High 
    Skip    Low     
    Fail    Medium 
    Fail    High 

現在,我需要找到不合格,合格的總數,並從「測試結果」欄略過,我能夠做到這一點。但是,我還需要從「風險評級」列中將「測試結果」列的「失敗」總數與「低」進行比較。同樣,總數失敗與中等等。我的最終結果應該如下所示:

Fail (Low Risk Rating) = 1 
Fail (Medium Risk Rating) = 2 
Fail (High Risk Rating) = 1 
Pass (Low Risk Rating) = 1 
Pass (Medium Risk Rating) = 1 
Pass (High Risk Rating) = 1 
Skip (Low Risk Rating) = 1 
Skip (Medium Risk Rating) = 0 
Skip (High Risk Rating) = 1 

我該怎麼做?任何幫助,將不勝感激。

回答

3

我想你需要groupby由兩列和彙總size

df = df.groupby(['Test Result', 'Risk Rating']).size().reset_index(name='counts') 
print (df) 
    Test Result Risk Rating counts 
0  Fail  High  1 
1  Fail   Low  1 
2  Fail  Medium  2 
3  Pass  High  1 
4  Pass   Low  1 
5  Pass  Medium  1 
6  Skip  High  1 
7  Skip   Low  1 

也許更好的數據透視表與unstack

df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0) 
print (df) 
Risk Rating High Low Medium 
Test Result     
Fail   1 1  2 
Pass   1 1  1 
Skip   1 1  0 

或者slowier溶液crosstab

df = pd.crosstab(df['Test Result'], df['Risk Rating']) 
print (df) 
Risk Rating High Low Medium 
Test Result     
Fail   1 1  2 
Pass   1 1  1 
Skip   1 1  0 

如果需要mi與0 ssing值添加stack:。

df = df.groupby(['Test Result', 'Risk Rating']) 
     .size() 
     .unstack(fill_value=0) 
     .stack() 
     .reset_index(name='counts') 
print (df) 
    Test Result Risk Rating counts 
0  Fail  High  1 
1  Fail   Low  1 
2  Fail  Medium  2 
3  Pass  High  1 
4  Pass   Low  1 
5  Pass  Medium  1 
6  Skip  High  1 
7  Skip   Low  1 
8  Skip  Medium  0 
+0

thanks..I正在使用DF = df.groupby([ '測試結果', '風險評級'])尺寸()出棧(fill_value = 0),但不能夠從df的結果中獲得特定的值。例如。我只需要'高','低','中'值的'失敗'值。 –

+0

我認爲你需要['布爾索引'](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing) – jezrael