2016-12-05 24 views
-2

我有一個數據幀是從groupby調用結果獲取從Python的數據幀基於指數

test=uniqueStudents.groupby(['index1','index2']).count() 

test.head(10) 

我期待在那裏我發現整個索引1

計數輸出的平均獲得一個總平均值

結果和期望的輸出示於下

電流/所需的輸出繼電器:

Current/Desired Ouput

有人可以幫我用python代碼來實現這個嗎?或者還有其他方法可以從數據集中獲取嗎?

回答

1

groupby方法中使用level參數,該方法可以採用索引的名稱。

test.groupby(level='index1').mean() 

此外,您可以重置指數和做的by參數正常GROUPBY。

test.reset_index().groupby('index1').mean() 
0

您需要通過index1水平groupby和總GroupBy.mean,然後按列得到DataFrame.mean

test = pd.DataFrame({'column4': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column10': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column3': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column8': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column11': {('01-06-15', 278658): 22.0, ('01-06-15', 206905): 101.0, ('02-06-15', 225800): 308.0, ('02-06-15', 225596): 19.0, ('01-06-15', 152551): 64.0, ('01-06-15', 124337): 54.0, ('02-06-15', 235369): 7.0, ('01-06-15', 31883): 124.0, ('03-06-15', 124337): np.nan}, 'column5': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column7': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 3, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column2': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column1': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column6': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column9': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}}) 
test.index.names = ['index1','index2'] 
test = test[['column'+str(col) for col in range(1,12)]] 
print (test) 
       column1 column2 column3 column4 column5 column6 \ 
index1 index2               
01-06-15 31883  124  124  124  124  124  124 
     124337  54  54  54  54  54  54 
     152551  64  64  64  64  64  64 
     206905  101  101  101  101  101  101 
     278658  22  22  22  22  22  22 
02-06-15 225596  19  19  19  19  19  19 
     225800  308  308  308  308  308  308 
     235369  7  7  7  7  7  7 
03-06-15 124337  17  17  17  17  17  17 

       column7 column8 column9 column10 column11 
index1 index2             
01-06-15 31883  124  62.0  62.0  62.0  124.0 
     124337  54  21.0  21.0  21.0  54.0 
     152551  64  55.0  55.0  55.0  64.0 
     206905  101  60.0  60.0  60.0  101.0 
     278658  22  17.0  17.0  17.0  22.0 
02-06-15 225596  19  15.0  15.0  15.0  19.0 
     225800  308 280.0 280.0  280.0  308.0 
     235369  3  3.0  3.0  3.0  7.0 
03-06-15 124337  17  NaN  NaN  NaN  NaN 
df = test.groupby(level='index1').mean().mean(axis=1).reset_index(name='val') 
print (df) 
    index1   val 
0 01-06-15 57.818182 
1 02-06-15 107.939394 
2 03-06-15 17.000000 

另一種解決方案是第一mean按列,然後groupby

df = test.mean(axis=1).groupby(level='index1').mean().reset_index(name='val') 
print (df) 
    index1   val 
0 01-06-15 57.818182 
1 02-06-15 107.939394 
2 03-06-15 17.000000