2017-03-02 116 views
0

我想通過選擇日期來從數據幀中獲取最小值和最小值的均值。如何從Pandas DataFrame中使用DateTime獲得最小值的均值

從這個數據幀:

    2  chk2  chk3  val 
0             
2016-08-01 31.340000 2016-05-09 2016-08-08 18.605 
2016-08-02 32.359999 2016-05-09 2016-08-08 18.605 
2016-08-03 32.089001 2016-05-09 2016-08-08 18.605 
2016-08-04 31.194001 2016-05-09 2016-08-08 18.605 
2016-08-05 30.585000 2016-05-09 2016-08-08 18.605 
2016-08-08 20.490000 2016-05-09 2016-08-08 18.605 
2016-08-09 20.135000 2016-08-08 2016-11-21 18.605 
2016-08-10 19.103000 2016-08-08 2016-11-21 18.605 
2016-08-11 19.452000 2016-08-08 2016-11-21 18.605 
2016-08-12 19.241001 2016-08-08 2016-11-21 18.605 
2016-08-15 19.645000 2016-08-08 2016-11-21 18.605 
2016-08-16 20.124000 2016-08-08 2016-11-21 18.605 
2016-08-17 19.863001 2016-08-08 2016-11-21 18.605 
2016-08-18 19.667999 2016-08-08 2016-11-21 18.605 
2016-08-19 19.083001 2016-08-08 2016-11-21 18.605 
2016-08-22 18.163000 2016-08-08 2016-11-21 18.605 
2016-08-23 18.948001 2016-08-08 2016-11-21 18.605 
2016-08-24 19.329999 2016-08-08 2016-11-21 18.605 
2016-08-25 19.735999 2016-08-08 2016-11-21 18.605 
2016-08-26 19.769999 2016-08-08 2016-11-21 18.605 
2016-08-29 18.704000 2016-08-08 2016-11-21 18.605 
2016-08-30 19.756000 2016-08-08 2016-11-21 18.605 
2016-08-31 19.931000 2016-08-08 2016-11-21 18.605 

這給了我整個數據幀的nsmallest,似乎則會忽略在chk2chk3日的第一週

df.query('chk2 <= index <= chk3')[2].nsmallest(3) 

0 
2016-08-22 18.163000 
2016-08-29 18.704000 
2016-08-23 18.948001 
Name: 2, dtype: float64 

應用此功能後,已經改變 - - 似乎在第一週的日期變化。

def _test(row): 
#  df.query('chk2 <= index <= chk3')[2].nsmallest(3).mean() 
    return df.query('chk2 <= index <= chk3')[2].nsmallest(3).mean() 

    #return df.query('row[1] <= index <= row[2]')[2].nsmallest(3).mean() 
    #UndefinedVariableError: ("name 'row' is not defined", u'occurred at index 2016-08-01 00:00:00') 


df.info() 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 23 entries, 2016-08-01 to 2016-08-31 
Data columns (total 3 columns): 
2  23 non-null float64 
chk2 23 non-null datetime64[ns] 
chk3 23 non-null datetime64[ns] 
dtypes: datetime64[ns](2), float64(1) 
memory usage: 736.0 bytes 

回答

1

如果我理解正確的,我想你可以使用groupby來獲得日期更改,然後transform對這些組執行您的操作。

(df.query('chk2 <= index <= chk3').groupby(['chk2', 'chk3']) 
            .transform(lambda x: x.nsmallest(3).mean()) 

演示

>>> df 
        2  chk2  chk3 
2016-08-01 31.340000 2016-05-09 2016-08-08 
2016-08-02 32.359999 2016-05-09 2016-08-08 
... 
2016-08-30 19.756000 2016-08-08 2016-11-21 
2016-08-31 19.931000 2016-08-08 2016-11-21 

>>> (df.query('chk2 <= index <= chk3').groupby(['chk2', 'chk3']) 
             .transform(lambda x: x.nsmallest(3).mean()) 
       2 
2016-08-01 27.423 
2016-08-02 27.423 
2016-08-03 27.423 
2016-08-04 27.423 
2016-08-05 27.423 
2016-08-08 27.423 
2016-08-09 18.605 
2016-08-10 18.605 
2016-08-11 18.605 
2016-08-12 18.605 
2016-08-15 18.605 
2016-08-16 18.605 
2016-08-17 18.605 
2016-08-18 18.605 
2016-08-19 18.605 
2016-08-22 18.605 
2016-08-23 18.605 
2016-08-24 18.605 
2016-08-25 18.605 
2016-08-26 18.605 
2016-08-29 18.605 
2016-08-30 18.605 
2016-08-31 18.605 
相關問題