2013-03-14 61 views
2

我有一個大DataFrame如下:如何一個數據幀分成兩個小的

  count mean median min max std 
datet            
2001-05-16  17 NaN  NaN NaN NaN NaN 
2001-05-17  24 8.28 8.27 8.15 8.46 0.09 
2001-05-18  24 8.41 8.31 8.18 8.85 0.19 
2001-05-19  24 10.44 10.64 9.03 10.98 0.60 
2001-05-20  24 10.53 10.56 9.98 10.92 0.28 
2001-05-21  24 10.28 10.31 9.90 10.66 0.23 
2001-05-22  24 10.40 10.42 10.17 10.67 0.17 
2001-05-23  24 10.04 10.03 9.87 10.17 0.08 
2001-05-24  24 9.63 9.66 9.41 9.88 0.15 
2001-05-25  24 9.21 9.22 9.01 9.41 0.11 

我怎麼能按照此分離成DataFrame兩個小問題之前或日期「二○○一年五月二十〇日」後?如下所示:

df1: 
     count mean median min max std 
datet            
2001-05-16  17 NaN  NaN NaN NaN NaN 
2001-05-17  24 8.28 8.27 8.15 8.46 0.09 
2001-05-18  24 8.41 8.31 8.18 8.85 0.19 
2001-05-19  24 10.44 10.64 9.03 10.98 0.60 
2001-05-20  24 10.53 10.56 9.98 10.92 0.28 

df2: 
    count mean median min max std 
datet            
2001-05-21  24 10.28 10.31 9.90 10.66 0.23 
2001-05-22  24 10.40 10.42 10.17 10.67 0.17 
2001-05-23  24 10.04 10.03 9.87 10.17 0.08 
2001-05-24  24 9.63 9.66 9.41 9.88 0.15 
2001-05-25  24 9.21 9.22 9.01 9.41 0.11 

回答

3

對於單個之前/之後的拆分,我認爲通過布爾標準進行分組是最直接的方法。

In [1]: df = DataFrame(np.random.randn(10), 
         index=pd.date_range('2001-05-16', '2001-05-25')) 

In [2]: grouper = df.groupby(df.index < pd.Timestamp('2001-05-21')) 

In [3]: before, after = grouper.get_group(True), grouper.get_group(False) 

In [4]: before 
Out[4]: 
       0 
2001-05-16 2.560516 
2001-05-17 -2.207314 
2001-05-18 0.646882 
2001-05-19 0.660611 
2001-05-20 0.437303 

after也出來了。任何人都可以改進我的In [3]

+0

謝謝你,我認爲這是完美的! ) – wuwucat 2013-03-14 16:53:33

3

0.11 DEV(.IX將等效工作)

In [16]: df.loc[:'20010520'] 
Out[16]: 
        0 
2001-05-16 0.105445 
2001-05-17 1.660771 
2001-05-18 0.485668 
2001-05-19 -0.102616 
2001-05-20 -0.228228 

In [17]: df.loc['20010521':] 
Out[17]: 
        0 
2001-05-21 -0.024324 
2001-05-22 -1.004362 
2001-05-23 2.342225 
2001-05-24 1.124695 
2001-05-25 -0.291302 

或(IX將在這裏工作爲好,這只是更明確)

In [27]: i = df.index.get_loc('20010520') 

In [28]: df.iloc[:i+1] 
Out[28]: 
        0 
2001-05-16 0.105445 
2001-05-17 1.660771 
2001-05-18 0.485668 
2001-05-19 -0.102616 
2001-05-20 -0.228228 

In [29]: df.iloc[i+1:] 
Out[29]: 
        0 
2001-05-21 -0.024324 
2001-05-22 -1.004362 
2001-05-23 2.342225 
2001-05-24 1.124695 
2001-05-25 -0.291302