2015-10-06 61 views
2

我想用我的熊貓重新索引功能填補我時間系列數據中缺失的行。 我的數據是這樣的:用熊貓重新索引功能填補缺失的數據行

100,2007,239,4,29.588,-30.851,-999.0,-999.0,-999.0,-999.00,13.125,-999.00 
100,2007,239,5,29.573,-30.843,-999.0,-999.0,-999.0,-999.00,13.126,-999.00 
100,2007,239,14,29.389,-30.880,-999.0,-999.0,-999.0,-999.00,13.131,-999.00 
100,2007,239,15,29.367,-30.901,-999.0,-999.0,-999.0,-999.00,13.131,-999.00 
100,2007,239,24,29.374,-30.920,-999.0,-999.0,-999.0,-999.00,13.135,-999.00 
             . 
             . 

這一天與第四列指示一個分時段的時間序列數據。對於正常的時間序列指標不太可能,該數據的時間索引看起來像0到59,100到159 .... 2300到2359,因爲1天是24小時,1小時是60分鐘。所以,填充「男」值的差距,我提出的代碼波紋管:

S = [] 
for i in range(0,24): 

    s = np.arange(i*100,i*100+60) 
    s = list(s) 
S = S + s 

pd.set_option('max_rows',10) 
for INPUT in FileList: 
    output = INPUT + "result" # set the output files 
    data=pd.read_csv(INPUT,sep=',',index_col=[3],parse_dates=[3]) 
    index = 'S'#make the reference index to fill 
    df = data 
    sk_f = df.reindex(index)  
    sk_f.to_csv(output,na_rep='nan') 

通過該代碼,我意要填補的「男」的行中的間隙設在第四列中的指數之以下S是參考指標。 但結果是「南」的只是行,而不是填充間隙如下:

,100,2007,241,22.471,-31.002,-999.0,-999.0.1,-999.0.2,-999.00,13.294,-999.00 .1 
0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
1,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
2,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
3,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
4,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
5,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
6,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
7,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
8,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
9,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
10,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
11,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 

我的期望是,以填補在原始數據丟失線的差距。例如,在原始數據中,0到3索引行之間沒有低位。所以我想用原始數據格式填充這些行。 我可能會錯過一些東西。 如果你能提供任何想法或幫助,我會非常感激。

謝謝 艾薩克

回答

1

首先,我找到創建列表S = S + s問題縮進。你必須使用,因爲列表S只保留最後s

S = [] 
for i in range(0,24): 

    s = np.arange(i*100,i*100+60) 
    s = list(s) 
S = S + s #keep only last s 

到:

S = [] 
for i in range(0,24): 
    s = np.arange(i*100,i*100+60) 
    s = list(s) 
    S = S + s 

或更短:

S = [] 
for i in range(0,24): 
    S = S + list(np.arange(i*100,i*100+60)) 

下是有問題的index = 'S'我認爲,這是錯字和它可以是index = S。 您可以添加功能bfill()並向後填充空白。 link

sk_f = df.reindex(index).bfill() 

代碼:

import pandas as pd 
import numpy as np 
import io 

S = [] 
for i in range(0,24): 
    S = S + list(np.arange(i*100,i*100+60)) 

#original data 
temp=u"""100,2007,239,4,29.588,-30.851,-999.0,-999.0,-999.0,-999.00,13.125,-999.00 
100,2007,239,5,29.573,-30.843,-999.0,-999.0,-999.0,-999.00,13.126,-999.00 
100,2007,239,14,29.389,-30.880,-999.0,-999.0,-999.0,-999.00,13.131,-999.00 
100,2007,239,15,29.367,-30.901,-999.0,-999.0,-999.0,-999.00,13.131,-999.00 
100,2007,239,24,29.374,-30.920,-999.0,-999.0,-999.0,-999.00,13.135,-999.00""" 

#pd.set_option('max_rows',10) 

data=pd.read_csv(io.StringIO(temp),sep=',', header=None, index_col=[3], parse_dates=[3]) 
data.index.name = None 
print data 

#  0  1 2  4  5 6 7 8 9  10 11 
#4 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#5 100 2007 239 29.573 -30.843 -999 -999 -999 -999 13.126 -999 
#14 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#15 100 2007 239 29.367 -30.901 -999 -999 -999 -999 13.131 -999 
#24 100 2007 239 29.374 -30.920 -999 -999 -999 -999 13.135 -999 

index = S #make the reference index to fill 
df = data 
sk_f = df.reindex(index).bfill() 

print sk_f.head(20) 
#  0  1 2  4  5 6 7 8 9  10 11 
#0 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#1 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#2 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#3 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#4 100 2007 239 29.588 -30.851 -999 -999 -999 -999 13.125 -999 
#5 100 2007 239 29.573 -30.843 -999 -999 -999 -999 13.126 -999 
#6 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#7 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#8 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#9 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#10 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#11 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#12 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#13 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#14 100 2007 239 29.389 -30.880 -999 -999 -999 -999 13.131 -999 
#15 100 2007 239 29.367 -30.901 -999 -999 -999 -999 13.131 -999 
#16 100 2007 239 29.374 -30.920 -999 -999 -999 -999 13.135 -999 
#17 100 2007 239 29.374 -30.920 -999 -999 -999 -999 13.135 -999 
#18 100 2007 239 29.374 -30.920 -999 -999 -999 -999 13.135 -999 
#19 100 2007 239 29.374 -30.920 -999 -999 -999 -999 13.135 -999