閱讀使用熊貓和python數據的重複塊

我有下列數據的文件：閱讀使用熊貓和python數據的重複塊

 2008 1 1  ATMOS CO2 = 382. ppm 
                SOIL LAYER NO 
         1   1   2   3   4   TOT 
     DEPTH(m)  0.01  0.10  0.33  0.64  0.81 
BD 33kpa(t/m3)  1.48  1.48  1.48  1.50  1.53 
     SAND(%)  82.2  82.2  82.2  66.9  67.4 
     SILT(%)   5.3   5.3   9.8  23.1  19.6 
     CLAY(%)  12.5  12.5   8.0  10.0  13.0 
    WHSC(kg/ha)  525.  4729.  4480.  6119.  1114.  16968. 
    WHPC(kg/ha)  1123.  10104.  9572.  13076.  2381.  36256. 
    WOC(kg/ha)  1717.  15455.  14638.  19995.  3641.   55. 



     2008 12 31  ATMOS CO2 = 382. ppm 
                SOIL LAYER NO 
         1   1   2   3   4   TOT 
     DEPTH(m)  0.01  0.10  0.33  0.64  0.81 
BD 33kpa(t/m3)  1.48  1.48  1.48  1.50  1.53 
     SAND(%)  81.4  81.4  81.4  67.7  67.4 
     SILT(%)   6.5   6.5  10.3  22.3  19.6 
     CLAY(%)  12.1  12.1   8.2  10.0  13.0 
    WHSC(kg/ha)  499.  4559.  4291.  6017.  1117.  16483. 
    WHPC(kg/ha)  1123.  10109.  9576.  13081.  2382.  36271. 
    WOC(kg/ha)  1633.  14757.  13993.  19316.  3601.   53.

每個塊開始於年的日期例如2008 1 1表示2008年1月1日，2008 12 31表示2008年12月31日。

在每個塊中，存在幾個參數的值，例如， DEPTH，SAND（％），WOC等。我想提取給定年份以及日期和月份的用戶的WOC值。 2008 12 31，並且對於特定列例如TOT。我可以讀數據幀，但不知道什麼是在那之後，最好的方法：你必須處理由行文件中的行，然後用StringIO作爲功能read_csv輸入

df = pandas.read_csv('data.txt')

來源

2015-11-19 user308827

，我不認爲你最終會閱讀這一切，一步到位的大熊貓。您可能需要打開文件，可以逐行進行。 –

你需要年，月，日爲3列還是以日期時間爲一個？ – jezrael

。

import pandas as pd 
import numpy as np 
from StringIO import StringIO 
pathToFile = 'test/file.txt' 
f = open(pathToFile) 
s = StringIO() 
cur_atm = np.nan 

for ln in f: 
    #replace multiply spaces to one ; 
    ln = ';'.join(ln.split()) 
    if('ppm' in ln): 
     cur_atm = ln.split(';') 
     #items of list cur_atm 
     print cur_atm 
     #get 2 item from back of list cur_atm 
     cur_atm = cur_atm[-2] 
     continue 
    if (ln.startswith('20')) | (ln.startswith('19')): 
     continue; 
    #remove rows start with string SOIL and 1;1, remove empty rows 
    if (ln.startswith('SOIL')) | (ln.startswith('1;1')) | (ln == ''): 
     continue; 
    if ln.startswith('BD;'): 
     ln = ln.replace('BD;', 'BD ') 
     continue;   

    #print ln to StringIO s 
    s.write(str(cur_atm) + ";" + ln + '\n') 
s.seek(0) 

# create new dataframe with desired column names 
df = pd.read_csv(s, sep=";", index_col=[1], names=['ATM','','1','2','3','4','5', 'TOT'])

print df 
#    ATM  1  2   3   4  5 TOT 
#                  
#DEPTH(m)  382  0.01  0.1  0.33  0.64  0.81 NaN 
#SAND(%)  382 82.20  82.2  82.20  66.90 67.40 NaN 
#SILT(%)  382  5.30  5.3  9.80  23.10 19.60 NaN 
#CLAY(%)  382 12.50  12.5  8.00  10.00 13.00 NaN 
#WHSC(kg/ha) 382 525.00 4729.0 4480.00 6119.00 1114.00 16968 
#WHPC(kg/ha) 382 1123.00 10104.0 9572.00 13076.00 2381.00 36256 
#WOC(kg/ha) 382 1717.00 15455.0 14638.00 19995.00 3641.00  55 
#DEPTH(m)  382  0.01  0.1  0.33  0.64  0.81 NaN 
#SAND(%)  382 81.40  81.4  81.40  67.70 67.40 NaN 
#SILT(%)  382  6.50  6.5  10.30  22.30 19.60 NaN 
#CLAY(%)  382 12.10  12.1  8.20  10.00 13.00 NaN 
#WHSC(kg/ha) 382 499.00 4559.0 4291.00 6017.00 1117.00 16483 
#WHPC(kg/ha) 382 1123.00 10109.0 9576.00 13081.00 2382.00 36271 
#WOC(kg/ha) 382 1633.00 14757.0 13993.00 19316.00 3601.00  53

df = df.loc['WOC(kg/ha)', ['ATM', 'TOT']] 
print df 
#   ATM TOT 
#      
#WOC(kg/ha) 382 55 
#WOC(kg/ha) 382 53

來源

2015-11-19 18:49:34 jezrael

感謝@jezrael，一個疑問：這些年可能在20世紀90年代開始，所以你的'ln.startswith（'20'）'行會失敗。是否有解決方案，而不是檢查ATMOS CO2' – user308827

我的答案已編輯。 – jezrael

閱讀使用熊貓和python數據的重複塊

回答

相關問題