從具有熊貓的文件中讀取特定的日期行python

我試圖讀取許多文件。每個文件是每10分鐘一次的數據文件。每個文件中的數據是怎麼樣的「分塊了」這樣的：從具有熊貓的文件中讀取特定的日期行python

2015-11-08 00:10:00 00:10:00 
# z speed dir  W sigW  bck error 
30 3.32 111.9 0.15 0.12 1.50E+05  0 
40 3.85 108.2 0.07 0.14 7.75E+04  0 
50 4.20 107.9 0.06 0.15 4.73E+04  0 
60 4.16 108.5 0.03 0.19 2.73E+04  0 
70 4.06 93.6 0.03 0.23 9.07E+04  0 
80 4.06 93.8 0.07 0.28 1.36E+05  0 

2015-11-08 00:20:00 00:10:00 
# z speed dir  W sigW  bck error 
30 3.79 120.9 0.15 0.11 7.79E+05  0 
40 4.36 115.6 0.04 0.13 2.42E+05  0 
50 4.71 113.6 0.07 0.14 6.84E+04  0 
60 5.00 113.3 0.13 0.17 1.16E+04  0 
70 4.29 94.2 0.22 0.20 1.38E+05  0 
80 4.54 94.1 0.11 0.25 1.76E+05  0 

2015-11-08 00:30:00 00:10:00 
# z speed dir  W sigW  bck error 
30 3.86 113.6 0.13 0.10 2.68E+05  0 
40 4.34 116.1 0.09 0.11 1.41E+05  0 
50 5.02 112.8 0.04 0.12 7.28E+04  0 
60 5.36 110.5 0.01 0.14 5.81E+04  0 
70 4.67 95.4 0.14 0.16 7.69E+04  0 
80 4.56 95.0 0.15 0.21 9.84E+04  0 

...

的文件繼續這樣下去，每10分鐘一整天。該文件的文件名是151108.mnd。我希望我的代碼能夠讀取所有11月份的文件，因此1511 ??。mnd和我希望我的代碼在整個月的每一天文件中讀取所有的日期時間行，所以對於我剛剛展示的部分數據文件示例我想要我的代碼抓取2015-11-08 00:10:00，2015-11-08 00:20:00，2015-11-08 00:30:00等存儲爲變量，然後轉到第二天的文件（151109.mnd），並抓住所有的日期時間行和存儲爲日期變量，並追加到以前存儲的日期。等等等整個月。這裏是我的代碼至今：

import pandas as pd 
import glob 
import datetime 

filename = glob.glob('1511??.mnd') 
data_nov15_hereford = pd.DataFrame() 
frames = [] 
dates = [] 
counter = 1 
for i in filename: 
    f_nov15_hereford = pd.read_csv(i, skiprows = 32) 
    for line in f_nov15_hereford: 
     if line.startswith("20"): 
      print line 
      date_object = datetime.datetime.strptime(line[:-6], '%Y-%m-%d %H:%M:%S %f') 
      dates.append(date_object) 
      counter = 0 
     else: 
      counter += 1 
    frames.append(f_nov15_hereford) 
data_nov15_hereford = pd.concat(frames,ignore_index=True) 
data_nov15_hereford = data_nov15_hereford.convert_objects(convert_numeric=True) 


print dates

此代碼有一些問題，因爲當我打印日期，它打印出每次約會的兩個副本，它也只能打印出每一個文件，以便2015-11的第一次約會-08 00:10:00，2015-11-09 00:10:00等等。它不會在每個文件中一行一行，然後一旦該文件中的所有日期都存儲到下一個文件我想要。相反，它只是抓住每個文件中的第一個日期。有關此代碼的任何幫助？有沒有更簡單的方法去做我想要的？謝謝！

來源

2016-03-01 HM14

幾個意見：

第一：爲什麼你只得到一個文件中的第一次約會：

f_nov15_hereford = pd.read_csv(i, skiprows = 32) 
for line in f_nov15_hereford: 
    if line.startswith("20"):

第一行讀取該文件，進入大熊貓數據幀。第二行遍歷數據框的列，而不是行。因此，最後一行檢查列是否以「20」開頭。這隻會發生一次每個文件。

第二：counter被初始化，它的值被改變，但它從來沒有被使用過。我認爲它是用來跳過文件中的行。

第三：將所有日期收集到Python列表中，然後在需要時將其轉換爲熊貓數據框可能會更簡單。

import pandas as pd 
import glob 
import datetime as dt 

# number of lines to skip before the first date 
offset = 32 

# number of lines from one date to the next 
recordlength = 9 

pattern = '1511??.mnd' 

dates = [] 

for filename in glob.iglob(pattern): 

    with open(filename) as datafile: 

     count = -offset 
     for line in datafile: 
      if count == 0: 
       fmt = '%Y-%m-%d %H:%M:%S %f' 
       date_object = dt.datetime.strptime(line[:-6], fmt) 
       dates.append(date_object) 

      count += 1 

      if count == recordlength: 
       count = 0 

data_nov15_hereford = pd.DataFrame(dates, columns=['Dates']) 

print dates

來源

2016-03-02 08:19:04 RootTwo

這似乎很好！我唯一的抱怨是，當我打印日期它仍然給我2套。或者如果我打印np。形狀（日期）我得到兩個形狀（2046L，）（2046L，） – HM14

沒關係，我認爲這是我的筆記本問題，而不是代碼！非常感謝！ – HM14

考慮在讀入數據框之前逐行修改csv數據。下面打開glob列表中的原始文件，並寫入移到日期到最後一列的臨時文件，刪除多個標題和空行。

CSV數據（假設csv文件的文本視圖看起來像以下;如果不是實際不同，調整PY代碼）

2015-11-0800:10:0000:10:00,,,,,, 
z,speed,dir,W,sigW,bck,error 
30,3.32,111.9,0.15,0.12,1.50E+05,0 
40,3.85,108.2,0.07,0.14,7.75E+04,0 
50,4.2,107.9,0.06,0.15,4.73E+04,0 
60,4.16,108.5,0.03,0.19,2.73E+04,0 
70,4.06,93.6,0.03,0.23,9.07E+04,0 
80,4.06,93.8,0.07,0.28,1.36E+05,0 
,,,,,, 
2015-11-0800:10:0000:20:00,,,,,, 
z,speed,dir,W,sigW,bck,error 
30,3.79,120.9,0.15,0.11,7.79E+05,0 
40,4.36,115.6,0.04,0.13,2.42E+05,0 
50,4.71,113.6,0.07,0.14,6.84E+04,0 
60,5,113.3,0.13,0.17,1.16E+04,0 
70,4.29,94.2,0.22,0.2,1.38E+05,0 
80,4.54,94.1,0.11,0.25,1.76E+05,0 
,,,,,, 
2015-11-0800:10:0000:30:00,,,,,, 
z,speed,dir,W,sigW,bck,error 
30,3.86,113.6,0.13,0.1,2.68E+05,0 
40,4.34,116.1,0.09,0.11,1.41E+05,0 
50,5.02,112.8,0.04,0.12,7.28E+04,0 
60,5.36,110.5,0.01,0.14,5.81E+04,0 
70,4.67,95.4,0.14,0.16,7.69E+04,0 
80,4.56,95,0.15,0.21,9.84E+04,0

的Python腳本

import glob, os 
import pandas as pd 

filenames = glob.glob('1511??.mnd') 
temp = 'temp.csv' 

# INITIATE EMPTY DATAFRAME 
data_nov15_hereford = pd.DataFrame(columns=['z', 'speed', 'dir', 'W', 
              'sigW', 'bck', 'error', 'date']) 

# ITERATE THROUGH EACH FILE IN GLOB LIST 
for file in filenames:   
    # DELETE PRIOR TEMP VERSION      
    if os.path.exists(temp): os.remove(temp) 

    header = 0 
    # READ IN ORIGINAL CSV 
    with open(file, 'r') as txt1: 
     for rline in txt1: 
      # SAVE DATE VALUE THEN SKIP ROW 
      if "2015-11" in rline: date = rline.replace(',',''); continue 

      # SKIP BLANK LINES (CHANGE IF NO COMMAS)    
      if rline == ',,,,,,\n': continue 

      # ADD NEW 'DATE' COLUMN AND SKIP OTHER HEADER LINES 
      if 'z,speed,dir,W,sigW,bck,error' in rline: 
       if header == 1: continue 
       rline = rline.replace('\n', ',date\n') 
       with open(temp, 'a') as txt2: 
        txt2.write(rline) 
       continue 
      header = 1 

      # APPEND LINE TO TEMP CSV WITH DATE VALUE 
      with open(temp, 'a') as txt2: 
       txt2.write(rline.replace('\n', ','+date)) 

    # APPEND TEMP FILE TO DATA FRAME 
    data_nov15_hereford = data_nov15_hereford.append(pd.read_csv(temp))

輸出

點

 z speed dir  W sigW  bck error      date 
0 30 3.32 111.9 0.15 0.12 150000  0 2015-11-0800:10:0000:10:00 
1 40 3.85 108.2 0.07 0.14 77500  0 2015-11-0800:10:0000:10:00 
2 50 4.20 107.9 0.06 0.15 47300  0 2015-11-0800:10:0000:10:00 
3 60 4.16 108.5 0.03 0.19 27300  0 2015-11-0800:10:0000:10:00 
4 70 4.06 93.6 0.03 0.23 90700  0 2015-11-0800:10:0000:10:00 
5 80 4.06 93.8 0.07 0.28 136000  0 2015-11-0800:10:0000:10:00 
6 30 3.79 120.9 0.15 0.11 779000  0 2015-11-0800:10:0000:20:00 
7 40 4.36 115.6 0.04 0.13 242000  0 2015-11-0800:10:0000:20:00 
8 50 4.71 113.6 0.07 0.14 68400  0 2015-11-0800:10:0000:20:00 
9 60 5.00 113.3 0.13 0.17 11600  0 2015-11-0800:10:0000:20:00 
10 70 4.29 94.2 0.22 0.20 138000  0 2015-11-0800:10:0000:20:00 
11 80 4.54 94.1 0.11 0.25 176000  0 2015-11-0800:10:0000:20:00 
12 30 3.86 113.6 0.13 0.10 268000  0 2015-11-0800:10:0000:30:00 
13 40 4.34 116.1 0.09 0.11 141000  0 2015-11-0800:10:0000:30:00 
14 50 5.02 112.8 0.04 0.12 72800  0 2015-11-0800:10:0000:30:00 
15 60 5.36 110.5 0.01 0.14 58100  0 2015-11-0800:10:0000:30:00 
16 70 4.67 95.4 0.14 0.16 76900  0 2015-11-0800:10:0000:30:00 
17 80 4.56 95.0 0.15 0.21 98400  0 2015-11-0800:10:0000:30:00

來源

2016-03-01 22:38:36 Parfait

這非常有用！謝謝！ – HM14

從具有熊貓的文件中讀取特定的日期行python

回答

相關問題