2017-04-18 66 views
0

我正在嘗試使用包含在tar.gz文件中的csv文件,並且遇到問題將正確的數據/對象傳遞給csv模塊。Python3在tar文件中使用csv文件

說我有一個tar.gz文件,其中包含許多格式化的csv文件,如下所示。

1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30 
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26 
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31 
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38 

我希望能夠訪問內存中的每個csv文件,而不從tar文件中提取的每個文件,並將其寫入磁盤。 例如:

import tarfile 
import csv 

tar = tarfile.open("tar-file.tar.gz") 

for member in tar.getmembers(): 
    f = tar.extractfile(member).read() 
    content = csv.reader(f) 
    for row in content: 
     print(row) 
tar.close() 

這產生了以下錯誤。

for row in content: 
_csv.Error: iterator should return strings, not int (did you open the file in text mode?) 

我也嘗試解析f作爲csv模塊文檔中描述的字符串。

content = csv.reader([f]) 

以上產生相同的錯誤。

我試着解析文件對象f ascii。

f = tar.extractfile(member).read().decode('ascii') 

但這迭代每個csv元素,而不是迭代包含元素列表的行。

['1'] 
['0'] 
['7'] 
['9'] 
['', ''] 
['S'] 
['A'] 
['M'] 
['P'] 
['L'] 
['E'] 
['_'] 
['A'] 
['', ''] 
['G'] 
['R'] 

剪斷...

['2'] 
['0'] 
['1'] 
['7'] 
['/'] 
['0'] 
['2'] 
['/'] 
['1'] 
['5'] 
[' '] 
['2'] 
['2'] 
[':'] 
['5'] 
['7'] 
[':'] 
['3'] 
['8'] 
[] 
[] 

試圖既解析˚F爲ASCII和讀取它作爲一個字符串

f = tar.extractfile(member).read().decode('ascii') 
content = csv.reader([f]) 

產生以下輸出

for row in content: 
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? 

要展示了我用以下方面的不同結果ng代碼。

import tarfile 
import csv 

tar = tarfile.open("tar-file.tar.gz") 

for member in tar.getmembers(): 
    f = tar.extractfile(member).read() 
    print(member.name) 
    print('Raw :', type(f)) 
    print(f) 
    print() 
    f = f.decode('ascii') 
    print('ASCII:', type(f)) 
    print(f) 
tar.close() 

這產生以下輸出。 (每個csv在本例中都包含相同的數據)。

./raw_data/csv-file1.csv 
Raw : <class 'bytes'> 
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n' 

ASCII: <class 'str'> 
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30 
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26 
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31 
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38 


./raw_data/csv-file2.csv 
Raw : <class 'bytes'> 
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n' 

ASCII: <class 'str'> 
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30 
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26 
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31 
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38 


./raw_data/csv-file3.csv 
Raw : <class 'bytes'> 
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n' 

ASCII: <class 'str'> 
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30 
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26 
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31 
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38 

如何讓csv模塊正確讀取tar模塊提供的內存中的文件? 謝謝。

回答

2

你只需要使用io.StringIO()來產生一個類似csv庫的對象的文件來使用。例如:

import tarfile 
import csv 
import io 

with tarfile.open('input.rar') as tar: 
    for member in tar: 
     if member.isreg():  # Is it a regular file? 
      print("{} - {} bytes".format(member.name, member.size)) 
      csv_file = io.StringIO(tar.extractfile(member).read().decode('ascii')) 

      for row in csv.reader(csv_file): 
       print(row) 
+0

感謝馬丁,這很好地訣竅。 – Pobbel