這是我在Stack Overflow上的第一篇文章,我有一個關於使用GZ壓縮從TAR文件中提取單個文件的問題。我不是最好的Python,所以我可能會這樣做不正確,任何幫助將不勝感激。處理來自損壞的GZ(TAR)的單個文件提取
場景:
損壞* .tar.gz文件進來,在廣州的第一個文件包含了獲取系統的SN的重要信息。這可以用來識別機器,以便我們可以向管理員發送文件已損壞的通知。
的問題:
使用常規的UNIX焦油二元我能提取剛剛從歸檔中的README文件,即使檔案是不完整的,在充分提取它會返回一個錯誤。但是,在Python中,我無法僅提取一個文件,即使我只指定單個文件,它也會返回一個異常。
目前的解決方法:
我使用「os.popen」使用UNIX焦油二進制爲了獲得公正的README文件。
期望解:
使用Python tar文件包只提取單個文件。
例錯誤:
UNIX(工程):
[[email protected] tmp]# tar -xvzf bundle.tar.gz README
README
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
[[email protected] tmp]#
[[email protected] tmp]# ls
bundle.tar.gz README
的Python:
>>> import tarfile
>>> tar = tarfile.open("bundle.tar.gz")
>>> data = tar.extractfile("README").read()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/tarfile.py", line 1364, in extractfile
tarinfo = self.getmember(member)
File "/usr/lib64/python2.4/tarfile.py", line 1048, in getmember
tarinfo = self._getmember(name)
File "/usr/lib64/python2.4/tarfile.py", line 1762, in _getmember
members = self.getmembers()
File "/usr/lib64/python2.4/tarfile.py", line 1059, in getmembers
self._load() # all members, we first have to
File "/usr/lib64/python2.4/tarfile.py", line 1778, in _load
tarinfo = self.next()
File "/usr/lib64/python2.4/tarfile.py", line 1588, in next
self.fileobj.seek(self.offset)
File "/usr/lib64/python2.4/gzip.py", line 377, in seek
self.read(1024)
File "/usr/lib64/python2.4/gzip.py", line 225, in read
self._read(readsize)
File "/usr/lib64/python2.4/gzip.py", line 273, in _read
self._read_eof()
File "/usr/lib64/python2.4/gzip.py", line 309, in _read_eof
raise IOError, "CRC check failed"
IOError: CRC check failed
>>> print data
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'data' is not defined
的Python(處理異常):
>>> tar = tarfile.open("bundle.tar.gz")
>>> try:
... data = tar.extractfile("README").read()
... except:
... pass
...
>>> print(data)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'data' is not defined
查看tarfile.py代碼,extractfile調用最終調用getmembers的getmember。 getmembers掃描整個tar文件,當它遇到EOF/Corrupted時,gzip會吱吱作響。嘗試提供一個已經解壓縮的流,以便crc異常不會被提取出來。 – kevpie 2010-12-04 04:32:32