Python - 使用readlines處理第n行跳（）

我正在修復一個我想在Github上使用的斷開的lib。Python - 使用readlines處理第n行跳（）

我已經在本地「修復」了這個問題。但我不認爲它是一個非常乾淨的方法...

我正在戳WARC庫的互聯網檔案，特別是arc.py部分（https://github.com/internetarchive/warc/blob/master/warc/arc.py）。

由於編寫了lib，導致ARC文件的工具發生了一些變化，因此內置解析器失敗，因爲它不期望在文件中看到一些元數據。

我的本地修訂如下：

if header.startswith("<arcmetadata"): 
     while not header.endswith("</arcmetadata>\n"): 
      header = self.fileobj.readline() 
     header = self.fileobj.readline() 
     header = self.fileobj.readline()

而且我不知道我的電話readlines()兩次剝離下兩個空行（包含"/n"是通過文件對象推進的最徹底的方法。

這是很好的Python或是否有更好的辦法

來源

2013-11-25 Jay Gattuso

的代碼看起來像一個複製/粘貼錯誤。沒有什麼錯誤使用.readline()，只是記錄你在做什麼：

# skip metadata 
if header.startswith("<arcmetadata"): 
    while not header.endswith("</arcmetadata>\n"): 
     header = self.fileobj.readline() 
    #NOTE: header ends with `"</arc..."` here i.e., it is not blank 

# skip blank lines 
while not header.strip(): 
    header = self.fileobj.readline()

順便說一句，如果該文件包含XML，然後使用XML解析器解析它。不要用手去做。

來源

2013-11-26 00:39:21 jfs

雖然沒有什麼內在的錯誤，你在做什麼，它可能是更多的語義寫：？？

next(self.fileobj, None)

沒有變量賦值來表示您正在折騰下一行。

來源

2013-11-25 22:36:54

[不要將'.readline（）'（用於'arc.py'）等文件方法作爲迭代器訪問文件（'next（）'）]（http：// stackoverflow.com/q/4762262/4279）。 – jfs

Ahhhh ...呃。 –

itertools可能使用這裏

from itertools import islice, dropwhile 
if header.startswith("<arcmetadata"): 
    fileobj = dropwhile(lambda x: not x.endswith("</arcmetadata>\n"), fileobj) 
    fileobj = islice(fileobj, 2, None)

來源

2013-11-26 00:19:02 iruvar

Python - 使用readlines處理第n行跳（）

回答

相關問題