for循環到列表理解

嘿所有，我有一些代碼來讀取文件中的某些行，並想知道它是否會作爲列表理解或生成器表達式/函數運行得更快。如果它運行得更快，代碼將如何查看？仍在學習Python。感謝您的幫助for循環到列表理解

input = open('C:/.../list.txt', 'r') 
output = open('C:/.../output.txt', 'w') 

x=0 

for line in input: 
    x = x+1 
    if x > 2 and x < 5: 
     output.write(line)

列表文件在新文件中

輸出

3 
4

來源

2011-03-17 John_U262D

爲什麼性能是這個問題？如果問題成爲問題，你是不是應該學習如何編寫可理解和可維護的代碼，並擔心性能問題。如果可以的話， – 2011-03-17 19:41:28

+10給@David。另外，無論如何處理內存中的數據，文件I/O都很慢。 – delnan 2011-03-17 19:43:43

無需列表理解。

output.write(''.join(itertools.islice(inputfile, 2, 4))

來源

2011-03-17 19:39:35

整潔。唯一的潛在缺陷是，這會將整個數據保存在內存中，所以如果「停止 - 啓動」很大...... – delnan 2011-03-17 19:45:21

如果你想與發電機做到這一點：

output.writelines(line for line in input if 2 < int(line) < 5)

來源

2011-03-17 19:46:17 Narcolei

我不知道快速，但是這會佔用較少的內存，因爲它只能在一個期限一次。 – theheadofabroom 2011-03-17 19:52:17

@Jochen：謝謝，我修好了。 – Narcolei 2011-03-17 19:55:13

這會創建OP代碼中不存在的輸入文件內容的依賴關係。 – martineau 2011-03-17 20:55:06

不是更快，但如果你想使用列表理解：

output.writelines([line for (x, line) in enumerate(input) if 1 < x < 4])

這裏假設你正在使用文件位置的實際行數，而不是文件中的讀取值（根據您對x的賦值來判斷是真的）。

來源

2011-03-17 20:04:19

您特別詢問了關於生成器與列表理解的問題，但總的來說，有一些解決問題的方法。

發電機版本：

input = open('input.txt', 'r') 
output = open('output.txt', 'w') 

def gen() : 
    for line in input : 
     yield "FOO " + line 

for l in gen() : 
    output.write(l)

列表理解：

output.writelines("FOO " + line for line in input)

迭代器風格：

class GenClass(object) : 
    def __init__(self, _in) : 
     self.input = _in 

    def __iter__(self): 
     return self 

    def next(self) : 
     line = self.input.readline() 
     if len(line) == 0 : 
      raise StopIteration 
     return "FOO " + line 

output.writelines(GenClass(input))

思考：

列表解析會擁有一切在內存
列表解析會限制的代碼量（功能ONELINE）
發生器是在編碼實踐
迭代器風格更加靈活，爲您提供了可能是最靈活
稍微高一點的初始化成本（物體）

來源

2011-03-17 20:21:13 koblas

找出最快的方法是測試它！

在這段代碼中，我假設你關心行的值，而不是哪行號。

import timeit 

def test_comprehension(): 
    input = open('list.txt') 
    output = open('output.txt','w') 
    [output.write(x) for x in input if int(x) > 2 and int(x) < 5] 

def test_forloop(): 
    input = open('list.txt') 
    output = open('output.txt','w') 

    for x in input: 
     if int(x) > 2 and int(x) < 5: 
      output.write(x) 

if __name__=='__main__': 
    times = 10000 

    from timeit import Timer 
    t = Timer("test_comprehension()", "from __main__ import test_comprehension") 
    print "Comprehension: %s" % t.timeit(times) 

    t = Timer("test_forloop()", "from __main__ import test_forloop") 
    print "For Loop: %s" % t.timeit(times)

在此我只設置了幾個功能，一個是與列表理解這樣做，而另一個做它作爲一個for循環。 timeit模塊按您指定的次數運行小代碼，對其進行計時並返回運行所花費的時間。所以，如果你運行上面的代碼，你會得到的東西線沿線的輸出：

理解：0.957081079483 For循環：0.956691980362

令人沮喪的是，這是大致相同的兩種方式。

來源

2011-03-17 20:32:54

可能是因爲它的I/O限制... – martineau 2011-03-17 20:56:13

def copyLines(infname, outfname, lines): 
    lines = list(set(lines)) # remove duplicates 
    lines.sort(reverse=True) 
    with open(infname, 'r') as inf, open(outfname, 'w') as outf: 
     try: 
      i = 1 
      while lines: 
       seek = lines.pop() 
       while i<seek: 
        inf.next() 
        i += 1 
       outf.write(inf.next()) 
       i += 1 
     except StopIteration: # hit end of file 
      pass 

def main(): 
    copyLines('C:/.../list.txt', 'C:/.../output.txt', range(3,5)) 

if __name__=="__main__": 
    main()

請注意，一旦它用完所需的線條就會退出。

來源

2011-03-17 22:10:48

for循環到列表理解

回答

相關問題