加速處理時間，同時從較大的文件創建較小的文件

我有一個小腳本，它將運行約300,000個單詞的單詞表並創建與該文件完全相同的999KB文件。這工作完美，但它的速度非常緩慢，因爲我每次迭代打開文件，我如何解決這個腳本到哪裏它將具有完全相同的行爲，但會加快處理時間？加速處理時間，同時從較大的文件創建較小的文件

import os 
import hashlib 


data = [] 
count = 1 


with open("dicts/included_dicts/dictionaries/000webhost.txt") as a: 
    for line in a.readlines(): 
     h = hashlib.md5() 
     h.update(line.strip()) 
     data.append(h.hexdigest() + ": " + line.strip() + "\n") 

for item in data: 
    with open("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count), "a+") as b: 
     if os.stat("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count)).st_size <= 1022162L: 
      b.write(item) 
     else: 
      count += 1

來源

2016-12-03 papasmurf

嘗試存儲外循環變量：

b = open("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count), "a+") 
for item in data: 
    if os.stat("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count)).st_size <= 1022162L: 
     b.write(item) 
    else: 
     count += 1 
     b.close() 
     b = open("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count), "a+")

來源

2016-12-03 22:53:37

這可能會工作大聲笑。 – papasmurf

這可行，但它不會產生相同的行爲，文件大小是1003KB而不是999KB – papasmurf

@papasmurf嘗試減小大小比較值。 –

只是使字符串和寫。

另一種方法，猜這個答案可能取決於平臺，不知道其他系統上的空文件大小，然後linux。

cache = "" 
count = 1 
for item in data: 
    cache += item+"\n" 
    if sys.getsizeof(cache+item+"\n") > 999999-4: 
     with open("dicts/included_dicts/rainbowtables/md5_{}.rtc".format(count), "w") as b: 
      b.write(cache) 
     count += 1 
     cache = item

來源

2016-12-03 23:25:36 Simon

加速處理時間，同時從較大的文件創建較小的文件

回答

相關問題