使寫入文件的過程更有效

我是編程新手，我正在運行此腳本來清理大文本文件（超過12000行）並將其寫入另一個.txt文件。問題是，當一個較小的文件（大約500行左右）運行時，它的執行速度很快，因此我的結論是由於文件的大小而花費時間。所以，如果有人能指導我使這個代碼有效率，它將不勝感激。使寫入文件的過程更有效

input_file = open('bNEG.txt', 'rt', encoding='utf-8') 
    l_p = LanguageProcessing() 
    sentences=[] 
    for lines in input_file.readlines(): 
     tokeniz = l_p.tokeniz(lines) 
     cleaned_url = l_p.clean_URL(tokeniz) 
     remove_words = l_p.remove_non_englishwords(cleaned_url) 
     stopwords_removed = l_p.remove_stopwords(remove_words) 
     cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n" 
     output_file = open('cNEG.txt', 'w', encoding='utf-8') 
     sentences.append(cleaned_sentence) 
     output_file.writelines(sentences) 
    input_file.close() 
    output_file.close()

編輯：下面是與其他一些改變的答覆中提到，以滿足我的要求

input_file = open('chromehistory_log.txt', 'rt', encoding='utf-8') 
    output_file = open('dNEG.txt', 'w', encoding='utf-8') 
    l_p = LanguageProcessing() 
    #sentences=[] 
    for lines in input_file.readlines(): 
     #print(lines) 
     tokeniz = l_p.tokeniz(lines) 
     cleaned_url = l_p.clean_URL(tokeniz) 
     remove_words = l_p.remove_non_englishwords(cleaned_url) 
     stopwords_removed = l_p.remove_stopwords(remove_words) 
     #print(stopwords_removed) 
     if stopwords_removed==[]: 
      continue 
     else: 
      cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n" 

     #sentences.append(cleaned_sentence) 
     output_file.writelines(cleaned_sentence) 
    input_file.close() 
    output_file.close()

來源

2017-10-11 Steve harvey

您打開每行的output_file。嘗試在循環上方移動「output_file = open（'cNEG.txt'，'w'，encoding ='utf-8'）」。 –

感謝您使用@RalphErdt解決方案進行答覆，但所用時間沒有發生重大變化。 –

噢。我監督了一些事情：您在「句子」中收集所有字符串，並在每個循環中寫入整個字符串。 - > a）只在循環中寫clean_sentence（並且不要收集在「句子」中）b）收集所有內容並在循環後面寫上「句子」。我更喜歡a），因爲它的內存密集程度較低，但速度較慢。 –

要使討論，答案更正後的代碼：

兩個問題這裏：

你打開/創建輸出文件並在循環中寫入數據 - 對於輸入的每一行fil即另外你要收集數組中的所有數據（句子）。

你有兩種可能性：

a）創建循環之前的文件，並在環僅編寫「cleaned_sentence」（刪除收集「句子」）。

b）在循環後一次性收集「句子」並寫出「句子」。

a）的缺點是：這比b）慢一點（只要OS di不必交換b的內存）。但優點是：這種內存消耗少得多，並且無論文件有多大以及計算機內存的安裝量如何，都可以工作。

來源

2017-10-11 08:51:36

正如你所推薦的，我嘗試了兩種方法，但都使用方法（a）。仍然需要很長時間.. –

請發佈更正的代碼。額外的不同文件的時間和行數。 –

我已將上面編輯的代碼添加到 –

使寫入文件的過程更有效

回答

相關問題