在多個文件中計算不同的字符串

我想在我的路徑/ test /中的文件列表（.txt）中計算一個笑臉列表。在多個文件中計算不同的字符串

這是我的方法來計算所有文件中的笑臉。

def count_string_occurrence(): 
     import os 
     total = 0 
     x = 0 
     for file in os.listdir("C:/users/M/Desktop/test"): 
       if file.endswith(".txt"): 
        string = ":)" #define search term 
        f=open(file,encoding="utf8") 
        contents = f.read() 
        f.close() 
        x=contents.count(string) 
        total +=int(x) #calculate occurance of smiley in all files 
     print("Number of " + string + " in all files equals " + str(total)) 

    count_string_occurrence()

我現在循環不同的表情和如何打印每個笑臉seperately結果呢？由於我已經通過不同的文件循環，它變得複雜。

來源

2017-04-18 M. H.

你說的是什麼意思你想計算表情符號像'：D'，';）'，'：）'等等？ – blacksite

我的意思是我想讓腳本計算大約20個笑臉的數量，並輸出每個文件中「所有文件中X的數量等於___________」（X =笑臉）。笑臉包括:)，:-)，：]和一些正面和負面笑臉的變化。 –

關於你的問題：你可以保留一個字典，每個字符串的計數並返回它。但是如果你保持現有的結構，跟蹤它不會很好。

這導致我的建議：

你保持整個文件在內存中沒有明顯的原因，你可以通過它逐行檢查字符串當前行。
您也多次閱讀相同的文件，而您只能閱讀一次，並檢查字符串是否存在。
您正在檢查文件的擴展名，這聽起來像是glob的作業。
您可以使用defaultdict，因此您不需要關心計數是否最初爲0。

修改後的代碼：

from collections import defaultdict 
import glob 

SMILIES = [':)', ':P', '=]'] 

def count_in_files(string_list): 
    results = defaultdict(int) 
    for file_name in glob.iglob('*.txt'): 
     print(file_name) 
     with open(file_name) as input_file: 
      for line in input_file: 
       for s in string_list: 
        if s in line: 
         results[s] += 1 
    return results 

print(count_in_files(SMILIES))

最後，使用這種方法，如果你使用的是Python> = 3.5，則可以更改glob調用for file_name in glob.iglob('**/*.txt', recursive=True)所以它會遞歸搜索，以防你需要它。

這將打印出類似這樣：「循環不同的表情符號」

defaultdict(<class 'int'>, {':P': 2, ':)': 1, '=]': 1})

來源

2017-04-18 15:36:01 ChatterOne

謝謝，這種方法奏效！ :-)它確實比舊的要快得多。 –

您可以將您的搜索字符串作爲函數參數，然後用不同的搜索詞多次調用您的函數。

def count_string_occurrence(string): 
    import os 
    total = 0 
    x = 0 
    for file in os.listdir("C:/users/M/Desktop/test"): 
     if file.endswith(".txt"): 
      f=open(file,encoding="utf8") 
      contents = f.read() 
      f.close() 
      x=contents.count(string) 
      total +=int(x) #calculate occurance of smiley in all files 
    return total 

smilies = [':)', ':P', '=]'] 
for s in smilies = 
    total = count_string_occurrence(s) 
    print("Number of {} in all files equals {}".format(s, total))

一種不同的方法是通過表情給你的函數列表，然後執行if塊內的迭代。也許將結果存儲在一個字典中{ ':)': 5, ':P': 4, ... }

來源

2017-04-18 14:48:22

在多個文件中計算不同的字符串

回答

相關問題