在目錄中搜索包含一個或多個單詞的文件

我想要一個程序 - 搜索（文件，列表）來搜索USB棒D以查找包含一個或多個單詞的文本，如果它包含字，它會把它放在一個列表中，然後轉到下一個單詞。對於每個文檔，它都會找到單詞，我希望它在「此文件目錄」中顯示「單詞[0]，單詞[1]，單詞[2]」的語句。以下是我迄今嘗試的內容：在目錄中搜索包含一個或多個單詞的文件

import os 

def search(file, list): 
    if list == []: 
     return 
    else: 
     if os.path.isfile(file): 
      try: 
       infile = open(file, 'r') 
       doc = infile.read() 
      except: 
       return 
      infile.close() 
      print ('Searching {}'.format(file)) 
      if list[0] in doc: 
       print('{} in {}'.format(list[0], file)) 
     elif os.path.isdir(file): 
      for item in os.listdir(file): 
       itempath = os.path.join(file, item) 
       search(itempath, list) 
    return search(file, list[1:])

來源

2017-06-01 calculator2compiler

對於初學者，您忘記了在遞歸調用中返回'return search（itempath，list）' – karthikr

謝謝，我現在已經在列表中運行了，但是我忘記了提示中的額外步驟，現在更新了問題 – calculator2compiler

如果你想逐個查看單詞，只是迭代列表而不是返回'return search（file，list [1：]）'是否合理？ –

你不是遍歷您list（順便說一句，不要使用file和list作爲變量名，你的陰影內置類型）來檢查的條件，你必須這樣做：

found_words = [] 
for word in list: 
    if word in doc: 
     found_words.append(word) 
if found_words: 
    print('{} in {}'.format(", ".join(found_words), file))

而是如果你想檢查所有條款。但是，你要做到這一點比它需要的複雜得多。對於初學者，您應該使用os.walk()遞歸地瀏覽所有子目錄。其次，在內存中讀取整個文件不是一個好主意 - 不僅平均而言搜索速度會更慢，而且當您遇到大文件時，您可能會開始出現內存問題...

我會做它是這樣的：

def search(path, terms): 
    result = {} # store our result in the form "file_path": [found terms] 
    start_path = os.path.abspath(os.path.realpath(path)) # full path, resolving a symlink 
    for root, dirs, files in os.walk(start_path): # recurse our selected dir 
     for source in files: # loop through each files 
      source_path = os.path.join(root, source) # full path to our file 
      try: 
       with open(source_path, "r") as f: # open our current file 
        found_terms = [] # store for our potentially found terms 
        for line in f: # loop through it line-by line 
         for term in terms: # go through all our terms and check for a match 
          if term in line: # if the current term exists on the line 
           found_terms.append(term) # add the found term to our store 
        if found_terms: # if we found any of the terms... 
         result[source_path] = found_terms # store it in our result 
      except IOError: 
       pass # ignore I/O errors, we may optionally store list of failed files... 
    return result

它會返回一個字典，其鍵被設置爲您的文件路徑，值是發現的術語。因此，例如，如果你搜索在當前文件夾中的文件的字（運行腳本文件夾）「進口」，你可以用做：

search_results = search("./", ["import, export"]) 
for key in search_results: 
    print("{} in {}".format(", ".join(search_results[key]), key)

，它應該打印你想要的結果。它也可以使用檢查文件擴展名/類型，所以你不會浪費你的時間試圖通過一個不可讀/二進制文件。此外，編解碼器的檢查應該是爲了依賴於你的文件，讀取它的行可能會引起unicode錯誤（解碼默認）。底線，有很大的改進空間...

此外，請注意，你並不是正在尋找一個字，但僅僅是傳遞的字符序列的存在。例如，如果您要搜索cat，它也會返回包含caterpillar的文件。而且，還有一些專用工具可以在短時間內完成。

來源

2017-06-01 02:39:09 zwer

在目錄中搜索包含一個或多個單詞的文件

回答

相關問題