如何在另一個文本文件中查找文件名，並在相應的行上提取信息？

在下面的代碼中，我打開fileList並檢查fileList中的每個file。如何在另一個文本文件中查找文件名，並在相應的行上提取信息？

如果file的名稱與另一文本文件每一行的第4個字符對應，我提取其被寫入到文本文件line.split()[1]然後將此字符串的INT分配給d數。之後我會用這個d來劃分計數器。

這是我的一部分功能：

fp=open('yearTerm.txt' , 'r') #open the text file 
def parsing(): 
    fileList = pathFilesList() 
    for f in fileList: 
     date_stamp = f[15:-4] 
     #problem is here that this for , finds d for first file and use it for all 
     for line in fp : 
       if date_stamp.startswith(line[:4]) : 
        d = int(line.split()[1]) 
     print d 
     print "Processing file: " + str(f) 
     fileWordList = [] 
     fileWordSet = set() 
     # One word per line, strip space. No empty lines. 
     fw = open(f, 'r') 

     fileWords = Counter(w for w in fw.read().split()) 
     # For each unique word, count occurance and store in dict. 
     for stemWord, stemFreq in fileWords.items(): 
      Freq= stemFreq/d 
      if stemWord not in wordDict: 
       wordDict[stemWord] = [(date_stamp, Freq)] 
      else: 
       wordDict[stemWord].append((date_stamp, Freq))

這工作，但它給我的錯誤輸出時，爲週期尋找d只是做了一次，但我希望它爲每個文件運行每個文件有不同的d。我不知道如何改變這爲爲了得到正確的d爲每個文件或任何我應該使用。

我很欣賞任何建議。

來源

2014-09-04 Singu

你應該發佈[MCVE]（http://stackoverflow.com/help/mcve） – 2014-09-04 09:12:20

「for fp line」循環僅執行一次的原因是因爲您每次循環遍歷一個相同的文件對象 - 您正在使用文件中的所有行，而無需重新設置/重新創建文件迭代器。使用「fp = open（'yearTerm.txt'）.readlines（）」來解決這個問題。 – 2014-09-04 09:28:37

@Rawing非常感謝你。這工作，我沒有意識到這個錯誤。 – Singu 2014-09-04 09:35:38

我不太明白你正在嘗試做的，但如果你想在每fp「好」行做一些處理，你應該將根據該if相應代碼：

def parsing(): 
    fileList = pathFilesList() 
    for f in fileList: 
     date_stamp = f[15:-4] 
     #problem is here that this for , finds d for first file and use it for all 
     for line in fp : 
      if date_stamp.startswith(line[:4]) : 
       d = int(line.split()[1]) 
       print d 
       print "Processing file: " + str(f) 
       fileWordList = [] 
       fileWordSet = set() 
       ...

來源

2014-09-04 09:17:55

如何在另一個文本文件中查找文件名，並在相應的行上提取信息？

回答

相關問題