如何通過比較兩個文件中的字符串比較兩個文件來正確循環

我無法對單詞列表（文件2，製表符分隔符，兩個字符）進行tweets（文件1，標準twitter json響應）的情感分析專欄），分配給他們的觀點（正面或負面）。如何通過比較兩個文件中的字符串比較兩個文件來正確循環

問題是：頂部循環只運行一次，然後腳本結束，而我循環通過文件1，然後嵌套在那裏我循環通過文件2，並試圖比較並保持運行總和的情緒爲每個推文。

，所以我有：

def get_sentiments(tweet_file, sentiment_file): 


    sent_score = 0 
    for line in tweet_file: 

     document = json.loads(line) 
     tweets = document.get('text') 

     if tweets != None: 
      tweet = str(tweets.encode('utf-8')) 

      #print tweet 


      for z in sentiment_file: 
       line = z.split('\t') 
       word = line[0].strip() 
       score = int(line[1].rstrip('\n').strip()) 

       #print score 



       if word in tweet: 
        print "+++++++++++++++++++++++++++++++++++++++" 
        print word, tweet 
        sent_score += score 



      print "====", sent_score, "=====" 

    #PROBLEM, IT'S ONLY DOING THIS FOR THE FIRST TWEET 

file1 = open(tweetsfile.txt) 
file2 = open(sentimentfile.txt) 


get_sentiments(file1, file2)

我花了好半天試圖弄清楚爲什麼它打印出沒有嵌套for循環file2的所有微博，但有了它，只有它處理第一條推文然後退出。

來源

2013-05-06 roy

它只做一次的原因是for循環已經到達文件末尾，所以它停止了，因爲沒有更多的行要讀取。

換句話說，第一次循環運行時，它遍歷整個文件，然後由於沒有更多的行要讀取（自從它到達文件末尾），它不會再循環，導致只有一行正在處理。

所以解決此問題的一種方法是「倒回」該文件，您可以使用文件對象的seek方法執行該操作。

如果您的文件不是很大，另一種方法是將它們全部讀入列表或類似結構中，然後循環遍歷它。

然而，由於你的景氣指數是一個簡單的查找，最好的辦法是建立一個字典的景氣指數，然後查找字典中的每個字計算鳴叫的整體人氣：

import csv 
import json 

scores = {} # empty dictionary to store scores for each word 

with open('sentimentfile.txt') as f: 
    reader = csv.reader(f, delimiter='\t') 
    for row in reader: 
     scores[row[0].strip()] = int(row[1].strip()) 


with open('tweetsfile.txt') as f: 
    for line in f: 
     tweet = json.loads(line) 
     text = tweet.get('text','').encode('utf-8') 
     if text: 
      total_sentiment = sum(scores.get(word,0) for word in text.split()) 
      print("{}: {}".format(text,score))

with statement自動關閉文件處理程序。我正在使用csv module來讀取文件（它也適用於製表符分隔的文件）。

這行不計算：

total_sentiment = sum(scores.get(word,0) for word in text.split())

它是寫此循環更短的方式：

tweet_score = [] 
for word in text.split(): 
    if word in scores: 
     tweet_score[word] = scores[word] 

total_score = sum(tweet_score)

字典的get方法需要一秒鐘可選參數時返回自定義值鑰匙找不到;如果你省略第二個參數，它將返回None。在我的循環中，我使用它來返回0，如果這個詞沒有得分。

來源

2013-05-06 04:53:23

我不認爲這可能有更好的答案。謝謝。 – roy 2013-05-06 13:42:31

如何通過比較兩個文件中的字符串比較兩個文件來正確循環

回答

相關問題