2016-04-25 45 views
0

我想要做的是編寫一個程序,打開一個帶有電影評論的.txt文件,其中評級是從0-4開始的一個數字,然後是短片的電影評論。程序然後提示用戶打開第二個文本文件,其中文字將與評論相匹配,並根據評論給出數字值。以奇怪的方式遍歷一個.txt文件

例如,對於這兩個抽樣審查他們將如何出現在.txt文件:

4植根於經歷中年危機的標題字符真誠的性能幾乎難遇一個喜劇片。馬蘇德的故事是一部史詩,也是一部悲劇,是一位頑強的人道主義戰士的紀錄,他也是囚犯-LRB,最終是受害者-RRB-歷史。因此,如果我正在查找「史詩」這個詞,它會將該詞的計數增加2(我已經計算出),因爲它出現兩次,然後將值4和2附加到該單詞的評分列表。

如何將這些整數附加到與該單詞相關的列表或字典?請記住,我需要爲單詞列表中的每個單詞創建一個新列表或字母鍵。

請謝謝。抱歉,如果這措辭不好,編程不是我的特長。

我的所有代碼:

def menu_validate(prompt, min_val, max_val): 
    """ produces a prompt, gets input, validates the input and returns a value. """ 
    while True: 
     try: 
      menu = int(input(prompt)) 
      if menu >= min_val and menu <= max_val: 
       return menu 
       break 
      elif menu.lower == "quit" or menu.lower == "q": 
       quit() 
      print("You must enter a number value from {} to {}.".format(min_val, max_val)) 
     except ValueError: 
      print("You must enter a number value from {} to {}.".format(min_val, max_val)) 

def open_file(prompt): 
    """ opens a file """ 
    while True: 
     try: 
      file_name = str(input(prompt)) 
      if ".txt" in file_name: 
       input_file = open(file_name, 'r') 
       return input_file 
      else: 
       input_file = open(file_name+".txt", 'r') 
       return input_file 
     except FileNotFoundError: 
      print("You must enter a valid file name. Make sure the file you would like to open is in this programs root folder.") 

def make_list(file): 
    lst = [] 
    for line in file: 
     lst2 = line.split(' ') 
     del lst2[-1] 
     lst.append(lst2) 
    return lst 

def rating_list(lst): 
    '''iterates through a list of lists and appends the first value in each list to a second list''' 
    rating_list = [] 
    for list in lst: 
     rating_list.append(list[0]) 
    return rating_list 

def word_cnt(lst, word : str): 
    cnt = 0 
    for list in lst: 
     for word in list: 
      cnt += 1 
    return cnt 

def words_list(file): 
    lst = [] 
    for word in file: 
     lst.append(word) 
    return lst 

##def sort(words, occurrences, avg_scores, std_dev): 
## '''sorts and prints the output''' 
## menu = menu_validate("You must choose one of the valid choices of 1, 2, 3, 4 \n  Sort Options\n 1. Sort by Avg Ascending\n 2. Sort by Avg Descending\n 3. Sort by Std Deviation Ascending\n 4. Sort by Std Deviation Descending", 1, 4) 
## print ("{}{}{}{}\n{}".format("Word", "Occurence", "Avg. Score", "Std. Dev.", "="*51)) 
## if menu == 1: 
##  for i in range (len(word_list)): 
##   print ("{}{}{}{}".format(cnt_list.sorted[i],) 

def make_odict(lst1, lst2): 
    '''makes an ordered dictionary of keys/values from 2 lists of equal length''' 

    dic = OrderedDict() 

    for i in range (len(word_list)): 
     dic[lst2[i]] = lst2[i] 

    return dic   


cnt_list = [] 
while True: 
    menu = menu_validate("1. Get sentiment for all words in a file? \nQ. Quit \n", 1, 1) 
    if menu == True: 
     ratings_file = open("sample.txt") 
     ratings_list = make_list(ratings_file) 


     word_file = open_file("Enter the name of the file with words to score \n") 
     word_list = words_list(word_file) 
     for word in word_list: 
      cnt = word_cnt(ratings_list, word) 
      cnt_list.append(word_cnt(ratings_list, word)) 

對不起,我知道這是混亂和非常不完整。

回答

1

我想你的意思:

import collections 

counts = collections.defaultdict(int) 

word = 'epic' 

counts[word] += 1 

很明顯,你可以做更多的word比我有,但你是不是向我們展示任何代碼,所以......

編輯

好的,看着你的代碼,我建議你在評級和文本之間做出明確區分。藉此:

def make_list(file): 
    lst = [] 
    for line in file: 
     lst2 = line.split(' ') 
     del lst2[-1] 
     lst.append(lst2) 
    return lst 

並將其轉換爲這樣的:

def parse_ratings(file): 
    """ 
    Given a file of lines, each with a numeric rating at the start, 
    parse the lines into score/text tuples, one per line. Return the 
    list of parsed tuples. 
    """ 
    ratings = [] 
    for line in file: 
     text = line.strip().split() 
     if text: 
      score = text[0] 
      ratings.append((score,text[1:])) 
    return ratings 

然後你就可以計算出兩個值加在一起:

def match_reviews(word, ratings): 
    cnt = 0 
    scores = [] 

    for score,text in ratings: 
     n = text.count(word) 
     if n: 
      cnt += n 
      scores.append(score) 

    return (cnt, scores) 
+0

我已經得到了數部分想通了。我需要能夠遍歷.txt文件,並且每當程序查找的單詞出現時,它都應該將該單詞前面的整數附加到列表中。 –

+0

好的,我添加了一些可能有用的代碼。我認爲你需要更正式地處理你的數據。將分數與文本分開,然後保存在官方位置。你會一直知道是什麼。 –