2016-10-14 42 views
0

我已經做了一個函數,我計算了每個單詞在文件中使用了多少次,也就是說單詞頻率。現在函數可以計算所有單詞的總和,並向我顯示七個最常用的單詞以及它們被使用的次數。現在我想比較一下我的第一個文件是我用另一個文件分析了單詞的頻率是否有英文中使用的最常用單詞,我想將這些單詞與我在第一個文件中看到的單詞進行比較任何單詞匹配。字典列表和比較列表python

我得到的是製作兩個文件的列表,然後將它們相互比較。但是我爲此編寫的代碼並沒有給出任何輸出,關於如何解決這個問題的任何想法?

def CountWords(): 
filename = input('What is the name of the textfile you want to open?: ') 
if filename == "alice" or "alice-ch1.txt" or " ": 
    file = open("alice-ch1.txt","r") 
    print('You want to open alice-ch1.txt') 
    wordcount = {} 
    for word in file.read().split(): 
     if word not in wordcount: 
      wordcount[word] = 1 
     else: 
      wordcount[word] += 1           
    wordcount = {k.lower(): v for k, v in wordcount.items() } 
    print (wordcount) 

    sum = 0 
    for val in wordcount.values(): 
     sum += val 
    print ('The total amount of words in Alice adventures in wonderland: ' + str(sum)) 
    sortList = sorted(wordcount.values(), reverse = True) 
    most_freq_7 = sortList[0:7] 
    #print (most_freq_7) 
    print ('Totoro says: The 7 most common words in Alice Adventures in Wonderland:') 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[0])] + " " + str(most_freq_7[0])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[1])] + " " + str(most_freq_7[1])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[2])] + " " + str(most_freq_7[2])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[3])] + " " + str(most_freq_7[3])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[4])] + " " + str(most_freq_7[4])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[5])] + " " + str(most_freq_7[5])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[6])] + " " + str(most_freq_7[6])) 

    file_common = open("common-words.txt", "r") 
    commonwords = [] 
    contents = file_common.readlines() 

    for i in range(len(contents)): 
     commonwords.append(contents[i].strip('\n')) 
    print(commonwords) 

#From here's the code were I need to find out how to compare the lists: 
    alice_keys = wordcount.keys() 
    result = set(filter(set(alice_keys).__contains__, commonwords)) 
    newlist = list() 


    for elm in alice_keys: 
     if elm not in result: 
      newlist.append(elm) 
    print('Here are the similar words: ' + str(newlist)) #Why doesn't show? 


else: 
    print ('I am sorry, that filename does not exist. Please try again.')    

回答

0

我不在口譯員面前,所以我的代碼可能會稍微偏離。但嘗試更多這樣的事情。

from collections import Counter 
with open("some_file_with_words") as f_file 
    counter = Counter(f_file.read()) 
    top_seven = counter.most_common(7) 
    with open("commonwords") as f_common: 
    common_words = f_common.read().split() 
    for word, count in top_seven: 
     if word in common_words: 
     print "your word " + word + " is in the most common words! It appeared " + str(count) + " times!" 
+0

Thanks @ bravosierra99! – Allizon

+0

它出現爲字符,雖然「你的單詞e是最常用的單詞....」而不是單詞... – Allizon

+0

你的常用單詞文件是如何設置的?我使用.split()這意味着單詞需要用空格分隔。你必須調整這個代碼,以確定你的常用單詞文件是如何設置的。 – bravosierra99