2010-08-29 76 views
1

嗨根據earlier post找到列表的共同元素

考慮以下列表:

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

我想指望有多少次出現每一個以大寫字母開頭的字,並顯示前3名。

我不感興趣的話做不以資本開始。

如果一個單詞出現多次,有時以大寫字母開頭,有時不是,只計算它對大寫字母所做的時間。

這是我的代碼看起來像現在:

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

word_counter = {} 

for word in words: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

matches = [] 

for i in range(3): 

     print word_counter[top_3[i]], top_3[i] 
+0

爲什麼在使用計數器? (順便說一句,請接受一個答案,如果這對你最有幫助的話)。 – kennytm 2010-08-29 12:41:42

+1

這是功課嗎? – Johnsyweb 2010-08-29 21:44:37

+0

如果從文件中讀取單詞,則此問題頂部的Python列表無關緊要。 – Johnsyweb 2010-08-29 21:45:48

回答

1

一般來說,字[0] .isupper()將電話你,如果一個詞以大寫字母開頭。結合這到一個列表理解(或者你的循環)

[x for x in my_list if x[0].isupper()] 

(假設沒有空字符串)

,你會得到啓動以大寫字母開頭的所有單詞。

+0

我不確定如何將其添加到我的程序中以使其正常工作 – user434180 2010-08-29 13:08:51

+0

@ user434180:您嘗試過什麼? – Johnsyweb 2010-08-29 23:37:16

7
#uncomment to produce the word file 
##words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
##open('novel.txt','w').write('\n'.join(words)) 

import string 
cap_words = [word.strip(string.punctuation) for word in open('novel.txt').read().split() if word.istitle()] 
##print(cap_words) # debug 
try: 
    from collections import Counter # Python >= 2.7 
    print('Counter') 
    print(Counter(cap_words).most_common(3)) 
except ImportError: 
    print('Normal dict') 
    wordcount= dict() 
    for word in cap_words: 
     wordcount[word] = (wordcount[word] + 1 
          if word in wordcount 
          else 1) 
    print(sorted(wordcount.items(), key = lambda x: x[1], reverse = True)[:3]) 

我不明白你爲什麼想用'rU'模式保持不同種類的線路終端。正如我在上面編輯的代碼中所寫的那樣,我通常會正常使用。 編輯:你有話標點符號一起,所以清理那些帶()

+0

當我嘗試這個我得到的錯誤: 回溯(最近最後調用最後): 文件「C:/用戶/亞當/桌面/亞當的工作/ 2010年/ IST/python compt/f.py」,第1行,在 從集合進口計數器 ImportError:無法導入名稱計數器 – user434180 2010-08-29 12:59:32

+2

如前所述,您需要python 2.7 for collections.counter工作 – 2010-08-29 13:29:34

2

這裏有一些補充意見:

text = open('novel.txt', 'rU').read() # read everything 
wordlist = text.split() # split on all whitespace 


words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

可以替換

但是你不用你的「必須以大寫字母開頭」的要求。及時補充:

capwordlist = (word for word in wordlist if word.istitle()) 

istitle()意味着word[0].isupper() and word[1:].islower()。這意味着'SO'.istitle() -> False

這可能適合你,但也許你只是想word[0].isupper()來代替。


這部分是好的,如果你不能使用collections.Counter(new in 2。7)

word_counter = {} 

for word in capwordlist: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

否則這簡單地變爲:

from collections import Counter 

word_counter = Counter(capwords) 
top_3 = word_counter.most_common(3) # gives `word, count` pairs! 

這:

for i in range(3): 
     print word_counter[top_3[i]], top_3[i] 

可以是這樣的:

for word in top_3: 
    print word_counter[word], word 
+0

'istitle()'很好,但'isupper()似乎符合OP的要求。從上一個問題來看,似乎Python 2.6是所有可用的(因此不是Counter)。 – Johnsyweb 2010-08-29 21:43:53

0

硅NCE不使用Python2.7並沒有Counter

from collections import defaultdict 
counter = defaultdict(int) 
words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
for word in (word for word in words if word[0].isupper()): 
    counter[word]+=1 
print counter 
3
print "\n".join(sorted(["%d %s" % (lst.count(i), i) \ 
      for i in set(lst) if i.istitle()])[-3:]) 
2 And 
5 Cats 
6 Jellicle 
2

有一件事我會避免在閱讀完所有詞語的前處理。它會工作,但恕我直言,最好不要這樣做,如果你不需要,而你不這樣做。這裏是我的解決方案(從以前的慷慨被盜元素!),用做2.6.2:

import sys 

# a generator function which iterates over the words in a file 
def words(f): 
    for line in f: 
     for word in line.split(): 
      yield word 

# returns a generator expression filtering an iterator down to titlecase words 
def titles(s): 
    return (word for word in s if word.istitle()) 

# count the titlecase words in the file 
count = {} 
for word in titles(words(file(sys.argv[1]))): 
    count[word] = count.get(word, 0) + 1 

# build a list of tuples with the count for each word 
countsAndWords = [(kv[1], kv[0]) for kv in count.iteritems()] 

# put them in decreasing order 
countsAndWords.sort() 
countsAndWords.reverse() 

# print the top three 
for count, word in countsAndWords[:3]: 
    print word, count 

我做了排序上的計數裝飾排序,去除裝飾,而不是做那種有比較這確實在計數字典中查找;它不太優雅,但我相信它會更快。這可能是一件罪惡的事情。

0

你可以使用itertools

import itertools 

words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
capwords = (word for word in words if len(word) > 1 and word[0].isupper()) 
capwordssorted = sorted(capwords) 
wordswithcounts = ((k,len(list(g))) for (k,g) in itertools.groupby(capwordssorted)) 
print sorted(wordswithcounts,key=lambda x:x[1],reverse=True)[:3]