找到列表的共同元素

考慮以下列表：

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

我想指望有多少次出現每一個以大寫字母開頭的字，並顯示前3名。

我不感興趣的話做不以資本開始。

如果一個單詞出現多次，有時以大寫字母開頭，有時不是，只計算它對大寫字母所做的時間。

這是我的代碼看起來像現在：

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

word_counter = {} 

for word in words: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

matches = [] 

for i in range(3): 

     print word_counter[top_3[i]], top_3[i]

來源

2010-08-29 user434180

爲什麼在使用計數器？（順便說一句，請接受一個答案，如果這對你最有幫助的話）。 – kennytm 2010-08-29 12:41:42

這是功課嗎？ – Johnsyweb 2010-08-29 21:44:37

如果從文件中讀取單詞，則此問題頂部的Python列表無關緊要。 – Johnsyweb 2010-08-29 21:45:48

一般來說，字[0] .isupper（）將電話你，如果一個詞以大寫字母開頭。結合這到一個列表理解（或者你的循環）

[x for x in my_list if x[0].isupper()]

（假設沒有空字符串）

，你會得到啓動以大寫字母開頭的所有單詞。

來源

2010-08-29 12:38:26

我不確定如何將其添加到我的程序中以使其正常工作 – user434180 2010-08-29 13:08:51

@ user434180：您嘗試過什麼？ – Johnsyweb 2010-08-29 23:37:16

#uncomment to produce the word file 
##words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
##open('novel.txt','w').write('\n'.join(words)) 

import string 
cap_words = [word.strip(string.punctuation) for word in open('novel.txt').read().split() if word.istitle()] 
##print(cap_words) # debug 
try: 
    from collections import Counter # Python >= 2.7 
    print('Counter') 
    print(Counter(cap_words).most_common(3)) 
except ImportError: 
    print('Normal dict') 
    wordcount= dict() 
    for word in cap_words: 
     wordcount[word] = (wordcount[word] + 1 
          if word in wordcount 
          else 1) 
    print(sorted(wordcount.items(), key = lambda x: x[1], reverse = True)[:3])

我不明白你爲什麼想用'rU'模式保持不同種類的線路終端。正如我在上面編輯的代碼中所寫的那樣，我通常會正常使用。編輯：你有話標點符號一起，所以清理那些帶（）

來源

2010-08-29 12:53:25

當我嘗試這個我得到的錯誤：回溯（最近最後調用最後）：文件「C：/用戶/亞當/桌面/亞當的工作/ 2010年/ IST/python compt/f.py」，第1行，在從集合進口計數器 ImportError：無法導入名稱計數器 – user434180 2010-08-29 12:59:32

如前所述，您需要python 2.7 for collections.counter工作 – 2010-08-29 13:29:34

這裏有一些補充意見：

text = open('novel.txt', 'rU').read() # read everything 
wordlist = text.split() # split on all whitespace

：

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n')

可以替換

但是你不用你的「必須以大寫字母開頭」的要求。及時補充：

capwordlist = (word for word in wordlist if word.istitle())

istitle()意味着word[0].isupper() and word[1:].islower()。這意味着'SO'.istitle() -> False。

這可能適合你，但也許你只是想word[0].isupper()來代替。

這部分是好的，如果你不能使用collections.Counter（new in 2。7）

word_counter = {} 

for word in capwordlist: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3]

否則這簡單地變爲：

from collections import Counter 

word_counter = Counter(capwords) 
top_3 = word_counter.most_common(3) # gives `word, count` pairs!

這：

for i in range(3): 
     print word_counter[top_3[i]], top_3[i]

可以是這樣的：

for word in top_3: 
    print word_counter[word], word

來源

2010-08-29 13:09:21

'istitle（）'很好，但'isupper（）似乎符合OP的要求。從上一個問題來看，似乎Python 2.6是所有可用的（因此不是Counter）。 – Johnsyweb 2010-08-29 21:43:53

硅NCE不使用Python2.7並沒有Counter

from collections import defaultdict 
counter = defaultdict(int) 
words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
for word in (word for word in words if word[0].isupper()): 
    counter[word]+=1 
print counter

來源

2010-08-29 13:15:34

print "\n".join(sorted(["%d %s" % (lst.count(i), i) \ 
      for i in set(lst) if i.istitle()])[-3:]) 
2 And 
5 Cats 
6 Jellicle

來源

2010-08-29 19:02:06 killown

有一件事我會避免在閱讀完所有詞語的前處理。它會工作，但恕我直言，最好不要這樣做，如果你不需要，而你不這樣做。這裏是我的解決方案（從以前的慷慨被盜元素！），用做2.6.2：

import sys 

# a generator function which iterates over the words in a file 
def words(f): 
    for line in f: 
     for word in line.split(): 
      yield word 

# returns a generator expression filtering an iterator down to titlecase words 
def titles(s): 
    return (word for word in s if word.istitle()) 

# count the titlecase words in the file 
count = {} 
for word in titles(words(file(sys.argv[1]))): 
    count[word] = count.get(word, 0) + 1 

# build a list of tuples with the count for each word 
countsAndWords = [(kv[1], kv[0]) for kv in count.iteritems()] 

# put them in decreasing order 
countsAndWords.sort() 
countsAndWords.reverse() 

# print the top three 
for count, word in countsAndWords[:3]: 
    print word, count

我做了排序上的計數裝飾排序，去除裝飾，而不是做那種有比較這確實在計數字典中查找;它不太優雅，但我相信它會更快。這可能是一件罪惡的事情。

來源

2010-08-29 22:16:20

你可以使用itertools

import itertools 

words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
capwords = (word for word in words if len(word) > 1 and word[0].isupper()) 
capwordssorted = sorted(capwords) 
wordswithcounts = ((k,len(list(g))) for (k,g) in itertools.groupby(capwordssorted)) 
print sorted(wordswithcounts,key=lambda x:x[1],reverse=True)[:3]

來源

2010-08-30 11:39:59 user196636

找到列表的共同元素

回答

相關問題