Python：按字母順序打印所有術語的計數，即使爲零

我正在通過360 + txt文件運行循環，這些文件計算每個文件中某些單詞的出現次數。代碼如下：Python：按字母順序打印所有術語的計數，即使爲零

>>> cnt=Counter() 
>>> def process(filename): 
words=re.findall('\w+',open(filename).read().lower()) 
for word in words: 
    if word in words_fra: 
     cnt[word]+=1 
    if word in words_1: 
     cnt[word]+=1 
print cnt 
    cnt.clear() 

>>> for filename in os.listdir("C:\Users\Cameron\Desktop\Project"): 
process(filename)

我有兩個名單，words_fra和words_1，每個約10-15個單詞。這將與計數相匹配的單詞輸出，但它不打印零計數的單詞，並按頻率順序列出單詞。要顯示

Counter({'prices': 140, 'inflation': 107, 'labor': 46, 'price': 34, 'wage': 27,  'productivity': 26, 'capital': 21, 'workers': 20, 'wages': 19, 'employment': 18, 'investment': 14, 'unemployment': 13, 'construction': 13, 'production': 11, 'inflationary': 10, 'housing': 8, 'credit': 8, 'job': 7, 'industry': 7, 'jobs': 6, 'worker': 4, 'tax': 2, 'income': 2, 'aggregates': 1, 'payments': 1}) 
Counter({'inflation': 193, 'prices': 118, 'price': 97, 'labor': 58, 'unemployment': 42, 'wage': 32, 'productivity': 32, 'construction': 22, 'employment': 18, 'wages': 17, 'industry': 17, 'investment': 16, 'income': 16, 'housing': 15, 'production': 13, 'job': 13, 'inflationary': 12, 'workers': 9, 'aggregates': 9, 'capital': 5, 'jobs': 5, 'tax': 4, 'credit': 3, 'worker': 2})

我可以接受的格式，但我需要所有的字數，即使它的零，我需要按字母順序返回的字數：輸出

例而不是按頻率。

我該如何解決我的代碼來實現這個目標？這也可能是最好的，我可以把它變成一個很好的CSV格式，字作爲列標題和計數行值。

謝謝！

編輯：頂部是什麼目前的產出看起來像。底部是我想讓他們看起來像。

Wordlist="a b c d" 
Counter({'c': 4, 'a': 3, 'b':1}) 
Counter({'a': 3, 'b': 1, 'c': 4, 'd': 0})

來源

2013-02-18 CoS

嗯...... '所有的字計數'？這是否意味着至少出現在一個找到的文件中的單詞（但不在您正在查看的文件中？） – SingleNegationElimination 2013-02-18 03:59:13

對不起，我需要每個計數器輸出（每個文件一個）按順序列出每個單詞，即使爲零 – CoS 2013-02-18 04:21:52

要打印的所有單詞在你的單詞列表，你可以通過在單詞列表中的單詞循環，然後再開始尋找文件的話，並與0的結果字典作爲計數。

要以正確的順序打印它們，請使用內置的sorted()。

事情是這樣的：

import re 

wordlist = words_fra + words_1 
cnt = {} 
for word in wordlist: 
    cnt[word] = 0 

words=re.findall('\w+',open('foo.html').read().lower()) 
for word in words: 
    if word in wordlist: 
     cnt[word]+=1 

for result in sorted(cnt.items()): 
    print("{0} appeared {1} times".format(*result))

如果你想進行排序，以便最常用的詞是第一位的，你這樣做：如果你想導致Counter那麼你必須重寫

for result in sorted(cnt.items(), key=lambda x:x[1]): 
    print("{0} appeared {1} times".format(*result))

來源

2013-02-18 04:00:17

要添加0值的單詞，你可以做'Counter.fromkeys（all_words，0）+ result_counter' – mgilson 2013-02-18 04:03:54

是的使用計數器這裏是一個好主意，但是這顯然是一個初學者的問題，沒有stdlib的幫助也可能是一個好主意。:-) – 2013-02-18 04:05:09

但是它看起來像OP已經在使用計數器... – mgilson 2013-02-18 04:08:20

方法Counter接受0。例如..

In [8]: from collections import Counter 

In [9]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0}) 
Out[9]: Counter({'red': 8, 'blue': 4}) 

In [10]: 
    ...: class Counter(Counter): 
    ...:  def __add__(self, other): 
    ...:   if not isinstance(other, Counter): 
    ...:    return NotImplemented 
    ...:   result = Counter() 
    ...:   for elem, count in self.items(): 
    ...:    newcount = count + other[elem] 
    ...:    result[elem] = newcount 
    ...:   for elem, count in other.items(): 
    ...:    if elem not in self: 
    ...:     result[elem] = count 
    ...:   return result 
    ...:  

In [11]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0}) 
Out[11]: Counter({'red': 8, 'blue': 4, 'white': 0}) #<-- now you see that `0` has been added to the resultant Counter

來源

2013-02-18 04:12:45 namit

for word in sorted(words_fra + words_1): 
    print word, cnt[word]

來源

2013-02-18 04:15:24

我應該把這個代碼代替print cnt，對嗎？ – CoS 2013-02-18 04:20:42

@CoS，沒錯 – 2013-02-18 04:28:08

Python：按字母順序打印所有術語的計數，即使爲零

回答

相關問題