2017-01-08 17 views
0

我試圖打開一個文件並計算字母的出現次數。計算txtfile中所有字母的出現

到目前爲止,這是我在哪裏:

def frequencies(filename): 
    infile=open(filename, 'r') 
    wordcount={} 
    content = infile.read() 
    infile.close() 
    counter = {} 
    invalid = "‘'`,.?!:;-_\n—' '" 

    for word in content: 
     word = content.lower() 
     for letter in word: 
      if letter not in invalid: 
       if letter not in counter: 
        counter[letter] = content.count(letter) 
        print('{:8} appears {} times.'.format(letter, counter[letter])) 

任何幫助將不勝感激。

回答

0
使用numpy的軟件包

最佳的方式,例如會是這樣

import numpy 
text = "xvasdavawdazczxfawaczxcaweac" 
text = list(text) 
a,b = numpy.unique(text, return_counts=True) 
x = sorted(zip(b,a), reverse=True) 
print(x) 
你的情況

,您可以將所有單詞組合成一個字符串,然後將字符串轉換成字符 的列表,如果你想刪除所有除人物,您可以使用正則表達式來清潔它

#clean all except character 
content = re.sub(r'[^a-zA-Z]', r'', content) 
#convert to list of char 
content = list(content) 
a,b = numpy.unique(content, return_counts=True) 
x = sorted(zip(b,a), reverse=True) 
print(x) 
0

如果你正在尋找一個解決方案不使用numpy

invalid = set([ch for ch in "‘'`,.?!:;-_\n—' '"]) 

def frequencies(filename): 
    counter = {} 
    with open(filename, 'r') as f: 
     for ch in (char.lower() for char in f.read()): 
      if ch not in invalid: 
       if ch not in counter: 
        counter[ch] = 0 
       counter[ch] += 1 

     results = [(counter[ch], ch) for ch in counter] 
     return sorted(results) 

for result in reversed(frequencies(filename)): 
    print result 
0

我建議使用collections.Counter來代替。

緊湊的解決方案

from collections import Counter 
from string import ascii_lowercase # a-z string 

VALID = set(ascii_lowercase) 

with open('in.txt', 'r') as fin: 
    counter = Counter(char.lower() for line in fin for char in line if char.lower() in VALID) 
    print(counter.most_common()) # print values in order of most common to least. 

更多可讀溶液。

from collections import Counter 
from string import ascii_lowercase # a-z string 

VALID = set(ascii_lowercase) 

with open('in.txt', 'r') as fin: 
    counter = Counter() 
    for char in (char.lower() for line in fin for char in line): 
     if char in VALID: 
      counter[char] += 1 
    print(counter) 

如果你不想使用Counter那麼你可以只使用一個dict

from string import ascii_lowercase # a-z string 

VALID = set(ascii_lowercase) 

with open('test.txt', 'r') as fin: 
    counter = {} 
    for char in (char.lower() for line in fin for char in line): 
     if char in VALID: 
      # add the letter to dict 
      # dict.get used to either get the current count value 
      # or default to 0. Saves checking if it is in the dict already 
      counter[char] = counter.get(char, 0) + 1 
    # sort the values by occurrence in descending order 
    data = sorted(counter.items(), key = lambda t: t[1], reverse = True) 
    print(data) 
相關問題