python中的字母頻率

我需要製作一個程序，打印文本文件中的字母頻率，並將該頻率與python中的另一個頻率進行比較。python中的字母頻率

到目前爲止，我可以打印信件發生的次數，但我得到的百分比頻率是錯誤的。我認爲這是因爲我需要我的程序通過刪除所有空格和其他字符來計算文件中只有字母的數量。

def addLetter (x): 
    result = ord(x) - ord(a) 
    return result 


#start of the main program 
#prompt user for a file 

while True: 
    speech = raw_input("Enter file name:") 

    wholeFile = open(speech, 'r+').read() 
    lowlet = wholeFile.lower() 
    letters= list(lowlet) 
    alpha = list('abcdefghijklmnopqrstuvwxyz') 
    n = len(letters) 
    f = float(n) 
    occurrences = {} 
    d = {} 


    #number of letters 
    for x in alpha: 
     occurrences[x] = letters.count(x) 
     d[x] =(occurrences[x])/f 
    for x in occurrences: 
     print x, occurrences[x], d[x]

這是輸出

Enter file name:dems.txt 
a 993 0.0687863674148 
c 350 0.0242449431976 
b 174 0.0120532003325 
e 1406 0.0973954003879 
d 430 0.0297866444999 
g 219 0.015170407315 
f 212 0.0146855084511 
i 754 0.0522305347742 
h 594 0.0411471321696 
k 81 0.00561097256858 
j 12 0.000831255195345 
m 273 0.0189110556941 
l 442 0.0306178996952 
o 885 0.0613050706567 
n 810 0.0561097256858 
q 9 0.000623441396509 
p 215 0.0148933222499 
s 672 0.0465502909393 
r 637 0.0441257966196 
u 305 0.021127736215 
t 1175 0.0813937378775 
w 334 0.0231366029371 
v 104 0.00720421169299 
y 212 0.0146855084511 
x 13 0.000900526461624 
z 6 0.000415627597672 
Enter file name:

程序可以打印在列，但我真的不知道如何顯示在這裏。

的頻率「A」應該是0.0878

來源

2011-02-28 SimplyZ

這是一個家庭作業嗎？ – 2011-02-28 23:42:22

您的「f」變量包含列表的總長度，而不是列表中的字母字符數。另外 - 不要使用SO來欺騙你的作業。如果你不自己學習，你永遠不會學習它。 – 2011-02-28 23:43:44

是的，這是一個任務。我沒有試圖欺騙。我只是遇到了一個死路，需要一些指導。感謝你的幫助。 – SimplyZ 2011-03-01 02:19:55

您可以使用translator recipe刪除所有字符無法在alpha。由於這樣做使得letters只包含alpha中的字符，因此n現在是正確的分母。

然後，您可以使用collections.defaultdict(int)計算字母的出現：

import collections 
import string 

def translator(frm='', to='', delete='', keep=None): 
    # Python Cookbook Recipe 1.9 
    # Chris Perkins, Raymond Hettinger 
    if len(to) == 1: to = to * len(frm) 
    trans = string.maketrans(frm, to) 
    if keep is not None: 
     allchars = string.maketrans('', '') 
     # delete is expanded to delete everything except 
     # what is mentioned in set(keep)-set(delete) 
     delete = allchars.translate(allchars, keep.translate(allchars, delete)) 
    def translate(s): 
     return s.translate(trans, delete) 
    return translate 

alpha = 'abcdefghijklmnopqrstuvwxyz' 
keep_alpha=translator(keep=alpha) 

while True: 
    speech = raw_input("Enter file name:") 
    wholeFile = open(speech, 'r+').read() 
    lowlet = wholeFile.lower() 
    letters = keep_alpha(lowlet) 
    n = len(letters) 
    occurrences = collections.defaultdict(int)  
    for x in letters: 
     occurrences[x]+=1 
    for x in occurrences: 
     print x, occurrences[x], occurrences[x]/float(n)

來源

2011-02-28 23:51:43 unutbu

如果您使用的是Python 2.7/3.1，還有'collections.Counter'類，[主要用於使用正整數來表示運行計數]（http://docs.python.org/library/collections。 HTML＃collections.Counter）。 – 2011-03-01 00:42:15

嘿，謝謝你的幫忙。我不知道如何做到這一點。我現在可以繼續這個項目。 – SimplyZ 2011-03-01 02:09:58

我覺得這是做一個非常簡單的方法：

while True: 
    speech = raw_input("Enter file name:") 

    wholeFile = open(speech, 'r+').read() 
    lowlet = wholeFile.lower() 

    alphas = 'abcdefghijklmnopqrstuvwxyz' 

    # lets set default values first 
    occurrences = {letter : 0 for letter in alphas } 
    # occurrences = dict(zip(alphas, [0]*len(alphas))) # for python<=2.6 

    # total number of valid letters 
    total = 0 

    # iter everything in the text 
    for letter in lowlet: 
     # if it is a valid letter then it is in occurrences 
     if letter in occurrences: 
      # update counts 
      total += 1 
      occurrences[letter] += 1 

    # now print the results: 
    for letter, count in occurrences.iteritems(): 
     print letter, (1.0*count/total)

當你注意到你需要在計算頻率之前，文本中的有效字母總數。要麼在處理文本之前過濾文本，要麼將過濾與處理結合起來，這就是我在這裏所做的。

來源

2011-03-01 00:11:57

感謝您的幫助。我不知道如何過濾掉我不需要的東西。你的方法使用了我們不允許使用的字典。我能夠使用unutbu的方法。 – SimplyZ 2011-03-01 02:31:27

@SimplyZ：你確定嗎？你使用了兩個字典。 – 2011-03-01 02:35:16

是的，謝謝你指出。我之前沒有注意到。 – SimplyZ 2011-03-01 02:52:47

import collections 
import re 
from __future__ import division 

file1 = re.subn(r"\W", "", open("file1.txt", "r").read())[0].lower() 
counter1 = collections.Counter(file1) 
for k, v in counter1.iteritems(): 
    counter1[k] = v/len(file1) 

file2 = re.subn(r"\W", "", open("file2.txt", "r").read())[0].lower() 
counter2 = collections.Counter(file2) 
for k, v in counter2.iteritems(): 
    counter2[k] = v/len(file2)

注意：需要Python 2.7。

來源

2011-03-01 01:03:25 robert

python中的字母頻率

回答

相關問題