2012-01-07 75 views
0

我試圖讓計數器看起來通過文本並返回與前一對字母相關的字母的頻率。 例如輸出的一部分將是:爲後續字母的三元組創建計數器

'th' : Counter ({'e':119, 'a':145 etc... }) 

我希望它遍歷在小寫字符的所有可能的對。

到現在爲止,我一直在用下面的代碼來獲得輸出,僅考慮了前一封信:

def pairwise(iterable): 
    it = iter(iterable) 
    last = next(it) 
    for curr in it: 
     yield last, curr 
     last = curr 

valid = set('abcdefghijklmnopqrstuvwxyz ') 

def valid_pair((last, curr)): 
    return last in valid and curr in valid 

def make_markov(text): 
    markov = defaultdict(Counter) 
    lowercased = (c.lower() for c in text) 
    for p, q in ifilter(valid_pair, pairwise(lowercased)): 
     markov[p][q] += 1 
    return markov 
+0

你能解決這個缺口該代碼? – DaedalusFall 2012-01-07 18:53:54

+0

@DaedalusFall對不起 – Julia 2012-01-07 18:59:13

回答

1

未經測試:

def pairwise(iterable): 
    it = iter(iterable) 
    last = next(it)+next(it) 
    for curr in it: 
     yield last, curr 
     last = last[1]+curr 


def valid_pair((last, curr)): 
    return last[0] in valid and last[1] in valid and curr in valid