同時總結在字典{元組，字典{字符串，Int64類型}}內的字典中單個環路

鑑於計數在文本單詞的countmap對象：同時總結在字典{元組，字典{字符串，Int64類型}}內的字典中單個環路

vocab_counter = countmap(split("the lazy fox jumps over the brown dog"))

[OUT]：

Dict{SubString{String},Int64} with 7 entries: 
    "brown" => 1 
    "lazy" => 1 
    "jumps" => 1 
    "the" => 2 
    "fox" => 1 
    "over" => 1 
    "dog" => 1

而進入一個人物二元計數器，每字：

ngram_word_counter = Dict{Tuple,Dict}() 
for (word, count) in vocab_counter 
    for ng in ngrams(word, n) # bigrams. 
     if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word) 
      ngram_word_counter[ng] = Dict{String,Int64}() 
      ngram_word_counter[ng][word] = 0 
     end 
     ngram_word_counter[ng][word] += count 
    end 
end

[ngram_word_counter]：

Dict{Tuple,Dict} with 20 entries: 
    ('b','r') => Dict("brown"=>1) 
    ('t','h') => Dict("the"=>2) 
    ('o','w') => Dict("brown"=>1) 
    ('z','y') => Dict("lazy"=>1) 
    ('o','g') => Dict("dog"=>1) 
    ('u','m') => Dict("jumps"=>1) 
    ('o','x') => Dict("fox"=>1) 
    ('e','r') => Dict("over"=>1) 
    ('a','z') => Dict("lazy"=>1) 
    ('p','s') => Dict("jumps"=>1) 
    ('h','e') => Dict("the"=>2) 
    ('d','o') => Dict("dog"=>1) 
    ('w','n') => Dict("brown"=>1) 
    ('m','p') => Dict("jumps"=>1) 
    ('l','a') => Dict("lazy"=>1) 
    ('o','v') => Dict("over"=>1) 
    ('v','e') => Dict("over"=>1) 
    ('r','o') => Dict("brown"=>1) 
    ('f','o') => Dict("fox"=>1) 
    ('j','u') => Dict("jumps"=>1)

隨着Dict{Tuple, Dict{String,Int64}}對象，我需要重新循環的ngram_word_counter得到ngram_counter無字，即Dict{Tuple,Int64}：

ngram_counter = Dict{Tuple,Int64}() 
for ng in keys(ngram_word_counter) 
    ngram_counter[ng] = sum(values(ngram_word_counter[ng])) 
end

[ngram_counter]：

Dict{Tuple,Int64} with 20 entries: 
    ('b','r') => 1 
    ('t','h') => 2 
    ('o','w') => 1 
    ('z','y') => 1 
    ('o','g') => 1 
    ('u','m') => 1 
    ('o','x') => 1 
    ('e','r') => 1 
    ('a','z') => 1 
    ('p','s') => 1 
    ('h','e') => 2 
    ('d','o') => 1 
    ('w','n') => 1 
    ('m','p') => 1 
    ('l','a') => 1 
    ('o','v') => 1 
    ('v','e') => 1 
    ('r','o') => 1 
    ('f','o') => 1 
    ('j','u') => 1

目前，爲了得到這兩個對象，我可以做一個特設第二計數：

function compute_statistics(vocab_counter, n) 
    ngram_word_counter = Dict{Tuple,Dict}() 
    for (word, count) in vocab_counter 
     for ng in ngrams(word, n) # bigrams. 
      if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word) 
       ngram_word_counter[ng] = Dict{String,Int64}() 
       ngram_word_counter[ng][word] = 0 
      end 
      ngram_word_counter[ng][word] += count 
     end 
    end 
    ngram_counter = Dict{Tuple,Int64}() 
    for ng in keys(ngram_word_counter) 
     ngram_counter[ng] = sum(values(ngram_word_counter[ng])) 
    end 
    return ngram_word_counter, ngram_counter 
end

或同時更新兩個ngram_word_counter和ngram_counter在第一循環：

function compute_statistics(vocab_counter, n) 
    ngram_word_counter = Dict{Tuple,Dict}() 
    ngram_counter = Dict{Tuple,Int64}() 
    for (word, count) in vocab_counter 
     for ng in ngrams(word, n) # bigrams. 
      if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word) 
       ngram_word_counter[ng] = Dict{String,Int64}() 
       ngram_word_counter[ng][word] = 0 
      end 
      ngram_word_counter[ng][word] += count 
      ngram_counter[ng] += 1 
     end 
    end 
    return ngram_word_counter, ngram_counter 
end 

ngram_word_counter, ngram_counter

但我發現了一個KeyError，更新ngram_counter時：

KeyError: key ('b','r') not found

我已經添加了額外的檢查和它的工作原理：

function compute_statistics(vocab_counter, n) 
    ngram_word_counter = Dict{Tuple,Dict}() 
    ngram_counter = Dict{Tuple,Int64}() 
    for (word, count) in vocab_counter 
     for ng in ngrams(word, n) # bigrams. 
      if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word) 
       ngram_word_counter[ng] = Dict{String,Int64}() 
       ngram_word_counter[ng][word] = 0 
      end 
      if !haskey(ngram_counter, ng) 
       ngram_counter[ng] = 0 
      end 
      ngram_word_counter[ng][word] += count 
      ngram_counter[ng] += 1 
     end 
    end 
    return ngram_word_counter, ngram_counter 
end 

ngram_word_counter, ngram_counter

[o UT]：

(Dict{Tuple,Dict}(Pair{Tuple,Dict}(('b','r'),Dict("brown"=>1)),Pair{Tuple,Dict}(('t','h'),Dict("the"=>2)),Pair{Tuple,Dict}(('o','w'),Dict("brown"=>1)),Pair{Tuple,Dict}(('z','y'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('o','g'),Dict("dog"=>1)),Pair{Tuple,Dict}(('u','m'),Dict("jumps"=>1)),Pair{Tuple,Dict}(('o','x'),Dict("fox"=>1)),Pair{Tuple,Dict}(('e','r'),Dict("over"=>1)),Pair{Tuple,Dict}(('a','z'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('p','s'),Dict("jumps"=>1))…),Dict{Tuple,Int64}(Pair{Tuple,Int64}(('b','r'),1),Pair{Tuple,Int64}(('t','h'),1),Pair{Tuple,Int64}(('o','w'),1),Pair{Tuple,Int64}(('z','y'),1),Pair{Tuple,Int64}(('o','g'),1),Pair{Tuple,Int64}(('u','m'),1),Pair{Tuple,Int64}(('o','x'),1),Pair{Tuple,Int64}(('e','r'),1),Pair{Tuple,Int64}(('a','z'),1),Pair{Tuple,Int64}(('p','s'),1)…))

有沒有辦法同時總結在快譯通{元組，快譯通{字符串，Int64的}}內的字典在一個循環？

來源

2017-04-03 Nat Gillin

不知道這個答案，但你可以讓compute_statistics清潔如下：

function compute_statistics(vocab_counter, n) 
    ngram_word_counter = Dict{Tuple,Dict{String,Int}}() 
    ngram_counter = Dict{Tuple,Int}() 
    for (word, count) in vocab_counter, ng in ngrams(word,n) 
     ngram_word_counter[ng] = get(ngram_word_counter,ng,Dict{String,Int}()) 
     ngram_word_counter[ng][word] = get(ngram_word_counter[ng],word,0)+count 
     ngram_counter[ng] = get(ngram_counter,ng,0)+count 
    end 
    return ngram_word_counter, ngram_counter 
end

（這使用get避免haskey和更短的for語法）

另一種方式來獲得ngram_counter從ngram_word_counter計算得出：

ngram_counter = map(x->x[1]=>sum(values(x[2])),ngram_word_counter)

或

ngram_counter = Dict(k=>sum(values(d)) for (k,d) in ngram_word_counter)

來源

2017-04-03 07:25:57

糟糕。迷惑'getkey'與'get'，但它現在已經修復 –

同時總結在字典{元組，字典{字符串，Int64類型}}內的字典中單個環路

回答

相關問題