2
鑑於計數在文本單詞的countmap對象:同時總結在字典{元組,字典{字符串,Int64類型}}內的字典中單個環路
vocab_counter = countmap(split("the lazy fox jumps over the brown dog"))
[OUT]:
Dict{SubString{String},Int64} with 7 entries:
"brown" => 1
"lazy" => 1
"jumps" => 1
"the" => 2
"fox" => 1
"over" => 1
"dog" => 1
而進入一個人物二元計數器,每字:
ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
for ng in ngrams(word, n) # bigrams.
if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
ngram_word_counter[ng] = Dict{String,Int64}()
ngram_word_counter[ng][word] = 0
end
ngram_word_counter[ng][word] += count
end
end
[ngram_word_counter
]:
Dict{Tuple,Dict} with 20 entries:
('b','r') => Dict("brown"=>1)
('t','h') => Dict("the"=>2)
('o','w') => Dict("brown"=>1)
('z','y') => Dict("lazy"=>1)
('o','g') => Dict("dog"=>1)
('u','m') => Dict("jumps"=>1)
('o','x') => Dict("fox"=>1)
('e','r') => Dict("over"=>1)
('a','z') => Dict("lazy"=>1)
('p','s') => Dict("jumps"=>1)
('h','e') => Dict("the"=>2)
('d','o') => Dict("dog"=>1)
('w','n') => Dict("brown"=>1)
('m','p') => Dict("jumps"=>1)
('l','a') => Dict("lazy"=>1)
('o','v') => Dict("over"=>1)
('v','e') => Dict("over"=>1)
('r','o') => Dict("brown"=>1)
('f','o') => Dict("fox"=>1)
('j','u') => Dict("jumps"=>1)
隨着Dict{Tuple, Dict{String,Int64}}
對象,我需要重新循環的ngram_word_counter
得到ngram_counter
無字,即Dict{Tuple,Int64}
:
ngram_counter = Dict{Tuple,Int64}()
for ng in keys(ngram_word_counter)
ngram_counter[ng] = sum(values(ngram_word_counter[ng]))
end
[ngram_counter]:
Dict{Tuple,Int64} with 20 entries:
('b','r') => 1
('t','h') => 2
('o','w') => 1
('z','y') => 1
('o','g') => 1
('u','m') => 1
('o','x') => 1
('e','r') => 1
('a','z') => 1
('p','s') => 1
('h','e') => 2
('d','o') => 1
('w','n') => 1
('m','p') => 1
('l','a') => 1
('o','v') => 1
('v','e') => 1
('r','o') => 1
('f','o') => 1
('j','u') => 1
目前,爲了得到這兩個對象,我可以做一個特設第二計數:
function compute_statistics(vocab_counter, n)
ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
for ng in ngrams(word, n) # bigrams.
if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
ngram_word_counter[ng] = Dict{String,Int64}()
ngram_word_counter[ng][word] = 0
end
ngram_word_counter[ng][word] += count
end
end
ngram_counter = Dict{Tuple,Int64}()
for ng in keys(ngram_word_counter)
ngram_counter[ng] = sum(values(ngram_word_counter[ng]))
end
return ngram_word_counter, ngram_counter
end
或同時更新兩個ngram_word_counter
和ngram_counter
在第一循環:
function compute_statistics(vocab_counter, n)
ngram_word_counter = Dict{Tuple,Dict}()
ngram_counter = Dict{Tuple,Int64}()
for (word, count) in vocab_counter
for ng in ngrams(word, n) # bigrams.
if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
ngram_word_counter[ng] = Dict{String,Int64}()
ngram_word_counter[ng][word] = 0
end
ngram_word_counter[ng][word] += count
ngram_counter[ng] += 1
end
end
return ngram_word_counter, ngram_counter
end
ngram_word_counter, ngram_counter
但我發現了一個KeyError
,更新ngram_counter
時:
KeyError: key ('b','r') not found
我已經添加了額外的檢查和它的工作原理:
function compute_statistics(vocab_counter, n)
ngram_word_counter = Dict{Tuple,Dict}()
ngram_counter = Dict{Tuple,Int64}()
for (word, count) in vocab_counter
for ng in ngrams(word, n) # bigrams.
if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
ngram_word_counter[ng] = Dict{String,Int64}()
ngram_word_counter[ng][word] = 0
end
if !haskey(ngram_counter, ng)
ngram_counter[ng] = 0
end
ngram_word_counter[ng][word] += count
ngram_counter[ng] += 1
end
end
return ngram_word_counter, ngram_counter
end
ngram_word_counter, ngram_counter
[o UT]:
(Dict{Tuple,Dict}(Pair{Tuple,Dict}(('b','r'),Dict("brown"=>1)),Pair{Tuple,Dict}(('t','h'),Dict("the"=>2)),Pair{Tuple,Dict}(('o','w'),Dict("brown"=>1)),Pair{Tuple,Dict}(('z','y'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('o','g'),Dict("dog"=>1)),Pair{Tuple,Dict}(('u','m'),Dict("jumps"=>1)),Pair{Tuple,Dict}(('o','x'),Dict("fox"=>1)),Pair{Tuple,Dict}(('e','r'),Dict("over"=>1)),Pair{Tuple,Dict}(('a','z'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('p','s'),Dict("jumps"=>1))…),Dict{Tuple,Int64}(Pair{Tuple,Int64}(('b','r'),1),Pair{Tuple,Int64}(('t','h'),1),Pair{Tuple,Int64}(('o','w'),1),Pair{Tuple,Int64}(('z','y'),1),Pair{Tuple,Int64}(('o','g'),1),Pair{Tuple,Int64}(('u','m'),1),Pair{Tuple,Int64}(('o','x'),1),Pair{Tuple,Int64}(('e','r'),1),Pair{Tuple,Int64}(('a','z'),1),Pair{Tuple,Int64}(('p','s'),1)…))
有沒有辦法同時總結在快譯通{元組,快譯通{字符串,Int64的}}內的字典在一個循環?
糟糕。迷惑'getkey'與'get',但它現在已經修復 –