構造一個散列,其中的鍵由給定字符串的所有唯一子字符串組成,該字符串在字符串中至少出現兩次(不重疊),對於每個鍵,該值都是字符串中所有偏移量的數組,密鑰(子字符串)的值開始。
代碼
def recurring_substrings(str)
arr = str.chars
(1..str.size/2).each_with_object({}) do |n,h|
arr.each_cons(n).map { |b| b.join }.uniq.each do |s|
str.scan(Regexp.new(s)) { (h[s] ||= []) << Regexp.last_match.begin(0) }
end
end.reject { |_,v| v.size == 1 }
end
實例
recurring_substrings 'abjkabrjkab'
#=> {"a"=>[0, 4, 9], "b"=>[1, 5, 10], "j"=>[2, 7], "k"=>[3, 8], "ab"=>[0, 4, 9],
# "jk"=>[2, 7], "ka"=>[3, 8], "jka"=>[2, 7], "kab"=>[3, 8], "jkab"=>[2, 7]}
recurring_substrings "abcabcabcabcabcdkkabclilabcoabcdieabcdowabcdppabzabx"
#=> {"a"=>[0, 3, 6, 9, 12, 18, 24, 28, 34, 40, 46, 49],
# "b"=>[1, 4, 7, 10, 13, 19, 25, 29, 35, 41, 47, 50],
# "c"=>[2, 5, 8, 11, 14, 20, 26, 30, 36, 42], "d"=>[15, 31, 37, 43],
# "k"=>[16, 17], "l"=>[21, 23], "i"=>[22, 32], "o"=>[27, 38], "p"=>[44, 45],
# "ab"=>[0, 3, 6, 9, 12, 18, 24, 28, 34, 40, 46, 49],
# "bc"=>[1, 4, 7, 10, 13, 19, 25, 29, 35, 41], "ca"=>[2, 5, 8, 11],
# "cd"=>[14, 30, 36, 42],
# "abc"=>[0, 3, 6, 9, 12, 18, 24, 28, 34, 40], "bca"=>[1, 4, 7, 10],
# "cab"=>[2, 5, 8, 11], "bcd"=>[13, 29, 35, 41],
# "abca"=>[0, 6], "bcab"=>[1, 7], "cabc"=>[2, 8], "abcd"=>[12, 28, 34, 40],
# "abcab"=>[0, 6], "bcabc"=>[1, 7], "cabca"=>[2, 8],
# "abcabc"=>[0, 6], "bcabca"=>[1, 7], "cabcab"=>[2, 8]}
說明
對於第一個例子上文中,步驟如下。
str = 'abjkabrjkab'
arr = str.chars
#=> ["a", "b", "j", "k", "a", "b", "r", "j", "k", "a", "b"]
q = str.size/2 # max size for string to repeat at least once
#=> 5
b = (1..q).each_with_object({})
#=> #<Enumerator: 1..5:each_with_object({})>
我們可以看到該枚舉器將通過將其轉換爲數組來生成哪些元素。 (我會在下面再做幾次。)
b.to_a
#=> [[1, {}], [2, {}], [3, {}], [4, {}], [5, {}]]
隨着計算的進行,將建立空的哈希值。
接着第一元件傳遞到塊並設置塊變量使用平行分配它(有時稱爲多個分配)。
n,h = b.next
#=> [1, {}]
n #=> 1
h #=> {}
c = arr.each_cons(n)
#=> #<Enumerator: ["a", "b", "j", "k", "a", "b", "r", "j", "k", "a", "b"]:each_cons(1)>
c
是長度爲1的所有子串的陣列。在下一迭代中它將是長度爲2的所有子串等的陣列。見Emumerable#each_cons。
c.to_a # Let's see which elements will be generated.
#=> [["a"], ["b"], ["j"], ["k"], ["a"], ["b"], ["r"], ["j"], ["k"], ["a"], ["b"]]
d = c.map { |b| b.join }
#=> ["a", "b", "j", "k", "a", "b", "r", "j", "k", "a", "b"]
e = d.uniq
#=> ["a", "b", "j", "k", "r"]
下一輪迭代,這將是
r = arr.each_cons(2)
#=> #<Enumerator: ["a", "b", "j", "k", "a", "b", "r", "j", "k", "a", "b"]:
# each_cons(2)>
r.to_a
#=> [["a", "b"], ["b", "j"], ["j", "k"], ["k", "a"], ["a", "b"],
# ["b", "r"], ["r", "j"], ["j", "k"], ["k", "a"], ["a", "b"]]
s = r.map { |b| b.join }
#=> ["ab", "bj", "jk", "ka", "ab", "br", "rj", "jk", "ka", "ab"]
s.uniq
#=> ["ab", "bj", "jk", "ka", "br", "rj"]
繼續,
f = e.each
#=> #<Enumerator: ["a", "b", "j", "k", "r"]:each>
f.to_a # Let's see which elements will be generated.
#=> ["a", "b", "j", "k", "r"]
s = f.next
#=> "a"
r = (Regexp.new(s))
#=> /a/
str.scan(r) { (h[s] ||= []) << Regexp.last_match.begin(0) }
如果h
還沒有一個關鍵s
,h[s] #=> nil
。 h[s] ||= []
擴展爲h[s] = h[s] || []
,在執行h[s] << Regexp.last_match.begin(0)
之前將h[s]
轉換爲空數組。那就是,h[s] = h[s] || [] #=> nil || [] #=> []
。
在塊內MatchData對象用類方法Regexp::last_match檢索。 (或者,可以用全局變量$~
代替Regexp.last_match
,詳細信息請參閱Regexp上的「特殊全局變量」。)MatchData#begin返回當前匹配開始時的str
的索引。
現在
h #=> {"a"=>[0, 4, 9]}
其餘的計算是相似的,增加鍵 - 值對h
直到已經在給定的例子已經構造。
你的問題是什麼? – Stefan
@Stefan請看看,如果它現在聽起來不錯? –
「全部」或「發生」是什麼意思?什麼是「aaaaaaa」的正確答案?我可以爭辯說,這是兩個週期的「aaa」剩下的事情。我認爲這個問題根本沒有明確的定義。 – matt