2016-12-01 51 views
1

嘗試驗證字符串以確定它是否包含一些3或更多的連續字符。檢查字符串在Ruby on Rails中是否具有連續字符

例子:

"11abcd$4567" => ['abcd', '4567'] 

試圖通過正則表達式來做到這一點,但它看起來更長的時間代碼:

(?!abc|bcd|cde|.....) 

有沒有一種簡單的方法通過正則表達式來檢查的順序字符或者或平原紅寶石?

+0

使用包括方法。 「11abcd $ 4567'.include? 'ABC'。 https://ruby-doc.org/core-2.2.0/String.html#method-i-include-3F –

+0

序列可以是任何字符串,不只是'abc'。它可以'減'等 – Arif

+0

Umm選擇應該像。 a ='abc'.chars; a.select {| b | str.include?(b)} –

回答

7

正則表達式在這裏不適用。它們不夠靈活,以至於無法構建一般情況;而且Unicode是巨大的,構建一個響應任何遞增字符序列的正則表達式將列出數十或數十萬個案例中的每一個。這可以通過編程來完成,但這需要時間,而且會非常昂貴,而且記憶方面。

def find_streaks(string, min_length=3) 
    string         # "xabcy" 
    .each_char       # ['x', 'a', 'b', 'c', 'y'] 
    .chunk_while { |a, b| a.succ == b } # [['x'], ['a', 'b', 'c'], ['y']] 
    .select { |c| c.size >= min_length } # [['a', 'b', 'c']] 
    .map(&:join)       # ['abc'] 
end 

我想這可能作爲一個polyfill工作...試試看吧?

          # skip this thing on Ruby 2.3+, unneeded 
unless Enumerable.instance_methods.include?(:chunk_while) 
    module Enumerable 
    def chunk_while      # let's polyfill! 
     streak = nil      # twofold purpose: init `streak` outside 
             # the block, and `nil` as flag to spot 
             # the first element. 

     Enumerator.new do |y|    # `chunk_while` returns an `Enumerator`. 
     each do |element|    # go through all the elements. 
      if streak      # except on first element: 
      if yield streak[-1], element # give the previous element and current 
             # one to the comparator block. 
             # `streak` will always have an element. 
       streak << element   # if the two elements are "similar", 
             # add this one to the streak; 
      else       # otherwise 
       y.yield streak    # output the current streak and 
       streak = [element]   # start a new one with the current element. 
      end 
      else       # for the first element, nothing to compare 
      streak = [element]   # so just start the streak. 
      end 
     end 
     y.yield streak if streak   # output the last streak; 
             # but if `streak` is `nil`, there were 
             # no elements, so no output. 
     end 
    end 
    end 
end 

嘛,DERP。在這裏,我去手工編寫這一切......當這本來是那麼容易,因爲這樣的:

unless Enumerable.instance_methods.include?(:chunk_while) 
    module Enumerable 
    def chunk_while 
     slice_when { |a, b| !yield a, b } 
    end 
    end 
end 

是啊,chunk_whileslice_when正好相反。甚至可以用原始代碼代替它,如.slice_when { |a, b| a.succ != b }。有時我很慢。

+0

NoMethodError:未定義的方法'chunk_while'爲#<枚舉器:「xabcy」:each_char> – Arif

+0

您的Ruby已舊。 2.3中增加了'chunk_while'。 – Amadan

+0

我使用Ruby 2.2.3 – Arif

1

這是我想出的一個有點解決方案。所以我試圖以奇怪的方式利用Ruby的內存引用。僞代碼:

  1. 循環遍歷字符串。
  2. 通過ord方法
  3. 如果mem變量爲空或mem最後一個字符,轉換爲ASCII轉換字符到一個ascii,等於當前字符減去1的ASCII,存儲它到MEM
  4. 如果mem變量具有3連續字符,則將其存儲在arr變量中。

注:這裏的地方變得棘手,你可能會看到,因爲你的條件中明確規定「獲得3個或更多的連續字符」這可能是一個問題。所述「< <」一起使用時,所述mutates對象string我們從arr可變推動。只要mem變量不reinitialized我們的賦值操作符,=,它會繼續變異你推到數組的字符串對象。

str = "11abcdefgh$4567" 

arr = [] 
mem = "" 

str.each_char do |s| 
    if mem.empty? || (mem[-1].ord == s.ord - 1) 
    mem << s 
    else 
    mem = "" 
    end 

    if mem.size == 3 
    arr << mem 
    end 
end 
puts arr 
0
string.each_char.with_object([]) do |e, acc| 
    if acc.last && acc.last[-1] && e == acc.last[-1].succ 
    acc.last << e 
    else 
    acc << e 
    end 
end.reject { |e| e.length < 3 } 

這個版本可能很容易地適應任何字母表的工作:

"11абвгнabcd$4567".codepoints.each_with_object([]) do |e, acc| 
    e = e.chr(Encoding::UTF_8) 
    acc.last && acc.last[-1] && e == acc.last[-1].succ ? \ 
    acc.last << e : acc << e 
end.reject { |e| e.length < 3 } 
#⇒ [ 
# [0] "абвг", 
# [1] "abcd", 
# [2] "4567" 
# ] 
+0

@ sagarpandya82確實,更新了,thx。 – mudasobwa

+0

@ sagarpandya82'codepoints'返回一個數組,而不是一個枚舉器。 – mudasobwa

1

標題和第一句話的問題狀態的任務是,以確定是否給定的字符串包含至少三個連續的字符(我假設爲ASCII)的順序(例如,"def""!"#'),即使認爲這似乎與示例衝突。回答這個問題的一個快速方法(可能不如其他方法那麼有效)如下。

代碼

def at_least_so_many_consecutive(str, min_run_size) 
    (32.chr..126.chr).each_cons(min_run_size). 
        map(&:join). 
        any? { |s| str.include?(s) } 
end 

str = "xabc$fghrtuvwx3!" 
at_least_so_many_consecutive(str, 3) 
    #=> true 
at_least_so_many_consecutive(str, 5) 
    #=> true 
at_least_so_many_consecutive(str, 6) 
    #=> false 

注:

(32.chr..126.chr).each_cons(min_run_size).map(&:join) 
    #=> [" !\"", "!\"#", "\"\#$", "\#$%", "$%&", "%&'", "&'(", "'()", "()*", 
    # ")*+", "*+,", "+,-", ",-.", "-./", "./0", "/01", "012", "123", "234", 
    # ... 
    # "QRS", "RST", "STU", "TUV", "UVW", "VWX", "WXY", "XYZ", "YZ[", "Z[\\", 
    # "[\\]", "\\]^", "]^_", "^_`", "_`a", "`ab", "abc", "bcd", "cde", "def", 
    # ... 
    # "opq", "pqr", "qrs", "rst", "stu", "tuv", "uvw", "vwx", "wxy", "xyz", 
    # "yz{", "z{|", "{|}", "|}~"] 

另一種解釋

我最初的解釋是所有符合標準的三個或更多字符的子字符串都要被返回(儘管這與示例不一致)。不過,我會留下我的解決方案(下面)來解決這個問題。

代碼

def runs_of_min_size_or_more(str, min_run_size) 
    arr = str.chars 
    (arr.size-min_run_size+1).times.with_object([]) do |_,a| 
    run = arr.lazy.slice_when { |x,y| y != x.next }.first 
    a << run.join if run.size >= min_run_size 
    arr.shift 
    end 
end 

str = "xabc$fghrtuvwx3!" 

runs_of_min_size_or_more str, 1 
    #=> ["x", "abc", "bc", "c", "$", "fgh", "gh", "h", "r", "tuvwx", 
    # "uvwx", "vwx", "wx", "x", "3"] 
runs_of_min_size_or_more str, 2 
    #=> ["abc", "bc", "fgh", "gh", "tuvwx", "uvwx", "vwx", "wx"] 
runs_of_min_size_or_more str, 3 
    # => ["abc", "fgh", "tuvwx", "uvwx", "vwx"] 
runs_of_min_size_or_more str, 4 
    #=> ["tuvwx", "uvwx"] 
runs_of_min_size_or_more str, 5 
    #=> ["tuvwx"] 
runs_of_min_size_or_more str, 6 
    #=> [] 

說明

參見Enumerable#slice_when,其先製成其在紅寶石V2.2外觀。我製作了slice_when惰性枚舉器,通過在其塊的末尾添加.first,切片將在獲得第一個切片後終止。

出這到底是怎麼回事的最簡單的方法是插入代碼中的一些puts語句,然後執行它。我還破

run = arr.lazy.slice_when { |x,y| y != x.next }.first 

slice = arr.lazy.slice_when { |x,y| y != x.next } 
run = slice.first 

但由於slice是一個枚舉我已經印刷slice.to_a這是枚舉將產生元件的陣列。

def runs_of_min_size_or_more(str, min_run_size) 
    arr = str.chars 
    rv = (arr.size-min_run_size+1).times.with_object([]) do |_,a| 
    puts "arr=#{arr}" 
    puts " a=#{a}" 
    slice = arr.lazy.slice_when { |x,y| y != x.next } 
    puts " slice.to_a=#{slice.to_a}" 
    run = slice.first 
    puts " run=#{run}" 
    puts " reject '#{run.join}' because run.size=#{run.size} < #{min_run_size}" \ 
     if run.size < min_run_size 
    a << run.join if run.size >= min_run_size 
    puts " run.join=#{run.join}" if run.size >= min_run_size 
    arr.shift 
    end 
    puts "arr=#{arr}" 
    rv 
end 

runs_of_min_size_or_more "xabc$rtuv3!", 2 

arr=["x", "a", "b", "c", "$", "r", "t", "u", "v", "3", "!"] 
    a=[] 
    slice.to_a=[["x"], ["a", "b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["x"] 
    reject 'x' because run.size=1 < 2 
arr=["a", "b", "c", "$", "r", "t", "u", "v", "3", "!"] 
    a=[] 
    slice.to_a=[["a", "b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["a", "b", "c"] 
    run.join=abc 
arr=["b", "c", "$", "r", "t", "u", "v", "3", "!"] 
    a=["abc"] 
    slice.to_a=[["b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["b", "c"] 
    run.join=bc 
arr=["c", "$", "r", "t", "u", "v", "3", "!"] 
    a=["abc", "bc"] 
    slice.to_a=[["c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["c"] 
    reject 'c' because run.size=1 < 2 

arr=["$", "r", "t", "u", "v", "3", "!"] 
    a=["abc", "bc"] 
    slice.to_a=[["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["$"] 
    reject '$' because run.size=1 < 2 
arr=["r", "t", "u", "v", "3", "!"] 
    a=["abc", "bc"] 
    slice.to_a=[["r"], ["t", "u", "v"], ["3"], ["!"]] 
    run=["r"] 
    reject 'r' because run.size=1 < 2 
arr=["t", "u", "v", "3", "!"] 
    a=["abc", "bc"] 
    slice.to_a=[["t", "u", "v"], ["3"], ["!"]] 
    run=["t", "u", "v"] 
    run.join=tuv 
arr=["u", "v", "3", "!"] 
    a=["abc", "bc", "tuv"] 
    slice.to_a=[["u", "v"], ["3"], ["!"]] 
    run=["u", "v"] 
    run.join=uv 
arr=["v", "3", "!"] 
    a=["abc", "bc", "tuv", "uv"] 
    slice.to_a=[["v"], ["3"], ["!"]] 
    run=["v"] 
    reject 'v' because run.size=1 < 2 
arr=["3", "!"] 
    a=["abc", "bc", "tuv", "uv"] 
    slice.to_a=[["3"], ["!"]] 
    run=["3"] 
    reject '3' because run.size=1 < 2 
arr=["!"] 
    #=> ["abc", "bc", "tuv", "uv"] 
1

我一直在琢磨,這是否可以用正則表達式來完成。我找到了一種方法,但確實需要進行一些預處理。 (它也只適用於短字符串,如在評論中指出的。哦,有些人可能找到感興趣的方法。)

代碼

def runs_of_min_size_or_more(str, min_run_size) 
    arr = [] 
    str.each_char.with_index.map { |c,i| (c.ord-i).chr }. 
     join. 
     scan(/(.)(?=(\1{#{min_run_size-1},}))/) do |m| 
     offset = Regexp.last_match.begin(0)-1 
     arr << m.join.gsub(/./) do |c| 
      offset += 1 
     (c.ord + offset).chr 
     end 
     end 
    arr 
end 

str = "xabc$fghrtuvwx3!" 
min_run_size = 3 
runs_of_min_size_or_more(str, min_run_size) 
    #=> ["abc", "fgh", "tuvwx", "uvwx", "vwx"] 

說明

請注意,不使用返回值String#scanscan的唯一功能是構建陣列arr

對於在例子中給出的strmin_run_size的值,其步驟如下。首先是預處理步驟。

arr = [] 
a = str.each_char.with_index.map { |c,i| (c.ord-i).chr } 
    #=> ["x", "`", "`", "`", " ", "a", "a", "a", "j", "k", "k", "k", "k", "k", 
    # "%", "\x12"] 
b = a.join 
    #=> "x``` aaajkkkkk%\x12" 

scan的說法的正則表達式是在自由間隔模式定義時執行以下操作:

r =/
    (.)     # match any character in capture group 1 
    (?=      # begin a positive lookahead 
     (      # begin capture group 2 
     \1     # match the content of capture group 1... 
     {#{min_run_size-1},} # at least min_run_size-1 times 
    )      # end capture group 2 
    )      # end positive lookahead 
    /x      # free-spacing regex definition mode 
    #=>/
     (.)     # match any character in capture group 1 
     (?=     # begin a positive lookahead 
     (     # begin capture group 2 
      \1     # match the content of capture group 1... 
      {2,}    # at least min_run_size-1 times 
     )     # end capture group 2 
    )     # end positive lookahead 
    /x 

b.scan(r) do |m| 
    offset = Regexp.last_match.begin(0)-1 
    arr << m.join.gsub(/./) do |c| 
      offset += 1 
      (c.ord + offset).chr 
     end 
end 
    #=> "x``` aaajkkkkk%\x12" 
arr 
    #=> ["abc", "fgh", "tuvwx", "uvwx", "vwx"] 
+0

'runs_of_min_size_or_more(「0」* 50,3)'會殺死它。 'runs_of_min_size_or_more(「123456」,3)'發現四次運行。但那些東西很小而且可以解決。我的主要問題是,處理前和處理後的數量與解決方案本身相同或更大,因此您可以使用不合適的工具。讓我想起那個笑話,「你怎麼用伏特計和秒錶來測量建築物的高度?從屋頂上掉下電壓表,以及在地面上撞擊需要多長時間。」 :P當然,你可以用這種方式浪費電壓表,或者你可以拿一根弦。 – Amadan

+0

@Amadan,對不那麼長的字符串來說會殺死它(通過引發'chr'的參數爲負數)。這應該發生在我身上,正如你說,這是可以解決的(可能是醜陋的方式)。我打算第二示例返回'[ 「123456」, 「23456」, 「3456」, 「456」]'。我們可能以不同的方式解釋這個問題正如我希望你猜測,我這樣做只是爲了看看我是如何接近能拿一個正則表達式的解決方案。我認爲這是一個火腿三明治和一個秒錶。 –

+0

沒有必要解釋:OP有一個測試用例,具有輸入和預期的輸出。從來沒有聽說過火腿三明治,但我想伏特計的版本提供了一個更好的花園路徑...... – Amadan

相關問題