Ruby字符串搜索：這是更快的拆分或正則表達式？

這是一個兩部分問題。假設你有一組可以在一個字符處分割的字符串（例如，'@'處的電子郵件地址或'。'處的文件名），這是在分割字符之前查找字符的最高性能方式？Ruby字符串搜索：這是更快的拆分或正則表達式？

my_string.split(char)[0]

或

my_string[/regex/]

問題的第二部分是什麼，你如何寫一個正則表達式字符的第一個實例之前得到的一切。下面的正則表達式在'。'之前找到某些字符。（因爲'。'不在模式中），但這是我找到解決方案的方法。

my_string[/[A-Za-z0-9\_-]+/]

謝謝！

來源

2011-09-23 kreek

我懷疑在處理電子郵件地址時會有明顯的差異（除非您每秒處理數百萬次......）。但你爲什麼不衡量自己並找出答案？ –

回答第一部分的最簡單方法是，像往常一樣，用您的真實數據進行基準測試。例如：

require 'benchmark' 
Benchmark.bm do |x| 
    x.report { 50000.times { a = '[email protected]'.split('@')[0] } } 
    x.report { 50000.times { a = '[email protected]'[/[^@]+/] } } 
end

說（我的設置）：

 user  system  total  real 
    0.130000 0.010000 0.140000 ( 0.130946) 
    0.090000 0.000000 0.090000 ( 0.096260)

所以正則表達式的解決方案看起來有點快，但不同的是即使有50 000次迭代幾乎察覺不到。 OTOH，正則表達式解決方案確切地說明了你的意思（「在第一個@之前給我所有的東西」），而split解決方案以稍微迂迴的方式得到你想要的結果。

split方法可能會比較慢，因爲它必須掃描整個字符串以將其拆分爲多個部分，然後構建碎片數組，然後提取數組的第一個元素並將其餘部分丟棄;我不知道虛擬機是否足夠聰明，可以識別它不需要構建陣列，這只是一個快速猜測工作。

至於你的第二個問題而言，說什麼你的意思是：

my_string[/[^.]+/]

如果你想要的一切，之前第一階段，然後說：「一切直到一段時間」，而非「時發出的第一個塊這些字符（恰好不包含句點）「。

來源

2011-09-23 19:09:13

謝謝你，我不知道ruby有一個內置的基準測試工具。在我自己的測試中，我發現正則表達式也更快，只要第一個子字符串小於50個字符，然後分割更快。當然，正如前面提到的那樣，用小套裝你幾乎看不出有什麼不同。 – kreek

@Kreek：這就是爲什麼你會喜歡這樣，學習新的東西:)我認爲最好先從代碼中儘可能清楚地表達你的意圖，然後擔心如果真的存在問題時的性能。 –

@ muistooshort是不是有緩存效果或頁面錯誤？ – Benjamin

partition將會比split更快，因爲它在第一場比賽後不會繼續檢查。

定期slice與index將比正則表達式slice更快。

由於匹配前的字符串部分變得更大，正則表達式切片也顯着減慢。它比〜10個字符後的原始分割變得更慢，然後變得更糟。如果你有一個沒有+或*匹配的正則表達式，我認爲它會更好一些。

require 'benchmark' 
n=1000000 

def bench n,email 
    printf "\n%s %s times\n", email, n 
    Benchmark.bm do |x| 
     x.report('split ') do n.times{ email.split('@')[0] } end 
     x.report('partition') do n.times{ email.partition('@').first } end 
     x.report('slice reg') do n.times{ email[/[^@]+/] } end 
     x.report('slice ind') do n.times{ email[0,email.index('@')] } end 
    end 
end 


bench n, '[email protected]' 
bench n, '[email protected]' 
bench n, '[email protected]' 
bench n, '[email protected]' 
bench n, '[email protected]omain.com' 
bench n, 'a'*254 + '@' + 'b'*253 # rfc limits 
bench n, 'a'*1000 + '@' + 'b'*1000 # for other string processing

結果1.9.3p484：

[email protected] 1000000 times 
     user  system  total  real 
split  0.405000 0.000000 0.405000 ( 0.410023) 
partition 0.375000 0.000000 0.375000 ( 0.368021) 
slice reg 0.359000 0.000000 0.359000 ( 0.357020) 
slice ind 0.312000 0.000000 0.312000 ( 0.309018) 

[email protected] 1000000 times 
     user  system  total  real 
split  0.421000 0.000000 0.421000 ( 0.432025) 
partition 0.374000 0.000000 0.374000 ( 0.379021) 
slice reg 0.421000 0.000000 0.421000 ( 0.411024) 
slice ind 0.312000 0.000000 0.312000 ( 0.315018) 

[email protected] 1000000 times 
     user  system  total  real 
split  0.593000 0.000000 0.593000 ( 0.589034) 
partition 0.531000 0.000000 0.531000 ( 0.529030) 
slice reg 0.764000 0.000000 0.764000 ( 0.771044) 
slice ind 0.484000 0.000000 0.484000 ( 0.478027) 

[email protected]ously-extra-long-silly-domain.com 1000000 times 
     user  system  total  real 
split  0.483000 0.000000 0.483000 ( 0.481028) 
partition 0.390000 0.016000 0.406000 ( 0.404023) 
slice reg 0.406000 0.000000 0.406000 ( 0.411024) 
slice ind 0.312000 0.000000 0.312000 ( 0.344020) 

[email protected]omain.com 1000000 times 
     user  system  total  real 
split  0.639000 0.000000 0.639000 ( 0.646037) 
partition 0.609000 0.000000 0.609000 ( 0.596034) 
slice reg 0.764000 0.000000 0.764000 ( 0.773044) 
slice ind 0.499000 0.000000 0.499000 ( 0.491028) 

a<254>@b<253> 1000000 times 
     user  system  total  real 
split  0.952000 0.000000 0.952000 ( 0.960055) 
partition 0.733000 0.000000 0.733000 ( 0.731042) 
slice reg 3.432000 0.000000 3.432000 ( 3.429196) 
slice ind 0.624000 0.000000 0.624000 ( 0.625036) 

a<1000>@b<1000> 1000000 times 
     user  system  total  real 
split  1.888000 0.000000 1.888000 ( 1.892108) 
partition 1.170000 0.016000 1.186000 ( 1.188068) 
slice reg 12.885000 0.000000 12.885000 (12.914739) 
slice ind 1.108000 0.000000 1.108000 ( 1.097063)

2.1.3p242持有約同％的差異，但在一切快約10-30％，除了正則表達式分割它減慢甚至更多。

來源

2014-10-13 21:43:04 Matt

Ruby字符串搜索：這是更快的拆分或正則表達式？

回答

相關問題