試圖弄清楚最快的方式串過濾的多頭排列,通過尋找成員的子串,下一個方法:查找在列表的成員用Perl
$str =~ /\.xml/
- 找到」。 XML 「的地方串$str =~ /\.xml$/
中 - 找到名爲」 .xml 「的字符串substr($str,-4) eq ".xml"
結束 - 是最後4個字符」 .XML「?rindex($str, ".xml")
- 發生任何「.xml」(length($str) - rindex($str,".xml")) == 4
- 是最後4個字符「.xml」嗎?
我嘗試了上述所有與while/if/push
和內grep
的,下一個代碼(從註釋與想法修訂版)
use 5.016;
use warnings;
use Benchmark qw(:all);
my $nmax = 5_000_000;
my @list = map { sprintf "a%s.%s", int(rand(100000000)), (int(rand(2))%2?"txt":"xml") } 1..$nmax;
cmpthese(10, {
'whl_match' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if($x =~ /\.xml/ )}; },
'whl_matchend' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if($x =~ /\.xml$/)}; },
'whl_matchendz'=> sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if($x =~ /\.xml\z/)}; },
'whl_substr' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if(substr($x,-4) eq ".xml")}; },
'whl_rindex' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if(rindex($x,".xml") >= 0)}; },
'whl_lenrindex'=> sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if((length($x)-rindex($x,".xml"))==4)};},
'for_match' => sub { my @xml; for my $x (@list) { push(@xml, $x) if($x =~ /\.xml/ )}; },
'for_matchend' => sub { my @xml; for my $x (@list) { push(@xml, $x) if($x =~ /\.xml$/)}; },
'for_matchendz'=> sub { my @xml; for my $x (@list) { push(@xml, $x) if($x =~ /\.xml\z/)}; },
'for_substr' => sub { my @xml; for my $x (@list) { push(@xml, $x) if(substr($x,-4) eq ".xml")}; },
'for_rindex' => sub { my @xml; for my $x (@list) { push(@xml, $x) if(rindex($x,".xml") >= 0)}; },
'for_lenrindex'=> sub { my @xml; for my $x (@list) { push(@xml, $x) if((length($x)-rindex($x,".xml"))==4)};},
'grp_match' => sub { my @xml = grep { /\.xml/ } @list; },
'grp_matchend' => sub { my @xml = grep { /\.xml$/ } @list; },
'grp_matchendz'=> sub { my @xml = grep { /\.xml\z/ } @list; },
'grp_substr' => sub { my @xml = grep { substr($_,-4) eq ".xml" } @list; },
'grp_rindex' => sub { my @xml = grep { rindex($_,".xml") >= 0 } @list; },
'grp_lenrindex'=> sub { my @xml = grep { (length($_) - rindex($_,".xml")) == 4 } @list; },
});
我的鯉魚筆記本電腦上的結果。
s/iter whl_matchend whl_matchendz grp_matchendz grp_matchend whl_lenrindex whl_match whl_substr grp_match whl_rindex for_matchend for_matchendz for_lenrindex for_match grp_lenrindex for_substr for_rindex grp_substr grp_rindex
whl_matchend 4.48 -- -0% -10% -12% -17% -21% -24% -25% -32% -47% -47% -67% -70% -70% -73% -73% -77% -78%
whl_matchendz 4.46 0% -- -9% -11% -17% -21% -23% -25% -32% -46% -46% -67% -69% -70% -73% -73% -76% -78%
grp_matchendz 4.05 11% 10% -- -2% -9% -13% -15% -17% -25% -41% -41% -63% -66% -67% -70% -70% -74% -76%
grp_matchend 3.96 13% 13% 2% -- -6% -11% -14% -15% -24% -40% -40% -62% -66% -66% -70% -70% -73% -75%
whl_lenrindex 3.70 21% 21% 9% 7% -- -5% -8% -9% -18% -35% -35% -60% -63% -64% -67% -67% -72% -73%
whl_match 3.53 27% 27% 15% 12% 5% -- -3% -5% -14% -32% -32% -58% -61% -62% -66% -66% -70% -72%
whl_substr 3.42 31% 30% 18% 16% 8% 3% -- -2% -12% -30% -30% -57% -60% -61% -65% -65% -69% -71%
grp_match 3.36 33% 33% 20% 18% 10% 5% 2% -- -10% -29% -29% -56% -59% -60% -64% -64% -69% -71%
whl_rindex 3.02 48% 48% 34% 31% 22% 17% 13% 11% -- -21% -21% -51% -55% -56% -60% -60% -65% -67%
for_matchend 2.40 87% 86% 69% 65% 55% 47% 43% 40% 26% -- -0% -38% -43% -44% -50% -50% -56% -59%
for_matchendz 2.39 87% 87% 69% 65% 55% 47% 43% 40% 26% 0% -- -38% -43% -44% -50% -50% -56% -59%
for_lenrindex 1.49 201% 200% 172% 166% 149% 137% 130% 126% 103% 61% 61% -- -8% -10% -19% -19% -29% -33%
for_match 1.36 229% 227% 197% 191% 172% 159% 151% 146% 122% 76% 76% 9% -- -2% -11% -12% -23% -27%
grp_lenrindex 1.33 237% 236% 204% 198% 178% 165% 157% 153% 127% 80% 80% 12% 2% -- -9% -9% -21% -26%
for_substr 1.21 271% 270% 235% 228% 207% 192% 184% 178% 150% 98% 98% 23% 13% 10% -- -0% -13% -18%
for_rindex 1.20 272% 271% 236% 229% 208% 193% 184% 179% 151% 99% 99% 23% 13% 10% 0% -- -13% -18%
grp_substr 1.05 326% 325% 285% 277% 252% 235% 226% 220% 188% 128% 128% 41% 30% 27% 15% 15% -- -6%
grp_rindex 0.990 352% 351% 309% 300% 274% 256% 246% 239% 205% 142% 142% 50% 38% 34% 22% 22% 6% --
我重複了很多次測試,總是得到了上面的順序。
問題1:
正如我所料,grep
更快的相似while/if/push
,但讓我吃驚的未來:
比較:
s/iter
whl_matchend 4.54
grp_matchend 3.98
的grep
只稍快與while/if/push
相似。
爲什麼例如在下一:
whl_substr 3.23
grp_substr 1.05
的grep
是3倍的速度作爲while/if/push
。那麼,爲什麼grep
在執行substr
時比while/if/push
快近3倍,如果使用/regex-match/
時爲何不快(僅爲14%)?同樣,這可以看作是任何「字符串操作」。
隨着換句話說,grep {/regex/}
只有輕微的速度遞增while/if/push
,以上但grep {substr}
有巨大的速度增加。 爲什麼?
問題2
另一個(至少對我來說)驚喜的是未來:爲什麼$str =~ /\.xml/
是更快$str =~ /\.xml$/
?我期待,不是指定的$
將加快rexex,因爲不需要整個字符串搜索 - 但這是一個錯誤的假設,與下一太測試:
use 5.016;
use warnings;
use Benchmark qw(:all);
my $str = "a38877283.xml";
cmpthese(10, {
'match' => sub { $str =~ /\.xml/ for (1..5_000_000) },
'matchend' => sub { $str =~ /\.xml$/ for (1..5_000_000) },
'matchendz' => sub { $str =~ /\.xml\z/ for (1..5_000_000) }, #updated the \z
});
爲perl 5, version 20, subversion 0 (v5.20.0) built for darwin-2level
(perlbrew )
s/iter matchend matchendz match
matchend 2.32 -- -1% -64%
matchendz 2.30 1% -- -63%
match 0.844 175% 173% --
隨着:perl 5, version 16, subversion 2 (v5.16.2) built for darwin-thread-multi-2level
(默認OS X)
Rate matchendz match matchend
matchendz 0.405/s -- -69% -70%
match 1.29/s 218% -- -5%
matchend 1.36/s 235% 5% --
perl的舊的更快。 ;)
操作系統:
Darwin jabko.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
最後一個問題
- 仍然沒有測試預編譯的正則表達式
qr
的效果 - 任何其他的想法有什麼可以最快的過濾器?
你上'rindex'測試應該是'> = 0'。值得一試'sub {my @ xml;對於我的$ x(@list){push @xml,$ x if $ x =〜/\.xml/}}等等。 ' – Borodin 2014-08-28 10:52:29
thanx,要添加並將更新與結果的問題。 – jm666 2014-08-28 10:54:33
@Borodin是的,'for'肯定比'while'快得多。更新了結果。 – jm666 2014-08-28 11:26:46