2017-09-13 75 views
1

我試過從字符串中刪除重複項,"a","b","b","a","c"取消結果後是"a","b","c",。我已經實現了這一點,但我有一個疑問正則表達式替換的工作正則表達式替換如何在perl中工作?

use warnings; 
use strict; 
my $s = q+"a","b","b","a","c"+; 

$s=~s/ ("\w"),?/($s=~s|($1)||g)?"$1,":"" /xge; 
#^     ^
#|     Consider this as s2 
#Consider this as s1 

print "\n$s\n\n"; 

s1值包含字符串"a","b","b","a","c"

步驟1

置換後:

猜測,是什麼數據包含s1以下變量"a","b","b","c""a","b","b","a","c","b","b",,"c"數據。

我已經運行與EVAL正則表達式分組

$s=~s/ ("\w"),? (?{print "$s\n"})/ ($s=~s|($1)||g)?"$1,":"" /xge; 

結果是

"a","b","b","a","c" 
,"b","b",,"c" #This is from after substitution 
,,,,"c" 
,,,,"c" 
,,,,"c" 

現在我dobut是s2變量也$s爲什麼它不與s1連接在一起,這意味着在第二步驟結果應該是"a","b","b","c"(所有字符串"a"被替換爲空並且在$s中添加了a)。


編輯

從EVAL分組結果是(?{print $s})

"a","b","b","a","c" 
,"b","b",,"c" 
,,,,"c" 
,,,,"c" 
,,,,"c" 

我印刷$s變量這是給"a","b","c"取代行之後,如何該輸出來了。?

+0

這是相當困難的檢查正則表達式,並推斷它應該做的,也許將其分成多行,併爲每個區塊添加評論,也許其他人會努力投入一些時間來解決您的問題。 –

+0

從我的角度來看,你的問題來自你的正則表達式中的空格。如果你嘗試'$ s =〜s /(「\ w」),? /_/G; 打印「\ nString用_:$ s替換\ $ 1後';',你會注意到你的字符串是不變的,但是如果你刪除空格,你將會有'$ s =〜s /(」\ w「) ,?/ _/g;'然後''1'將被'_'取代。 –

+0

我認爲在're'模塊中有[一些選項](https://perldoc.perl.org/re.html#%27Debug%27-mode)用於調試。 –

回答

6

正則表達式(在我看來)是在這裏使用的錯誤工具。我會

  • split上逗號串
  • split
  • join列表中返回的列表中刪除重複的回字符串

像這樣:

#!/usr/bin/perl 

use strict; 
use warnings; 
use feature 'say'; 

my $str = q["a","b","b","a","c"]; 

my %seen; 

$str = join ',', 
     grep { ! $seen{$_}++ } 
     split /,/, $str; 

say $str; 
+0

感謝您的回答。在使用正則表達式之前,我嘗試過使用哈希函數,但結果是數據被洗牌,所以我已經轉向了正則表達式。現在你的答案解決了我的哈希混洗問題。但我很想知道正則表達式替換是如何工作的。 – mkHun

2

正確的解決方案是分割,過濾,重新加入爲@Dav e Cross已經展示。

...

但是,下面的正則表達式的解決方案做工作,希望說明了爲什麼戴維的解決方案優於

#!/usr/bin/env perl 

use v5.10; 
use strict; 
use warnings; 

my $str = q{"a","b","b","a","c"}; 

1 while $str =~ s{ 
    \A 
    (?: (?&element) ,)* 
    ((?&element))   # Capture in \1 
    (?: , (?&element))* 
    \K 
    , 
    \1      # Remove the duplicate along with preceding comma 
    (?= \z | ,) 

    (?(DEFINE) 
     (?<element> 
      " 
      \w 
      " 
     ) 
    ) 
}{}xg; 

say $str; 

輸出:

"a","b","c"