2009-12-21 54 views
2

我正在基於我是否預編譯正則表達式不同的結果:Python的正則表達式不一致

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean') 
' Bean' 
>>> re.sub('mr', '', 'Mr Bean', re.IGNORECASE) 
'Mr Bean' 

Python documentation說:一些功能被簡化爲編譯正則表達式的全功能版本的方法。但它也聲稱RegexObject.sub()是與sub()函數相同。

那麼這裏發生了什麼?

回答

12

re.sub()不能接受re.IGNORECASE,看來。

的文檔狀態:

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost 
non-overlapping occurrences of the pattern in string by the 
replacement repl. repl can be either a string or a callable; 
if a string, backslash escapes in it are processed. If it is 
a callable, it's passed the match object and must return 
a replacement string to be used.

使用該作品在其位,但是:

re.sub("(?i)mr", "", "Mr Bean") 
5

模塊級別sub()調用在最後不接受修飾符。那就是「count」參數 - 要被替換的模式發生的最大數目。

4
>>> help(re.sub) 
    1 Help on function sub in module re: 
    2 
    3 sub(pattern, repl, string, count=0) 
    4  Return the string obtained by replacing the leftmost 
    5  non-overlapping occurrences of the pattern in string by the 
    6  replacement repl. repl can be either a string or a callable; 
    7  if a callable, it's passed the match object and must return 
    8  a replacement string to be used. 

沒有功能正則表達式標記(IGNORECASE, MULTILINE, DOTALL)中的參數re.sub,如re.compile

替代方案:

>>> re.sub("[M|m]r", "", "Mr Bean") 
' Bean' 

>>> re.sub("(?i)mr", "", "Mr Bean") 
' Bean' 

編輯 Python 3.1中,增加了對正則表達式的標誌,http://docs.python.org/3.1/whatsnew/3.1.html。從3.1開始, re.sub樣子:

re.sub(pattern, repl, string[, count, flags]) 
2

從Python 2.6.4文檔:

re.sub(pattern, repl, string[, count]) 

應用re.sub()不帶標誌設置正則表達式模式。如果你想re.IGNORECASE,你必須使用re.compile()。sub()