首先,要獲得所需空間,請將\s*
替換爲\s*?
,以使其非貪婪。
首次定位:
>>> re.compile(r'(((iphone|games|mac)\s*?)+)', re.I).sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone</em> <em>games</em> from my <em>mac</em>.'
不幸的是,一旦\s*
是不貪婪,它分裂的短語,你可以看到。沒有它,它是這樣的,將兩者分組在一起:
>>> re.compile(r'(((iPhone|games|mac)\s*)+)').sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games </em>from my <em>mac</em>.'
我還想不出如何解決這個問題。
請注意,在這些我已經卡在一個額外的括號+的周圍,以便所有匹配被抓到 - 這是不同之處。
進一步更新:實際上,我可以想辦法解決它。你決定你是否想這樣。
>>> regex = re.compile(r'((iphone|games|mac)(\s*(iphone|games|mac))*)', re.I)
>>> regex.sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games</em> from my <em>mac</em>.'
更新:把你點大約字邊界考慮,我們只需要在\b
少數情況下,單詞邊界匹配增加。
>>> regex = re.compile(r'(\b(iphone|games|mac)\b(\s*(iphone|games|mac)\b)*)', re.I)
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone games from my mac')
'I love downloading <em>iPhone games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone gameses from my mac')
'I love downloading <em>iPhone</em> gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney games from my mac')
'I love downloading iPhoney <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney gameses from my mac')
'I love downloading iPhoney gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone gameses from my mac')
'I love downloading miPhone gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone games from my mac')
'I love downloading miPhone <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone igames from my mac')
'I love downloading <em>iPhone</em> igames from my <em>mac</em>'
他已經用're.I'覆蓋了區分大小寫。 – snapshoe 2010-11-19 02:29:54
沒錯,錯過了。我想這就是爲什麼他使用re.compile而不是re.sub - 似乎只允許在re.sub中添加'flags'。 – 2010-11-19 02:31:09
謝謝!最後一個是完美的。 – Sean 2010-11-19 03:19:01