2014-08-30 94 views
1

樣品字符串:找到所有分隔符旁邊串子和蟒蛇代替

s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>" 

我需要這個轉換爲:

s = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)" 

這將需要在開始和結束兩個工作標記和所有分隔符,如「。」,「,」,「 - 」,「(」,「)」等。

我可以做一個搜索並替換爲「)」等等,但顯然我想要一些更性感的東西。

所以基本上移動標籤外的所有分隔符。

謝謝!

回答

4

下面的正則表達式可以幫助您將打開和關閉標記內的分隔符移動到結束標記的下一個。

(<sec>)([^.,()-]*)([.,()-])(<\/sec>) 

替換字符串:

\1\2\4\3 

DEMO

>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>" 
>>> re.sub(r'(<sec>)([^.,()-]*)([.,()-])(<\/sec>)', r'\1\2\4\3', s) 
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)' 

OR

這將適用於任何標籤,

>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>" 
>>> re.sub(r'(<(\S+?\b)[^>]*>)([^.,()-]*)([.,()-])(<\/\2>)', r'\1\3\5\4', s) 
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)' 
+0

非常感謝,這正是我所需要的! – SupsH 2014-08-30 18:51:30

2

的其他正則表達式的變化:

>>> s = "Nicely<sec>, John</sec> said hi to a woman (named <sec>Mary)</sec>" 
>>> re.sub(r'((?:<[^>]+>)?)(*[-.(),]+ *)((?:</[^>]+>)?)',r'\3\2\1',s) 
#       ^^  ^^ 
#     move spaces with the punctuation 
#      remove that if not needed 

'Nicely, <sec>John</sec> said hi to a woman (named <sec>Mary</sec>)' 

的想法是交換開放標籤↔標點符號標點符號↔關閉標籤。