否定先前匹配的組詞

我試圖從看起來像這樣的字符串中提取內容：否定先前匹配的組詞

A.content content 
    content 
B.content C. content content 
content D.content

這裏是在Python我的正則表達式：

reg = re.compile(r''' 
    (?xi) 
    (\w\.\t*\s*)+ (?# e.g. A. or b.) 
    (.+)   (?# the alphanumeric content with common symbols) 
    ^(?:\1)  (?# e.g. 'not A.' or 'not b.') 
    ''') 

m = reg.findall(s)

讓我給你一個例。說我有以下字符串：

s = ''' 
a. $1000 abcde!? 
b. (December 31, 1993.) 
c. 8/1/2013 
d. $690 * 10% = 69 Blah blah 
'''

下面的正則表達式工作，並返回到我的正則表達式組的內容：如果內容漸漸成爲了另一條線

reg = re.compile(r''' 
      (?xi) 
      \w\.\t* 
      ([^\n]+) (?# anything not newline char) 
''') 

for c in reg.findall(s): print "line:", c 
>>>line: $1000 abcde!? 
>>>line: (December 31, 1993.) 
>>>line: 8/1/2013 
>>>line: $690 * 10% = 69 Blah blah

但，正則表達式不起作用。

s = ''' 
    a. $1000 abcde!? B.  December 
    31, 1993 c. 8/1/2013 D. $690 * 10% = 
    69 Blah blah 
''' 
reg = re.compile(r''' 
    (?xi) 
    (\w\.\t*\s*)+ (?# e.g. A. or b.) 
    (.+)   (?# the alphanumeric content with common symbols) 
    ^(?:\1)  (?# e.g. 'not A.' or 'not b.') 
    ''') 
for c in reg.findall(s): print "line:", C# no matches :(
>>> blank :(

無論是否有換行符分隔內容，我都希望得到相同的匹配。

這就是爲什麼我嘗試使用否定匹配詞組。因此，有關如何使用正則表達式或其他解決方法解決此問題的任何想法？

謝謝。

保羅

來源

2013-03-04 Paul

你有一些樣本匹配/不匹配？弄清楚你想要做什麼有點困難。 – iamnotmaynard 2013-03-04 18:27:34

我已更新我的問題以提供示例和輸出。 – Paul 2013-03-04 20:45:18

這仍然很神祕。你能發佈你想得到什麼樣的匹配，以及它們與你實際得到的不同嗎？ – 2013-03-04 20:53:55

我想我知道你想要什麼。你想

a. $1000 abcde!? B.  December 
31, 1993 c. 8/1/2013 D. $690 * 10% = 
69 Blah blah

分成

a. $1000 abcde!?
B. December \n31, 1993
c. 8/1/2013
D. $690 * 10% = \n69 Blah blah

，對嗎？然後排除模式斷言是你想要什麼：

reg = re.compile(r''' 
    (?xs)    # no need for i, but for s (dot matches newlines) 
    (\b\w\.\s*)   # e.g. A. or b. (word boundary to restrict to 1 letter) 
    ((?:(?!\b\w\.).)+) # everything until the next A. or b. 
    ''')

與findall()使用它：

>>> reg.findall(s) 
[('a. ', '$1000 abcde!? '), ('B.  ', 'December \n 31, 1993 '), 
('c. ', '8/1/2013 '), ('D. ', '$690 * 10% = \n 69 Blah blah\n')]

如果你不想a.部分，使用

reg = re.compile(r''' 
    (?xs)    # no need for i, but for s (dot matches newlines) 
    (?:\b\w\.\s*)  # e.g. A. or b. (word boundary to restrict to 1 letter) 
    ((?:(?!\b\w\.).)+) # everything until the next A. or b. 
    ''')

來源

2013-03-04 20:58:44

這真是太棒了，如果你知道如何使用它們，正則表達式真的很強大。 – Paul 2013-03-04 21:15:10

否定先前匹配的組詞

回答

相關問題