在Python中使用正則表達式完全匹配多行

我試圖提取跨越多行的內容。內容是這樣的：在Python中使用正則表達式完全匹配多行

some content here 
[1/1/2015 - SSR] something 
[1/2/2015 - SSR] another: 
*something here 
*another something here 
not relevant, should not be returned 
[1/3/2015 - SSR] another one

總是有前*

的空間我使用的代碼是：

re.search(r'.*- SSR](.*)',line,re.DOTALL)

的預期結果是：

[1/1/2015 - SSR] something 
[1/2/2015 - SSR] another: 
*something here 
*another something here 
[1/3/2015 - SSR] another one

但它只能檢索第一個和第三個記錄，而不是第二個。由於它範圍多行。任何人都可以幫忙嗎？我真的很感激。

來源

2015-03-13 user3238319

http://stackoverflow.com/questions/587345/python-regular-expression-matching-a-multiline-block-of-text – rockerBOO 2015-03-13 18:55:26

包括預期匹配在題。 – 2015-03-13 18:58:48

您可以使用這樣的正則表達式：

^.*?- SSR]([^[]*)

Working demo

enter image description here

比賽信息：

MATCH 1 
1. [34-45] ` something 
` 
MATCH 2 
1. [61-111] ` another: 
*something here 
*another something here 
` 
MATCH 3 
1. [127-139] ` another one`

您可以使用這樣的事情：

import re 
p = re.compile(ur'^\[.*?- SSR]([^[]*)', re.DOTALL | re.MULTILINE) 
test_str = u"some content here\n[1/1/2015 - SSR] something\n[1/2/2015 - SSR] another:\n*something here\n*another something here\n[1/3/2015 - SSR] another one" 

re.findall(p, test_str)

在另一方面，如果你也想拍攝該組中的字符串的開始，那麼你可以使用這個表達式：

^(\[.*?- SSR][^[]*)

Working demo

賽事信息：

MATCH 1 
1. [18-45] `[1/1/2015 - SSR] something 
` 
MATCH 2 
1. [45-111] `[1/2/2015 - SSR] another: 
*something here 
*another something here 
` 
MATCH 3 
1. [111-139] `[1/3/2015 - SSR] another one`

來源

2015-03-13 19:32:05

非常感謝！有一件事：在*之前總會有一個空格。所以看起來以*開頭的行沒有被返回。我如何修改正則表達式？ – user3238319 2015-03-13 19:53:18

@ user3238319它也捕獲'*'之前的空白。點擊這裏https://regex101.com/r/mG9qG1/3 – 2015-03-13 19:58:08

作爲上面的解決方案，我試圖添加另一行不應該匹配，但它也返回。 – user3238319 2015-03-13 20:43:56

假設t分機可以包含尖括號，您可以使用整個前導碼與非捕獲lookahead獲取內容。最後一條記錄需要接近\Z。

import re 

s = """[1/1/2015 - SSR] something 
[1/2/2015 - SSR] another: 
*something here 
*another something here 
[1/3/2015 - SSR] another one""" 

print 'string to process' 
print s 
print 
print 'matches' 
matches = re.findall(
    r'\[\d+/\d+/\d+ - SSR\].*?(?:(?=\[\d+/\d+/\d+ - SSR\])|\Z)', 
    s, re.MULTILINE|re.DOTALL) 
for i, match in enumerate(matches, 1): 
    print "%d: %s" % (i, match.strip())

輸出

string to process 
[1/1/2015 - SSR] something 
[1/2/2015 - SSR] another: 
*something here 
*another something here 
[1/3/2015 - SSR] another one 

matches 
1: [1/1/2015 - SSR] something 
2: [1/2/2015 - SSR] another: 
*something here 
*another something here 
3: [1/3/2015 - SSR] another one

來源

2015-03-13 20:00:40 tdelaney

我試圖添加另一行不應該匹配，但它也返回。 – user3238319 2015-03-13 20:43:34

@ user3238319 - 那是另一個要求。你能告訴我不應該匹配的線嗎？如果你有一套複雜的規則，正則表達式可能不是最好的選擇。例如，python腳本對於正則表達式來說太複雜了，儘管正則表達式被使用了tokenize。 – tdelaney 2015-03-13 20:47:22

@ user3238319 -oh wiat，我明白了。它的「不相關」的路線。匹配線是否真的以「*」開頭？匹配和匹配部分之間有什麼獨特之處？ – tdelaney 2015-03-13 20:49:27

在Python中使用正則表達式完全匹配多行

回答

相關問題