python中的我的正則表達式沒有正確地遞歸

我想捕獲標記內的所有內容以及它後面的下一行，但是假設它在下一次遇到括號時停止。我究竟做錯了什麼？python中的我的正則表達式沒有正確地遞歸

import re #regex 

regex = re.compile(r""" 
     ^     # Must start in a newline first 
     \[\b(.*)\b\]   # Get what's enclosed in brackets 
     \n     # only capture bracket if a newline is next 
     (\b(?:.|\s)*(?!\[)) # should read: anyword that doesn't precede a bracket 
     """, re.MULTILINE | re.VERBOSE) 

haystack = """ 
[tab1] 
this is captured 
but this is suppose to be captured too! 
@[this should be taken though as this is in the content] 

[tab2] 
help me 
write a better RE 
""" 
m = regex.findall(haystack) 
print m

什麼IM試圖得到的是：
[（ 'TAB1'，「這是捕獲\ n但這個是假設過於捕獲\ n @這應當理解，雖然，因爲這！在內容] \ n」， '[TAB2]'， '幫我\ Nwrite這更好的RE \ n'）]

編輯：

regex = re.compile(r""" 
      ^   # Must start in a newline first 
      \[(.*?)\] # Get what's enclosed in brackets 
      \n   # only capture bracket if a newline is next 
      ([^\[]*) # stop reading at opening bracket 
     """, re.MULTILINE | re.VERBOSE)

這似乎工作，但它也修整括號內內容。

來源

2009-06-05 cybervaldez

Python的正則表達式不支持遞歸afaik。

編輯：但在你的情況下，這會工作：

regex = re.compile(r""" 
     ^   # Must start in a newline first 
     \[(.*?)\] # Get what's enclosed in brackets 
     \n   # only capture bracket if a newline is next 
     ([^\[]*) # stop reading at opening bracket 
    """, re.MULTILINE | re.VERBOSE)

編輯2：是的，它不能正常工作。

import re 

regex = re.compile(r""" 
    (?:^|\n)\[    # tag's opening bracket 
     ([^\]\n]*)   # 1. text between brackets 
    \]\n     # tag's closing bracket 
    (.*?)     # 2. text between the tags 
    (?=\n\[[^\]\n]*\]\n|$) # until tag or end of string but don't consume it 
    """, re.DOTALL | re.VERBOSE) 

haystack = """[tag1] 
this is captured [not a tag[ 
but this is suppose to be captured too! 
[another non-tag 

[tag2] 
help me 
write a better RE[[[] 
""" 

print regex.findall(haystack)

雖然我同意viraptor。正則表達式很酷，但你不能檢查你的文件與他們的錯誤。混合也許？：P

tag_re = re.compile(r'^\[([^\]\n]*)\]$', re.MULTILINE) 
tags = list(tag_re.finditer(haystack)) 

result = {} 
for (mo1, mo2) in zip(tags[:-1], tags[1:]): 
    result[mo1.group(1)] = haystack[mo1.end(1)+1:mo2.start(1)-1].strip() 
result[mo2.group(1)] = haystack[mo2.end(1)+1:].strip() 

print result

編輯3：這是因爲^字符意味着只有內部[^squarebrackets]負匹配。在其他地方，它意味着字符串開始（或開始於re.MULTILINE）。在正則表達式中只有字符沒有好的方式來進行負面字符串匹配。

來源

2009-06-05 09:24:39

感謝您的答覆，我看，我確實嘗試了遞歸（R？），但你說的沒錯它不是真正的工作在Python中，所以你知道一種方式讓我做到這一點，我可以實現我想做的事情？ – cybervaldez 2009-06-05 09:29:40

我有一個問題，它似乎停止時，也有一個括號內的支架。我該如何做到這一點，只有當它僅在行的開始處找到[括號]時纔會停止。 [tab1] – cybervaldez 2009-06-06 11:40:19

謝謝，我的這個問題已經很豐富，因爲很多細節和選擇已經出現。對於事情與你的第一個解決方案有什麼不同，我感到非常驚訝。我不知道爲什麼我的解決方案無法正常工作：（^ [\ n \ [] *），如果在換行符之後有一個[括號]爲什麼它不起作用？這僅僅是爲了思考，你的答案已經很完美了。 – cybervaldez 2009-06-07 00:41:35

這是做你想做的嗎？

regex = re.compile(r""" 
     ^     # Must start in a newline first 
     \[\b(.*)\b\]   # Get what's enclosed in brackets 
     \n      # only capture bracket if a newline is next 
     ([^[]*) 
     """, re.MULTILINE | re.VERBOSE)

這給出了元組列表（每個匹配一個2元組）。如果你想要一個扁平的元組，你可以這樣寫：

m = sum(regex.findall(haystack),())

來源

2009-06-05 09:32:38

首先爲什麼一個正則表達式，如果你試圖解析？正如你所看到的，你無法自己找到問題的根源，因爲正則表達式沒有給出任何反饋。您也沒有在該RE中進行任何遞歸。

讓你的生活簡單：

def ini_parse(src): 
    in_block = None 
    contents = {} 
    for line in src.split("\n"): 
     if line.startswith('[') and line.endswith(']'): 
     in_block = line[1:len(line)-1] 
     contents[in_block] = "" 
     elif in_block is not None: 
     contents[in_block] += line + "\n" 
     elif line.strip() != "": 
     raise Exception("content out of block") 
    return contents

你得到錯誤例外與作爲獎金的能力來調試執行處理。你也可以得到一個字典作爲結果，並可以處理時處理重複的部分。我的結果：

{'tab2': 'help me\nwrite a better RE\n\n', 
'tab1': 'this is captured\nbut this is suppose to be captured too!\[email protected][this should be taken though as this is in the content]\n\n'}

RE很多過度使用這些天...

來源

2009-06-06 12:15:02 viraptor

python中的我的正則表達式沒有正確地遞歸

回答

相關問題