Python 3.2：如何將多行字符串拆分爲使用行組的部分

我有跨越多行的數據組文件。數據行的每一部分前面都有兩行，以散列標記（＃）開頭，後面跟着一個換行符（'\ n'），一行破折號（' - '），然後是兩個換行符。Python 3.2：如何將多行字符串拆分爲使用行組的部分

換句話說，文件看起來是這樣的：

# Comment 
# Comment 
data for section 1 
data for section 1 
... 
last line of data for section 1 

-------------------------------------------------- 

# Comment 
# Comment 
data for section 2 
data for section 2 
... 
last line of data for section 2 

-------------------------------------------------- 

...

我想將這個文件分解成每個包圍這樣，這樣我可以單獨處理它們的羣體。由於我手邊有用於文件處理的最簡單的語言是Python 3.2，我試圖建立一個正則表達式來執行這個分割。不幸的是，我無法讓拆分工作。

舉例來說，我已經成功地使用下面的正則表達式查找行忽略：

with open('original.out') as temp: 
    original = temp.read() 
print(re.findall(r'^$|^[#-].*$', original, re.MULTILINE))

但當我嘗試此相同的正則表達式傳遞給re.split()，它只是返回整個文件。

如何以我需要的方式構建這個部分列表，以及我對理解正則表達式（或者Python如何處理它們）缺少的東西，這些東西能幫助我看到解決方案？

來源

2011-11-06 sadakatsu

快速和骯髒的發電的解決方案

from collections import deque 

# yield each section 
def gen_sections(lines): 
    breaker = deque(maxlen=3) 
    section = [] 
    check = [ 
     lambda line: not line.strip(),  # blank 
     lambda line: line.startswith('---'), # dashed line 
     lambda line: not line.strip()  # blank 
    ] 
    for line in lines: 
     line = line.strip() 
     breaker.append(line) 
     section.append(line) 
     if len(breaker) == 3 and all(f(x) for f,x in zip(check, breaker)): 
     yield '\n'.join(section[:-len(breaker)]) 
     section = [] 

# wrap file in this to remove comments 
def no_comments(lines): 
    for line in lines: 
     line = line.strip() 
     if not line.startswith('#'): 
     yield line 

for section in gen_sections(open('file.txt')): 
    print section, '\n'

來源

2011-11-06 07:46:53 Triptych

Python 3.2：如何將多行字符串拆分爲使用行組的部分

回答

相關問題