2016-03-02 63 views
0

我需要從下面的文本文件中提取值:複製部分

fdsjhgjhg 
fdshkjhk 
Start 
Good Morning 
Hello World 
End 
dashjkhjk 
dsfjkhk 

我需要提取的值是從開始到結束。

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: 
    copy = False 
    for line in infile: 
     if line.strip() == "Start": 
      copy = True 
     elif line.strip() == "End": 
      copy = False 
     elif copy: 
      outfile.write(line) 

上面我使用的代碼是從這樣一個問題: Extract Values between two strings in a text file using python

該代碼將不包含字符串「開始」和「結束」只是裏面是什麼他們。你會如何包含周邊字符串?

+0

我會用多正則表達式爲 - 的代碼也將尋找更容易 – MaxU

回答

2

@en_Knight幾乎是正確的。這裏有一個修復,以滿足業務方案的要求,即分隔符包含在輸出:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: 
    copy = False 
    for line in infile: 
     if line.strip() == "Start": 
      copy = True 
     if copy: 
      outfile.write(line) 
     # move this AFTER the "if copy" 
     if line.strip() == "End": 
      copy = False 

或者乾脆包括寫(),它適用於情況:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: 
    copy = False 
    for line in infile: 
     if line.strip() == "Start": 
      outfile.write(line) # add this 
      copy = True 
     elif line.strip() == "End": 
      outfile.write(line) # add this 
      copy = False 
     elif copy: 
      outfile.write(line) 

更新:到回答在評論這個問題:「只能用‘結束’一號次數後‘開始’」,最後elif line.strip() == "End"更改爲:

 elif line.strip() == "End" and copy: 
      outfile.write(line) # add this 
      copy = False 

這如果只有一個「開始」,但多個「結束」......這聽起來很奇怪,但這是提問者所問的。

+0

這使得有很大的意義。是否可以有選擇性地結束複製,僅在'開始'之後使用'結束'的第一次出現。我的文件包含多個字符串'End'? – johnnydrama

1

elifmeans「只有在其他情況失敗時才這樣做」。它在語法上等同於「else if」,if you're coming from a differnet C-like語言。沒有它,秋季應該照顧包括「開始」和「結束」

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: 
    copy = False 
    for line in infile: 
     if line.strip() == "Start": 
      copy = True 
     if copy: # flipped to include end, as Dan H pointed out 
      outfile.write(line) 
     if line.strip() == "End": 
      copy = False 
1

正則表達式的方法:

import re 

with open('input.txt') as f: 
    data = f.read() 

match = re.search(r'\n(Start\n.*?\nEnd)\n', data, re.M | re.S) 
if match: 
    with open('output.txt', 'w') as f: 
     f.write(match.group(1)) 
+0

這可能是更強大的解決方案,但對於elif v如果不清楚的人,也許可以包含一些文字描述? –

+0

這樣比較好:'(^ Start [\ s \ S] +^End)'[Demo](https://regex101.com/r/gT0eR6/1)(或'(^ Start [\ s \ S] +?^ End)'如果有多於1個'End' ...) – dawg