2017-06-03 69 views
0

我是新來的python,我試圖使用這個目前無法運行的代碼從文本文件中提取兩個頭之間的信息。如何提取兩個標題之間的信息?

with open('toysystem.txt','r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    i = 0 
    lines = f.readlines() 
    for line in lines: 
    if line == start: 
    keywords = lines[i+1] 
i += 1 

僅供參考,文本文件看起來像這樣:

<Keywords> 
GTO 
</Keywords> 

上什麼可能是錯誤的代碼的任何想法?或者也許是解決這個問題的另一種方法?

謝謝!

回答

1
  • 行從文件中讀取在結尾處包含換行符號,所以我們也許應該strip他們,

  • f對象是iterator,所以我們並不需要在這裏使用str.readlines方法。

因此,我們可以寫類似

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line) 

給我們

>>> keywords 
['GTO\n'] 

如果您不需要在關鍵字的結尾換行符以及 - 帶他們太:

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip()) 

>>> keywords 
['GTO'] 

但在這種情況下,將更好地generator創建剝離線,如

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    stripped_lines = (line.rstrip() for line in f) 
    for line in stripped_lines: 
     if line == start: 
      break 
    for line in stripped_lines: 
     if line == end: 
      break 
     keywords.append(line) 

這不相同。


最後,如果你需要在腳本中的下一個部分的線,我們可以使用str.readlines和剝離線發生器:

with open('test.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    lines = f.readlines() 
    stripped_lines = (line.rstrip() for line in lines) 
    for line in stripped_lines: 
     if line.rstrip() == start: 
      break 
    for line in stripped_lines: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip()) 

給我們

>>> lines 
['<Keywords>\n', 'GTO\n', '</Keywords>\n'] 
>>> keywords 
['GTO'] 

進一步閱讀

0

使用Python重新模塊insted的和使用正則表達式解析它?

import re 
with open('toysystem.txt','r') as f: 
    contents = f.read() 
    # will find all the expressions in the file and return a list of values inside the(). You can extend the expression according to your need. 
    keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>') 
    print(keywords) 

從您的文件時,它會打印

['GTO'] 

更多有關正則表達式和python檢查TutorialspointFor python3Python2