2015-07-21 79 views

回答

3

lxml分析非格式良好的XML時,應該已經拋出的異常,例如:

from lxml import etree 

xml = """ 
<multipleroot> 
    <noclosingtag> 
</multipleroot> 
<multipleroot></multipleroot>""" 
doc = etree.fromstring(xml) 

拋出異常:

Traceback (most recent call last): 
    File "D:\StackOverflow\Python\Q50.py", line 8, in <module> 
    doc = etree.fromstring(xml) 
    ...... 
    ...... 
XMLSyntaxError: Opening and ending tag mismatch: noclosingtag line 3 and multipleroot, line 4, column 16 

不過,如果你明確告訴XMLParser恢復非格式良好的XML,或者您正在使用HTMLParserlxml仍可以解析XML:

from lxml import etree 

xml = """ 
<multipleroot> 
    <noclosingtag> 
</multipleroot> 
<multipleroot></multipleroot>""" 
parser = etree.XMLParser(recover=True) 
#parser = etree.HTMLParser() 
doc = etree.fromstring(xml, parser=parser) 
print(etree.tostring(doc)) 

成功打印解析的XML:

<multipleroot> 
    <noclosingtag> 
</noclosingtag> 
<multipleroot/></multipleroot> 
+0

捕捉異常的最正確方法是什麼如果程序的要點是檢查xml文件是否格式正確? – Celeritas

+0

捕獲'XMLSyntaxError'?像這樣:'嘗試:...除了etree.XMLSyntaxError:....' – har07

+0

Riiiiight ...我不知道這是etree.XMLSyntaxError被拋出。我發現它不清楚在python文件知道什麼樣的異常正在拋出... – Celeritas

相關問題