我正在學習Python,並試圖從任何XML文件中提取所有標籤和相應值的列表。這是我的代碼到目前爲止。使用Python將XML轉換爲標籤和值列表
def ParseXml(XmlFile):
try:
parser = etree.XMLParser(remove_blank_text=True, compact=True)
tree = ET.parse(XmlFile, parser)
root = tree.getroot()
ListOfTags, ListOfValues, ListOfAttribs = [], [], []
for elem in root.iter('*'):
Tag = elem.tag
ListOfTags.append(Tag)
value = elem.text
if value is not None:
ListOfValues.append(value)
else:
ListOfValues.append('')
attrib = elem.attrib
if attrib:
ListOfAttribs.append([attrib])
else:
ListOfAttribs.append([])
print('%s File parsed successfully' % XmlFile)
return (ListOfTags, ListOfValues, ListOfAttribs)
except Exception as e:
print('Error while parsing XMLs : %s : %s' % (type(e), e))
return ([], [], [])
對於像這樣的XML輸入:
<?xml version="1.0" encoding="UTF-8"?>
<Application Version="2.01">
<UserAuthRequest>
<VendorApp>
<AppName>SING</AppName>
</VendorApp>
</UserAuthRequest>
<ApplicationRequest ID="12-123-AH">
<GUID>ABD45129-PD1212-121DFL</GUID>
<Type tc="200">Streaming</Type>
<File></File>
<FileExtension VendorCode="200">
<Result>
<ResultCode tc="1">Success</ResultCode>
</Result>
</FileExtension>
</ApplicationRequest>
</Application>
此輸出的標記,值和屬性多個列表。這工作正常。
['Application', 'UserAuthRequest', 'VendorApp', 'AppName', 'ApplicationRequest', 'GUID', 'Type', 'File', 'FileExtension', 'Result', 'ResultCode']
['', '', '', 'SING', '', 'ABD45129-PD1212-121DFL', 'Streaming', '', '', '', 'Success']
[[{'Version': '2.01'}], [], [], [], [{'ID': '12-123-AH'}], [], [{'tc': '200'}], [], [{'VendorCode': '200'}], [], [{'tc': '1'}]]
但我的問題是,我需要標籤,包括父母和孩子的標籤。像下面的實際輸出我靶向:
['Application', 'UserAuthRequest', 'UserAuthRequest.VendorApp', 'UserAuthRequest.VendorApp.AppName', 'ApplicationRequest', 'ApplicationRequest.GUID', 'ApplicationRequest.Type', 'ApplicationRequest.File', 'ApplicationRequest.File.FileExtension', 'ApplicationRequest.File.FileExtension.Result', 'ApplicationRequest.File.FileExtension.Result.ResultCode']
我如何做到這一點與Python?還是有其他的替代方法來做到這一點?
你嘗試過使用BeautifulSoup嗎? – snapcrack
我在某處讀到它與lxml類似的地方。是否有可能使用BeautifulSoup獲得所需的輸出?如果是這樣,怎麼樣? – Naveen
目標輸出似乎不一致,至少對於根節點的孩子來說;他們應該是'Application.UserAuthRequest'和'Application.ApplicationRequest'。另外,_xml_中沒有'ApplicationRequest.File。*'。 – CristiFati