我有一個python
代碼,我正在解析xml
文件並從中提取所有tags
。現在我想提取與tag
相關的特定值,但在這樣做中發現了一些問題。我xml
文件的示例如下:使用python提取與xml標籤相關聯的值問題
<Cell ss:StyleID="s65"><Data ss:Type="String">Variable Name</Data></Cell>
<Cell ss:StyleID="s65"><Data ss:Type="String">Variable Label</Data></Cell>
<Cell ss:StyleID="s79"><Data ss:Type="String">Minimum Value</Data></Cell>
<Cell ss:StyleID="s79"><Data ss:Type="String">Maximum Value</Data></Cell>
<Cell ss:StyleID="s80"><Data ss:Type="String">Mean Value</Data></Cell>
<Row ss:AutoFitHeight="0" ss:Height="15">
<Cell ss:StyleID="s73"><Data ss:Type="String">Marks</Data></Cell>
<Cell ss:StyleID="s73"><Data ss:Type="String">Marks of Students</Data></Cell>
<Cell ss:StyleID="s82"><Data ss:Type="Number">0</Data></Cell>
<Cell ss:StyleID="s82"><Data ss:Type="Number">96</Data></Cell>
<Cell ss:StyleID="s83"><Data ss:Type="Number">65.71</Data></Cell>
</Row>
現在上面只是一個,我想提取出完整的XML文件的一部分。我寫了這個代碼打印的所有標籤中的XML文件:
import xml.etree.ElementTree
xmlTree = xml.etree.ElementTree.parse('sample_xml.xml').getroot()
elemList = []
for elem in xmlTree.iter():
elemList.append(elem.tag) # indent this by tab, not two spaces as I did here
# Just printing out the result
for element in elemList:
print(element)
現在,當我執行這個代碼,我看到的是下面的示例輸出的重複一串:
{urn:schemas-microsoft-com:office:spreadsheet}Interior
{urn:schemas-microsoft-com:office:spreadsheet}NumberFormat
{urn:schemas-microsoft-com:office:spreadsheet}Protection
{urn:schemas-microsoft-com:office:spreadsheet}Worksheet
{urn:schemas-microsoft-com:office:spreadsheet}Table
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
我不知道哪些單元格,數據,行要定位以提取我需要的值(標記,學生的標記,最小值,最大值),如開始時的示例xml格式所示。我怎樣才能做到這一點?
UPDATE:根據建議,我能夠提取使用下面的代碼進行相關的文本:
for elem in xmlTree.iter():
if elem.text != None:
print(elem.text)
現在的問題是,在我的XML文件中有很多不同的文本,但我的想要提取在這4個標籤文本之後出現的4個文本 - Marks
,Marks of Students
,Minimum Marks
,Maximum Marks
。如果迭代器在我的當前標記與Marks
匹配時移動到下一個標記,並且按照該順序繼續匹配下3個標記,但它不產生所需結果,我試圖使用next()
。這裏是我寫的:
for elem in xmlTree.iter():
if elem.text == 'Marks':
if next(xmlTree.iter()) == 'Marks of Students':
if next(xmlTree.iter()) == 'Minimum Value':
if next(xmlTree.iter()) == 'Maximum Value':
print(next(elem.text))
print(next(elem.text))
print(next(elem.text))
print(next(elem.text))
我不能重現使用你的XML的修改使其格式良好的問題。請發佈*最少但完整的*示例XML,以及相應的輸出,以顯示問題... – har07