使用python提取與xml標籤相關聯的值問題

我有一個python代碼，我正在解析xml文件並從中提取所有tags。現在我想提取與tag相關的特定值，但在這樣做中發現了一些問題。我xml文件的示例如下：使用python提取與xml標籤相關聯的值問題

<Cell ss:StyleID="s65"><Data ss:Type="String">Variable Name</Data></Cell> 
    <Cell ss:StyleID="s65"><Data ss:Type="String">Variable Label</Data></Cell> 
    <Cell ss:StyleID="s79"><Data ss:Type="String">Minimum&#10;Value</Data></Cell> 
    <Cell ss:StyleID="s79"><Data ss:Type="String">Maximum&#10;Value</Data></Cell> 
    <Cell ss:StyleID="s80"><Data ss:Type="String">Mean&#10;Value</Data></Cell> 

    <Row ss:AutoFitHeight="0" ss:Height="15"> 
    <Cell ss:StyleID="s73"><Data ss:Type="String">Marks</Data></Cell> 
    <Cell ss:StyleID="s73"><Data ss:Type="String">Marks of Students</Data></Cell> 
    <Cell ss:StyleID="s82"><Data ss:Type="Number">0</Data></Cell> 
    <Cell ss:StyleID="s82"><Data ss:Type="Number">96</Data></Cell> 
    <Cell ss:StyleID="s83"><Data ss:Type="Number">65.71</Data></Cell> 
    </Row>

現在上面只是一個，我想提取出完整的XML文件的一部分。我寫了這個代碼打印的所有標籤中的XML文件：

import xml.etree.ElementTree 
xmlTree = xml.etree.ElementTree.parse('sample_xml.xml').getroot() 

elemList = [] 

for elem in xmlTree.iter(): 
    elemList.append(elem.tag) # indent this by tab, not two spaces as I did here 

# Just printing out the result 

for element in elemList: 
    print(element)

現在，當我執行這個代碼，我看到的是下面的示例輸出的重複一串：

{urn:schemas-microsoft-com:office:spreadsheet}Interior 
{urn:schemas-microsoft-com:office:spreadsheet}NumberFormat 
{urn:schemas-microsoft-com:office:spreadsheet}Protection 
{urn:schemas-microsoft-com:office:spreadsheet}Worksheet 
{urn:schemas-microsoft-com:office:spreadsheet}Table 
{urn:schemas-microsoft-com:office:spreadsheet}Column 
{urn:schemas-microsoft-com:office:spreadsheet}Column 
{urn:schemas-microsoft-com:office:spreadsheet}Column 
{urn:schemas-microsoft-com:office:spreadsheet}Column 
{urn:schemas-microsoft-com:office:spreadsheet}Column 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data 
{urn:schemas-microsoft-com:office:spreadsheet}Row 
{urn:schemas-microsoft-com:office:spreadsheet}Cell 
{urn:schemas-microsoft-com:office:spreadsheet}Data

我不知道哪些單元格，數據，行要定位以提取我需要的值（標記，學生的標記，最小值，最大值），如開始時的示例xml格式所示。我怎樣才能做到這一點？

UPDATE：根據建議，我能夠提取使用下面的代碼進行相關的文本：

for elem in xmlTree.iter(): 
    if elem.text != None: 
     print(elem.text)

現在的問題是，在我的XML文件中有很多不同的文本，但我的想要提取在這4個標籤文本之後出現的4個文本 - Marks，Marks of Students,Minimum Marks,Maximum Marks。如果迭代器在我的當前標記與Marks匹配時移動到下一個標記，並且按照該順序繼續匹配下3個標記，但它不產生所需結果，我試圖使用next()。這裏是我寫的：

for elem in xmlTree.iter(): 
    if elem.text == 'Marks': 
     if next(xmlTree.iter()) == 'Marks of Students': 
      if next(xmlTree.iter()) == 'Minimum Value': 
       if next(xmlTree.iter()) == 'Maximum Value': 
        print(next(elem.text)) 
        print(next(elem.text)) 
        print(next(elem.text)) 
        print(next(elem.text))

來源

2017-04-21 user2966197

我不能重現使用你的XML的修改使其格式良好的問題。請發佈*最少但完整的*示例XML，以及相應的輸出，以顯示問題... – har07

我不能重現你在這裏指定的XML文件的問題。但我懷疑你的XML文件可能是這種格式。

<?xml version="1.0"?> 
<?mso-application progid="Excel.Sheet"?> 
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:x="urn:schemas-microsoft-com:office:excel" 
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
xmlns:html="http://www.w3.org/TR/REC-html40"> 
<Interior/> 
<NumberFormat/> 
<Protection/> 
<Worksheet ss:Name="Sheet1"> 
<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="2685" x:FullColumns="1" 
x:FullRows="1"> 
<Column ss:AutoFitWidth="0" ss:Width="26.25"/> 
<Column ss:AutoFitWidth="0" ss:Width="117" ss:Span="3"/> 
<Column ss:Index="6" ss:AutoFitWidth="0" ss:Width="29.25"/> 
<Row ss:AutoFitHeight="0" ss:Height="60"> 
<Cell ss:StyleID="s22"/> 
<Cell ss:StyleID="s23"><Data ss:Type="String">Name</Data></Cell> 
<Cell ss:StyleID="s23"><Data ss:Type="String">UserName</Data></Cell> 
<Cell ss:StyleID="s23"><Data ss:Type="String">Address</Data></Cell> 
<Cell ss:StyleID="s23"><Data ss:Type="String">Telephone Number</Data></Cell> 
<Cell ss:StyleID="s22"/> 
</Row> 
<Row ss:AutoFitHeight="0" ss:Height="30"> 
<Cell ss:StyleID="s22"/> 
<Cell ss:StyleID="s24"><Data ss:Type="String">John Smith</Data></Cell> 
<Cell ss:StyleID="s24"><Data ss:Type="String">JSmith</Data></Cell> 
<Cell ss:StyleID="s24"><Data ss:Type="String">ABC</Data></Cell> 
<Cell ss:StyleID="s24"><Data ss:Type="String">(999) 999-9999</Data></Cell> 
<Cell ss:StyleID="s22"/> 
</Row> 
</Table> 
</Worksheet> 
</Workbook>

如果這是相同的，那麼你可以使用下面的代碼。

import xml.etree.cElementTree as etree 

with open('sample.xml') as xml_file: 
    tree = etree.iterparse(xml_file) 
    for item in tree: 
     if item[1].text != None: 
      print item[1].text

我已經使用了下面的參考文件來理解和複製代碼。 Reading Excel xml to dictionary

來源

2017-04-21 11:44:57 PAR

當我爲xmlTree.iter（）中的元素執行操作時：if elem [1] .text！= None：print（elem [1 ] .text）'我得到'IndexError：子索引超出範圍' – user2966197

我能夠解決上述錯誤，但我有一個問題。在我的XML文件中有一堆不同的標籤文本。現在我想要做的是檢查標記文本是否是「標記」，然後檢查下3個標記，看它們是否是「學生的標記，最小標記，最大標記」。如果他們然後提取下4個標籤值，否則繼續。我怎樣才能做到這一點？ – user2966197

我已經更新了我的帖子，以反映當前的問題 – user2966197

使用python提取與xml標籤相關聯的值問題

回答

相關問題