使用python lxml獲取Excel xml的父屬性

我有一個Excel XML文件，我需要獲取確定顏色（內部）單元的元素的樣式ID。使用python lxml獲取Excel xml的父屬性

我有這個Excel XML作爲例子：

而且這是該文件的標題：

<?xml version="1.0"?> 
<?mso-application progid="Excel.Sheet"?> 
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:x="urn:schemas-microsoft-com:office:excel" 
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:html="http://www.w3.org/TR/REC-html40"> 
    <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

在這裏，我需要訪問：

<Style ss:ID="s64"> 
    <Interior ss:Color="#00CC00" ss:Pattern="Solid"/> 
</Style>

我需要編寫一個傳遞顏色的函數＃00CC00我得到這個元素，然後我可以訪問它的父節點來獲得ID。

我已經嘗試過使用此代碼，並且它無效。我想我應該使用命名空間。

parser = et.parse(str(file)) 
color = parser.xpath("//interior[@ss:Color='#FFCC00'") 
par = color.getparent() 
print(par)

我需要的代碼返回「S64」。

但它不是有效的代碼。我錯過了什麼？

編輯：我想編輯我的問題，並添加一些額外的信息，我寫這段代碼

def _find_color(self): 
    """ 
    Find the color in the xml file and returns the attribute. 
    """ 
    print('The folder is: ', self.path) 
    nsd ={'Default':'urn:schemas-microsoft-com:office:spreadsheet', 
       'o': 'urn:schemas-microsoft-com:office:office', 
       'ss': 'urn:schemas-microsoft-com:office:spreadsheet'} 
    if pathlib.Path(self.path).exists(): 
     for file in self.folder.glob('**/*.xml'): 
      print('The file is ', file) 
      parser = et.parse(str(file)) 
      color = parser.xpath("//style/interior[@ss:Color='#00CC00']",namespaces=nsd) 
      print(color) 
      #par = color.getparent() 
      #print(par)

但是它返回一個空列表尋找更多的信息之後。所以它沒有找到任何東西。

並稱，我有興趣與

<?xml version="1.0"?> 
<?mso-application progid="Excel.Sheet"?> 
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:x="urn:schemas-microsoft-com:office:excel" 
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:html="http://www.w3.org/TR/REC-html40"> 
    <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> 
    <Author>Somebody</Author> 
    <LastAuthor>Somebody</LastAuthor> 
    <Created>2016-05-16T10:44:52Z</Created> 
    <Company>SomeCompany</Company> 
    <Version>12.00</Version> 
    </DocumentProperties> 
    <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 
    <WindowHeight>9495</WindowHeight> 
    <WindowWidth>20835</WindowWidth> 
    <WindowTopX>240</WindowTopX> 
    <WindowTopY>420</WindowTopY> 
    <ProtectStructure>False</ProtectStructure> 
    <ProtectWindows>False</ProtectWindows> 
    </ExcelWorkbook> 
    <Styles> 
    <Style ss:ID="Default" ss:Name="Normal"> 
     <Alignment ss:Vertical="Bottom"/> 
     <Borders/> 
     <Font ss:FontName="Arial" x:Family="Swiss"/> 
     <Interior/> 
     <NumberFormat/> 
     <Protection/> 
    </Style> 
    <Style ss:ID="s63"> 
     <Font ss:FontName="Arial" x:Family="Swiss" ss:Color="#FF0000" ss:Bold="1"/> 
    </Style> 
    <Style ss:ID="s64"> 
     <Interior ss:Color="#00CC00" ss:Pattern="Solid"/> 
    </Style> 
    <Style ss:ID="s65"> 
     <Font ss:FontName="Arial" x:Family="Swiss" ss:Color="#FF0000" ss:Bold="1"/> 
    <Interior ss:Color="#44CF00" ss:Pattern="Solid"/> 
     </Style> 
    </Styles>

工作，我不能夠找到基於使用XPath該屬性的元素的整個源代碼的一部分。

來源

2017-06-16 TMikonos

這是如何做到的。

from lxml import etree as ET 

NS = {"ss": "urn:schemas-microsoft-com:office:spreadsheet"} 

tree = ET.parse("workbook.xml") 
interior = tree.find("//ss:Style/ss:Interior[@ss:Color='#00CC00']", namespaces=NS) 
print(interior.getparent().get("{urn:schemas-microsoft-com:office:spreadsheet}ID"))

輸出：

s64

評論：

的ss前綴必須在所有元素中使用。
XML區分大小寫（Style！= style）。
當獲得名稱空間ID屬性的值時，必須使用URI（而不是前綴）。

來源

2017-06-21 16:24:44 mzjn

這是行不通的。我得到一個None對象。我認爲這是因爲我沒有真正創造樹。如果我添加getroot（），並使用find生成的對象：我得到了錯誤，我無法在元素上使用絕對路徑。此外，我可以創建一個字典，只有名稱空間，我將使用，在這種情況下，爲「SS」？ – TMikonos

我不確定我可以添加什麼。該代碼適用於我。沒有必要添加'getroot（）'。在我的回答中，「workbook.xml」是你添加了''結束標籤的問題中的XML，以使其格式良好。是的，NS字典只需要包含代碼中實際使用的名稱空間。 – mzjn

尋找我終於找到了解決方案。它看起來像犯錯的那一次是，我沒有產生樹（我解決了這個與getroot（））所以我的解決辦法是：

def _find_color(self): 
    """ 
    Find the color in the xml file and returns the attribute. 
    """ 
    print('The folder is: ', self.path) 
    nsd ={'Default':'urn:schemas-microsoft-com:office:spreadsheet', 
       'o': 'urn:schemas-microsoft-com:office:office', 
       'ss': 'urn:schemas-microsoft-com:office:spreadsheet'} 
    if pathlib.Path(self.path).exists(): 
     for file in self.folder.glob('**/*.xml'): 
      print('The file is ', file) 
      parser = et.parse(str(file)) 
      root=parser.getroot() 
      color = root.xpath("//Default:Interior[@ss:Color='#FFCC00']",namespaces=nsd) 
    print(color) 
      for element in color: 
       print('Tag: ', element.tag, 'Attribute: ', element.attrib) 
       par_id= element.getparent().get("{urn:schemas-microsoft-com:office:spreadsheet}ID") 
       print(par_id)

它返回S64。

對於獲取父母的id的部分，我使用了mzjn提供的解決方案。正如我所知道的，我必須使用URI而不是短名稱。

來源

2017-06-22 08:13:02 TMikonos

使用python lxml獲取Excel xml的父屬性

回答

相關問題