2011-10-07 76 views
3

我想使用lxml的ElementTree etree在我的xml文檔中查找特定的標籤。 標籤如下所示:在Python lxml中查找前綴標記的技巧?

<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation> 

我希望用etree.find(「文本:statedAge」),但這種方法並不像「文」字頭。 它提到我應該將「文本」添加到前綴映射中,但我不確定如何去做。有小費嗎?

編輯: 我希望能夠寫入hr4e前綴標籤。 下面是該文件的重要組成部分:在XML文檔中

<?xml version="1.0" encoding="utf-8"?> 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
    <header> 
    <documentID root="18c41e51-5f4d-4d15-993e-2a932fed720a" /> 
    <title>Health Records for Everyone Continuity of Care Document</title> 
    <version> 
    <number>1</number> 
</version> 
<confidentiality codeSystem="2.16.840.1.113883.5.25" code="N" /> 
<documentTimestamp value="201105300211+0800" /> 
<personalInformation> 
    <patientInformation> 
    <personID root="2.16.840.1.113883.3.881.PI13023911" /> 
    <personAddress> 
     <streetAddressLine nullFlavor="NI" /> 
     <city>Santa Cruz</city> 
     <state nullFlavor="NI" /> 
     <postalCode nullFlavor="NI" /> 
    </personAddress> 
    <personPhone nullFlavor="NI" /> 
    <personInformation> 
     <personName> 
     <given>Benjamin</given> 
     <family>Keidan</family> 
     </personName> 
     <gender codeSystem="2.16.840.1.113883.5.1" code="M" /> 
     <personDateOfBirth value="NI" /> 
     <hr4e:ageInformation> 
     <hr4e:statedAge>9424</hr4e:statedAge> 
     <hr4e:estimatedAge>0912</hr4e:estimatedAge> 
     <hr4e:yearInSchool>1</hr4e:yearInSchool> 
     <hr4e:statusInSchool>attending</hr4e:statusInSchool> 
     </hr4e:ageInformation> 
    </personInformation> 
    <hr4e:livingSituation> 
     <hr4e:homeVillage>Putney</hr4e:homeVillage> 
     <hr4e:tribe>Oromo</hr4e:tribe> 
    </hr4e:livingSituation> 
    </patientInformation> 
</personalInformation> 

回答

7

命名空間前綴必須聲明(映射到URI)。然後你可以使用{URI}localname notation找到text:statedAge和其他元素。像這樣:

from lxml import etree 

XML = """ 
<root xmlns:text="http://example.com"> 
<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation> 
</root>""" 

root = etree.fromstring(XML) 

ageinfo = root.find("{http://example.com}ageInformation") 
age = ageinfo.find("{http://example.com}statedAge") 
print age.text 

這將打印「12」。

做的另一種方式:

ageinfo = root.find("text:ageInformation", 
        namespaces={"text": "http://example.com"}) 
age = ageinfo.find("text:statedAge", 
        namespaces={"text": "http://example.com"}) 
print age.text 

您還可以使用XPath

age = root.xpath("//text:statedAge", 
       namespaces={"text": "http://example.com"})[0] 
print age.text 
+0

我不斷收到NoneTypes。 .. 是我的根文件。 我試過ageInfo = root.find(「{hr4e :: patientdata} ageInformation」) – super

+0

@super:如果您提供了一個完整的示例XML文檔(更新問題),這將有所幫助。 – mzjn

+0

kk。我包括它。 – super

1

我最後不得不使用嵌套的前綴:

from lxml import etree 

XML = """ 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
<personInformation> 
<hr4e:ageInformation> 
    <hr4e:statedAge>12</hr4e:statedAge> 
</hr4e:ageInformation> 
</personInformation> 
</greenCCD>""" 

root = etree.fromstring(XML) 
#root = etree.parse("hr4e_patient.xml") 

ageinfo = root.find("{AlschulerAssociates::GreenCDA}personInformation/{hr4e::patientdata}ageInformation") 
age = ageinfo.find("{hr4e::patientdata}statedAge") 
print age.text 
+0

偉大的,它適合你(我認爲我給了原來的問題一個很好的答案,考慮到有關實際命名空間的重要信息被省略)。 – mzjn

+0

沒有你的幫助,我不會找到我的解決方案。非常感謝您的親切先生。 – super