2017-02-28 127 views
0

我一直在毆打自己試圖從庫文檔中解析XML響應,但無法確定一種簡單的方法來查找我想要的值。我將使用任何通用庫。XML解析幫助Python lxml,etree或dom

示例XML響應這是字符串格式:

<entry 
     xmlns="http://www.w3.org/2005/Atom" 
     xmlns:s="http://dev.splunk.com/ns/rest" 
     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"> 
    <title>search index</title> 
    <id>https://localhost:8089/services/search/jobs/mysearch_02151949</id> 
    <updated>2011-07-07T20:49:58.000-07:00</updated> 
    <link href="/services/search/jobs/mysearch_02151949" rel="alternate"/> 
    <published>2011-07-07T20:49:57.000-07:00</published> 
    <link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/> 
    <link href="/services/search/jobs/mysearch_02151949/events" rel="events"/> 
    <link href="/services/search/jobs/mysearch_02151949/results" rel="results"/> 
    <link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/> 
    <link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/> 
    <link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/> 
    <link href="/services/search/jobs/mysearch_02151949/control" rel="control"/> 
    <author> 
    <name>admin</name> 
    </author> 
    <content type="text/xml"> 
    <s:dict> 
     <s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key> 
     <s:key name="delegate"></s:key> 
     <s:key name="diskUsage">2174976</s:key> 
     <s:key name="dispatchState">DONE</s:key> 
     <s:key name="doneProgress">1.00000</s:key> 
     <s:key name="dropCount">0</s:key> 
     <s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key> 
     <s:key name="eventAvailableCount">287</s:key> 
     <s:key name="eventCount">287</s:key> 
     <s:key name="eventFieldCount">6</s:key> 
     <s:key name="eventIsStreaming">1</s:key> 
     <s:key name="eventIsTruncated">0</s:key> 
     <s:key name="eventSearch">search index</s:key> 
     <s:key name="eventSorting">desc</s:key> 
     <s:key name="isDone">1</s:key> 

我已經截斷輸出與兩個值我想對於文本值:

  • 名=「isDone」( 1)
  • 名稱= 「doneProgress」(1.00000)
  • 名稱= 「EVENTCOUNT」(287)

如何找到這些數值?

+0

你看過beautifulsoup4嗎?我對它有好運。例如:http://stackoverflow.com/questions/4071696/python-beautifulsoup-xml-parsing#4093940 –

+0

我是BS4的忠實粉絲。我只是想要一個真正的XML庫來完成這項工作,因爲它是與原生XML的Splunk集成的。 – DJimmy

回答

0

您可以使用lxmlxpath

ns = {'s':"http://dev.splunk.com/ns/rest"} 
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns) 

,它將打印[1]。完整示例:

xml = ''' 
<entry 
     xmlns="http://www.w3.org/2005/Atom" 
     xmlns:s="http://dev.splunk.com/ns/rest" 
     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"> 
    <title>search index</title> 
    <id>https://localhost:8089/services/search/jobs/mysearch_02151949</id> 
    <updated>2011-07-07T20:49:58.000-07:00</updated> 
    <link href="/services/search/jobs/mysearch_02151949" rel="alternate"/> 
    <published>2011-07-07T20:49:57.000-07:00</published> 
    <link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/> 
    <link href="/services/search/jobs/mysearch_02151949/events" rel="events"/> 
    <link href="/services/search/jobs/mysearch_02151949/results" rel="results"/> 
    <link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/> 
    <link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/> 
    <link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/> 
    <link href="/services/search/jobs/mysearch_02151949/control" rel="control"/> 
    <author> 
    <name>admin</name> 
    </author> 
    <content type="text/xml"> 
    <s:dict> 
     <s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key> 
     <s:key name="delegate"></s:key> 
     <s:key name="diskUsage">2174976</s:key> 
     <s:key name="dispatchState">DONE</s:key> 
     <s:key name="doneProgress">1.00000</s:key> 
     <s:key name="dropCount">0</s:key> 
     <s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key> 
     <s:key name="eventAvailableCount">287</s:key> 
     <s:key name="eventCount">287</s:key> 
     <s:key name="eventFieldCount">6</s:key> 
     <s:key name="eventIsStreaming">1</s:key> 
     <s:key name="eventIsTruncated">0</s:key> 
     <s:key name="eventSearch">search index</s:key> 
     <s:key name="eventSorting">desc</s:key> 
     <s:key name="isDone">1</s:key> 
    </s:dict> 
    </content> 
</entry> 
''' 

from lxml import etree 
from cStringIO import StringIO 

xml = StringIO(xml) 
xml = etree.parse(xml) 
ns = {'s':"http://dev.splunk.com/ns/rest"} 
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns) 
+0

我的來源已在字符串格式,所以我能夠省略 '從cStringIO進口StringIO' 'XML = StringIO的(XML)' 我已經使用這個嘗試: 'XML = etree.fromstring(XML )' 'ns = {'s':「http://dev.splunk.com/ns/rest」}' 'print xml.xpath(「// s:key [@ name ='isDone']/text()「,namespaces = ns)' 但現在我得到AttributeError:'元素'對象沒有屬性xpath – DJimmy

+0

其實,它的工作。我的AttributeError問題是無關的。 – DJimmy