解析XML以獲取節點的值

import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
url = urlset.getElementsByTagName("url") 

for i in range(0, url.length): 
    loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue 
    lastmod = url[i].getElementsByTagName("lastmod")[0].childNodes[0].nodeValue 
    changefreq = url[i].getElementsByTagName("changefreq")[0].childNodes[0].nodeValue 
    priority = url[i].getElementsByTagName("priority")[0].childNodes[0].nodeValue 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)

是否沒有簡單的方法來獲取節點的值？解析XML以獲取節點的值

loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue

來源

2012-08-03 anjanesh

有可能是一個更好的方式來獲得一個節點的值...但是這至少是一個更清潔的替代，你不要重複自己：

import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

def get_first_node_val(obj, tag): 
    return obj.getElementsByTagName(tag)[0].childNodes[0].nodeValue 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
urls = urlset.getElementsByTagName("url") 

for url in urls: 
    loc = get_first_node_val(url, "loc") 
    lastmod = get_first_node_val(url, "lastmod") 
    changefreq = get_first_node_val(url, "changefreq") 
    priority = get_first_node_val(url, "priority") 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)

來源

2012-08-03 07:25:42 Jack

這項工作：loc = getElementsByTagName("loc")[i].innerHTML？

來源

2012-08-03 07:16:33

這不是Python的。 – anjanesh 2012-08-03 07:19:25

爲什麼點不則firstChild

loc = url[i].getElementsByTagName("loc").firstChild.nodeValue

來源

2012-08-03 07:26:56

回溯（最近最後調用）：文件「script.py」，第31行，在 LOC = URL [I] .getElementsByTagName（「LOC」）firstChild.nodeValue AttributeError的： '節點列表' 對象沒有屬性'firstChild' – anjanesh 2012-08-03 07:58:35

from xml.dom.minidom import Node ..您是否導入節點？ – 2012-08-03 08:23:35

向「get_first_node_val」添加附加功能，該功能接受具有相同節點值的XML元素。例如，以下包含兩個loc元素。

<url> 
<loc>http://domain.com/</loc> 
<loc>http://sub.domain.com</loc> 
<lastmod>2011-01-27T23:55:42+01:00</lastmod> 
<changefreq>daily</changefreq> 
<priority>0.5</priority> 
</url> 


def get_first_node_val(obj, tag): 
    element = [] 
    l = 0 
    for x in obj.getElementsByTagName(tag): 
    element.append({tag : obj.getElementsByTagName(tag)[l].childNodes[0].nodeValue}) 
    l += 1 
    return element

輸出

[{'loc': u'http://domain.com/'}, {'loc': u'http://sub.domain.com'}], [{'lastmod': u'2011-01-27T23:55:42+01:00'}], [{'changefreq': u'daily'}], [{'priority': u'0.5'}]

來源

2014-11-29 18:55:34

解析XML以獲取節點的值

回答

相關問題