2012-08-03 100 views
0
import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
url = urlset.getElementsByTagName("url") 

for i in range(0, url.length): 
    loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue 
    lastmod = url[i].getElementsByTagName("lastmod")[0].childNodes[0].nodeValue 
    changefreq = url[i].getElementsByTagName("changefreq")[0].childNodes[0].nodeValue 
    priority = url[i].getElementsByTagName("priority")[0].childNodes[0].nodeValue 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority) 

是否沒有簡單的方法來獲取節點的值?解析XML以獲取節點的值

loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue 

回答

0

有可能是一個更好的方式來獲得一個節點的值...但是這至少是一個更清潔的替代,你不要重複自己:

import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

def get_first_node_val(obj, tag): 
    return obj.getElementsByTagName(tag)[0].childNodes[0].nodeValue 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
urls = urlset.getElementsByTagName("url") 

for url in urls: 
    loc = get_first_node_val(url, "loc") 
    lastmod = get_first_node_val(url, "lastmod") 
    changefreq = get_first_node_val(url, "changefreq") 
    priority = get_first_node_val(url, "priority") 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority) 
0

這項工作:loc = getElementsByTagName("loc")[i].innerHTML

+0

這不是Python的。 – anjanesh 2012-08-03 07:19:25

0

爲什麼點不則firstChild

loc = url[i].getElementsByTagName("loc").firstChild.nodeValue 
+0

回溯(最近最後調用): 文件 「script.py」,第31行,在 LOC = URL [I] .getElementsByTagName( 「LOC」)firstChild.nodeValue AttributeError的: '節點列表' 對象沒有屬性'firstChild' – anjanesh 2012-08-03 07:58:35

+0

from xml.dom.minidom import Node ..您是否導入節點? – 2012-08-03 08:23:35

0

向「get_first_node_val」添加附加功能,該功能接受具有相同節點值的XML元素。例如,以下包含兩個loc元素。

<url> 
<loc>http://domain.com/</loc> 
<loc>http://sub.domain.com</loc> 
<lastmod>2011-01-27T23:55:42+01:00</lastmod> 
<changefreq>daily</changefreq> 
<priority>0.5</priority> 
</url> 


def get_first_node_val(obj, tag): 
    element = [] 
    l = 0 
    for x in obj.getElementsByTagName(tag): 
    element.append({tag : obj.getElementsByTagName(tag)[l].childNodes[0].nodeValue}) 
    l += 1 
    return element 

輸出

[{'loc': u'http://domain.com/'}, {'loc': u'http://sub.domain.com'}], [{'lastmod': u'2011-01-27T23:55:42+01:00'}], [{'changefreq': u'daily'}], [{'priority': u'0.5'}]