在用Python語言解析XML時無法訪問子節點

我對python腳本語言非常陌生，最近正在研究解析基於web的xml文件的解析器。在用Python語言解析XML時無法訪問子節點

我能夠檢索所有使用Python中的minidom但沒有問題的元素之一，但我有一個節點，我遇到了麻煩。我從XML文件所需要的最後一個節點是「圖像」標籤中的「網址」，這可以在下面的XML文件示例中找到：

<events> 
    <event id="abcde01"> 
     <title> Name of event </title> 
     <url> The URL of the Event <- the url tag I do not need </url> 
     <image> 
      <url> THE URL I DO NEED </url> 
     </image> 
    </event>

下面我抄我的代碼簡短的部分，我感覺可能是相關的。我真的很感謝任何幫助，以檢索這最後的圖像網址節點。我還將包括我嘗試過的以及在GAE中運行此代碼時收到的錯誤。我使用的Python版本是Python 2.7，我也許應該指出，我將它們保存在數組中（以便以後輸入到數據庫中）。

class XMLParser(webapp2.RequestHandler): 
def get(self): 
     base_url = 'http://api.eventful.com/rest/events/search?location=Dublin&date=Today' 
     #downloads data from xml file: 
     response = urllib.urlopen(base_url) 
     #converts data to string 
     data = response.read() 
     unicode_data = data.decode('utf-8') 
     data = unicode_data.encode('ascii','ignore') 
     #closes file 
     response.close() 
     #parses xml downloaded 
     dom = mdom.parseString(data)   
     node = dom.documentElement #needed for declaration of variable 
     #print out all event names (titles) found in the eventful xml 
     event_main = dom.getElementsByTagName('event') 

     #URLs list parsing - MY ATTEMPT - 
     urls_list = [] 
     for im in event_main: 
      image_url = image.getElementsByTagName("image")[0].childNodes[0] 
      urls_list.append(image_url)

錯誤我收到的是以下任何的幫助深表感謝，卡倫

image_url = im.getElementsByTagName("image")[0].childNodes[0] 
IndexError: list index out of range

來源

2013-04-20 Karen

不要對數據進行解碼和重新編碼！將解碼保留到XML解析器。任何你不能使用[ElementTree API]（http://docs.python.org/2/library/xml.etree.elementtree.html）而不是minidom的理由？ – 2013-04-20 08:08:30

該URL會爲我返回錯誤響應;我收到一個'Authentication Error'消息。也許你也這樣做？ – 2013-04-20 08:11:07

嗨@MartijnPieters，我已經遺漏了這個例子的API key，就像我認爲它會使它更簡單一樣。如果你覺得這樣會更有用，我可以插入api鍵，但是我沒有問題，它更像是訪問圖像標籤的元素。由於xml數據中發現的黑星的編碼問題，我必須解碼並重新編碼xml數據。 http://stackoverflow.com/questions/16026594/unicode-encoding-errors-python-parsing-xml-cant-encode-a-character-star/16073981?noredirect=1#16073981 – Karen 2013-04-20 09:25:18

首先，做不重新編碼的內容。沒有必要這樣做，XML解析器完全能夠處理編碼的內容。

接下來，我會用ElementTree API像這樣的任務：

from xml.etree import ElementTree as ET 

response = urllib.urlopen(base_url) 
tree = ET.parse(response) 

urls_list = [] 
for event in tree.findall('.//event[image]'): 
    # find the text content of the first <image><url> tag combination: 
    image_url = event.find('.//image/url') 
    if image_url is not None: 
     urls_list.append(image_url.text)

這隻consideres有直接image子元素event元素。

來源

2013-04-20 08:16:00

在用Python語言解析XML時無法訪問子節點

回答

相關問題