2012-07-11 37 views
0

匹配使用python的XPath標籤下面是我的代碼:不能下LXML

def extractContent(self,html): 
    parser = etree.XMLParser(ns_clean=True, recover=True) 
    print html.find('id="detail"') 
    tree = etree.fromstring(html,parser) 
    if tree!=None: 
     for c in self.contents: 
     m = tree.xpath(c['xpath']) 
     print m,c['xpath'] 
     if len(m) >= 1: 
      print c['name'] + ' : ' + m[0].text 

我想匹配的HTML源//*[@id="i-detail"]/li[1]但它說明不了什麼。

這裏是上面代碼的輸出:

25803 
[] //*[@id="i-detail"]/li[1] 

這是html代碼:

<div class="mc fore tabcon"> 
        <ul id="i-detail"> 
         <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li> 
         <li>BBBBBBBBB</li> 
....... 

我試圖使用XPath下comandline:

>>> root.xpath('//*[@id="i-detail"]/li') 
>>> [] 
>>> root.xpath('//*[@id="i-detail"]/*') 
>>> [<Element {http://www.w3.org/1999/xhtml}li at 0x1007b7910>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b79b0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7a50>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7aa0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7af0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b40>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b90>] 
>>> root.xpath('//*[@id="i-detail"]/*')[0] <----- this line could get the target ! 
+0

使用'tree not is None','None'是一個單身人士。 – 2012-07-11 08:16:33

+0

請格式化您的代碼。 – 2012-07-11 08:17:20

回答

0

這似乎在我身邊工作:

>>> s = """<div class="mc fore tabcon"> 
        <ul id="i-detail"> 
         <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li> 
         <li>BBBBBBBBB</li> 
        </ul> 
</div>""" 
>>> parser = etree.XMLParser(ns_clean=True, recover=True) 
>>> root = etree.fromstring(s, parser) 
>>> for node in root.xpath('//*[@id="i-detail"]/li[1]'): 
    print node, node.text 


<Element li at 0x12534b8> **AAAAAAAAAAA**(what i want to match)