2013-08-21 38 views
2

僅使用Twitter,例如,此代碼從Twitter頁面中刪除第5條推文。該頁面包含一個鏈接,除了當我嘗試使用lxml和xpath將其拉出時,它只顯示切斷鏈接的文本。Python - lxml /獲取xpath的完整內容

腳本:

import urllib2 
from lxml import etree 

xpathselector = "/html/body/div/div[2]/div/div[5]/div[2]/div/ol/li[5]/div/div/p" 
url = "https://twitter.com/memphismayfire" 
response = urllib2.urlopen(url) 
htmlparser = etree.HTMLParser() 
tree = etree.parse(response, htmlparser) 
result = tree.xpath(xpathselector) 

print result[0].text 

打印:

'英里遠' 飛聲可以在iTunes!誰下載了單曲?!讓我們把它放在單曲榜上!鏈接:

HTML從XPath位置:

<p class="js-tweet-text tweet-text">'Miles Away' Acoustic is available on iTunes! Who's downloaded the single?! Let's get it up the Singles Chart!! Link: <a title="http://smarturl.it/mmf-miles-away" target="_blank" class="twitter-timeline-link" data-expanded-url="http://smarturl.it/mmf-miles-away" dir="ltr" rel="nofollow" href="http://t.co/fU2hVqAiSq" f52ae163cfcf0237f="true"><span class="tco-ellipsis"></span><span class="invisible">http://</span><span class="js-display-url">smarturl.it/mmf-miles-away</span><span class="invisible"></span><span class="tco-ellipsis"><span class="invisible">&nbsp;</span></span></a><div ida2bb72480="_p_mzkte2cwofawsu3r.t.co" style="cursor: pointer; width: 16px; height: 16px;display: inline-block;">&nbsp;</div></p> 

什麼是打印的XPath,而不僅僅是文本的全部內容的最佳方法是什麼?謝謝您的幫助!

+0

對於那些誰去使用lxml.etree,請參閱本:https://stackoverflow.com/questions/43280486/pylint-error-message-e1101-module-lxml -etree-具有-NO-帶狀標籤構件 – illegaldisease

回答