2015-12-21 402 views
0

我試圖訪問使用XPath以下URL的元素: http://www.booking.com/searchresults.html?dest_id=2400&dest_type=region&offset=288從URL的Python的XPath返回空列表

我正在尋找的特定元素是div類「sr_item_link_to_villas」。我一直在使用以下XPath試圖訪問它(在這個例子中,我試圖訪問第二上市,但完整的腳本通過每個上市循環),但它返回一個空列表:

//*[@id="hotellist_inner"]/*[contains(@class,"sr_item")][2]//*[contains(@class,"sr_item_link_to_villas ")] 

的完整的代碼是:

url='http://www.booking.com/searchresults.html?dest_id=2400&dest_type=region&offset=288' 
page = parse(url).getroot() 
pathstr='//*[@id="hotellist_inner"]/*[contains(@class,"sr_item")][2]//*[contains(@class,"sr_item_link_to_villas ")]' 
content=page.xpath(pathstr) 

回答

0

以下代碼可能會解決您的目的。您必須爲獲取數據添加標題值。

import urllib2 
    from lxml import etree 
    from lxml.html import tostring,fromstring 

    def get_HTML(url): 
     header={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Connection": "keep-alive"} 
     req=urllib2.Request(url,None,header) 
     return urllib2.urlopen(req).read() 

    url="http://www.booking.com/searchresults.html?dest_id=2400&dest_type=region&offset=288" 

    read = get_HTML(url) 
    tree = etree.HTML(read)  
    data = tree.xpath("//div[@class='sr_item_link_to_villas ']/a/text()"); 
    print data