Python lxml xpath返回無輸出

我嘗試使用Python中的lxml在網站上刮取特定元素。下面你可以找到我的代碼，但沒有輸出。Python lxml xpath返回無輸出

from lxml import html 

    webpage = 'http://www.funda.nl/koop/heel-nederland/' 
    page = requests.get(webpage) 
    tree = html.fromstring(page.content) 

    content = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()' 
    content = str(tree.xpath(content)) 
    print content

來源

2017-04-27 Nweijden

看起來你正試圖報廢的網站並不喜歡被廢棄。他們利用各種技術來檢測請求是來自合法用戶還是來自bot，並且如果他們認爲它來自bot，則阻止訪問。這就是爲什麼你的xpath找不到任何東西，這就是爲什麼你應該重新考慮你正在做的事情。

如果您決定要繼續，那麼欺騙這個特定網站的最簡單方法似乎是將cookies添加到您的請求中。

首先，使用你真正的瀏覽器獲得cookie字符串：

打開新的標籤頁
開放開發工具
轉到開發工具
如果網絡選項卡是空的「網絡」選項卡，刷新頁面
查找請求到heel-nederland/並點擊它
在請求標題中，你會發現cookie字符串 - 它很長，並且包含許多看似隨機的字符。它複製

然後，修改程序中使用這些cookie：

import requests 
from lxml import html 

webpage = 'http://www.funda.nl/koop/heel-nederland/' 
headers = { 
     'Cookie': '<string copied from browser>' 
     } 
page = requests.get(webpage, headers=headers) 
tree = html.fromstring(page.content) 

selector = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()' 
content = str(tree.xpath(selector)) 
print content

來源

2017-04-29 21:05:19

Python lxml xpath返回無輸出

回答

相關問題