1
我已經使用Selenium在以下網站下載了https://www.eex-transparency.com/homepage/power/czech-republic/production/availability/non-usability/non-usability。我正在刮所有的表格數據。它運行良好,但運行該腳本需要相當長的時間。因此,我開始尋找替代方案,並在這裏使用API向StackOverflow發送了請求到服務器的幾個主題,但經過數小時的嘗試和搜索後,我放棄了,因爲我沒有得到幾件事:在Python中刪除AJAX加載的網站
- 如何反向工程API發送正確的請求?
- 我應該使用哪個url鏈接?
這是我想出了:
import json
import requests
url = "https://www.eex-transparency.com/ajax/en/navigation/ajaxGetNavi/12"
data = {
"id": "16",
"title": "Czech Republic",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic",
"class": "country",
"description": "",
"children": [
{
"id": "649",
"title": "Production",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "650",
"title": "Capacity",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "651",
"title": "Installed Capacity",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic\\/production\\/capacity\\/installed-capacity",
"class": "",
"description": ""
}
]
}
]
}
]
}
response = requests.get(url, data=data)
file = response.json()
在一般情況下,也許有人可以解釋,我應該以什麼措施刮除後網頁,我特別感興趣的是如何找到正確的來自Chrome( - > Inspect - > Network - > XHR)的信息以及如何從後面的信息生成data
變量(即我輸入requests
)?
這是怎麼想的?你沒有提供任何細節.. – Aertonas