蟒蛇 - beautifulsoup find_all（）產生的無效日期

我的代碼：蟒蛇 - beautifulsoup find_all（）產生的無效日期

import requests 
import re 
from bs4 import BeautifulSoup 

r = requests.get(
    "https://www.traveloka.com/hotel/detail?spec=22-9-2016.24-9-2016.2.1.HOTEL.3000010016588.&nc=1474427752464") 

data = r.content 
soup = BeautifulSoup(data, "html.parser") 
ratingdates = soup.find_all("div", {"class": "reviewDate"}) 

for i in range(0,10): 
    print(ratingdates[i].get_text())

這些代碼將打印「無效的日期」。如何獲得日期？

附加說明：

看來解決方案是使用硒或spynner，但我不知道如何使用它。此外，我不能安裝spynner，它總是堅持安裝lxml

來源

2016-09-21 Eternity Neet

這些日期是通過ajax請求生成的，數據發佈到https://api.traveloka.com/v1/hotel/hotelReviewAggregate，可以複製但不平凡。 –

@PadraicCunningham你會介意檢查我的[新問題]（http://stackoverflow.com/questions/39703021/python-requests-fetch-data-from-api-based-website） –

如果您使用Selenium，這非常簡單。下面是一些解釋一個基本的例子：

要安裝硒運行pip install selenium

from bs4 import BeautifulSoup 
from selenium import webdriver 

# set webdriver's browser to Firefox 
driver = webdriver.Firefox() 

#load page in browser 
driver.get(
    "https://www.traveloka.com/hotel/detail?spec=22-9-2016.24-9-2016.2.1.HOTEL.3000010016588.&nc=1474427752464") 

#Wait 5 seconds after page load so dates are loaded 
driver.implicitly_wait(5) 
#get page's source 
data = driver.page_source 

#rest is pretty much the same 
soup = BeautifulSoup(data, "html.parser") 
ratingdates = soup.find_all("div", {"class": "reviewDate"}) 

#I changed this bit to always print all dates without range issues 
for i in ratingdates: 
    print(i.get_text())

更多關於使用Selenium來看看這裏的文檔 - http://selenium-python.readthedocs.io/

如果你不想讓每次運行腳本時都會彈出Firefox，您可以使用PhantomJS - 一款輕量級的無標頭瀏覽器。在downloading並設置後，您可以在上面的示例中將driver = webdriver.Firefox()更改爲driver = webdriver.PhantomJS()。

來源

2016-09-21 10:26:01 4140tm

我看到它需要一個瀏覽器，我可以不使用瀏覽器嗎？ –

你需要在頁面執行'js'並加載你想要的內容。所以你需要一個瀏覽器。然而，無需每次打開新窗口都可以選擇。它叫做'PhantomJS'，在我的回答結尾添加了一些。 – 4140tm

蟒蛇 - beautifulsoup find_all（）產生的無效日期

回答

相關問題