2017-05-24 148 views
2

我想刮電子商務網站,使用ajax調用來加載其下一頁。python - 使用BeautifulSoup網站刮ajax網站

我可以抓取第1頁上的數據,但是當我將第1頁滾動到底部時,第2頁會通過ajax調用自動加載。

我的代碼:

from bs4 import BeautifulSoup as soup 
from urllib.request import urlopen as ureq 
my_url='http://www.shopclues.com/mobiles-smartphones.html' 
page=ureq(my_url).read() 
page_soup=soup(page,"html.parser") 
containers=page_soup.findAll("div",{"class":"column col3"}) 
for container in containers: 
    name=container.h3.text 
    price=container.find("span",{'class':'p_price'}).text 
    print("Name : "+name.replace(","," ")) 
    print("Price : "+price) 
for i in range(2,7): 
    my_url="http://www.shopclues.com/ajaxCall/moreProducts?catId=1431&filters=&pageType=c&brandName=&start="+str(36*(i-1))+"&columns=4&fl_cal=1&page="+str(i) 
    page=ureq(my_url).read() 
    print(page) 
    page_soup=soup(page,"html.parser") 
    containers=page_soup.findAll("div",{"class":"column col3"}) 
    for container in containers: 
     name=container.h3.text 
     price=container.find("span",{'class':'p_price'}).text 
     print("Name : "+name.replace(","," ")) 
     print("Price : "+price) 

我已經印刷由ureq讀取AJAX頁面知道我是否能夠打開AJAX頁面,我得到了一個輸出爲: enter image description here

B」'是輸出: 打印(頁)

請爲我提供一個解決方案來刮取剩餘的數據。

+1

試着用'硒'。 –

+1

我是新來的網絡報廢它會是你的一種,如果你可以提供我的代碼 –

+2

我建議使用他們的APi,http://developer.shopclues.com/index.php/API_Basics#link –

回答

1
from selenium import webdriver 
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from bs4 import BeautifulSoup as soup 
from urllib2 import urlopen as ureq 
import random 
import time 

chrome_options = webdriver.ChromeOptions() 
prefs = {"profile.default_content_setting_values.notifications": 2} 
chrome_options.add_experimental_option("prefs", prefs) 

# A randomizer for the delay 
seconds = 5 + (random.random() * 5) 
# create a new Chrome session 
driver = webdriver.Chrome(chrome_options=chrome_options) 
driver.implicitly_wait(30) 
# driver.maximize_window() 

# navigate to the application home page 
driver.get("http://www.shopclues.com/mobiles-smartphones.html") 
time.sleep(seconds) 
time.sleep(seconds) 
# Add more to range for more phones 
for i in range(1): 
    element = driver.find_element_by_id("moreProduct") 
    driver.execute_script("arguments[0].click();", element) 
    time.sleep(seconds) 
    time.sleep(seconds) 
html = driver.page_source 
page_soup = soup(html, "html.parser") 
containers = page_soup.findAll("div", {"class": "column col3"}) 
for container in containers: 
# Add error handling 
    try: 
     name = container.h3.text 
     price = container.find("span", {'class': 'p_price'}).text 
     print("Name : " + name.replace(",", " ")) 
     print("Price : " + price) 
    except AttributeError: 
     continue 
driver.quit() 

我用硒來加載網站,並點擊按鈕加載更多的結果。然後拿出生成的html並輸入你的代碼。

+1

歡迎到StackOverflow!請在答案中提供解釋或文檔,以進一步幫助原始海報和任何可能搜索此答案的人。 –

+1

對不起,說實話我沒有得到你在這裏實際做的。此外,我沒有找到任何按鈕來加載更多的產品,因爲當我向下滾動頁面時頁面本身已經加載。 –

+1

我試了一次,我得到了加載更多的按鈕,但當我點擊它瀏覽器帶我到網址:http://b.codeonclick.com/script/wait.php?stamat=m%7C% 2C%2Cg3F-9jZXoGU3B_9GH0dEdHP3xP.f10%2CqqtKzScrXaD6J-TdEPg201mBMiNRUBdz6CXReBfSkvUVRInI1LXqZThgGFzCEHMpF1lleptOU_QsrpOi6T7Hby7nsDmByZIpPmfQ9jTUqKnDJMkuuIUs2gNUMD-4q8sddxXk9SJ9DV0v5jXqlTWUZdtJQpypd5folRnCfkojHyAp_deich7xrxO_f1wrkstlYSw7fGuN7n6aoTbh6DiYEF0Ypi2LPx8j3rcuOvcI8SqWq0Nn017hDlPJJxhoMjvHa67t4aRUI7sl9iV308NqAjdhpD5WQ7sYXYpfMxy-KpDzCUiL5Ndf-N_giWqeVZ-5 TTC = t4xr44rc –