2017-05-25 68 views
1

我試圖在genius.com上颳去某些藝術家的歌曲頁面的鏈接,但我遇到了問題,因爲單個歌曲頁面的鏈接顯示在彈出式模式窗口中。在Python中使用Selenium滾動模式窗口

模態窗口不會一次加載所有鏈接,而是在向下滾動到模態底部時通過ajax加載更多內容。

我嘗試使用代碼來滾動到頁面的底部,但遺憾的是,僅僅滾動窗口後面的模式,而不是模式本身:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 

於是我試圖在選擇的最後一個元素模態和滾動到(有做已加載幾次,直到所有歌曲頁面的想法),但它不會滾動遠遠不足以讓網站加載更多內容

last_element = driver.find_elements_by_xpath('//div[@class="mini_card-metadata"]')[-1] 
last_element.location_once_scrolled_into_view 

這裏是我的到目前爲止的代碼:

import os 
from bs4 import BeautifulSoup 
from selenium import webdriver 

chrome_driver = "/Applications/chromedriver" 
os.environ["webdriver.chrome.driver"] = chrome_driver 
driver = webdriver.Chrome(chrome_driver) 

base_url = 'https://genius.com/artists/Stormzy' 
driver.get(base_url) 

xpath_str = '//div[contains(text(),"Show all songs by Stormzy")]' 
driver.find_element_by_xpath(xpath_str).click() 

有沒有辦法提取藝術家的所有歌曲頁面鏈接?

+0

請參見:[我如何做X?](https://meta.stackoverflow.com/questions/253069/whats-the-appropriate-new-current-close -SO-do-x-do-x)對SO的期望是,用戶提出問題不僅要研究回答他們自己的問題,還要分享研究,代碼嘗試和結果。這表明你已經花時間去嘗試幫助自己,它使我們避免重申明顯的答案,最重要的是它可以幫助你得到更具體和相關的答案!另見:[問] – JeffC

回答

0

當您滾動模式對話框的底部調用它

$scrollable_data_ctrl.load_next(); 

爲選項,您可以嘗試執行它,直到新的成果出現在模式

driver.execute_script("$scrollable_data_ctrl.load_next();") 
0

嘗試下面的代碼,以獲得所需的輸出:

from selenium import webdriver as web 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait as wait 
from selenium.webdriver.common.keys import Keys 
from selenium.common.exceptions import TimeoutException 

driver = web.Chrome() 
base_url = 'https://genius.com/artists/Stormzy' 
driver.get(base_url) 

# Open modal 
driver.find_element_by_xpath('//div[normalize-space()="Show all songs by Stormzy"]').click() 
song_locator = By.CSS_SELECTOR, 'a.mini_card.mini_card--small' 
# Wait for first XHR complete 
wait(driver, 10).until(EC.visibility_of_element_located(song_locator)) 
# Get current length of songs list 
current_len = len(driver.find_elements(*song_locator)) 

while True: 
    # Load new XHR until it's possible 
    driver.find_element(*song_locator).send_keys(Keys.END) 
    try: 
     wait(driver, 3).until(lambda x: len(driver.find_elements(*song_locator)) > current_len) 
     current_len = len(driver.find_elements(*song_locator)) 
    # Return full list of songs 
    except TimeoutException: 
     songs_list = [song.get_attribute('href') for song in driver.find_elements(*song_locator)] 
     break 

print(songs_list) 

這應該允許您請求新的XHR,直到長度歌曲列表變得不變,最後返回鏈接列表