2016-11-17 77 views
0

我有一些問題與Python中的Selenium循環有關。實際上,我想遍歷由'driver.find_elements_by_id'跟蹤的鏈接列表,然後逐個點擊它們,但問題在於,每次點擊鏈接(代碼中的'linklist'),頁面刷新,因此有錯誤消息指示 '消息:元素引用已過時。元素不再附加到DOM或頁面已被刷新。'在Python中刷新硒循環頁面

我知道原因是因爲鏈接列表在點擊後消失了。但是,我通常在Selenium中如何迭代列表,即使頁面不再存在。我用'driver.back()',顯然它不起作用。

這行代碼後,錯誤信息彈出:

link.click() 

鏈表位於此URL(我要碰杯按鈕文件,然後下載的第一個文件的刷新頁面後顯示)'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001467373&type=10-K&dateb=201&owner=exclude&count=40'

有人可以看看這個問題嗎? 謝謝!

from selenium import webdriver 
from selenium.webdriver.support.ui import WebDriverWait 
import unittest 
import os 
import time 
from bs4 import BeautifulSoup 
from selenium.webdriver.common.keys import Keys 
import requests 
import html2text 



class LoginTest(unittest.TestCase): 
def setUp(self): 


    self.driver=webdriver.Firefox() 
    self.driver.get("https://www.sec.gov/edgar/searchedgar/companysearch.html") 


def test_Login(self): 
    driver=self.driver 

    cikID="cik" 
    searchButtonID="cik_find" 
    typeID="//*[@id='type']" 
    priorID="prior_to" 
    cik="00001467373" 
    Type="10-K" 
    prior="201" 
    search2button="//*[@id='contentDiv']/div[2]/form/table/tbody/tr/td[6]/input[1]" 


    documentsbuttonid="documentsbutton" 
    formbuttonxpath='//a[text()="d10k.htm"]' 


    cikElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(cikID)) 

    cikElement.clear() 
    cikElement.send_keys(cik) 


    searchButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(searchButtonID)) 
    searchButtonElement.click() 

    typeElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(typeID)) 
    typeElement.clear() 
    typeElement.send_keys(Type) 
    priorElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(priorID)) 
    priorElement.clear() 
    priorElement.send_keys(prior) 
    search2Element=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(search2button)) 
    search2Element.send_keys(Keys.SPACE) 
    time.sleep(1) 

    documentsButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(documentsbuttonid)) 
    a=driver.current_url 



    window_be1 = driver.window_handles[0] 
    linklist=driver.find_elements_by_id(documentsbuttonid) 


    with open("D:/doc2/"+"a"+".txt", mode="w",errors="ignore") as newfile: 


     for link in linklist: 

       link.click()    

       formElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(formbuttonxpath)) 
       formElement.click() 
       time.sleep(1) 

       t=driver.current_url 

       r = requests.get(t) 
       data = r.text 

       newfile.write(html2text.html2text(data)) 

       drive.back() 
       drive.back() 


def terdown(self): 
    self.driver.quit() 
if __name__=='__main__': 
unittest.main() 
+0

不當然,如果這是問題,但你在'for'循環中使用'drive.back()而不是'driver.back()' –

回答

3

您不應該使用web元素列表,而應使用鏈接列表。嘗試是這樣的:

linklist = [] 
for link in driver.find_elements_by_xpath('//h4[@class="title"]/a'): 
    linklist.append(link.get_attribute('href')) 

然後你就可以通過鏈接列表遍歷

for link in linklist: 
    driver.get(link) 
    # do some actions on page 

如果你想身體點擊各個環節,你可能需要使用

for link in linklist: 
    driver.find_element_by_xpath('//h4[@class="title"]/a[@href=%s]' % link).click() 
    # do some actions on page 
+0

嘿,再次感謝!它現在就像一種魅力! – SXC88

+0

也許你想看看這篇文章@Andersson ... http://stackoverflow.com/questions/40748555/python-threading-timer-set-time-limit-when-program-runs-out-of -time?noredirect = 1#comment68724872_40748555 – SXC88