2017-03-07 102 views
2

我試圖用Selenium生成一個URL列表。 我希望用戶瀏覽檢測過的瀏覽器並最終創建他訪問的URL列表。使用Selenium生成一個URL列表

我發現屬性「current_url」可以幫助做到這一點,但我沒有找到一種方法來知道用戶點擊了一個鏈接。

In [117]: from selenium import webdriver 

In [118]: browser = webdriver.Chrome() 

In [119]: browser.get("http://stackoverflow.com") 

--> here, I click on the "Questions" link. 

In [120]: browser.current_url 

Out[120]: 'http://stackoverflow.com/questions' 

--> here, I click on the "Jobs" link. 

In [121]: browser.current_url 

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab' 

任何提示讚賞!

謝謝

回答

2

是不是真的要監視的用戶在硒做一個正式的方式。你唯一能做的就是啓動驅動程序,然後運行一個不斷檢查driver.current_url的循環。但是,我不知道退出這個循環的最佳方法是什麼,因爲我不知道你的用法是什麼。也許你可以試試:

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 

如果你沒有對如何結束這個循環什麼想法,我建議要麼將用戶導航到一個URL,將打破循環,如http://www.endseleniumcheck.com,並將其添加代碼如下:

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url == 'http://www.endseleniumcheck.com': 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 

或者,如果你想得到狡猾,你可以在用戶退出瀏覽器時終止循環。您可以通過與psutil庫(pip install psutil)監測的進程ID做到這一點:

from selenium import webdriver 
import psutil 


urls = [] 

driver = webdriver.Firefox() 
pid = driver.binary.process.pid 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if pid not in psutil.pids(): 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 
+0

非常感謝您!它會做的。就我個人而言,我最終使用了try/catch結構來處理瀏覽器出口(拋出異常)。這不是乾淨的,但足夠我所要做的。 – reike