2016-03-01 171 views
1

我在搜索房地產數據。在用JavaScript硒做了出色的工作產生的站點:你發現有Python - Selenium:在Find_elements_by()上搜索帶有循環的AngularJS元素

driver.find_elements_by... 

緩繳全部的相關信息,並循環的標籤,但在這site,該列表按角JS生產。我嘗試了同樣的方法:

for article in driver.find_elements_by_css_selector("div.property.ng-scope"): 
    do something 

我想通了,我必須讓我的webdriver(phantomJS)單擊通向單獨列表網站的鏈接:

linkbase = article.find_element_by_css_selector("div.info.clear.ng-scope") 
link = linkbase.find_element_by_tag_name('a') 
link.click() 

然後webdriver的僅僅是指出對該網站,我可以得到我想要的所有信息一個清單

只要通過一個運行結束,我得到以下錯誤:

> Message: {"errorMessage":"Element does not exist in cache","request":{"headers": 
{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close"," 
Content-Length":"142","Content-Type":"application/json;charset=UTF-8","Host":"12 
7.0.0.1:56577","User-Agent":"Python-urllib/3.4"},"httpVersion":"1.1","method":"P 
OST","post":"{\"sessionId\": \"f9ec2c10-dfd9-11e5-9d4c-3bbe8f5bf7c0\", \"using\" 
: \"css selector\", \"id\": \":wdc:1456856343349\", \"value\": \"div.info.clear. 
ng-scope\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"elemen 
t","directory":"/","path":"/element","relative":"/element","port":"","host":""," 
password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/ele 
ment","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/f9ec2c10-dfd9- 
11e5-9d4c-3bbe8f5bf7c0/element/:wdc:1456856343349/element"}} 

包含頁面上的鏈接的元素是:

<a ng-href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532" ng-click="beforeOpen(i.iterator, i.regionTip)" class="title" href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532"> 
<span class="name ng-binding"> ... </a> 

這僅僅是標題文字的每個列表。我確實在this answer之後設置了用戶代理,即使它沒有出現在錯誤中。此外,我等待周圍的元素加載之前:

wait = WebDriverWait(driver, getSearchResults_CZ.waiting) 
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.content"))) 

我要的是分析所有這些屬性元素,通過列表的鏈接保存到一個列表,然後循環,打開每個環節與driver.get( )我知道,通過點擊鏈接,驅動程序的網址發生了變化,但我認爲一旦文章列表已經建立了find_elements_by,它將作爲一個穩定的參考點。通過搜索「a」標籤訪問鏈接,並調用get_attribute('href')在這種情況下無法使用角度js框架。我沒有看到什麼?

編輯: 如回答,沒有.click()的get_attribute是正確的路要走。我原來的錯誤與CSS選擇器有關:我一直在使用「div [class^='property']」並得到了完全不同的鏈接。必須找到我以前從未見過的另一個元素。

回答

1

等待至少一個「屬性」 可見然後搶鏈接:

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

driver = webdriver.Firefox() 
driver.get("http://www.sreality.cz/hledani/prodej/domy?region=jemnice") 
driver.maximize_window() 

wait = WebDriverWait(driver, 10) 
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "property"))) 

links = [link.get_attribute("href") for link in driver.find_elements_by_css_selector("div.property div.info a")] 
print(links) 

driver.close() 

爲我工作。

+0

正如它對我來說......不是點擊是正確的路要走。否則Selenium會丟失它應該循環的webobjects。 – Thanados