2017-04-05 127 views
0

我試圖刮一個特定的網站片段。我希望能得到:AttributeError:'str'對象沒有屬性'後代'

<div class="inhoudsindicatie"><p><span class="hl0 highlightColor0">HR</span>: art. 81RO.</p></div> 

特別是「藝術81RO」的一部分。

from selenium import webdriver 
from bs4 import BeautifulSoup as soup 
driver.get('http://uitspraken.rechtspraak.nl/inziendocument?id=ECLI:NL:HR:2014:3004&showbutton=true&keyword=HR%3a') 
page=soup(driver.page_source, "html.parser") 
details=soup.findAll("span",{"class":"hl0 highlightColor0"}) 

它返回:

AttributeError: 'str' object has no attribute 'descendants' 

這意味着什麼關於我的代碼?我讀了關於後代的一般信息,我很確定我不明白。

(我的主要興趣是在瞭解問題,解決它是次要的,但當然高度讚賞)

+1

你忘了定義你的dirver,如:'司機= webdriver.Firefox()' –

+0

爲什麼你不用urllib2或請求嘗試? –

回答

1

這爲我工作:

import time 
from selenium import webdriver 
from bs4 import BeautifulSoup as soup 
driver = webdriver.Chrome("/path/to/chromedriver") 
driver.get('http://uitspraken.rechtspraak.nl/inziendocument?id=ECLI:NL:HR:2014:3004&showbutton=true&keyword=HR%3a') 
time.sleep(5) 
page = soup(driver.page_source, "html.parser") 
details = page.select_one("span.hl0.highlightColor0").find_parent().get_text() 
print(details) 
driver.quit() 

# output: HR: art. 81RO. 

但由於您使用的硒,無論如何,爲什麼不堅持呢?

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait 

driver = webdriver.Chrome("/path/to/chromedriver") 
driver.get('http://uitspraken.rechtspraak.nl/inziendocument?id=ECLI:NL:HR:2014:3004&showbutton=true&keyword=HR%3a') 
wait = WebDriverWait(driver, 10) 
xpath = "//p/span[contains(@class, 'highlightColor0') and contains(@class, 'hl0')]/.." 
details = wait.until(EC.visibility_of_element_located((By.XPATH, xpath))) 
print(details.text) 
driver.quit() 

# output: HR: art. 81RO. 

如果你不想 'HR:' 部分,你可以將其刪除:

details.split('HR: ')[1] 

# output: art. 81RO.