我使用這個代碼從鏈接https://website.grader.com/results/www.dubizzle再殺一些數據.com。由於在15秒的加載後,我想要提取標籤的實際腳本加載,因此有人建議我使用selenuim在代碼中引入延遲。因此,我用這個代碼錯誤「服務」對象有沒有屬性「過程」,而使用python美麗的湯提取硒
的代碼如下
#!/usr/bin/python
import urllib
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string
driver = webdriver.Firefox()
driver.get('https://website.grader.com/results/dubizzle.com')
time.sleep(25)
html = driver.page_source
soup = BeautifulSoup(html)
# print soup
Sizeofweb=""
try:
Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
print Sizeofweb.get_text().encode("utf-8")
except StandardError as e:
converted_date="Error was {0}".format(e)
print converted_date
這我提取HTML的部分如下
快:https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0
<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>
錯誤我我得到的是:
Traceback (most recent call last):
File "ahmed.py", line 20, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 140, in __init__
self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x7f65a1ccbe10>> ignored
嗨_info_;我編輯了你的問題來提高可讀性,例如我修正了一些拼寫錯誤。請記住,在這個網站上,我們鼓勵你[編輯]並重新編輯你的問題,使其儘可能清晰和有用;這可以幫助你獲得答案,並幫助其他人解決類似的問題。 –