2017-10-12 1069 views
0

所以我想使用BeautifulSoup和Python第一次做網頁抓取。我試圖刮掉頁面是:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172爲什麼我沒有獲得領域的價值而不是領域本身?

client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
page_html = client.read() 
client.close() 
page_soup = soup(page_html) 

identification = page_soup.find('div', {'data-bind':'text: name'}) 
print(identification.text) 

當我這樣做,我只是得到一個空字符串。如果我打印出簡單的標識變量,我得到:

<div class="col-xs-7" data-bind="text: name"></div> 

This is the line of html that I am trying to get the value of, as you can see there is a value A LEBLANC there in the tag

+2

這是一個Ajax驅動的網站,所有數據被加載的Javascript。 –

回答

0

你可以試試這個代碼:

from selenium import webdriver 

driver=webdriver.Chrome() 

browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 

find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]') 

print(find.text) 

輸出:

A LEBLANC 
+0

這裏是你如何找到:) https://pasteboard.co/GOCOeBP.png –

0

有幾種方法你可以達到同樣的目標。但是,我在腳本中使用了選擇器,這很容易理解,並且除非該網站的html結構發生重大變化,否則就不會有突破的機會。試試這個。

from selenium import webdriver 
from bs4 import BeautifulSoup 

driver = webdriver.Chrome() 
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
soup = BeautifulSoup(driver.page_source,"lxml") 
driver.quit() 
item_name = soup.select("[data-bind$='name']")[0].text 
print(item_name) 

結果:

A LEBLANC 

順便說一句,你啓動的方式也將工作:

from selenium import webdriver 
from bs4 import BeautifulSoup 

driver = webdriver.Chrome() 
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
soup = BeautifulSoup(driver.page_source,"lxml") 
driver.quit() 
item_name = soup.find('div', {'data-bind':'text: name'}).text 
print(item_name)