用於網頁瀏覽的美化工具不起作用？

我想從網站上刮取一些數據。這是html格式。我想湊字"No description for 632930413867".用於網頁瀏覽的美化工具不起作用？

HTML代碼：

<div class="col-xs-6 col-sm-6 col-md-6 col-lg-6"> 
    <table class="table product_info_table"> 
    <tbody> 
     <tr> 
     <td>GS1 Address</td> 
     <td>R.R. 1, Box 2, Malmo, NE 68040</td> 
     </tr> 
     <tr> 
     <td>Description</td> 
     <td> 
      <div id="read_desc"> 
      No description for 632930413867 
      </div> 
     </td> 
     </tr> 
    </tbody> 
    </table> 
</div>

和圖片src從這個網站

<div class="centered_image header_image"> 
<img src="https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL._SL160_.jpg" title="UPC 632930413867" alt="UPC 632930413867">

所以我用這個代碼

Baseurl = "https://www.buycott.com/upc/632930413867" 
uClient = '' 
while uClient == '': 
    try: 
     uClient = requests.get(Baseurl) 
     print("Relax we are getting the data...") 

    except: 
     print("Connection refused by the server..") 
     print("Let me sleep for 7 seconds") 
     time.sleep(7) 
     print("Was a nice sleep, now let me continue...") 
     continue 


page_html = uClient.content 

uClient.close() 
page_soup = soup(page_html, "html.parser") 

Productcontainer = page_soup.find_all("div", {"class": "row"}) 
link = page_soup.find(itemprop="image") 

print(Productcontainer) 

for item in Productcontainer: 
    print(link) 
    productdescription = Productcontainer.find("div", {"class": "product_info_table"}) 
    print(productdescription)

當我運行此代碼時，不顯示數據。我如何獲得描述和img src？

來源

2017-11-25 learner101

只有一個頁面上的每個（項目和產品描述）的實例，以便你可以去他們直接使用find（），就沒有必要在這種情況下使用find_all（）：

import requests 
from bs4 import BeautifulSoup as soup 

Baseurl = "https://www.buycott.com/upc/632930413867" 
uClient = '' 
while uClient == '': 
    try: 
     uClient = requests.get(Baseurl) 
     print("Relax we are getting the data...") 

    except: 
     print("Connection refused by the server..") 
     print("Let me sleep for 7 seconds") 
     time.sleep(7) 
     print("Was a nice sleep, now let me continue...") 
     continue 

page_html = uClient.content 
uClient.close() 

page_soup = soup(page_html, "html.parser") 
productdescription = page_soup.find("div", {"id": "read_desc"}).text 
link = page_soup.find("div", {"class": "centered_image header_image"}).find("img")['src'] 
print (productdescription) 
print (link)

輸出：

Relax we are getting the data... 

No description for 632930413867 

https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL._SL160_.jpg

來源

2017-11-25 15:35:06

如果你沒有找到關於使用beautifulsoup通過谷歌搜索任何東西，有一個很好的教程在這裏：
https://www.dataquest.io/blog/web-scraping-tutorial-python/
從你的問題，我得到了你在這個初學者的印象。如果不是這樣，請編輯問題並說明具體不瞭解的內容。

來源

2017-11-25 15:29:37 Ehsan

你只需要檢查HTML和標識按住要刮的數據標籤。
在這種情況下，圖像爲div.centered_image.header_image img，而div#read_desc爲描述。
與bs4 css selectors一個例子：

import requests 
from bs4 import BeautifulSoup 

baseurl = "https://www.buycott.com/upc/632930413867" 
page_html = requests.get(baseurl).content 
soup = BeautifulSoup(page_html, "html.parser") 
image = soup.select_one('div.centered_image.header_image img')['src'] 
description = soup.select_one('div#read_desc').text.strip() 

print(image) 
print(description)

https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL.SL160.jpg
爲632930413867

來源

2017-11-25 15:38:23

沒有描述這可以這樣來完成，以及：

import requests 
from bs4 import BeautifulSoup 

soup = BeautifulSoup(requests.get("https://www.buycott.com/upc/632930413867").text, "lxml") 
desc = soup.select("#read_desc")[0].text.strip() 
link = soup.select(".centered_image img")[0]['src'].strip() 
print("{}\n{}".format(desc,link))

輸出：

No description for 632930413867 
https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL._SL160_.jpg

來源

2017-11-26 19:09:12 SIM

用於網頁瀏覽的美化工具不起作用？

回答

相關問題