搶標籤屬性的值使用BS4

我使用BS4與Python3搶產品的細節上亞馬遜，這裏搜索是我的代碼：搶標籤屬性的值使用BS4

from bs4 import BeautifulSoup as BS 
import requests 

html = requests.get('http://www.amazon.in/s/ref=nb_sb_noss_2?url=search- 
alias%3Daps&field-keywords=hp+monitors') 

soup = BS(html.text , 'lxml') 
#print(soup.prettify()) 

for i in soup.find_all('li') : 
    print(i.get('id')) 
    h2_tag = i.h2 
    print(h2_tag.get('data-attribute')) 
    print("_____")

有了這個代碼我沒有得到數據屬性屬性的值h2標記。而值ID屬性李標籤出來了。任何人都可以告訴我我犯的錯誤。

來源

2017-06-14 BoRRis

幾件事情在這裏說：

而不是使用html.text，使用as recommended herehtml.content。
爲什麼要在這裏使用lxml？ html.parser應該沒問題。
不需要使用data-attribute標籤：您可以使用h2.text從h2中獲取文本。

一個更簡單的方法來收集商品標題是通過所有具有s-inline類（商品標題）的<h2>直接迭代：

from bs4 import BeautifulSoup 
import requests 

html = requests.get('http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=hp+monitors') 
soup = BeautifulSoup(html.content , 'html.parser') 

for h2 in soup.find_all('h2', class_='s-inline'): 
    print(h2.text)

輸出

HP 24ES 23.8-HP 24ES 23.8-inch THINNEST LED Monitor (Black)LED Monitor (Black) 
HP 22es Display 54.6 cm, 21.5 Inch THINNEST IPS LED Backlit Monitor 
HP 22KD 21.5-inch FULL HD LED Backlit Monitor (Black 
HP 19KA 18.5-inch LED Backlit Monitor (Black) 
HP 27es 27 Inches Display IPS LED Backlit Monitor (Full HD) 
HP 21KD 20.7-inch FULL HD LED Backlit Monitor (Black) 
LG 24MP88HV-S 24"IPS Slim LCD Monitor 
Dell S Series S2415H 24-Inch Screen Full HD HDMI LED Monitor 
Dell E1916HV 18.5-inch LED Monitor (Black) 
HP 20KD 19.5-inch LED Backlit Monitor (Black) 
Dell S2216H 21.5-Inch Full HD LED Monitor 
HP V222 21.5" LED Widescreen Monitor (M1T37AA Black) 
AlexVyan®-Genuine Accessory with 1 year warranty:= (38.1CM) 15 Inch LCD Monitor for HP, Dell, Lenovo, Pc Desktop Computer Only (Black) 
Compaq B191 18.5-inch LED Backlit Monitor (Black) 
HP 20WD 19.45-Inch LED Backlit Monitor 
HP Compaq F191 G9F92AT 18.5-inch Monitor

此外，而不是使用粗體的內聯代碼，使用反引號是這樣的：

`codecode`將呈現爲codecode

編輯：

這裏，soup.find_all('h2')會得到從頁面的所有H2標籤，但是亞馬遜的頁面也有其他的H2標籤元素比產品。我只注意到所有的產品都有s-inline類，所以soup.find_all('h2', class_='s-inline")只會從產品中獲取h2標籤。

來源

2017-06-23 12:26:24 TrakJohnson

Thankyou，> soup.find_all（'h2'，class _ ='s-inline'）是如何工作的？ – BoRRis

@BoRRis我編輯了我的答案來解釋它 – TrakJohnson

搶標籤屬性的值使用BS4

回答

相關問題