幾件事情在這裏說:
而不是使用html.text
,使用as recommended herehtml.content
。
爲什麼要在這裏使用lxml
? html.parser
應該沒問題。
不需要使用data-attribute
標籤:您可以使用h2.text
從h2中獲取文本。
一個更簡單的方法來收集商品標題是通過所有具有s-inline
類(商品標題)的<h2>
直接迭代:
from bs4 import BeautifulSoup
import requests
html = requests.get('http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=hp+monitors')
soup = BeautifulSoup(html.content , 'html.parser')
for h2 in soup.find_all('h2', class_='s-inline'):
print(h2.text)
輸出
HP 24ES 23.8-HP 24ES 23.8-inch THINNEST LED Monitor (Black)LED Monitor (Black)
HP 22es Display 54.6 cm, 21.5 Inch THINNEST IPS LED Backlit Monitor
HP 22KD 21.5-inch FULL HD LED Backlit Monitor (Black
HP 19KA 18.5-inch LED Backlit Monitor (Black)
HP 27es 27 Inches Display IPS LED Backlit Monitor (Full HD)
HP 21KD 20.7-inch FULL HD LED Backlit Monitor (Black)
LG 24MP88HV-S 24"IPS Slim LCD Monitor
Dell S Series S2415H 24-Inch Screen Full HD HDMI LED Monitor
Dell E1916HV 18.5-inch LED Monitor (Black)
HP 20KD 19.5-inch LED Backlit Monitor (Black)
Dell S2216H 21.5-Inch Full HD LED Monitor
HP V222 21.5" LED Widescreen Monitor (M1T37AA Black)
AlexVyan®-Genuine Accessory with 1 year warranty:= (38.1CM) 15 Inch LCD Monitor for HP, Dell, Lenovo, Pc Desktop Computer Only (Black)
Compaq B191 18.5-inch LED Backlit Monitor (Black)
HP 20WD 19.45-Inch LED Backlit Monitor
HP Compaq F191 G9F92AT 18.5-inch Monitor
此外,而不是使用粗體的內聯代碼,使用反引號是這樣的:
`codecode`將呈現爲codecode
編輯:
這裏,soup.find_all('h2')
會得到從頁面的所有H2標籤,但是亞馬遜的頁面也有其他的H2標籤元素比產品。我只注意到所有的產品都有s-inline
類,所以soup.find_all('h2', class_='s-inline")
只會從產品中獲取h2標籤。
Thankyou,> soup.find_all('h2',class _ ='s-inline')是如何工作的? – BoRRis
@BoRRis我編輯了我的答案來解釋它 – TrakJohnson