2016-11-08 48 views
0

即時通訊使用beautifulsoup4解析網頁,並使用此代碼如何解決,找出兩個的各個環節(Beautifulsoup,蟒蛇)

#Collect links from 'new' page 
pageRequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') 
soup = BeautifulSoup(pageRequest.content, "html.parser") 
links = soup.select("div.turbolink_scroller a") 

allProductInfo = soup.find_all("a", class_="name-link") 
print allProductInfo 

linksList1 = [] 
for href in allProductInfo: 
    linksList1.append(href.get('href')) 

print(linksList1) 

linksList1打印兩各環節的收集所有的HREF值。我相信這是因爲它從標題以及項目顏色中獲取鏈接。我已經嘗試了一些東西,但不能讓BS只解析標題鏈接,並且每個鏈接都有一個列表,而不是兩個鏈接。我想象它真的很簡單,但我很想念它。在此先感謝

+1

make linksList1 a set()而不是list() –

+0

非常感謝你 – Harvey

回答

0
alldiv = soup.findAll("div", {"class":"inner-article"}) 
for div in alldiv: 
    linkList1.append(div.h1.a['href']) 
0

這個代碼將會給你的結果沒有得到重複結果 (也使用set()可能是一個好主意,因爲@Tarum古普塔) 但我改變了你爬

import requests 
from bs4 import BeautifulSoup 

#Collect links from 'new' page 
pageRequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') 
soup = BeautifulSoup(pageRequest.content, "html.parser") 
links = soup.select("div.turbolink_scroller a") 

# Gets all divs with class of inner-article then search for a with name-link class 
that is inside an h1 tag 
allProductInfo = soup.select("div.inner-article h1 a.name-link") 
# print (allProductInfo) 

linksList1 = [] 
for href in allProductInfo: 
    linksList1.append(href.get('href')) 

print(linksList1) 
0
set(linksList1)  # use set() to remove duplicate link 
list(set(linksList1)) # use list() convert set to list if you need