2016-09-19 59 views
0

我想用類似的結構刮網站的內容嵌套For循環不等實體

https://www.wellstar.org/locations/pages/default.aspx

使用所提供的網站,作爲一個框架,我想提取位置的名稱和與該位置相關的標題。我希望能夠產生如下:

WellStar醫院

WELLSTAR亞特蘭大MEDICAL CENTER

WellStar醫院

WELLSTAR亞特蘭大醫療中心南

...

WellStar Health Parks

Acworth的衛生PARK

...

至此我已嘗試嵌套for循環:

for type in soup.find_all("h3",class_="WebFont SpotBodyGreen"): 
    for name in soup.find_all("div",class_="PurpleBackgroundHeading"): 
     print(type.text, name.text) 

上面for loop返回由於每個名稱重複與每個類型成對呈現無論在網站上。不管是以代碼和/或推薦的資源來處理這個任務,任何幫助都將不勝感激。

回答

1

您需要一種按名稱對位置進行分組的方法。對於這一點,我們每個塊分開,讓收集到一本字典的名稱和地點:

from pprint import pprint 

import requests 
from bs4 import BeautifulSoup 

url = "https://www.wellstar.org/locations/pages/default.aspx" 
response = requests.get(url) 
soup = BeautifulSoup(response.content, "html.parser") 

d = {} 
for row in soup.select(".WS_Content > .WS_LeftContent > table > tr"): 
    title = row.h3.get_text(strip=True) 

    d[title] = [item.get_text(strip=True) for item in row.select(".PurpleBackgroundHeading a")] 

pprint(d) 

打印(適合打印用pprint()):

{'WellStar Community Hospice': ['Tranquility at Cobb Hospital', 
           'Tranquility at Kennesaw Mountain'], 
'WellStar Health Parks': ['Acworth Health Park', 'East Cobb Health Park'], 
'WellStar Hospitals': ['WellStar Atlanta Medical Center', 
         'WellStar Atlanta Medical Center South', 
         'WellStar Cobb Hospital', 
         'WellStar Douglas Hospital', 
         'WellStar Kennestone Hospital', 
         'WellStar North Fulton Hospital', 
         'WellStar Paulding Hospital', 
         'WellStar Spalding Regional Hospital', 
         'WellStar Sylvan Grove Hospital', 
         'WellStar West Georgia Medical Center', 
         'WellStar Windy Hill Hospital'], 
'WellStar Urgent Care Centers': ['WellStar Urgent Care in Acworth', 
            'WellStar Urgent Care in Kennesaw', 
            'WellStar Urgent Care in Marietta - Delk ' 
            'Road', 
            'WellStar Urgent Care in Marietta - East ' 
            'Cobb', 
            'WellStar Urgent Care in Marietta - ' 
            'Kennestone', 
            'WellStar Urgent Care in Marietta - Sandy ' 
            'Plains Road', 
            'WellStar Urgent Care in Smyrna', 
            'WellStar Urgent Care in Woodstock']} 
+0

你能解釋一下什麼是在'd執行[title] = [item.get_text(strip = True)for row.select(「。PurpleBackgroundHeading a」)]'line?我懷疑你是在哪裏加入字典的標題密鑰的價值?如果是這樣,我將如何去爲每個鍵添加另一個值。例如,我將如何將每個位置的地址添加到字典中? – Daniel

+0

@丹尼爾好吧,如果您需要進一步的幫助,請將其制定爲單獨的問題!謝謝。 – alecxe