從頁面獲取所有鏈接美麗的湯

我正在使用beautifulsoup獲取頁面中的所有鏈接。我的代碼是：從頁面獲取所有鏈接美麗的湯

import requests 
from bs4 import BeautifulSoup 


url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo' 
r = requests.get(url) 
html_content = r.text 
soup = BeautifulSoup(html_content, 'lxml') 

soup.find_all('href')

所有我得到的是：

[]

我怎樣才能得到該網頁上的所有HREF鏈接的列表？

來源

2017-09-29 user1922364

您正在通過find_all方法查找href標籤，不是屬性。

您需要找到<a>標籤，它們用於表示鏈接元素。

links = soup.find_all('a')

稍後，您可以訪問他們的href屬性是這樣的：

link = links[0]   # get the first link in the entire page 
url = link['href']  # get value of the href attribute 
url = link.get('href') # or like this

來源

2017-09-29 14:11:41 Anonta

但是當我這樣做，我只是得到第一個鏈接： http://www.acontecaeventos.com.br/ 我應該做一個for循環，讓他們都？ – user1922364

'links = soup.find_all（'a'）'給你一個所有鏈接的列表。我在答案的底部代碼中使用了第一個鏈接作爲示例。是的，循環鏈接列表來訪問所有找到的鏈接。 – Anonta

更換你的最後一行：

links = soup.find_all('a')

通過該行：

links = [a.get('href') for a in soup.find_all('a', href=True)]

將報廢所有的a標籤，並且對於每個a標籤，它會將href屬性附加到鏈接列表。

如果您想了解更多關於[]之間的for循環，請閱讀List comprehensions。

來源

2017-10-03 14:27:43 wbwlkr

從頁面獲取所有鏈接美麗的湯

回答

相關問題