使用BeautifulSoup提取圖像標題和圖像URL

我想從使用BeautifulSoup的文章中提取圖像url和圖像標題。我可以從前面和後面的HTML中分離文章的圖片url和圖片標題，但我無法弄清楚如何將這兩個html標籤分開。這裏是我的代碼：使用BeautifulSoup提取圖像標題和圖像URL

from bs4 import BeautifulSoup 
import requests 
url = 'http://www.prnewswire.com/news-releases/dutch-philosopher- 
koert-van-mensvoort-founder-of-the-next-nature-network-writes-a- 
letter-to-humanity-619925063.html' 
r = requests.get(url) 
html = r.text 
soup = BeautifulSoup(html, 'lxml') 
links = soup.find_all('div', {'class': 'image'})

我試圖提取的兩節是src =和title =節。任何想法如何完成這兩個解析將不勝感激。

來源

2017-04-20 Bill Orton

from bs4 import BeautifulSoup 
import requests 
url = 'http://www.prnewswire.com/news-releases/dutch-philosopher-koert-van-mensvoort-founder-of-the-next-nature-network-writes-a-letter-to-humanity-619925063.html' 
r = requests.get(url) 
html = r.text 
soup = BeautifulSoup(html, 'lxml') 
links = soup.find_all('div', {'class': 'image'}) 
print [i.find('img')['src'] for i in links] 
print [i.find('img')['title'] for i in links]

來源

2017-04-20 18:03:20

@Bill如果它解決了您的問題。請接受答案 –

這工作完美。非常感謝你。 –

正確的標記是'html5lib'而不是'lxml'，用於'xml' –

嘗試以下提取所有圖像標記

img = soup.findAll('img') 
#depending on how many images are here you will probably need to loop through img 
src = img.get('src') 
title = img.get('title')

來源

2017-04-20 18:02:05

晚的答案，但你可以使用：

from bs4 import BeautifulSoup 
import requests 
url = 'http://www.prnewswire.com/news-releases/dutch-philosopher-koert-van-mensvoort-founder-of-the-next-nature-network-writes-a-letter-to-humanity-619925063.html' 
r = requests.get(url) 
html = r.text 
soup = BeautifulSoup(html, "html5lib") 
links = soup.find_all('div', {'class': 'image'}) 
if links: 
    print(links[0].find('img')['src']) 
    print(links[0].find('img')['title'])

輸出：

http://mma.prnewswire.com/media/491859/Koert_van_Mensvoort.jpg?w=950

荷蘭哲學家科特·凡·門斯沃特 - 下一步自然網絡和「下一步自然」的研究員技術在埃因霍溫大學的創始人 - 支持國際地球日寫了一個「信人道」。（PRNewsfoto/Next Nature Network）

來源

2017-04-20 18:12:31

使用BeautifulSoup提取圖像標題和圖像URL

回答

相關問題