2017-06-17 200 views
0

我無法找到Imgur相冊中的所有鏈接。BeautifulSoup在Imgur上找到所有圖片鏈接

下面是從imgur的HTML:

<div class="post-image">... 
<a href="//i.imgur.com/P1VMco8.png" class="zoom"><img src="//i.imgur.com/P1VMco8.png" alt="" itemprop="contentURL" /> 

如何提取網頁中唯一的HREF?我使用下面的代碼獲取所有內容。

with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    r = f.read() 
    soup = BeautifulSoup(r,'lxml') 
    result = soup.select(".post-image a") 

回答

1

下面的代碼打印所有圖片鏈接:

import urllib 
from bs4 import BeautifulSoup 
with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    soup = BeautifulSoup(f.read(),'lxml') 
for image in soup.select(".post-image"): 
    print(image.a["href"]) 

如果你正在尋找只有第一.post-image然後做

import urllib 
from bs4 import BeautifulSoup 
with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    soup = BeautifulSoup(f.read(),'lxml') 
print(soup.select(".post-image")[0].a["href"]) 
+0

謝謝。稍作修改,現在我有一個圖像url的列表 – DatCra