我可以下載懶惰的加載圖片嗎？

我想使用的urllib下載從一些旅行圖片，但所有我得到從HTML中的src領域的網址是this 我可以下載懶惰的加載圖片嗎？

我做了一些研究，我發現，那些懶惰的負載圖像...有什麼辦法可以下載它們嗎？

2016-06-07 WisdomPill

你給的鏈接不起作用 – BradTheBrutalitist

對不起嘗試用這種https://www.tripadvisor.it/Restaurant_Review-g3174493-d3164947-Reviews-Le_Ciaspole-Tret_Fondo_Province_of_Trento_Trentino_Alto_Adige.html – WisdomPill

你可以使用超鏈接，或者您可以點擊左鍵並進行檢查，然後在元素頁面中找到該圖片。 – BradTheBrutalitist

您可以從Javascript使用Beautiful Soup和json模塊提取圖像列表，然後遍歷列表，並檢索您感興趣的圖像

編輯：

的問題是，圖像具有相同的名稱，因此它們被覆蓋。獲取前三張圖像很簡單，但在傳送帶打開之前，不會加載對傳送帶中其他圖像的引用，因此更加棘手。對於某些圖像，您可以通過用「photo-w」替換路徑中的「photo-s」來找到更高分辨率的版本，但要弄清楚需要深入研究Javascript邏輯。

import urllib, re, json 
from bs4 import BeautifulSoup as bs 

def img_data_filter(tag): 
    if tag.name == "script" and tag.text.strip().startswith("var lazyImgs"): 
     return True 
    return False 

response = urllib.urlopen("https://www.tripadvisor.it/Restaurant_Review-g3174493-d3164947-Reviews-Le_Ciaspole-Tret_Fondo_Province_of_Trento_Trentino_Alto_Adige.html") 
soup = bs(response.read(), 'html.parser') 
img_data = soup.find(img_data_filter) 

js = img_data.text 
js = js.replace("var lazyImgs = ", '') 
js = re.sub(r";\s+var lazyHtml.+", '', js, flags=re.DOTALL) 

imgs = json.loads(js) 
suffix = 1 

for img in imgs: 
    img_url = img["data"] 

    if not "media/photo-s" in img_url: 
     continue 

    img_name = img_url[img_url.rfind('/')+1:-4] 
    img_name = "%s-%03d.jpg" % (img_name, suffix) 
    suffix += 1 

    urllib.urlretrieve(img_url, img_name)

來源

2016-06-08 07:37:41 flesk

謝謝，但我想爲餐廳下載幾張圖片。 – WisdomPill

你的算法只能得到其中的一個......帶有鏈接「Tutte le foto dei visitatori」的那個......你能向我解釋如何得到它們的前3或4個嗎？爲什麼你的算法不下載它們？他們是不是也是圖像？ – WisdomPill

非常感謝您......其實我已經自己調整過它，但是我認爲您的編輯效果更好。 – WisdomPill

我可以下載懶惰的加載圖片嗎？

回答

相關問題