2013-03-24 63 views
4

我正在嘗試編寫一個Python腳本來下載圖像並將其設置爲我的壁紙。不幸的是,機械化文檔很差。我的腳本正確地跟蹤鏈接,但我很難將圖像保存在我的電腦上。從我研究的內容來看,.retrieve()方法應該完成這項工作,但是如何指定文件應該下載到哪裏的路徑?這裏是我有...使用Python下載圖像Mechanize

def followLink(browser, fixedLink): 
    browser.open(fixedLink) 

if browser.find_link(url_regex = r'1600x1200'): 

    browser.follow_link(url_regex = r'1600x1200') 

elif browser.find_link(url_regex = r'1400x1050'): 

    browser.follow_link(url_regex = r'1400x1050') 

elif browser.find_link(url_regex = r'1280x960'): 

    browser.follow_link(url_regex = r'1280x960') 

return 

回答

3

你可以通過打開IMG SRC的URL來獲取/下載圖像。

image_response = browser.open_novisit(img['src']) 

現在保存文件,只是使用的fopen:

with open('image_out.png', 'wb') as f: 
    f.write(image_response.read()) 
+0

它無法正常工作。我得到錯誤:「NameError:全局名稱'img'未定義」。並且圖像應該保存在哪裏? – XVirtusX 2013-03-24 01:58:35

+1

這裏第一行「img」表示它正在尋找「」標籤。將它指向包含要保存在其「src」屬性中的圖像的url的標記。另外,該圖像將保存在與腳本相同的文件夾中,如f.write語句所示。 – 2013-03-26 19:00:18

9
import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    data = browser.open(image['src']).read() 
    browser.back() 
    save = open(filename, 'wb') 
    save.write(data) 
    save.close() 

這可以幫助你從網頁上下載的所有圖像。至於解析html,你最好使用BeautifulSoup或lxml。下載只是讀取數據,然後將其寫入本地文件。您應該將自己的值分配給dir。這是你圖像存在的地方。

5

不知道爲什麼這個解決方案沒有出現,但你也可以使用mechanize.Browser.retrieve函數。也許這隻適用於mechanize的新版本,因此沒有提及?

無論如何,如果你想通過zhangyangyu縮短the answer,你可以這樣做:

import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    browser.retrieve(image['src'], filename) 
    browser.back() 

也請記住,你可能會希望把所有這一切都爲tryexcept塊像這樣的:

import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    try: 
     browser.retrieve(image['src'], filename) 
     browser.back() 
    except (mechanize.HTTPError,mechanize.URLError) as e: 
     pass 
     # Use e.code and e.read() with HTTPError 
     # Use e.reason.args with URLError 

當然,您需要根據自己的需要進行調整。也許你想讓它在遇到問題時被炸出來。這完全取決於你想要達到的目標。

0

這是非常糟糕的,但它 「作品」 對我來說,與0xc0000022l前面回答的

進口機械化,OS 從BeautifulSoup進口BeautifulSoup 進口的urllib2

def DownloadIMGs(url): # IMPORTANT URL WITH HTTP OR HTTPS 
    print "From", url 
    dir = 'F:\Downloadss' #Dir for Downloads 
    basicImgFileTypes = ['png','bmp','cur','ico','gif','jpg','jpeg','psd','raw','tif'] 

    browser = mechanize.Browser() 
    html = browser.open(url) 
    soup = BeautifulSoup(html) 
    image_tags = soup.findAll('img') 
    print "N Images:", len(image_tags) 
    print 
    #---------SAVE PATH 
    #check if available 
    if not os.path.exists(dir): 
     os.makedirs(dir) 
    #---------SAVE PATH 
    for image in image_tags: 

     #---------SAVE PATH + FILENAME (Where It is downloading) 
     filename = image['src'] 
     fileExt = filename.split('.')[-1] 
     fileExt = fileExt[0:3] 

     if (fileExt in basicImgFileTypes): 
      print 'File Extension:', fileExt 
      filename = filename.replace('?', '_') 
      filename = os.path.join(dir, filename.split('/')[-1]) 
      num = filename.find(fileExt) + len(fileExt) 
      filename = filename[:num] 
     else: 
      filename = filename.replace('?', '_') 
      filename = os.path.join(dir, filename.split('/')[-1]) + '.' + basicImgFileTypes[0] 
     print 'File Saving:', filename 
     #---------SAVE PATH + FILENAME (Where It is downloading) 

     #--------- FULL URL PATH OF THE IMG 
     imageUrl = image['src'] 
     print 'IMAGE SRC:', imageUrl 

     if (imageUrl.find('http://') > -1 or imageUrl.find('https://') > -1): 
      pass 
     else: 
      if (url.find('http://') > -1): 
       imageUrl = url[:len('http://')] 
       imageUrl = 'http://' + imageUrl.split('/')[0] + image['src'] 
      elif(url.find('https://') > -1): 
       imageUrl = url[:len('https://')] 
       imageUrl = 'https://' + imageUrl.split('/')[0] + image['src'] 
      else: 
       imageUrl = image['src'] 

     print 'IMAGE URL:', imageUrl 
     #--------- FULL URL PATH OF THE IMG 

     #--------- TRY DOWNLOAD 
     try: 
      browser.retrieve(imageUrl, filename) 
      print "Downloaded:", image['src'].split('/')[-1] 
      print 
     except (mechanize.HTTPError,mechanize.URLError) as e: 
      print "Can't Download:", image['src'].split('/')[-1] 
      print 
      pass 
     #--------- TRY DOWNLOAD 
    browser.close() 

DownloadIMGs('https://stackoverflow.com/questions/15593925/downloading-a-image-using-python-mechanize')