2015-09-27 90 views
0

嗨我有一個名爲image.txt的文件,其中包含大約5,00,000圖像url.I想讀取url並下載圖像並將其保存在一個目錄中。如果圖像不可下載I想要打印異常並重新開始下載其他文件,我可以通過優化的方式來實現。python中的圖像下載

import sys 
import os 
import urllib 


def isValidFile(path): 
    if not os.path.isfile(path): 
     print "Path " + path + " doesn't exist! Aborting..." 
     exit(1) 


def isValidDir(path): 
    if not os.path.isdir(path): 
     print "Path " + path + " doesn't exist! Aborting..." 
     exit(1) 


def normalize(url): 
    url = url.split("/")[-1] 
    return url.split("\n")[0] 

# Execution Starts Here 
urls = sys.argv[1] 
isValidFile(urls) 

out_dir = sys.argv[2] 
isValidDir(out_dir) 

with open(urls) as url_array: 
    for url in url_array: 
     urllib.urlretrieve(url, os.path.join(out_dir,  normalize(url))) 

    print("Images Downloaded") 
+2

什麼是與你的現有代碼的問題?你有錯誤嗎? –

+2

我非常困惑你的代碼,以及你正在嘗試做什麼。你可以放置它並正確縮進(縮進不正確,尤其是在後半部分) – Zizouz212

回答

0

如果你想要一個純Python的解決方案,你可以試試這個:

import urllib 
import os 

def getImage(url, dest): 
    with open(dest, 'wb') as fh: 
     fh.write(urllib.urlopen(url).read()) 

for url in urlArray: 
    try: 
     getImage(url, os.path.basename(url)) 
    except Exception: 
     print "Error downloading {}".format(url) 
+0

什麼是urlArray – rhya

+0

它基本上是一個可迭代的(例如'list'),其中包含要下載的URL。例如'urlArray = ['http://www.somedomain.com/image1.jpg','http://www.anotherdomain.com/image1.jpg']'等等 – jorgeh