2011-04-12 55 views
1

我可以使用Evenlet從網站上取消img,但未能將它們保存到國內目錄中。 以下是代碼。任何人都熟悉tasklets模型中的I/O操作?由於通過eventlet抓取和保存文件的問題

import pyquery 
import eventlet 
from eventlet.green import urllib2 

#fetch img urls............ works fine 

print "loading page..." 
html=urllib2.urlopen("http://www.meinv86.com/meinv/yuanchuangmeinvzipai/").read() 
print "Parsing urls..." 
d=pyquery.PyQuery(html) 
count=0 
urls=[] 
url='' 
for i in d('img'): 
count=count+1 
print i.attrib["src"] 
urls.append(i.attrib["src"]) 


def fetch(url): 
try: 
    print "start feteching %s" %(url) 
    urlfile = urllib2.urlopen(url) 
    size=int(urlfile.headers['content-length']) 
    print 'downloading %s, total file size: %d' %(url,size) 
    data = urlfile.read() 
    print 'download complete - %s' %(url) 

########################################## 
#file save just won't work 

    f=open("/head2/"+url+".jpg","wb") 
    f.write(body) 
    f.close() 
    print "file saved" 
########################################## 

    return data 

except: 
    print "fail to download..." 




pool = eventlet.GreenPool() 

for body in pool.imap(fetch, urls): 
    print "done" 

回答

0

確保url適合作爲文件名例如爲:

import hashlib 
import os 

def url2filename(url, ext=''): 
    return hashlib.md5(url).hexdigest() + ext # anything that removes '\/' 

# ... 
with open(os.path.join("/head2", url2filename(url, '.jpg')), 'wb') as f: 
    f.write(body) 
print "file saved" 

注意:您可能不希望將文件寫入到一個頂級目錄,如'/head2'。您可以考慮urllib.urlretrieve()

+0

太好了,非常感謝。這正是網址問題。 – user703661 2011-04-12 12:36:49