大多數pythonic的方式來檢索模塊內的圖像（HTML）

我正在嘗試編寫一個程序，該程序發送一個網站，然後產生一個天氣雷達的動畫網站的請求。然後，我抓取該頁面以獲取圖像URL（它們存儲在Java模塊中）並將它們下載到本地文件夾。我在許多雷達站和兩個雷達產品上迭代執行此操作。到目前爲止，我已經編寫了發送請求的代碼，解析html，並列出了圖像url。我似乎無法做的是在本地重命名並保存圖像。除此之外，我希望儘可能簡化這一點 - 這可能不是我目前爲止所做的。任何幫助1）獲取圖像下載到本地文件夾和2）指出我這樣做更pythonic的方式會很好。大多數pythonic的方式來檢索模塊內的圖像（HTML）

# import modules 
import urllib2 
import re 
from bs4 import BeautifulSoup 


##test variables## 
stationName = "KBYX" 
prod = ("bref1","vel1")       # a tupel of both ref and vel 
bkgr = "black" 
duration = "1" 
#home_dir = "/path/to/home/directory/folderForImages" 

##program## 

# This program needs to do the following: 
# read the folder structure from home directory to get radar names 
#left off here 
list_of_folders = os.listdir(home_dir) 
for each_folder in list_of_folders: 
    if each_folder.startswith('k'): 
    print each_folder 
# here each folder that starts with a "k" represents a radar station, and within each folder are two other folders bref1 and vel1, the two products. I want the program to read the folders to decide which radar to retrieve the data for... so if I decide to add radars, all I have to do is add the folders to the directory tree. 
# first request will be for prod[0] - base reflectivity 
# second request will be for prod[1] - base velocity 

# sample path: 
# http://weather.rap.ucar.edu/radar/displayRad.php?icao=KMPX&prod=bref1&bkgr=black&duration=1 

#base part of the path 
base = "http://weather.rap.ucar.edu/radar/displayRad.php?" 


#additional parameters 
call = base+"icao="+stationName+"&prod="+prod[0]+"&bkgr="+bkgr+"&duration="+duration 

#read in the webpage 
urlContent = urllib2.urlopen(call).read() 
webpage=urllib2.urlopen(call) 
#parse the webpage with BeautifulSoup 
soup = BeautifulSoup(urlContent) 
#print (soup.prettify())       # if you want to take a look at the parsed structure 


tag = soup.param.param.param.param.param.param.param #find the tag that holds all the filenames (which are nested in the PARAM tag, and 
                # located in the "value" parameter for PARAM name="filename") 
files_in=str(tag['value']) 

files = files_in.split(',')       # they're in a single element, so split them by comma 

directory = home_dir+"/"+stationName+"/"+prod[1]+"/" 
counter = 0 
for file in files:           # now we should THEORETICALLY be able to iterate over them to download them... here I just print them 
    print file

來源

2013-03-26 user2209220

我用這三種方法從互聯網上下載：

from os import path, mkdir 
from urllib import urlretrieve 

def checkPath(destPath): 
    # Add final backslash if missing 
    if destPath != None and len(destPath) and destPath[-1] != '/': 
     destPath += '/' 

    if destPath != '' and not path.exists(destPath): 
     mkdir(destPath) 
    return destPath 

def saveResource(data, fileName, destPath=''): 
    '''Saves data to file in binary write mode''' 
    destPath = checkPath(destPath) 
    with open(destPath + fileName, 'wb') as fOut: 
     fOut.write(data) 

def downloadResource(url, fileName=None, destPath=''): 
    '''Saves the content at url in folder destPath as fileName''' 
    # Default filename 
    if fileName == None: 
     fileName = path.basename(url) 

    destPath = checkPath(destPath) 

    try: 
     urlretrieve(url, destPath + fileName) 
    except Exception as inst: 
     print 'Error retrieving', url 
     print type(inst)  # the exception instance 
     print inst.args  # arguments stored in .args 
     print inst

還有一堆的例子here從各種網站

來源

2013-03-26 16:12:59 niroyb

也非常感謝。我會檢查這些 - 他們看起來很有希望，只需要一點時間來徹底瞭解它們。 – user2209220 2013-03-26 18:53:43

圖像下載到本地保存的圖像，像

import os 
IMAGES_OUTDIR = '/path/to/image/output/directory' 

for file_url in files: 
    image_content = urllib2.urlopen(file_url).read() 
    image_outfile = os.path.join(IMAGES_OUTDIR, os.path.basename(file_url)) 
    with open(image_outfile, 'wb') as wfh: 
     wfh.write(image_content)

如果您想重命名它們，請使用名稱你想要的不是os.path.basename（file_url）。

來源

2013-03-26 16:13:56 setrofim

太棒了！謝謝，這完全奏效。 – user2209220 2013-03-26 18:54:06

大多數pythonic的方式來檢索模塊內的圖像（HTML）

回答

相關問題