2012-07-05 78 views
2

所以我想獲得本頁面(nba團隊)的所有圖片。 http://www.cbssports.com/nba/draft/mock-draft使用Python從網頁中提取圖像鏈接

但是,我的代碼給了一點以上。它給我,

<a href="/nba/teams/page/ORL"><img src="http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png" alt="Orlando Magic" width="30" height="30" border="0" /></a> 

我怎樣才能縮短到只給我,http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png.

我的代碼:

import urllib2 
from BeautifulSoup import BeautifulSoup 
# or if your're using BeautifulSoup4: 
# from bs4 import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()) 

rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:] 

for row in rows: 
    fields = row.findAll("td") 
    if len(fields) >= 3: 
    anchor = row.findAll("td")[1].find("a") 
    if anchor: 
     print anchor 

回答

1

我知道這可能是「創傷」,但對於那些自動生成的頁面,在那裏你只是想抓住這該死的圖像了,再也沒有回來,咋正骯髒的正則表達式的需要所需的圖案往往是我的選擇(沒有美麗的湯的依賴是一個很大的優勢):

import urllib, re 

source = urllib.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read() 

## every image name is an abbreviation composed by capital letters, so... 
for link in re.findall('http://sports.cbsimg.net/images/nba/logos/30x30/[A-Z]*.png', source): 
    print link 


    ## the code above just prints the link; 
    ## if you want to actually download, set the flag below to True 

    actually_download = False 
    if actually_download: 
     filename = link.split('/')[-1] 
     urllib.urlretrieve(link, filename) 

希望這有助於!

1

爲了節省http://www.cbssports.com/nba/draft/mock-draft所有圖像,

import urllib2 
import os 
from BeautifulSoup import BeautifulSoup 
URL = "http://www.cbssports.com/nba/draft/mock-draft" 
default_dir = os.path.join(os.path.expanduser("~"),"Pictures") 
opener = urllib2.build_opener() 
urllib2.install_opener(opener) 
soup = BeautifulSoup(urllib2.urlopen(URL).read()) 
imgs = soup.findAll("img",{"alt":True, "src":True}) 
for img in imgs: 
    img_url = img["src"] 
    filename = os.path.join(default_dir, img_url.split("/")[-1]) 
    img_data = opener.open(img_url) 
    f = open(filename,"wb") 
    f.write(img_data.read()) 
    f.close() 

要在http://www.cbssports.com/nba/draft/mock-draft,上保存任何特定圖像使用

+0

因此,第一個沒有工作,但第二個沒有工作。 – user1497050 2012-07-05 18:57:59