2016-08-01 143 views
0

我想抓取鏈接圖像:「http://vnexpress.net/photo/cuoc-song-do-day/nguoi-trung-quoc-ra-be-boi-danh-mat-chuoc-tranh-nong-3445592.html」,但代碼只抓取一張圖片(在我的計算機中)並抓取所有圖片(在我的朋友計算機中)。普萊舍請幫我scrapy只能抓取1張圖片

import scrapy 

from scrapy.contrib.spiders import Rule, CrawlSpider 
from scrapy.contrib.linkextractors import LinkExtractor 
from imgur.items import ImgurItem 

class ImgurSpider(CrawlSpider): 
name = 'imgur' 
allowed_domains = ['vnexpress.net'] 
start_urls = ['http://vnexpress.net/photo/cuoc-song-do-day/nguoi-trung-quoc-ra-be-boi-danh-mat-chuoc-tranh-nong-3445592.html'] 
# rules = [Rule(LinkExtractor(allow=['/*']), 'parse123')] 

def parse(self, response): 
    image = ImgurItem() 
    # image['title'] = response.xpath(\ 
    # "//img[data-notes-url=""]").extract() 
    rel = response.xpath("//div[@id='article_content']//img/@src").extract() 
    image['image_urls'] = [rel[0]] 
    return image 

回答

0
rel = response.xpath("//div[@id='article_content']//img/@src").extract() 
image['image_urls'] = [rel[0]] 

你只需要一個通過指定[0]索引鏈接。 嘗試

image['image_urls'] = rel 

您也可以拆分您的代碼以URL解析功能,並下載圖像的回調。

+0

哦,是的,謝謝 –

+0

太棒了!你能接受答案嗎? – Huxwell