Scrapy - 如何存儲下載圖像的本地路徑？

我的刮板工作正常，下載圖像並在數據庫中註冊項目，但我也希望他們的本地路徑被保存到我的MySQL數據庫中，我不知道如何繼續。Scrapy - 如何存儲下載圖像的本地路徑？

我的文檔閱讀：

When the images are downloaded another field (images) will be populated with the results.

與下面的代碼，路徑不會被保存，我得到這個錯誤：

return self._values[key] 
    exceptions.KeyError: 'images'

下面是摘錄我的代碼：

items.py：

image_urls = Field() 
images = Field()

my_spider.py：

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

from project.items import ArtistItem 

class MySpider(BaseSpider): 

    name = 'XXX' 
    allowed_domains = ['XXX'] 
    start_urls = [ 
     "XXX", 
     "XXX" 
    ] 

    def parse(self, response): 
     x = HtmlXPathSelector(response) 

     artist = ArtistItem() 
     artist['url'] = response.url 
     artist['name'] = x.select("//h1/text()").extract() 
     artist['city'] = x.select("//span[@class='profile_location']/text()").extract() 
     artist['style'] = x.select("//span[@class='profile_genre']/text()").extract() 
     image_urls = x.select('/html/body/div[4]/div/div/div[2]/div[2]/div/a/img/@src').extract() 
     artist['image_urls'] = ["http:" + x for x in image_urls] 

     return artist

pipelines.py：

from scrapy.http import Request 
from scrapy.contrib.pipeline.images import ImagesPipeline 
from scrapy.exceptions import DropItem 
import MySQLdb 
import MySQLdb.cursors 
import sys 


class ProjectPipeline(object): 
    def __init__(self): 
     db = MySQLdb.connect(host='localhost', user='XXX', passwd='XXX', db='XXX', charset='utf8', 
          use_unicode=True) 

     self.c = db.cursor() 
     self.c.connection.autocommit(True) 


    def process_item(self, item, spider): 
     try: 
      self.c.execute("""INSERT INTO artist (name, city, style, image_url) 
         VALUES (%s, %s, %s, %s)""", 
          (item['name'][0], 
          item['city'][0], 
          item['style'][0], 
          item['images'][0]['path'], 
          )) 

     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 
      sys.exit(1) 

     return item

什麼我缺少的parse（）函數？
在此先感謝。

來源

2013-05-03 bsfoo116

你似乎沒有在任何地方處理字段圖像（這是持有結果）。 – 2013-05-03 22:28:57

你是對的，但我不知道如何處理它：artist ['images'] = ??? – bsfoo116 2013-05-03 23:11:33

啊哈。我讀了scrapy documentation on downloading images和the source file for images.py。

理論上你在做什麼應該可以工作，但創建自定義圖像管道可能更容易，該管道顯式地將保存的圖像路徑附加到每個項目。方便地，the example given does just that。 :)

一旦你實現這一點，那麼修改process_item在ProjectPipeline如下：

def process_item(self, item, spider): 
    try: 
     self.c.execute("""INSERT INTO artist (name, city, style, image_url) 
        VALUES (%s, %s, %s, %s)""", 
         (item['name'][0], 
         item['city'][0], 
         item['style'][0], 
         item['image_paths'], 
         )) 

    except MySQLdb.Error, e: 
     print "Error %d: %s" % (e.args[0], e.args[1]) 
     sys.exit(1) 

    return item

只記得更新你的settings.py文件指您的自定義圖像管線檔案，你應該很好去。

來源

2013-05-03 23:47:23 Talvalin

謝謝，但我有同樣的錯誤：'迴歸自我。_values [鍵] \t exceptions.KeyError：'images_urls''。另外，我想存儲下載文件的本地路徑（我已經編輯了我的問題來澄清這一點）。 – bsfoo116 2013-05-04 09:12:38

如果將固定字符串傳遞給insert語句，管道是否工作？如果你發佈了所有的蜘蛛和管道代碼，檢查這個會更容易。 – Talvalin 2013-05-04 12:46:28

如果我傳遞一個固定字符串，管道工作正常。我已更新我的問題以放置整個代碼。 – bsfoo116 2013-05-04 13:52:12

爲了將圖像保存到數據庫中，ITEM_PIPELINES設置中組件的優先級很重要。

例如，如果您使用MongoDB來存儲項目。這裏是你應該怎麼會有你的管道組件的優先級在你的settings.py

以上設置將確保圖像的處理和項目[「圖像」]被控制移到MongoDBPipeline用於存儲之前填充圖像信息。

你可以閱讀更多有關此文檔的最後一節中ITEM_PIPELINES設置優先級：http://doc.scrapy.org/en/latest/topics/item-pipeline.html

我花了小時圖了這一點，從而使得記在這裏，以便其爲他人面臨着同樣的問題很有幫助。

來源

2014-08-04 09:09:44

Scrapy - 如何存儲下載圖像的本地路徑？

回答

相關問題