我的刮板工作正常,下載圖像並在數據庫中註冊項目,但我也希望他們的本地路徑被保存到我的MySQL數據庫中,我不知道如何繼續。Scrapy - 如何存儲下載圖像的本地路徑?
我的文檔閱讀:
When the images are downloaded another field (images) will be populated with the results.
與下面的代碼,路徑不會被保存,我得到這個錯誤:
return self._values[key]
exceptions.KeyError: 'images'
下面是摘錄我的代碼:
items.py:
image_urls = Field()
images = Field()
my_spider.py:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from project.items import ArtistItem
class MySpider(BaseSpider):
name = 'XXX'
allowed_domains = ['XXX']
start_urls = [
"XXX",
"XXX"
]
def parse(self, response):
x = HtmlXPathSelector(response)
artist = ArtistItem()
artist['url'] = response.url
artist['name'] = x.select("//h1/text()").extract()
artist['city'] = x.select("//span[@class='profile_location']/text()").extract()
artist['style'] = x.select("//span[@class='profile_genre']/text()").extract()
image_urls = x.select('/html/body/div[4]/div/div/div[2]/div[2]/div/a/img/@src').extract()
artist['image_urls'] = ["http:" + x for x in image_urls]
return artist
pipelines.py:
from scrapy.http import Request
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
import MySQLdb
import MySQLdb.cursors
import sys
class ProjectPipeline(object):
def __init__(self):
db = MySQLdb.connect(host='localhost', user='XXX', passwd='XXX', db='XXX', charset='utf8',
use_unicode=True)
self.c = db.cursor()
self.c.connection.autocommit(True)
def process_item(self, item, spider):
try:
self.c.execute("""INSERT INTO artist (name, city, style, image_url)
VALUES (%s, %s, %s, %s)""",
(item['name'][0],
item['city'][0],
item['style'][0],
item['images'][0]['path'],
))
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)
return item
什麼我缺少的parse()函數?
在此先感謝。
你似乎沒有在任何地方處理字段圖像(這是持有結果)。 – 2013-05-03 22:28:57
你是對的,但我不知道如何處理它:artist ['images'] = ??? – bsfoo116 2013-05-03 23:11:33