我正在使用Scrapy(與SitemapSpider蜘蛛)構建www.apkmirror.com的刮板。到目前爲止以下工作:如何填充作爲字典的scrapy.Field
DEBUG = True
from scrapy.spiders import SitemapSpider
from apkmirror_scraper.items import ApkmirrorScraperItem
class ApkmirrorSitemapSpider(SitemapSpider):
name = 'apkmirror-spider'
sitemap_urls = ['http://www.apkmirror.com/sitemap_index.xml']
sitemap_rules = [(r'.*-android-apk-download/$', 'parse')]
if DEBUG:
custom_settings = {'CLOSESPIDER_PAGECOUNT': 20}
def parse(self, response):
item = ApkmirrorScraperItem()
item['url'] = response.url
item['title'] = response.xpath('//h1[@title]/text()').extract_first()
item['developer'] = response.xpath('//h3[@title]/a/text()').extract_first()
return item
其中ApkMirrorScraperItem
在items.py
定義如下:
class ApkmirrorScraperItem(scrapy.Item):
url = scrapy.Field()
title = scrapy.Field()
developer = scrapy.Field()
,如果我使用命令
scrapy crawl apkmirror-spider -o data.json
從項目目錄運行它產生的JSON輸出
是JSON字典數組,其密鑰爲url
,title
和developer
,以及t他將相應的字符串作爲值。我想不過來修改這個,從而使developer
值本身就帶有name
領域的字典,這樣我可以這樣來填充它:
item['developer']['name'] = response.xpath('//h3[@title]/a/text()').extract_first()
不過,如果我試試這個,我得到KeyError
小號,也如果我初始化developer
的Field
(這是dict
根據https://doc.scrapy.org/en/latest/topics/items.html#item-fields)爲developer = scrapy.Field(name=None)
。我怎麼去解決這個問題?