我有一隻蜘蛛(下圖),我希望能夠每10天左右通過一次Cron作業來運行它,但是,每次我第一次運行它時都會運行它。它重寫字段,而不是僅將項目追加到CSV中的相應字段。我該如何做到這一點,以便無論我運行多少次,頂部只有一組字段標題,並且下面的所有數據都包含在其中。Scrapy CSV輸出重複字段
import scrapy
class Wotd(scrapy.Item):
word = scrapy.Field()
definition = scrapy.Field()
sentence = scrapy.Field()
translation = scrapy.Field()
class WotdSpider(scrapy.Spider):
name = 'wotd'
allowed_domains = ['www.spanishdict.com/wordoftheday']
start_urls = ['http://www.spanishdict.com/wordoftheday/']
custom_settings = {
#specifies exported fields and their order
'FEED_EXPORT_FIELDS': ['word','definition','sentence','translation']
}
def parse(self, response):
jobs = response.xpath('//div[@class="sd-wotd-text"]')
for job in jobs:
item = Wotd()
item['word'] = job.xpath('.//a[@class="sd-wotd-headword-link"]/text()').extract_first()
item['definition'] = job.xpath('.//div[@class="sd-wotd-translation"]/text()').extract_first()
item['sentence'] = job.xpath('.//div[@class="sd-wotd-example-source"]/text()').extract_first()
item['translation'] = job.xpath('.//div[@class="sd-wotd-example-translation"]/text()').extract_first()
yield item
從我一直在閱讀上Scrapy文檔,它看起來像我可以與CsvItemExporter類有勾搭,並設置include_headers_line =假,但我不知道在哪裏添加類在項目結構。
謝謝,這正是我一直在尋找的。我在沒有更改的情況下運行了一次,以便設置標題,然後進行更改並像魅力一樣工作。謝謝你的幫助! – GainesvilleJesus