2017-11-18 199 views
0

我正在使用CSVFeedSpider來抓取本地csv文件(foods.csv)。使用CSVFeedSpider時出現錯誤

這就是:

calories name       price 

650   Belgian Waffles    $5.95 

900   Strawberry Belgian Waffles $7.95 

900   Berry-Berry Belgian Waffles $8.95 

600   French Toast     $4.50 

950   Homestyle Breakfast   $6.95 

這裏是我的foods.py文件代碼:

from scrapy.spiders import CSVFeedSpider 
from foods_csv.items import FoodsCsvItem 

class FoodsSpider(CSVFeedSpider): 
    name = 'foods' 
    start_urls = ['file:///users/Mina/Desktop/foods.csv'] 
    delimiter = ';' 
    quotechar = "'" 
    headers = ['name', 'price', 'calories'] 

    def parse_row(self, response, row): 
     self.logger.info('Hi, this is a row!: %r', row) 
     item = FoodsCsvItem() 
     item['name'] = row['name'] 
     item['price'] = row['price'] 
     item['calories'] = row['calories'] 
     return item 

items.py

import scrapy 

class FoodsCsvItem(scrapy.Item): 
    name = scrapy.Field() 
    price = scrapy.Field() 
    calories = scrapy.Field() 

但它給我這個錯誤:

2017-11-18 13:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET file:///users/Mina/Desktop/foods.csv> (referer: None) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 1 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 2 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 3 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 4 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 5 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 6 (length: 1, should be: 3) 

在開始的時候我只是刮「名」和「價格」,但它給了我同樣的錯誤,所以我嘗試添加「卡路里」根據這個方案,Scrapy: Scraping CSV File - not getting any output但什麼都沒有改變!

我只需要刮'名稱'和'價格'我該怎麼做?

回答

1

看起來可能是您的CSV文件的具體格式發佈時,它迷路了。如果格式與此處的發佈完全相同,那麼它實際上看起來像TSV(製表符分隔值)文件,您可以嘗試將delimiter = ';'更改爲delimiter = '\t'

但是,既然您已指定'作爲引號字符,我認爲這是正確的?我會嘗試在CSV文件上運行搜索/替換,並用"替換',看看是否有幫助。在使用單引號之前,我有一些奇怪的問題。

-1

試試這個

def parse_row(self, response, row): 
     self.logger.info('Hi, this is a row!: %r', row) 
     item = FoodsCsvItem() 
     item['name'] = row['name'] 
     item['price'] = row['price'] 
     item['calories'] = row['calories'] 
     return item 
+0

好的。我編輯它,但它給了我同樣的錯誤。 – MAGS94