爲什麼我的Scrapy蜘蛛複製它的輸出？

我試圖抓取一個網站，以獲得它的用戶非常粗略的人口統計（沒有個人識別信息或照片），但從我已修改的官方文檔教程蜘蛛重複同一行的輸出4次一排。爲什麼我的Scrapy蜘蛛複製它的輸出？

我正在使用的代碼的副本如下：

注意，我已經包含在代碼示例性配置文件是假的/垃圾郵件帳戶。在可能已被刪除的情況下，您可以用網站上的任何其他網址替換該網址，並且該網址會再次運行。

import scrapy 

class DateSpider(scrapy.Spider): 
name = "date" 
start_urls = [ 
    'http://www.pof.com/viewprofile.aspx?profile_id=141659067', 
] 

def parse(self, response): 
    for container in response.xpath('//div[@class="user-details-wide"]'): 
     yield { 
      'Gender': response.xpath("//span[@id='gender']/text()").extract_first(), 
      'Age': response.xpath("//span[@id='age']/text()").extract_first(), 
      'State': response.xpath("//span[@id='state_id']/text()").extract_first(), 
      'Marital status': response.xpath("//span[@id='maritalstatus']/text()").extract_first(), 
      'Body': response.xpath("//span[@id='body']/text()").extract_first(), 
      'Height': response.xpath("//span[@id='height']/text()").extract_first(), 
      'Ethnicity': response.xpath("//span[@id='ethnicity']/text()").extract_first(), 
      'Does drugs?': response.xpath("//span[@id='drugs']/text()").extract_first(), 
      'Smokes?': response.xpath("//span[@id='smoke']/text()").extract_first(), 
      'Drinks?': response.xpath("//span[@id='drink']/text()").extract_first(), 
      'Has children?': response.xpath("//span[@id='haschildren']/text()").extract_first(), 
      'Wants children?': response.xpath("//span[@id='wantchildren']/text()").extract_first(), 
      'Star sign': response.xpath("//span[@id='zodiac']/text()").extract_first(), 
      'Education': response.xpath("//span[@id='college_id']/text()").extract_first(), 
      'Personality': response.xpath("//span[@id='fishtype']/text()").extract_first(), 
     }

運行如下：

scrapy crawl date -o date.scv

我要找的是一個一行頭之後一個行後直它的結果，而不是空白和重複我的輸出目前正在接受。

來源

2017-03-16 scraper_newb

您不需要使用for循環。只需找到一個span元素並從中提取所有數據。

此外，我建議你使用scrapy項目它更方便。清除從空白中提取的數據的一種方法是使用xpath函數normalize-space()。

import scrapy 
from items import DateSpiderItem 


class DateSpider(scrapy.Spider): 
    name = "date" 
    start_urls = [ 
     'http://www.pof.com/viewprofile.aspx?profile_id=141659067', 
    ] 

    def parse(self, response): 
     item = DateSpiderItem() 
     item['Gender'] = response.xpath(
      "//span[@id='gender']/text()").extract_first() 
     item['Age'] = response.xpath(
      "//span[@id='age']/text()").extract_first() 
     item['State'] = response.xpath(
      "//span[@id='state_id']/text()").extract_first() 
     item['Marital_status'] = response.xpath(
      "normalize-space(//span[@id='maritalstatus']/text())").extract_first() 
     item['Body'] = response.xpath(
      "//span[@id='body']/text()").extract_first() 
     item['Height'] = response.xpath(
      "//span[@id='height']/text()").extract_first() 
     item['Ethnicity'] = response.xpath(
      "//span[@id='ethnicity']/text()").extract_first() 
     item['Does_drugs'] = response.xpath(
      "normalize-space(//span[@id='drugs']/text())").extract_first() 
     item['Smokes'] = response.xpath(
      "//span[@id='smoke']/text()").extract_first() 
     item['Drinks'] = response.xpath(
      "normalize-space(//span[@id='drink']/text())").extract_first() 
     item['Has_children'] = response.xpath(
      "normalize-space(//span[@id='haschildren']/text())").extract_first() 
     item['Wants_children'] = response.xpath(
      "normalize-space(//span[@id='wantchildren']/text())").extract_first() 
     item['Star_sign'] = response.xpath(
      "//span[@id='zodiac']/text()").extract_first() 
     yield item

項目文件：

class DateSpiderItem(scrapy.Item): 
    Gender = scrapy.Field() 
    Age = scrapy.Field() 
    State = scrapy.Field() 
    Marital_status = scrapy.Field() 
    Body = scrapy.Field() 
    Height = scrapy.Field() 
    Ethnicity = scrapy.Field() 
    Does_drugs = scrapy.Field() 
    Smokes = scrapy.Field() 
    Drinks = scrapy.Field() 
    Has_children = scrapy.Field() 
    Wants_children = scrapy.Field() 
    Star_sign = scrapy.Field() 
    Education = scrapy.Field() 
    Personality = scrapy.Field()

輸出：

來源

2017-03-16 19:13:12 vold

不得不移動和重命名一些文件，但我得到了你的代碼工作。它看起來也很整潔！非常感謝您的幫助，我非常感謝。 –

沒問題。我很高興我能幫助你。 – vold

爲什麼我的Scrapy蜘蛛複製它的輸出？

回答

相關問題