2016-07-14 83 views
1

我試圖獲取parse_start_url方法中的條目的URL,它產生一個回調parse_link方法的請求,但回調看起來沒有工作。我錯了什麼?自定義分析回調請求不工作在Scrapy

代碼:

from scrapy import Request 
from scrapy.selector import Selector 
from scrapy.linkextractors import LinkExtractor 
from scrapy.spiders import Rule, CrawlSpider 
from property.items import PropertyItem 
import sys 

reload(sys) 
sys.setdefaultencoding('utf8') #To prevent UnicodeDecodeError, UnicodeEncodeError. 

class VivastreetSpider(CrawlSpider): 
    name = 'viva' 
    allowed_domains = ['chennai.vivastreet.co.in'] 
    start_urls = ['http://chennai.vivastreet.co.in/rent+chennai/'] 
    rules = [ 
     Rule(LinkExtractor(restrict_xpaths = '//*[text()[contains(., "Next")]]'), callback = 'parse_start_url', follow = True) 
     ] 

    def parse_start_url(self, response): 
     urls = Selector(response).xpath('//a[contains(@id, "vs-detail-link")]/@href').extract() 

     for url in urls: 
      print('test ' + url) 
      yield Request(url = url, callback = self.parse_link) 

    def parse_link(self, response): 
     #item = PropertyItem() 
     print('parseitemcalled') 
     a = Selector(response).xpath('//*h1[@class = "kiwii-font-xlarge kiwii-margin-none"').extract() 
     print('test ' + str(a)) 

回答

0

您需要調整您的allowed_domains以允許跟隨所提取的網址:

allowed_domains = ['vivastreet.co.in'] 

然後,您將進入無效的表達錯誤,這是因爲//*h1[@class = "kiwii-font-xlarge kiwii-margin-none"無效,需要修復。