2015-11-13 84 views
5

我知道這是一個新手問題,它是一個基本的Python問題,但它在Scrapy的上下文中,我無法在任何地方找到答案。在Scrapy bot中,我如何從另一個內部調用一個函數?

當我運行此殭屍代碼:

import scrapy 

from tutorial.items import DmozItem 

class DmozSpider(scrapy.Spider): 
    name = "dmoz" 
    allowed_domains = ["lib-web.org"] 
    start_urls = [ 
     "http://www.lib-web.org/united-states/public-libraries/michigan/" 
    ] 

    count = 0 

    def increment(self): 
     global count 
     count += 1 

    def getCount(self): 
     global count 
     return count 

    def parse(self, response): 
     increment() 
     for sel in response.xpath('//div/div/div/ul/li'): 
      item = DmozItem() 
      item['title'] = sel.xpath('a/text()').extract() 
      item['link'] = sel.xpath('a/@href').extract() 
      item['desc'] = sel.xpath('p/text()').extract() 
      x = getCount() 
      print x 
      yield item 

DmozItem:

import scrapy 

class DmozItem(scrapy.Item): 
    title = scrapy.Field() 
    link = scrapy.Field() 
    desc = scrapy.Field() 

我得到這個錯誤:

File "/Users/Admin/scpy_projs/tutorial/tutorial/spiders/dmoz_spider.py", line 23, in parse 
    increment() 
NameError: global name 'increment' is not defined 

爲什麼我無法從parse(self, response)內調用increment()?我該如何做這項工作?

感謝您的任何幫助。

回答

6

increment()是一個實例方法你的蜘蛛 - 使用self.increment()來調用它。

此外,沒有必要使用全局變量 - 將count()定義爲實例變量。

修正版本:

import scrapy 

from tutorial.items import DmozItem 

class DmozSpider(scrapy.Spider): 
    name = "dmoz" 
    allowed_domains = ["lib-web.org"] 
    start_urls = [ 
     "http://www.lib-web.org/united-states/public-libraries/michigan/" 
    ] 

    def __init__(self, *args, **kwargs): 
     super(DmozSpider, self).__init__(*args, **kwargs) 

     self.count = 0 

    def increment(self): 
     self.count += 1 

    def getCount(self): 
     return self.count 

    def parse(self, response): 
     self.increment() 

     for sel in response.xpath('//div/div/div/ul/li'): 
      item = DmozItem() 
      item['title'] = sel.xpath('a/text()').extract() 
      item['link'] = sel.xpath('a/@href').extract() 
      item['desc'] = sel.xpath('p/text()').extract() 
      x = self.getCount() 
      print x 

      yield item 

您也可以define count as a property.

+0

我得在__init__和自我閱讀起來。感謝您的指導。這是我需要的。 – ryan71

相關問題