1

spider_closed()函數未執行。如果我只給出打印語句它正在打印,但如果我執行任何函數調用並返回它不工作的值。如何在scrapy中完成所有爬網之後執行該功能?

import scrapy 
import re 
from pydispatch import dispatcher 
from scrapy import signals 

from SouthShore.items import Product 
from SouthShore.internalData import internalApi 
from scrapy.http import Request 

class bestbuycaspider(scrapy.Spider): 
    name = "bestbuy_dca" 

    allowed_domains = ["bestbuy.ca"] 

    start_urls = ["http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+beds", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+night+stand", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+headboard", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+desk", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+bookcase", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+dresser", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+tv+stand", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+armoire", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+kids", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+changing+table", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+baby"] 

    def __init__(self,jsondetails="",serverdetails="", *args,**kwargs): 
     super(bestbuycaspider, self).__init__(*args, **kwargs) 
     dispatcher.connect(self.spider_closed, signal=signals.spider_closed) 
     self.jsondetails = jsondetails 
     self.serverdetails=serverdetails 
     self.data = [] 

    def parse(self,response): 
     #my stuff here 



    def spider_closed(self,spider): 
     print "returning values" 
     self.results['extractedData']=self.data 
     print self.results=internalApi(self.jsondetails,self.serverdetails) 
     yield self.results 

1)我要調用一些功能和返回值刮掉

+0

所以你想繼續在'spider_closed'中爬行?產生物品或請求? – eLRuLL

+0

不,我想在蜘蛛關閉後返回抓取的項目,並在另一個py文件中調用另一個函數,因此它會執行一些操作並給出一些值。我需要附加並返回,我的爬行值和稱爲函數輸出在一起。 –

+0

scrapy項目不存儲在內存中,當您調用'yield item'時,它們會被輸出。如果你想在輸出時處理每個項目,你將不得不使用管道,但是一旦蜘蛛結束就使用它們,這是一個非常糟糕的做法(因爲你必須自己存儲它們) – eLRuLL

回答

0

您可以創建一個Item Pipelineclose_spider()方法:

class MyPipeline(object): 
    def close_spider(self, spider): 
     do_something_here() 

只是不要忘記激活它設置.py,如上面的文檔鏈接中所述。

+0

道歉,我, m新的scrapy,我是否需要在pipelines.py文件中創建Pipeline類和closs_spider函數,或者我可以在我的蜘蛛文件本身中更改類名稱。 –

+0

如果我需要在pipelines.py文件中創建類和函數,那麼我的疑惑是1)如何將該管道類導入到我的蜘蛛文件或自動獲取? 2)如何將抓取的值傳遞給pipelines.y文件中的close_spider函數。 –