2017-06-19 122 views
2

我嘗試從主分析函數中調用getNext()函數,該函數使用分段調用但它永遠不會被調用。Python Scrapy函數調用

class BlogSpider(scrapy.Spider): 
     # User agent. 
     name = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19' 
     start_urls = ['http://www.tricksforums.org/best-free-movie-streaming-sites-to/'] 

     def getNext(self): 
     print("Getting next ... ") 
     # Check if next link in DB is valid and crawl. 
     try: 
      nextUrl = myDb.getNextUrl() 
      urllib.urlopen(nextUrl).getcode() 
      yield scrapy.Request(nextUrl['link']) 
     except IOError as e: 
      print("Server can't be reached", e.code) 
      yield self.getNext() 

     def parse(self, response): 
     print("Parsing link: ", response.url) 
     # Get all urls for futher crawling. 
     all_links = hxs.xpath('*//a/@href').extract() 
     for link in all_links: 
      if validators.url(link) and not myDb.existUrl(link) and not myDb.visited(link): 
      myDb.addUrl(link) 
     print("Getting next?") 
     yield self.getNext() 

我嘗試過和沒有屈服之前..有什麼問題?這個產量應該是什麼? :)

+0

你在控制檯上打印什麼? – alecxe

+0

'('Parsing link:','http://www.tricksforums.org/best-free-movie-streaming-sites-to/') 下一步是什麼?'這就是我得到的:) – Alessandro

+0

所以,你呢請參閱「下一步」打印......這意味着執行getNext(),對吧?謝謝。 – alecxe

回答

1

您試圖產生一個發電機,但意味着從發電機收益率

如果您對Python的3.3+,你可以使用yield from

yield from self.getNext() 

或者,乾脆做return self.getNext()

+0

是的,工作:)。但我仍然沒有得到良好的處理.. – Alessandro

+1

@Alessandro你應該也已經注意到在控制檯上的信息:'2017-06-19 15:42:49 [scrapy.core.scraper]錯誤:蜘蛛必須返回Request,BaseItem,dict或None,在中獲得'generator' - 請查看[this SO topic](https ://stackoverflow.com/q/1756096/771848)瞭解生成器。謝謝! – alecxe

+1

我有「--nolog」標誌..是的 – Alessandro