Scrapy：爲什麼我應該爲多個請求使用yield？

1）登錄
2）多個請求
3）同步請求（順序等 'C'）

我意識到 '產量' 應被用於多個請求。
但我認爲'收益率'與'C'有不同的作用，而不是順序的。
所以我想要使用沒有'yield'的請求，如下所示。
但通常不會調用爬網方法。
如何按C順序調用爬網方法？

class HotdaySpider(scrapy.Spider): 

name = "hotday" 
allowed_domains = ["test.com"] 
login_page = "http://www.test.com" 
start_urls = ["http://www.test.com"] 

maxnum = 27982 
runcnt = 10 

def parse(self, response): 
    return [FormRequest.from_response(response,formname='login_form',formdata={'id': 'id', 'password': 'password'}, callback=self.after_login)] 

def after_login(self, response): 
    global maxnum 
    global runcnt 
    i = 0 

    while i < runcnt : 
     **Request(url="http://www.test.com/view.php?idx=" + str(maxnum) + "/",callback=self.crawl)** 
     i = i + 1 

def crawl(self, response): 
    global maxnum 
    filename = 'hotday.html' 

    with open(filename, 'wb') as f:    
    f.write(unicode(response.body.decode(response.encoding)).encode('utf-8')) 
    maxnum = maxnum + 1

來源

2015-07-21 kevink

相關（但不重複）：http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python?rq=1 – NightShadeQueen

當您返回請求的列表（也就是你做什麼，當你yield很多人）Scrapy將安排他們，你無法控制它的響應會順序。

如果您想一次處理一個響應並按順序處理，則只需在after_login方法中返回一個請求，並在爬網方法中構造下一個請求。

def after_login(self, response): 
    return Request(url="http://www.test.com/view.php?idx=0/", callback=self.crawl) 

def crawl(self, response): 
    global maxnum 
    global runcnt 
    filename = 'hotday.html' 

    with open(filename, 'wb') as f:    
    f.write(unicode(response.body.decode(response.encoding)).encode('utf-8')) 
    maxnum = maxnum + 1 
    next_page = int(re.search('\?idx=(\d*)', response.request.url).group(1)) + 1 
    if < runcnt: 
     return Request(url="http://www.test.com/view.php?idx=" + next_page + "/", callback=self.crawl)

來源

2015-07-21 20:03:07 lufte

Scrapy：爲什麼我應該爲多個請求使用yield？

回答

相關問題