Scrapy固定的網址

-1

試圖圍繞這個我的頭......我有一個固定的100,000個URL的列表我想刮，這很好，我知道如何處理。但首先，我需要從初始表單發佈中獲取cookie，並將其用於後續請求。這會像一個嵌套的蜘蛛？試圖瞭解該用例的體系結構。Scrapy固定的網址

謝謝！

來源

2014-12-04 Honus Wagner

scrapy會自動做餅乾事情。

您只需要先發表帖子，然後獲得100,000個網址的請求。

class MySpider(scrapy.Spider): 
    name = "myspider" 
    start_urls = (
     'https://example.com/login', #login page 
    ) 

    def __init__(self, *args, **kwargs): 
     self.url_list = [] #your url lists 
     return super(MySpider, self).__init__(*args, **kwargs) 

    def parse(self, response): 
     data = {} 

     return scrapy.FormRequest.from_response(
      response, 
      formdata=data, 
      callback=self.my_start_requests 
     ) 

    def my_start_requests(self, response): 
     # ignore the login callback response 
     for url in self.url_list: 
      # scrapy will take care the cookies 
      yield scrapy.Request(url, callback=self.parse_item, dont_filter=True) 

    def parse_item(self, response): 
     # your code here 
     pass

來源

2014-12-05 02:40:57 soooooot

絕對的輝煌。太感謝了！我需要稍微編輯你的代碼的語法，但它的位置。 – 2014-12-05 21:49:10

Scrapy固定的網址

回答

相關問題