scrapy DEFAULT_REQUEST_HEADERS不行

我更改默認請求頭中settings.py如下：scrapy DEFAULT_REQUEST_HEADERS不行

DEFAULT_REQUEST_HEADERS = { 
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36', 
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
    'Accept-Encoding': 'gzip, deflate, sdch', 
    'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4', 
}

然而，它在我的HotSpider這麼想的工作。我可以看到scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware已啓用，但Connection已完全關閉，就好像標題未設置一樣。

這裏是HotSpider：

# -*- coding: utf-8 -*- 
    import scrapy 

    class HotSpider(scrapy.Spider): 
     name = "hot" 
     allowed_domains = ["qiushibaike.com"] 
     start_urls = (
      'http://www.qiushibaike.com/hot', 
     ) 

     def parse(self, response): 
      print '\n', response.status, '\n'

如果我更改代碼覆蓋make_requests_from_url設置頁眉，一切運行良好。

# -*- coding: utf-8 -*- 
    import scrapy 


    class HotSpider(scrapy.Spider): 
     name = "hot" 
     allowed_domains = ["qiushibaike.com"] 
     start_urls = (
      'http://www.qiushibaike.com/hot', 
     ) 
     headers = { 
      'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36', 
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
      'Accept-Encoding': 'gzip, deflate, sdch', 
      'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4', 
     } 

     def make_requests_from_url(self, url): 
      return scrapy.http.Request(url, headers=self.headers) 


     def parse(self, response): 
      print '\n', response.status, '\n'

這個問題將在Scrapy 1.2根據prioritize default headers over user agent middlewares #2091

來源

2016-07-04 Leonard Peng

我看到的User-Agent頭確實沒有設置正確使用默認的標題中間件時，這種特殊的網站拒絕沒有一些預期的連接解決用戶代理標題。

設置用戶代理爲你推薦爬行方法是使用USER_AGENT設定鍵：

例如

# settings.py 
USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"

使用默認的頭時，可能會在Scrapy一些bug，或者也許這是預料之中的某個文件沒有設置用戶代理。您需要對此進行更多的研究，如果確實存在bug值得在Scrapy github回購中發佈bug報告。

來源

2016-07-04 14:07:37

感謝您的回答！您建議設置用戶代理的方式運行良好。在文檔中，我找到[User-Agen]（http://doc.scrapy.org/en/latest/topics/settings.html#user-agent）和[DefaultHeadersMiddleware]（http：// doc.scrapy。 org/en/latest/topics/downloader-middleware.html＃scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware）。根據文檔，我認爲這是一個錯誤。 –

scrapy DEFAULT_REQUEST_HEADERS不行

回答

相關問題