我更改默認請求頭中settings.py
如下:scrapy DEFAULT_REQUEST_HEADERS不行
DEFAULT_REQUEST_HEADERS = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4',
}
然而,它在我的HotSpider這麼想的工作。我可以看到scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware
已啓用,但Connection已完全關閉,就好像標題未設置一樣。
這裏是HotSpider:
# -*- coding: utf-8 -*-
import scrapy
class HotSpider(scrapy.Spider):
name = "hot"
allowed_domains = ["qiushibaike.com"]
start_urls = (
'http://www.qiushibaike.com/hot',
)
def parse(self, response):
print '\n', response.status, '\n'
如果我更改代碼覆蓋make_requests_from_url
設置頁眉,一切運行良好。
# -*- coding: utf-8 -*-
import scrapy
class HotSpider(scrapy.Spider):
name = "hot"
allowed_domains = ["qiushibaike.com"]
start_urls = (
'http://www.qiushibaike.com/hot',
)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4',
}
def make_requests_from_url(self, url):
return scrapy.http.Request(url, headers=self.headers)
def parse(self, response):
print '\n', response.status, '\n'
這個問題將在Scrapy 1.2根據prioritize default headers over user agent middlewares #2091
感謝您的回答!您建議設置用戶代理的方式運行良好。 在文檔中,我找到[User-Agen](http://doc.scrapy.org/en/latest/topics/settings.html#user-agent)和[DefaultHeadersMiddleware](http:// doc.scrapy。 org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware)。根據文檔,我認爲這是一個錯誤。 –