2017-09-02 74 views
0

我有一個使用中間件通過pip安裝的scrapy項目。 更具體地說scrapy-random-useragent
如何使用pip在Scrapinghub上安裝中間件

設置文件 # - - 編碼:UTF-8 - -

# Scrapy settings for batdongsan project 
# 
# For simplicity, this file contains only settings considered important or 
# commonly used. You can find more settings consulting the documentation: 
# 
#  http://doc.scrapy.org/en/latest/topics/settings.html 
#  http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html 
#  http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html 

BOT_NAME = 'batdongsan' 

SPIDER_MODULES = ['batdongsan.spiders'] 
NEWSPIDER_MODULE = 'batdongsan.spiders' 
FEED_EXPORT_ENCODING = 'utf-8' # make output in json become human readable utf-8 
CLOSESPIDER_PAGECOUNT = 10 # limit the number of page crawl 
LOG_LEVEL = 'INFO' # write less log 

# Obey robots.txt rules 
ROBOTSTXT_OBEY = True 

# Enable or disable downloader middlewares 
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html 
DOWNLOADER_MIDDLEWARES = { 
    #'batdongsan.middlewares.MyCustomDownloaderMiddleware': 543, 
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None, 
    'random_useragent.RandomUserAgentMiddleware': 400 
} 
USER_AGENT_LIST = "agents.txt" 

我的機器上的scrapy項目運行良好。
我使用鏈接的github項目在scrapinghub上部署。
我得到了在scrapinghub上的日誌上的錯誤。

File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run 
    self.crawler_process.crawl(spname, **opts.spargs) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 168, in crawl 
    return self._crawl(crawler, *args, **kwargs) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 172, in _crawl 
    d = crawler.crawl(*args, **kwargs) 
    File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1445, in unwindGenerator 
    return _inlineCallbacks(None, gen, Deferred()) 
--- <exception caught here> --- 
    File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks 
    result = g.send(result) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl 
    six.reraise(*exc_info) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl 
    self.engine = self._create_engine() 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine 
    return ExecutionEngine(self, lambda _: self.stop()) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__ 
    self.downloader = downloader_cls(crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ 
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler 
    return cls.from_settings(crawler.settings, crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings 
    mwcls = load_object(clspath) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object 
    mod = import_module(module) 
    File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module 
    __import__(name) 
exceptions.ImportError: No module named random_useragent 

很明顯,問題是No module named random_useragent

但我不知道如何通過pip在Scrapinghub上安裝該模塊。

+0

你讀這https://shub.readthedocs.io/en/stable/deploying.html? –

+0

查看我的回答https://stackoverflow.com/a/43427263/4094231 – Umair

回答

1

當鏈接GitHub的倉庫與Python的Scrapinghub依賴,你需要有2個文件在您的存儲庫根(即在同一級別爲您scrapy.cfg文件):

  • scrapinghub.yml
  • requirements.txt

他們應該包含同樣的事情在shub deploy section from their docs詳細說明:

scrapinghub.yml:

requirements: 
    file: requirements.txt 

requirements.txt

scrapy-random-useragent