2012-08-12 28 views
1

我發現了一個我想使用的gem,但是我無法正常工作。該寶石被稱爲CobWeb。它看起來很漂亮。如何獲取Webcrawling Gem Cobweb在Rails中工作

我已經得到了redis和resque的工作。我創建了一個旨在利用CobWeb的新隊列。

class Crawler 
    @queue = :crawler_queue 
    def self.perform(site_id) 
    site = Site.find_by_id(site_id) 
    crawler = CobWeb.new(follow_redirects: false, internal_urls: true) 
    crawler.start(site.homepage) 

    puts crawler # I'm ultimately interested in getting a list of urls, but at this stage, I just want to see what data I get back from the crawler. 
    end 

問題是,當我嘗試運行隊列的耙取時,我得到這個錯誤。我不確定如何解決這個問題。有什麼建議麼?

rake resque:work QUEUE='*' --trace 
** Invoke resque:work (first_time) 
** Invoke resque:preload (first_time) 
** Invoke resque:setup (first_time) 
** Invoke environment (first_time) 
** Execute environment 
** Execute resque:setup 
** Execute resque:preload 
** Invoke resque:setup 
** Execute resque:work 
rake aborted! 
tried to create Proc object without a block 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1197:in `define_method' 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1197:in `generate_method' 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1206:in `compile!' 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1186:in `route' 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1168:in `get' 
/Users/bendowney/.rvm/gems/[email protected]/gems/sinatra-1.3.2/lib/sinatra/base.rb:1602:in `block (2 levels) in delegate' 
/Users/bendowney/.rvm/gems/[email protected]/gems/redis-namespace-1.2.1/lib/redis/namespace.rb:257:in `method_missing' 
/Users/bendowney/.rvm/gems/[email protected]/gems/resque-1.21.0/lib/resque/worker.rb:444:in `job' 
/Users/bendowney/.rvm/gems/[email protected]/gems/resque-1.21.0/lib/resque/worker.rb:377:in `unregister_worker' 
/Users/bendowney/.rvm/gems/[email protected]/gems/resque-1.21.0/lib/resque/worker.rb:159:in `ensure in work' 
/Users/bendowney/.rvm/gems/[email protected]/gems/resque-1.21.0/lib/resque/worker.rb:159:in `work' 
/Users/bendowney/.rvm/gems/[email protected]/gems/resque-1.21.0/lib/resque/tasks.rb:34:in `block (2 levels) in <top (required)>' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain' 
/Users/bendowney/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:116:in `invoke_task' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block (2 levels) in top_level' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `each' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block in top_level' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:88:in `top_level' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:66:in `block in run' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/lib/rake/application.rb:63:in `run' 
/Users/bendowney/.rvm/gems/[email protected]/gems/rake-0.9.2.2/bin/rake:33:in `<top (required)>' 
/Users/bendowney/.rvm/gems/[email protected]/bin/rake:19:in `load' 
/Users/bendowney/.rvm/gems/[email protected]/bin/rake:19:in `<main>' 
/Users/bendowney/.rvm/gems/[email protected]/bin/ruby_noexec_wrapper:14:in `eval' 
/Users/bendowney/.rvm/gems/[email protected]/bin/ruby_noexec_wrapper:14:in `<main>' 
Tasks: TOP => resque:work 

回答

1

似乎無法引腳下來正好,不過,我剛剛發佈了蜘蛛網,v0.0.64的新版本,並創建了一個新的Rails應用程序與運行它的示例代碼。

你可以在http://github.com/stewartmckee/cobweb_sample

唯一徹底的事情我可以在上面看到,這是錯誤的示例應用程序是:internal_urls應該是一個數組,如

internal_urls:「HTTP:// WWW .google.com/folder1/「,」http://www.google.com/folder2/「,」http://www.otherdomain.com/*「]

這將只允許網址這些url模式被處理。

看看示例站點,並嘗試在您的環境中運行它,以確保它不是環境問題。

斯圖爾特。