Ruby中的並行HTTP請求

我有一個URL數組，我不想打開每個URL並獲取特定的標記。
但我想這樣做並行。Ruby中的並行HTTP請求

這裏是我想要做的僞代碼：

 
urls = [...] 
tags = [] 
urls.each do |url| 
    fetch_tag_asynchronously(url) do |tag| 
    tags << tag 
    end 
end 
wait_for_all_requests_to_finish()

如果這可以在一個不錯的和安全的方式，這將是真棒來完成。
我可以使用線程，但它看起來不像數組是線程安全的紅寶石。

來源

2012-01-08 Nicklas A.

您可以通過實現線程安全Mutex：

require 'thread' # for Mutex 

urls = %w(
    http://test1.example.org/ 
    http://test2.example.org/ 
    ... 
) 

threads = [] 
tags = [] 
tags_mutex = Mutex.new 

urls.each do |url| 
    threads << Thread.new(url, tags) do |url, tags| 
    tag = fetch_tag(url) 
    tags_mutex.synchronize { tags << tag } 
    end 
end 

threads.each(&:join)

然而，這可能是適得其反使用一個新的線程爲每一個URL，所以限制線程數一樣，這可能是更好的性能：

THREAD_COUNT = 8 # tweak this number for maximum performance. 

tags = [] 
mutex = Mutex.new 

THREAD_COUNT.times.map { 
    Thread.new(urls, tags) do |urls, tags| 
    while url = mutex.synchronize { urls.pop } 
     tag = fetch_tag(url) 
     mutex.synchronize { tags << tag } 
    end 
    end 
}.each(&:join)

來源

2012-01-08 15:47:09

哈哈，這是我寫的完全相同的解決方案！ :) – 2012-01-08 15:52:34

如果大部分工作是IO，核心數量應該無關緊要，應該如何？ – ben 2012-01-08 15:59:21

@ben：的確如此。但是，同時擁有太多的線程和開放的連接可能會適得其反。對於HTTP流水線，Firefox使用默認的8個連接，所以我現在使用這個作爲建議值（而不是5）。 – 2012-01-08 16:03:10

感謝紅寶石的GIL這應該是安全的，根據我的閱讀http://merbist.com/2011/02/22/concurrency-in-ruby-explained/和其他鏈接。

來源

2012-01-08 15:58:24 ben

Typhoeus/Hydra寶石組合旨在很容易地做到這一點。它非常方便和強大。

來源

2012-01-09 01:26:13

非線程安全 – Imnl 2017-08-07 14:40:28

Ruby中的並行HTTP請求

回答

相關問題