2010-10-06 42 views
0

少量的數據,我已經使用了耙任務從CSV中重要的數據加載到Rails:如何將大量數據導入到Rails中?

desc "Import users." 
task :import_users => :environment do 
    File.open("users.txt", "r").each do |line| 
     name, age, profession = line.strip.split("\t") 
     u = User.new(:name => name, :age => age, :profession => profession) 
     u.save 
    end 
end 

對於大文件(約50,000條記錄),雖然,這是令人難以置信的慢。有更快的方式來導入數據嗎?

回答

1

沒有額外的庫(我同意,用AR擴展批量導入應該會更快)(雖然AR:擴展跳過模型驗證),你可以添加併發的一點點,並採取多核機器的優勢

# Returns the number of processor for Linux, OS X or Windows. 
def number_of_processors 
    if RUBY_PLATFORM =~ /linux/ 
    return `cat /proc/cpuinfo | grep processor | wc -l`.to_i 
    elsif RUBY_PLATFORM =~ /darwin/ 
    return `sysctl -n hw.logicalcpu`.to_i 
    elsif RUBY_PLATFORM =~ /win32/ 
    # this works for windows 2000 or greater 
    require 'win32ole' 
    wmi = WIN32OLE.connect("winmgmts://") 
    wmi.ExecQuery("select * from Win32_ComputerSystem").each do |system| 
     begin 
     processors = system.NumberOfLogicalProcessors 
     rescue 
     processors = 0 
     end 
     return [system.NumberOfProcessors, processors].max 
    end 
    end 
    raise "can't determine 'number_of_processors' for '#{RUBY_PLATFORM}'" 
end 

desc "Import users." 
task :fork_import_users => :environment do 
    procs = number_of_processors 
    lines = IO.readlines('user.txt') 
    nb_lines = lines.size 
    slices = nb_lines/procs 
    procs.times do 
    subset = lines.slice!(0..slices) 
    fork do 
     subset.each do |line| 
     name, age, profession = line.strip.split("\t") 
     u = User.new(:name => name, :age => age, :profession => profession) 
     u.save 
     end 
    end 
    end 
    Process.waitall 
end 
我的機器有2芯和叉版本上

我得到

real 1m41.974s 
user 1m32.629s 
sys  0m7.318s 

同時用版本:

real 2m56.401s 
user 1m21.953s 
sys  0m7.529s 
+0

ar-extensions(及其替代Rails 3 activerecord-import)不必跳過模型驗證。根據您的需求和速度偏好,這是可選的。 – 2011-04-29 04:24:40

0

您應該嘗試FasterCSV。這對我來說非常快速且很容易使用。