給定一個逗號分隔的CSV文件的格式如下:紅寶石嵌套散列與組合唯一鍵
Day,User,Requests,Page Views,Browse Time,Total Bytes,Bytes Received,Bytes Sent
"Jul 25, 2012","abc123",3,0,0,13855,3287,10568
"Jul 25, 2012","abc230",1,0,0,1192,331,861
"Jul 25, 2012",,7,0,0,10990,2288,8702
"Jul 24, 2012","123456",3,0,0,3530,770,2760
"Jul 24, 2012","abc123",19,1,30,85879,67791,18088
我想(超過30天= 30000條記錄1000個用戶)放棄整個數據集到一個哈希這樣密鑰1可能是重複密鑰,密鑰2可能是重複密鑰,但密鑰1將是唯一的。
使用上面的行1實施例:
report_hash = 「2012年7月25日」=> 「ABC123」=> { 「PageRequest」=> 3, 「瀏覽量」=> 0, 「BrowseTime」= > 0, 「TotalBytes」=> 13855, 「與BytesReceived」=> 3287, 「BytesSent」=> 10568}
def hashing(file)
#read the CSV file into an Array
report_arr = CSV.read(file)
#drop the header row
report_arr.drop(1)
#Create an empty hash to save the data to
report_hash = {}
#for each row in the array,
#if the first element in the array is not a key in the hash, make one
report_arr.each{|row|
if report_hash[row[0]].nil?
report_hash[row[0]] = Hash.new
#If the key exists, does the 2nd key exist? if not, make one
elsif report_hash[row[0]][row[1]].nil?
report_hash[row[0]][row[1]] = Hash.new
end
#throw all the other data into the 2-key hash
report_hash[row[0]][row[1]] = {"PageRequest" => row[2].to_i, "PageViews" => row[3].to_i, "BrowseTime" => row[4].to_i, "TotalBytes" => row[5].to_i, "BytesReceived" => row[6].to_i, "BytesSent" => row[7].to_i}
}
return report_hash
end
我花了幾個小時的學習散列和相關內容以走這麼遠,但感覺有一個更有效的方法來做到這一點。任何有關創建嵌套散列的正確/更有效的方法的建議,前兩個鍵是數組的前兩個元素,以便它們創建一個「組合」唯一鍵?
這可以工作。 csv文件是日誌轉儲,所以日期格式將保持一致。 – Neobane 2012-07-31 21:05:48