我一直在試圖將許多大型數據集合放入一個集合中,但我遇到了編寫MapReduce函數的問題。MongoDB集合上的MapReduce變爲空
這是我的數據是什麼樣子(這裏有17行,在現實中我有4+萬元):
{"user": 1, "day": 1, "type": "a", "sum": 10}
{"user": 1, "day": 2, "type": "a", "sum": 32}
{"user": 1, "day": 1, "type": "b", "sum": 11}
{"user": 2, "day": 4, "type": "b", "sum": 2}
{"user": 1, "day": 2, "type": "b", "sum": 1}
{"user": 1, "day": 3, "type": "b", "sum": 9}
{"user": 1, "day": 4, "type": "b", "sum": 12}
{"user": 2, "day": 2, "type": "a", "sum": 3}
{"user": 3, "day": 2, "type": "b", "sum": 81}
{"user": 1, "day": 4, "type": "a", "sum": 22}
{"user": 1, "day": 5, "type": "a", "sum": 39}
{"user": 2, "day": 5, "type": "a", "sum": 8}
{"user": 2, "day": 3, "type": "b", "sum": 1}
{"user": 3, "day": 3, "type": "b", "sum": 99}
{"user": 2, "day": 3, "type": "a", "sum": 5}
{"user": 1, "day": 3, "type": "a", "sum": 41}
{"user": 3, "day": 4, "type": "b", "sum": 106}
...
我試圖讓它看起來像這樣到底(數組每種類型的,其中的內容都只是由天決定,如果那天沒有該類型存在合適的索引的總和,它只是0):
{"user": 1, "type_a_sums": [10, 32, 41, 22, 39], "type_b_sums": [11, 1, 9, 12, 0]}
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]}
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 81, 99, 106, 0]}
...
這是MapReduce的我一直嘗試:
var mapsum = function(){
var output = {user: this.user, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: this.type, tempSum: this.sum, tempDay: this.day}
if(this.type == "a") {
output.type_a_sums[this.day-1] = this.sum;
}
if(this.type == "b") {
output.type_b_sums[this.day-1] = this.sum;
}
emit(this.user, output);
};
var r = function(key, values) {
var outs = {user: 0, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: -1, tempSum: -1, tempDay: -1}
values.forEach(function(v){
outs.user = v.user;
if(v.tempType == "a") {
outs.type_a_sums[v.tempDay-1] = v.tempSum;
}
if(v.tempType == "b") {
outs.type_b_sums[v.tempDay-1] = v.tempSum;
}
});
return outs;
};
res = db.sums.mapReduce(mapsum, r, {out: 'joined_sums'})
這給了我,我在小樣本輸出,但是當我運行在所有4個萬I得到一噸的輸出看起來像這樣:
{"user": 1, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]}
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]}
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]}
凡users
很大一部分應該有它們的數組中的和實際上只是填充了reduce
函數outs
對象中虛擬數組中的0,然後才用實際函數填充它們。
真奇怪的是,如果我在同一個集合上運行相同的確切函數,但只檢查一個用戶res = db.sums.mapReduce(mapsum, r, {query: {user: 1}, out: 'joined_sums'})
,我知道他們的數組中應該有總和,但以前一直都是0,我會實際得到我只需要該用戶的輸出。再次運行400萬,我回到0的地方。這就好像它只是寫了所有與虛擬填充陣列相關的工作。
我有太多數據嗎?考慮到時間,它不應該能夠通過它嗎?或者我遇到了一些我不知道的障礙?
有沒有機會在過時的MongoDB版本中發現錯誤? – maerics 2012-03-26 04:32:39
我認爲它與'reduce()'每個鍵不止一次被調用有關。我正在嘗試使用'finalize',但我很困惑它是如何工作的。 – TFX 2012-03-26 05:15:53