2016-08-19 286 views
0

我有一個libsvm格式的數據集,其中包含重要性分數和特徵的標籤。 qid是查詢。數據如下所示。xgboost(Python版本)中lambdaMART的數據格式是什麼?

我想使用xgboost進行搜索排名,但我不知道xgb.train函數的輸入數據格式是什麼。我已經看到了用於分類和迴歸的數據格式。但是我的數據集具有代表組信息的查詢並且lambda尚未計算。那麼,如何使用xgboost api來訓練我的等級模型,以及需要什麼樣的數據格式呢?

此外,我想用ndcg來評估我的模型。

非常感謝,我期待着解決方案。

0 qid:1830 1:0.002736 2:0.000000 3:0.000000 4:0.000000 5:0.002736 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
0 qid:1830 1:0.025992 2:0.125000 3:0.000000 4:0.000000 5:0.027360 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
0 qid:1830 1:0.001368 2:0.000000 3:0.000000 4:0.000000 5:0.001368 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1830 1:0.188782 2:0.375000 3:0.333333 4:1.000000 5:0.195622 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1830 1:0.077975 2:0.500000 3:0.666667 4:0.000000 5:0.086183 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
0 qid:1830 1:0.075239 2:0.125000 3:0.333333 4:0.000000 5:0.077975 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1830 1:0.079343 2:0.250000 3:0.666667 4:0.000000 5:0.084815 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1830 1:0.147743 2:0.000000 3:0.000000 4:0.000000 5:0.147743 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
0 qid:1830 1:0.058824 2:0.000000 3:0.000000 4:0.000000 5:0.058824 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
0 qid:1830 1:0.071135 2:0.125000 3:0.333333 4:0.000000 5:0.073871 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1840 1:0.007364 2:0.200000 3:1.000000 4:0.500000 5:0.013158 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
1 qid:1840 1:0.097202 2:0.000000 3:0.000000 4:0.000000 5:0.096491 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
2 qid:1840 1:0.169367 2:0.000000 3:0.500000 4:0.000000 5:0.169591 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 
...... 

回答

1

Format in XGBoost docs

的一般形式:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <docid> <inc> <prob> 
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid" 
<value> .=. <float> 
<info> .=. <string> 
<inc> .=. <integer> 
<prob> .=. <float> 
<docid> .=. <string> 

dump_svmlight_file如果您想在格式寫入數據。