2016-06-08 73 views
1

我正在嘗試使用Vowpal Wabbit來預測廣告展示的轉化率,並且我得到了非直觀的概率輸出,這些輸出集中在36%左右正面課堂的全球頻率小於1%。使用Vowpal Wabbit獲取未校準的概率輸出,廣告轉換預測

我在我的數據集中的正負不平衡是1/100(我已經欠負樣本),所以我在正例中使用了100的權重。

負面的例子有標籤-1和正面的1.我用shuf來洗牌正面和負面的例子,使在線學習正常工作。

樣品線路中的VW文件:

1 100 'c4ac3440|i search_delay_log:3.58351893846 click_count_log:3.58351893846 banner_impression_count_log:3.98898404656 |c es i_type_2 xvertical_1_61 vertical_1 creat_size_728x90 retargeting 
-1 1 'a4d25cf1|i search_delay_log:11.2825684591 click_count_log:11.2825684591 banner_impression_count_log:4.48863636973 |c br i_type_1 xvertical_1_960 vertical_1 creat_size_300x600 retargeting 

現在我使用以下方法來創建一個訓練集的模型:

vw -d impressions_rand.aa --loss_function logistic -c -k --passes 12 -f model.vw 

輸出:

final_regressor = model.vw 
Num weight bits = 18 
learning rate = 0.5 
initial_t = 0 
power_t = 0.5 
decay_learning_rate = 1 
creating cache_file = impressions_rand.aa.cache 
Reading datafile = impressions_rand.aa 
num sources = 1 
average since   example  example current current current 
loss  last   counter   weight label predict features 
0.693147 0.693147   1   1.0 -1.0000 0.0000  11 
0.510760 0.328374   2   2.0 -1.0000 -0.9449  11 
0.387521 0.264282   4   4.0 -1.0000 -1.1825  11 
1.765374 1.818883   8   107.0 1.0000 -1.7020  11 
2.152669 2.444504   51   249.0 1.0000 -3.2953  11 
1.289870 0.427071   201   498.0 -1.0000 -3.5498  11 
0.878843 0.528943   588   1083.0 1.0000 -1.3394  9 
0.852358 0.825872   1176   2166.0 -1.0000 -6.7918  11 
0.871977 0.891597   2451   4332.0 -1.0000 -2.7031  11 
0.689428 0.506878   4110   8664.0 -1.0000 -2.7525  11 
0.638008 0.586589   8517  17328.0 -1.0000 -5.8017  11 
0.580220 0.522713  17515  34741.0 1.0000 2.1519  11 
0.526281 0.472343  35525  69482.0 -1.0000 -6.2931  9 
0.497601 0.468921  71050  138964.0 -1.0000 -7.6245  9 
0.479305 0.461008  143585  277928.0 -1.0000 -0.8296  11 
0.443734 0.443734  288655  555856.0 -1.0000 -2.5795  11 h 
0.438806 0.433925  578181  1111791.0 1.0000 0.8503  11 h 

finished run 
number of examples per pass = 216000 
passes used = 5 
weighted example sum = 2072475.000000 
weighted label sum = -67475.000000 
average loss = 0.432676 h 
best constant = -0.065138 
best constant's loss = 0.692617 
total feature number = 11548690 

我們預測測試集。 --link logistic應將vw輸出轉換爲[0, 1]範圍內的概率。

vw -d impressions_rand.ab --link logistic -i model.vw -p preds_ab.txt 

輸出:

predictions = preds_ab.txt 
Num weight bits = 18 
learning rate = 0.5 
initial_t = 0 
power_t = 0.5 
using no cache 
Reading datafile = impressions_rand.ab 
num sources = 1 
average since   example  example current current current 
loss  last   counter   weight label predict features 
68.282379 68.282379   1   1.0 -1.0000 0.0001  9 
38.748867 9.215355   2   2.0 -1.0000 0.0174  11 
21.256140 3.763414   4   4.0 -1.0000 0.8345  11 
11.685329 2.114518   8   8.0 -1.0000 0.3508  11 
9.457854 7.230378   16   16.0 -1.0000 0.0069  11 
7.371087 5.284320   32   32.0 -1.0000 0.3561  11 
7.061980 6.752873   64   64.0 -1.0000 0.6549  11 
5.423309 3.784638   128   128.0 -1.0000 0.2597  11 
3.252394 1.725597   211   310.0 1.0000 0.7686  11 
2.140099 1.052366   330   627.0 1.0000 0.7143  11 
1.671550 1.203000   660   1254.0 -1.0000 0.8054  11 
1.788466 1.905383   1320   2508.0 -1.0000 0.0676  9 
1.508163 1.234410   2502   5076.0 1.0000 0.3921  11 
1.282862 1.060063   5061  10209.0 1.0000 0.4258  9 
1.119420 0.955977  11013  20418.0 -1.0000 0.6892  11 
1.017911 0.916403  22323  40836.0 -1.0000 0.5301  9 
0.888435 0.758960  42171  81672.0 -1.0000 0.3500  11 
0.787709 0.686983  84243  163344.0 -1.0000 0.2360  9 
0.703270 0.618831  170268  326688.0 -1.0000 0.5707  11 

finished run 
number of examples per pass = 207361 
passes used = 1 
weighted example sum = 397936.000000 
weighted label sum = -12936.000000 
average loss = 0.684043 
best constant = -0.032508 
best constant's loss = 0.998943 
total feature number = 2216941 

它輸出了我的預測文件preds_ab.txt,如:

0.000095 7c14ae23 
0.017367 3e9558bd 
0.139393 6a1cd72f 
0.834518 dfe76f6e 
0.089810 2b88b547 

如果我計算這些預測的ROC-AUC得分,我得到的0.85的值這接近我使用scikit-learn(0.90)所得到的結果。然而,概率輸出完全沒有校準,因爲它們比我預期的要高得多(接近1%)。這是直方圖。

Probability histogram

這是可靠性曲線:

Reliability curve

這是平均概率和正頻率的曲線圖,當實施例是通過概率分級:

Probabilities vs positive frequencies

很明顯,輸出概率是遠高於經過精確校準的分類器的預期值。

我在做什麼錯在這裏?我應該調查什麼?

UPDATE

如果我不使用100重量爲正面類的例子,我得到類似的非直觀的結果。平均可能性輸出爲0.27(與1仍相差很遠),可靠性曲線看起來更差,ROC-AUC爲0.76。

我可以確認我有237805個反例和2195個正例。

輸出培訓:

Num weight bits = 18 
learning rate = 0.5 
initial_t = 0 
power_t = 0.5 
decay_learning_rate = 1 
creating cache_file = impressions_rand.aa.cache 
Reading datafile = impressions_rand.aa 
num sources = 1 
average since   example  example current current current 
loss  last   counter   weight label predict features 
0.693147 0.693147   1   1.0 -1.0000 0.0000  11 
0.546724 0.400300   2   2.0 -1.0000 -0.7087  11 
0.398553 0.250382   4   4.0 -1.0000 -1.3963  11 
0.284506 0.170460   8   8.0 -1.0000 -2.2595  11 
0.181406 0.078306   16   16.0 -1.0000 -2.8225  11 
0.108136 0.034865   32   32.0 -1.0000 -4.2696  11 
0.063156 0.018176   64   64.0 -1.0000 -4.7412  11 
0.036415 0.009675   128   128.0 -1.0000 -4.2940  11 
0.020325 0.004235   256   256.0 -1.0000 -5.9903  11 
0.043248 0.066171   512   512.0 -1.0000 -5.5540  11 
0.045276 0.047304   1024   1024.0 -1.0000 -4.7065  11 
0.044606 0.043935   2048   2048.0 -1.0000 -6.6253  11 
0.048938 0.053270   4096   4096.0 -1.0000 -5.9119  11 
0.048711 0.048485   8192   8192.0 -1.0000 -2.3949  11 
0.048157 0.047603  16384  16384.0 -1.0000 -9.6219  11 
0.044306 0.040454  32768  32768.0 -1.0000 -8.8800  11 
0.044029 0.043752  65536  65536.0 -1.0000 -5.9218  9 
0.042739 0.041450  131072  131072.0 -1.0000 -3.8306  11 
0.042986 0.042986  262144  262144.0 -1.0000 -6.0941  11 h 
0.042321 0.041655  524288  524288.0 -1.0000 -4.0276  11 h 
0.042654 0.042988  1048576  1048576.0 -1.0000 -9.9169  11 h 

finished run 
number of examples per pass = 216000 
passes used = 7 
weighted example sum = 1512000.000000 
weighted label sum = -1484504.000000 
average loss = 0.042763 h 
best constant = -4.691161 
best constant's loss = 0.051789 
total feature number = 16166472 

輸出測試如下。我讀過的平均損失大於最佳常數損失表明我的模型學習出了問題。

Num weight bits = 18 
learning rate = 0.5 
initial_t = 0 
power_t = 0.5 
using no cache 
Reading datafile = impressions_rand.ab 
num sources = 1 
average since   example  example current current current 
loss  last   counter   weight label predict features 
78.141266 78.141266   1   1.0 -1.0000 0.0001  11 
54.228148 30.315029   2   2.0 -1.0000 0.0015  11 
33.279501 12.330854   4   4.0 1.0000 0.0472  11 
20.358767 7.438034   8   8.0 -1.0000 0.0527  11 
15.780043 11.201319   16   16.0 -1.0000 0.1657  11 
13.783271 11.786498   32   32.0 -1.0000 0.0012  9 
9.318714 4.854158   64   64.0 -1.0000 0.7268  11 
6.797651 4.276587   128   128.0 -1.0000 0.1404  9 
4.674237 2.550824   256   256.0 -1.0000 0.0516  11 
3.269198 1.864159   512   512.0 -1.0000 0.4092  11 
2.153033 1.036868   1024   1024.0 -1.0000 0.0425  11 
1.481920 0.810807   2048   2048.0 -1.0000 0.2792  11 
1.005869 0.529817   4096   4096.0 -1.0000 0.2422  11 
0.676574 0.347279   8192   8192.0 -1.0000 0.3003  11 
0.452924 0.229274  16384  16384.0 -1.0000 0.2579  11 
0.295262 0.137600  32768  32768.0 -1.0000 0.2833  11 
0.191513 0.087763  65536  65536.0 -1.0000 0.2616  9 
0.126758 0.062003  131072  131072.0 -1.0000 0.2670  11 

finished run 
number of examples per pass = 207361 
passes used = 1 
weighted example sum = 207361.000000 
weighted label sum = -203423.000000 
average loss = 0.099565 
best constant = -0.981009 
best constant's loss = 0.037621 
total feature number = 2217159 
+1

我會做些什麼來改善這個結果的第一件事是避免哈希衝突:你有超過20萬例,大約10倍更多的功能(〜10每個示例的功能)。留下默認的'-b 18'(大約262k獨特功能)似乎不夠。嘗試'-b 24'開始。它會改善結果嗎? – arielf

+0

另外:除非有一些嚴重的不規則行爲使得正面標籤出現在一起,否則無需對自然時間順序中出現的示例進行洗牌。 – arielf

+1

測試時,您還應該使用'-t',這樣您就不會繼續訓練測試數據。 –

回答

0

感謝Martin Popel和arielf的評論,我解決了這個問題。 :)

  1. 我在生成預測時忘記了使用-t
  2. 生成預測時我沒有指定--loss_function logisitc

因此,在使用默認損失函數而非邏輯函數進行測試時,模型正在更新,從而破壞了模型併產生了錯誤的結果。

外賣:

  1. 使用--loss_function logistic還測試過程中看到正確的損失輸出。
  2. 如果您不想在預測時更新模型,請記住使用-t

這是輸出的外觀現在測試時(無例子加權):

$ vw -d impressions_rand.ab --link logistic --loss_function logistic -i model.vw -t -p preds.txt 
only testing 
predictions = preds.txt 
Num weight bits = 18 
learning rate = 0.5 
initial_t = 0 
power_t = 0.5 
using no cache 
Reading datafile = impressions_rand.ab 
num sources = 1 
average since   example  example current current current 
loss  last   counter   weight label predict features 
0.000053 0.000053   1   1.0 -1.0000 0.0001  11 
0.000370 0.000687   2   2.0 -1.0000 0.0007  11 
1.252868 2.505366   4   4.0 1.0000 0.0067  11 
0.638249 0.023630   8   8.0 -1.0000 0.0036  11 
0.322060 0.005872   16   16.0 -1.0000 0.0031  11 
0.164750 0.007439   32   32.0 -1.0000 0.0000  9 
0.084911 0.005072   64   64.0 -1.0000 0.0081  11 
0.076905 0.068899   128   128.0 -1.0000 0.0004  9 
0.055126 0.033347   256   256.0 -1.0000 0.0000  11 
0.052986 0.050847   512   512.0 -1.0000 0.0133  11 
0.038351 0.023715   1024   1024.0 -1.0000 0.0000  11 
0.037059 0.035767   2048   2048.0 -1.0000 0.0167  11 
0.038848 0.040637   4096   4096.0 -1.0000 0.0112  11 
0.038903 0.038957   8192   8192.0 -1.0000 0.0281  11 
0.041625 0.044348  16384  16384.0 -1.0000 0.0001  11 
0.042526 0.043426  32768  32768.0 -1.0000 0.0218  11 
0.042538 0.042551  65536  65536.0 -1.0000 0.0000  9 
0.042150 0.041763  131072  131072.0 -1.0000 0.0019  11 

finished run 
number of examples per pass = 207361 
passes used = 1 
weighted example sum = 207361.000000 
weighted label sum = -203423.000000 
average loss = 0.042438 
best constant = -4.647395 
best constant's loss = 0.053670 
total feature number = 2217159 

你看現在報告average loss小於best constant's loss和迭代平均損失也躺在預期區間。

另外,輸出概率現在非常清楚了:

Mean probabilities vs. frequencies

2

你說你在訓練集中平均每100個反例有一個正面的例子。但是,您將正面例子的重量增加了100倍,這幾乎相當於在訓練集中重複每個正面例子100次。這樣平均預測概率應該在50%左右。所以你不應該感到驚訝,它不在1%左右。

根據您提供的vw輸出,似乎在訓練集impressions_rand.aa中每個正數有100多個負面示例,因此「加權標籤總和」爲負數(否則應該大約爲0) 。因此,平均預測概率不是50%,而是大約36%。

+0

感謝您的答案,但不幸的是不加權正面的例子輸出不可靠的概率。我已經用訓練/測試輸出更新了這個問題,沒有加權。 – dukebody