2

我是新來的Vowpal Wabbit,它可能是我錯過了一些非常明顯的東西。使用Vowpal Wabbit獲取大量的NaN預測用於分類

我有CSV的培訓數據,我已經分爲80%的培訓和20%的測試。它包含62個特徵(x0-x61),總共定義了7個類(0-6)。

x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,y 
190436e528,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4376549941058605,5624b8f759,152af2cb2f,91bb549494,e33c63cf35,1178.0,cc69cbe29a,617a4ad3f9,e8a040423a,c82c3dbd33,ee3501282b,199ce7c484,5f17dedd5c,5c5025bd0a,9aba4d7f51,24.94393348850157,-0.8146595838365664,-0.7083080633874904,1.5,-0.5124221809900756,-0.7339666422629345,0.3333333333333333,14.837727864583336,11.0,0.0,24.0,0.0,0.0,1.0,29.0,0.0,3.0,11.0,4.42,0.15,0.161,0.2,1.0,1.0,1.0,1.0,1.0,0.52,0.5329999999999999,0.835,-0.5865396521883026,0.6724356815192951,0.0,0.6060606060606061,0.12121212121212124,0.21212121212121213,0.060606060606060615,0.0,33.0,3 
a4c3095b75,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4809765809625592,7e5c97705a,e071d01df5,91bb549494,e33c63cf35,5777.0,6e40247e69,617a4ad3f9,4b9480aa42,e84655292c,527b6ca8cc,dd9c9e0da2,17c99905b6,0fc56ea1f0,9aba4d7f51,31.08028213771883,-0.3717867728837517,-0.3676156090868885,1.6666666666666663,0.2713072335472944,0.013112469951855535,17.333333333333325,1713.439127604167,33.0,0.0,6.0,1.0,0.6666666666666665,8.0,108.0,1.0,4.0,86.0,1.58,0.05,2.032,2.4,0.348,0.762,0.55,0.392,0.489,0.517,1.0,0.642,0.9609909328437232,0.7909897767145201,0.020161290322580645,0.6451612903225806,0.25806451612903225,0.03629032258064517,0.04032258064516129,0.0,248.0,3 
aa2f3cd34a,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,-2.9847503675733384,f67f142e40,c28b638881,91bb549494,e33c63cf35,-1.0,fe8fb80553,617a4ad3f9,718c61545b,c26d08129a,cac4fc8eaf,199ce7c484,60299bc448,76ba8f7080,9aba4d7f51,41.40215922501433,-0.043850620710912905,-0.043227755140810106,3.5,0.19464028583619075,-0.2926973864217809,11.333333333333332,732.8046875,106.0,0.0,14.0,0.0,0.0,1.0,21.0,2.0,3.0,14.0,7.17,0.24,0.645,0.6,0.25,0.5,0.5,0.0,0.773,0.899,0.0,0.0,-0.0818699491854678,0.6639345368601952,0.0,0.0,0.0,0.0,1.0,0.0,1.0,4 
bfff7d2d9e,16a14a2d17,06330986ed,ca63304de0,a62168d626,1746600cb0,1,1,0.6542629283893542,7b1f0ca4c1,1d42d0c490,669ea3d319,b38690945d,1602.0,6e40247e69,617a4ad3f9,718c61545b,d3dc404c37,7263b01813,dd9c9e0da2,17c99905b6,2cc3e04172,9aba4d7f51,32.11392568242685,0.2843684594325347,0.23249501198439226,5.0,-0.19979368911718315,0.3743375351985674,1101.0,0.44580078125,16.0,0.16666666666666666,6.0,1.0,0.5,5.0,209.0,3.0,2.0,43.0,12.08,0.4,2.613,2.8,0.5,0.556,0.875,0.612,0.064,0.0,0.435,0.785,0.5158309700290646,-0.1150902907744278,0.05945945945945946,0.8,0.06486486486486487,0.045045045045045036,0.014414414414414416,0.016216216216216217,555.0,2 

我一直在使用phraug2/csv2vw.py轉換成CSV格式Vowpal的 轉換後的數據如下:

3.0 |n c1_b4d8a653ea c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:-0.6887062641683063 c10_7e5c97705a c11_e5df3eff9b c12_91bb549494 c13_e33c63cf35 14:3694.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_c26d08129a c19_634e3cf3ac c20_dd9c9e0da2 c21_17c99905b6 c22_513a3e3f36 c23_9aba4d7f51 24:40.57961189718329 25:-0.11269265451935975 26:-0.17219069579806134 27:1.1666666666666663 28:1.6745384722167482 29:0.6308894281294708 30:37.0 31:1.294921875 32:55.0 33:0.16666666666666666 34:10.0 37:1.0 38:9.0 40:1.0 41:23.0 42:3.67 43:0.12 44:1.935 45:2.2 46:0.625 47:0.25 48:0.125 50:0.813 51:0.07400000000000001 52:0.634 53:0.5479999999999999 54:0.2353332208066929 55:0.2649521447821752 57:0.3333333333333333 58:0.3333333333333333 59:0.3333333333333333 62:9.0 
5.0 |n c1_467f9617a3 c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_b7584c2d52 c6_1746600cb0 7:1 8:1 9:0.8708708626728477 c10_5624b8f759 c11_fa0b797a92 c12_669ea3d319 c13_f178803074 14:18156.0 c15_01ede04b4b c16_617a4ad3f9 c17_718c61545b c18_d342e2765f c19_bb20e1ca06 c20_8a6c8cef83 c21_1b02793146 c22_992153ed65 c23_9aba4d7f51 24:28.76550293196428 25:2.6122849082704658 26:2.1590908057403015 27:4.0 28:1.7107137612171608 29:1.7135384162978815 30:0.16666666666666666 31:0.027669270833333325 32:109.0 34:31.0 37:1.0 38:244.0 39:1.0 40:1.0 41:68.0 42:17.25 43:0.57 44:3.452 45:4.0 46:0.409 47:0.619 48:0.579 49:0.248 50:0.34600000000000003 51:0.541 52:0.522 54:1.782346041542782 55:1.3224094711633876 56:0.011647254575707157 57:0.39767054908485855 58:0.2396006655574044 59:0.2495840266222961 60:0.06821963394342763 61:0.033277870216306155 62:601.0 
4.0 |n 1:190436e528 c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_b7584c2d52 c6_1746600cb0 7:1 8:1 9:0.4376549941058605 c10_5624b8f759 c11_152af2cb2f c12_91bb549494 c13_e33c63cf35 14:1178.0 c15_cc69cbe29a c16_617a4ad3f9 c17_e8a040423a c18_c82c3dbd33 c19_ee3501282b c20_199ce7c484 c21_5f17dedd5c c22_5c5025bd0a c23_9aba4d7f51 24:24.94393348850157 25:-0.8146595838365664 26:-0.7083080633874904 27:1.5 28:-0.5124221809900756 29:-0.7339666422629345 30:0.3333333333333333 31:14.837727864583336 32:11.0 34:24.0 37:1.0 38:29.0 40:3.0 41:11.0 42:4.42 43:0.15 44:0.161 45:0.2 46:1.0 47:1.0 48:1.0 49:1.0 50:1.0 51:0.52 52:0.5329999999999999 53:0.835 54:-0.5865396521883026 55:0.6724356815192951 57:0.6060606060606061 58:0.12121212121212124 59:0.21212121212121213 60:0.060606060606060615 62:33.0 
5.0 |n c1_43859085bc c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:0.004439125538873309 c10_f67f142e40 c11_c4dd2197c3 c12_91bb549494 c13_e33c63cf35 14:14559.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_c26d08129a c19_9e166b965d c20_466f8951b0 c21_fde72a6d5c c22_acfadc5c01 c23_9aba4d7f51 24:41.57685954242976 25:-0.9078334231173404 26:-0.7617355673740658 27:0.5 28:-0.6275253732641191 29:-0.8058011722835874 30:1.1666666666666663 31:0.00439453125 33:0.5 37:7.0 38:7.0 40:3.0 41:15.0 42:8.92 43:0.29 44:0.226 45:0.8 51:1.0 54:-1.6003257882042399 55:-1.8386800640762528 57:1.0 62:1.0 
3.0 |n c1_bfff7d2d9e c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:0.6542629283893542 c10_7b1f0ca4c1 c11_1d42d0c490 c12_669ea3d319 c13_b38690945d 14:1602.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_d3dc404c37 c19_7263b01813 c20_dd9c9e0da2 c21_17c99905b6 c22_2cc3e04172 c23_9aba4d7f51 24:32.11392568242685 25:0.2843684594325347 26:0.23249501198439226 27:5.0 28:-0.19979368911718315 29:0.3743375351985674 30:1101.0 31:0.44580078125 32:16.0 33:0.16666666666666666 34:6.0 35:1.0 36:0.5 37:5.0 38:209.0 39:3.0 40:2.0 41:43.0 42:12.08 43:0.4 44:2.613 45:2.8 46:0.5 47:0.556 48:0.875 49:0.612 50:0.064 52:0.435 53:0.785 54:0.5158309700290646 55:-0.1150902907744278 56:0.05945945945945946 57:0.8 58:0.06486486486486487 59:0.045045045045045036 60:0.014414414414414416 61:0.016216216216216217 62:555.0 

然後我試圖做多分類,一個反對所有建設型號:

vw ./train_my.text -f predictor.vw --oaa 7 --passes 5 --cache_file cache 

不過,我得到很多,失去了NAN預測:

NAN prediction in example 21643, forcing 0.000000 
NAN prediction in example 21643, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 

,平均損失說,模型不能真正預測什麼

number of examples per pass = 36063 
passes used = 4 
weighted example sum = 144252.000000 
weighted label sum = 0.000000 
average loss = 0.801797 h 
total feature number = 7598612 

我在做什麼錯?

+0

嗨!你能爲每個課程提供至少幾個例子嗎?我得到和你一樣的錯誤。 –

+0

錯誤在示例#3中。按照慣例,功能1:190436e528應爲c1_190436e528。 –

+0

@xeon嗯..你認爲csv2vw腳本可以搞砸數據嗎?這裏是CSV文件的鏈接和轉換成VW格式的相同文件。 – intellion

回答

0

這是因爲在vowpal wabbit中使用變量和權重進行訓練。 (即)x1:12334234或x1:1e-30。如果你拿掉你的變量的權重或者擴展它們......這個問題就會消失。你也可能想要擴大變量的邏輯迴歸值。

+0

檢查對此錯誤的迴應:https://github.com/JohnLangford/vowpal_wabbit/issues/756 –

相關問題