graphlab線性迴歸由於數值溢出錯誤而終止

我正嘗試使用graphlab創建線性迴歸模型。我有200個樣本和1個預測器。但是，我遇到了「數字溢出錯誤」，下面是輸出：graphlab線性迴歸由於數值溢出錯誤而終止

model_all = graphlab.linear_regression.create(data2.tail(200), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto') 
Linear regression: 
-------------------------------------------------------- 
Number of examples   : 200 
Number of features   : 1 
Number of unpacked features : 1 
Number of coefficients : 2 
Starting Newton Method 
-------------------------------------------------------- 
+-----------+----------+--------------+--------------------+---------------+ 
| Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse | 
+-----------+----------+--------------+--------------------+---------------+ 
+-----------+----------+--------------+--------------------+---------------+ 
TERMINATED: Terminated due to numerical overflow error. 
This model may not be ideal. To improve it, consider doing one of the following: 
(a) Increasing the regularization. 
(b) Standardizing the input data. 
(c) Removing highly correlated features. 
(d) Removing `inf` and `NaN` values in the training data

提示（二），（c）和（d），因爲只有1個功能且沒有INF並不適用於我的情況或NaN值。我嘗試了各種l2_penalty，但都沒有用。如果我將樣本數量限制在一個較小的數字上，如180，那麼它將起作用。

model_all = graphlab.linear_regression.create(data2.tail(180), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto') 
model_all.get("coefficients").print_rows(num_rows=100) 
Linear regression: 
-------------------------------------------------------- 
Number of examples   : 180 
Number of features   : 1 
Number of unpacked features : 1 
Number of coefficients : 2 
Starting Newton Method 
-------------------------------------------------------- 
+-----------+----------+--------------+--------------------+---------------+ 
| Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse | 
+-----------+----------+--------------+--------------------+---------------+ 
| 1   | 2  | 0.000866  | 9.873043   | 4.272624  | 
+-----------+----------+--------------+--------------------+---------------+ 
SUCCESS: Optimal solution found. 
+----------------+-------+------------------+-------------------+ 
|  name  | index |  value  |  stderr  | 
+----------------+-------+------------------+-------------------+ 
| (intercept) | None | 9.3412783539 | 3.80166353756 | 
| DOEDDIST.Index | None | 0.00226165438702 | 0.000975084975224 | 
+----------------+-------+------------------+-------------------+ 
[2 rows x 4 columns]

我不明白是什麼導致數值溢出錯誤。有人可以幫忙解釋嗎？

謝謝。

來源

2017-10-11 Pollyanna

如果解決這個任務是你所需要的，你總是可以選擇其他的求解器。爲了調試，你可能應該顯示數據，儘管你的觀察結果確實很奇怪。 – sascha

感謝您的回覆 – Pollyanna

我加倍檢查了我的數據，確實有NaN條目。我的錯。 data.dropna(axis = 'index',how = 'any',inplace=True)解決了它。

來源

2017-10-11 12:23:42 Pollyanna

graphlab線性迴歸由於數值溢出錯誤而終止

回答

相關問題