2

當alpha參數接近零時,Tikhonov(壟)成本等於最小平方成本。 scikit-learn docs about the subject上的所有內容都表示相同。因此,我預計LinearRegression()和Ridge(alpha = 0)之間的區別

sklearn.linear_model.Ridge(alpha=1e-100).fit(data, target) 

等同於

sklearn.linear_model.LinearRegression().fit(data, target) 

但事實並非如此。爲什麼?

更新,代碼:

import pandas as pd 
from sklearn.linear_model import Ridge, LinearRegression 
from sklearn.preprocessing import PolynomialFeatures 
import matplotlib.pyplot as plt 
%matplotlib inline 

dataset = pd.read_csv('house_price_data.csv') 

X = dataset['sqft_living'].reshape(-1, 1) 
Y = dataset['price'].reshape(-1, 1) 

polyX = PolynomialFeatures(degree=15).fit_transform(X) 

model1 = LinearRegression().fit(polyX, Y) 
model2 = Ridge(alpha=1e-100).fit(polyX, Y) 

plt.plot(X, Y,'.', 
     X, model1.predict(polyX),'g-', 
     X, model2.predict(polyX),'r-') 

注:情節看起來同爲alpha=1e-8alpha=1e-100

enter image description here

回答

4

按照documentationalpha必須爲正浮點數。你的例子有alpha=0作爲一個整數。使用一個小的積極的alpha,結果RidgeLinearRegression似乎收斂。

from sklearn.linear_model import Ridge, LinearRegression 
data = [[0, 0], [1, 1], [2, 2]] 
target = [0, 1, 2] 

ridge_model = Ridge(alpha=1e-8).fit(data, target) 
print("RIDGE COEFS: " + str(ridge_model.coef_)) 
ols = LinearRegression().fit(data,target) 
print("OLS COEFS: " + str(ols.coef_)) 

# RIDGE COEFS: [ 0.49999999 0.50000001] 
# OLS COEFS: [ 0.5 0.5] 
# 
# VS. with alpha=0: 
# RIDGE COEFS: [ 1.57009246e-16 1.00000000e+00] 
# OLS COEFS: [ 0.5 0.5] 

UPDATEalpha=0int上面似乎只要是與像上面的例子中的幾個玩具問題一個問題的問題。

對於房屋數據,問題是縮放問題之一。您調用的15度多項式導致數值溢出。若要從LinearRegressionRidge產生相同的效果,建議先縮放數據:

import pandas as pd 
from sklearn.linear_model import Ridge, LinearRegression 
from sklearn.preprocessing import PolynomialFeatures, scale 

dataset = pd.read_csv('house_price_data.csv') 

# scale the X data to prevent numerical errors. 
X = scale(dataset['sqft_living'].reshape(-1, 1)) 
Y = dataset['price'].reshape(-1, 1) 

polyX = PolynomialFeatures(degree=15).fit_transform(X) 

model1 = LinearRegression().fit(polyX, Y) 
model2 = Ridge(alpha=0).fit(polyX, Y) 

print("OLS Coefs: " + str(model1.coef_[0])) 
print("Ridge Coefs: " + str(model2.coef_[0])) 

#OLS Coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 
# -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 
# -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 
# -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01] 
#Ridge Coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 
# -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 
# -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 
# -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01] 
+0

謝謝,但設置阿爾法到一個非常小的正浮沒有解決它。請參閱附加的代碼和它生成的繪圖。 – spacegoliath

+1

Scaling固定它,謝謝!有趣的是它如何溢出脊和不規則的最小二乘。當採用係數的標準時,可能會發生這種情況,當根據未處理的數據計算時,這些係數是巨大的。 – spacegoliath

相關問題