LinearRegression（）和Ridge（alpha = 0）之間的區別

當alpha參數接近零時，Tikhonov（壟）成本等於最小平方成本。 scikit-learn docs about the subject上的所有內容都表示相同。因此，我預計LinearRegression（）和Ridge（alpha = 0）之間的區別

sklearn.linear_model.Ridge(alpha=1e-100).fit(data, target)

等同於

sklearn.linear_model.LinearRegression().fit(data, target)

但事實並非如此。爲什麼？

更新，代碼：

import pandas as pd 
from sklearn.linear_model import Ridge, LinearRegression 
from sklearn.preprocessing import PolynomialFeatures 
import matplotlib.pyplot as plt 
%matplotlib inline 

dataset = pd.read_csv('house_price_data.csv') 

X = dataset['sqft_living'].reshape(-1, 1) 
Y = dataset['price'].reshape(-1, 1) 

polyX = PolynomialFeatures(degree=15).fit_transform(X) 

model1 = LinearRegression().fit(polyX, Y) 
model2 = Ridge(alpha=1e-100).fit(polyX, Y) 

plt.plot(X, Y,'.', 
     X, model1.predict(polyX),'g-', 
     X, model2.predict(polyX),'r-')

注：情節看起來同爲alpha=1e-8或alpha=1e-100

來源

2016-11-13 spacegoliath

按照documentation，alpha必須爲正浮點數。你的例子有alpha=0作爲一個整數。使用一個小的積極的alpha，結果Ridge和LinearRegression似乎收斂。

from sklearn.linear_model import Ridge, LinearRegression 
data = [[0, 0], [1, 1], [2, 2]] 
target = [0, 1, 2] 

ridge_model = Ridge(alpha=1e-8).fit(data, target) 
print("RIDGE COEFS: " + str(ridge_model.coef_)) 
ols = LinearRegression().fit(data,target) 
print("OLS COEFS: " + str(ols.coef_)) 

# RIDGE COEFS: [ 0.49999999 0.50000001] 
# OLS COEFS: [ 0.5 0.5] 
# 
# VS. with alpha=0: 
# RIDGE COEFS: [ 1.57009246e-16 1.00000000e+00] 
# OLS COEFS: [ 0.5 0.5]

UPDATE 與alpha=0爲int上面似乎只要是與像上面的例子中的幾個玩具問題一個問題的問題。

對於房屋數據，問題是縮放問題之一。您調用的15度多項式導致數值溢出。若要從LinearRegression和Ridge產生相同的效果，建議先縮放數據：

import pandas as pd 
from sklearn.linear_model import Ridge, LinearRegression 
from sklearn.preprocessing import PolynomialFeatures, scale 

dataset = pd.read_csv('house_price_data.csv') 

# scale the X data to prevent numerical errors. 
X = scale(dataset['sqft_living'].reshape(-1, 1)) 
Y = dataset['price'].reshape(-1, 1) 

polyX = PolynomialFeatures(degree=15).fit_transform(X) 

model1 = LinearRegression().fit(polyX, Y) 
model2 = Ridge(alpha=0).fit(polyX, Y) 

print("OLS Coefs: " + str(model1.coef_[0])) 
print("Ridge Coefs: " + str(model2.coef_[0])) 

#OLS Coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 
# -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 
# -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 
# -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01] 
#Ridge Coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 
# -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 
# -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 
# -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01]

來源

2016-11-13 06:17:56

謝謝，但設置阿爾法到一個非常小的正浮沒有解決它。請參閱附加的代碼和它生成的繪圖。 – spacegoliath

Scaling固定它，謝謝！有趣的是它如何溢出脊和不規則的最小二乘。當採用係數的標準時，可能會發生這種情況，當根據未處理的數據計算時，這些係數是巨大的。 – spacegoliath

LinearRegression（）和Ridge（alpha = 0）之間的區別

回答

相關問題