如何在線性迴歸中強制零截取？

如果這個問題已經得到解答，我已經對此表示歉意了，我已經看過了，並且找不到具體的內容。如何在線性迴歸中強制零截取？

我有一些形式的或多或少的線性數據

x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0] 
y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54.009003143831116, 102.7077685684846, 185.72880217806673, 256.12183145545811, 301.97120103079675]

我使用scipy.optimize.leastsq，以適應線性迴歸到這一點：

def lin_fit(x, y): 
    '''Fits a linear fit of the form mx+b to the data''' 
    fitfunc = lambda params, x: params[0] * x + params[1] #create fitting function of form mx+b 
    errfunc = lambda p, x, y: fitfunc(p, x) - y    #create error function for least squares fit 

    init_a = 0.5       #find initial value for a (gradient) 
    init_b = min(y)       #find initial value for b (y axis intersection) 
    init_p = numpy.array((init_a, init_b)) #bundle initial values in initial parameters 

    #calculate best fitting parameters (i.e. m and b) using the error function 
    p1, success = scipy.optimize.leastsq(errfunc, init_p.copy(), args = (x, y)) 
    f = fitfunc(p1, x)   #create a fit with those parameters 
    return p1, f

它精美的作品（雖然我不是當然，如果scipy.optimize在這裏使用是正確的，那麼它可能會超過頂端？）。

但是，由於數據點的方式，它不會給我一個0的Y軸截取。我知道在這種情況下它必須爲零，if x = 0 than y = 0。

有什麼辦法可以強制這個嗎？

來源

2012-04-03 Kyra Tafar

如果你知道你的截距是0，你爲什麼把它作爲你的函數自由參數，以適應？你可以將'b'作爲一個自由參數嗎？ – Jdog 2012-04-03 09:51:09

啊。是。當然！我很抱歉，這是一個非常明顯的答案。有時我沒有看到樹木： - /這工作正常。非常感謝你指出我！ – 2012-04-03 10:51:27

我只看到答案中的數據圖。與這個問題無關，你應該嘗試二階多項式來擬合。通常可以說，如果按照錯誤的順序，截取值爲空，我認爲在拋物線擬閤中，你會得到它。 – chuse 2014-02-14 15:11:15

我不擅長這些模塊，但我有一些統計經驗，所以這裏是我所看到的。你需要從

fitfunc = lambda params, x: params[0] * x + params[1]

改變你的擬合函數：

fitfunc = lambda params, x: params[0] * x

另外刪除行：

init_b = min(y)

下一行更改爲：

init_p = numpy.array((init_a))

這應該擺脫第二個參數即產生y軸截距並將擬合線穿過原點。在你的其他代碼中，你可能需要做一些小的改動。

但是，我不確定這個模塊是否能正常工作，如果你只是像這樣拔掉第二個參數。它取決於模塊的內部工作是否可以接受這種修改。例如，我不知道參數列表params正在被初始化，所以我不知道這樣做是否會改變它的長度。作爲一個旁白，既然你提到過，我實際上認爲這只是一種優化坡度的過度方式。您可以稍微閱讀線性迴歸，然後編寫一些小代碼，在經過一些包絡計算後自行完成。這非常簡單直接，真的。事實上，我只是做了一些計算，我猜想最優化的斜率將只是<xy>/<x^2>，即x * y乘積的平均值除以x^2的平均值。

來源

2012-04-03 09:49:19

謝謝，這正是我需要做的。 :) – 2012-04-03 10:52:30

事實上，正如Abhranil在最後寫的那樣，「y = a * x」的最小二乘擬合的恰當解決方案就是「a = x.dot（y）/x.dot（x）」。 – divenex 2014-11-16 23:14:54

由於@AbhranilDas提到，只需使用線性方法。不需要像scipy.optimize.lstsq這樣的非線性求解器。

通常情況下，您會使用numpy.polyfit來爲您的數據添加一行，但在這種情況下，您需要直接使用numpy.linalg.lstsq，因爲您要將截距設置爲零。

作爲一個簡單的例子：

import numpy as np 
import matplotlib.pyplot as plt 

x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 
       20.0, 40.0, 60.0, 80.0]) 

y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 
       3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 
       11.073788414382639, 23.248479770546009, 32.120462301367183, 
       44.036117671229206, 54.009003143831116, 102.7077685684846, 
       185.72880217806673, 256.12183145545811, 301.97120103079675]) 

# Our model is y = a * x, so things are quite simple, in this case... 
# x needs to be a column vector instead of a 1D vector for this, however. 
x = x[:,np.newaxis] 
a, _, _, _ = np.linalg.lstsq(x, y) 

plt.plot(x, y, 'bo') 
plt.plot(x, a*x, 'r-') 
plt.show()

enter image description here

來源

2012-04-03 13:38:49

謝謝。這是我正在尋找的答案。我發現了另一個例子，說明如何使用linalg.lstsq'來攔截我的整體理解。爲此，用'x = np.vstack（[x，np.ones（len（x））]]替換'x = x [：，np.newaxis]'。 – Snorfalorpagus 2015-08-24 14:25:16

如何在線性迴歸中強制零截取？

回答

相關問題