爲什麼我的梯度下降R迴歸失敗？

我已經調整了以下梯度下降算法，用於迴歸存儲在data [：，1]中的x變量中存儲在數據[：，4]中的y變量。然而，梯度下降似乎正在發散。我希望能夠幫助我確定哪裏出錯。爲什麼我的梯度下降R迴歸失敗？

#define the sum of squared residuals 
ssquares <- function(x) 
    { 
    t = 0 
    for(i in 1:200) 
     { 
     t <- t + (data[i,4] - x[1] - x[2]*data[i,1])^2 
     } 
    t/200 
    } 

# define the derivatives 
derivative <- function(x) 
    { 
    t1 = 0 
    for(i in 1:200) 
     { 
     t1 <- t1 - 2*(data[i,4] - x[1] - x[2]*data[i,1]) 
     } 
    t2 = 0 
    for(i in 1:200) 
     { 
     t2 <- t2 - 2*data[i,1]*(data[i,4] - x[1] - x[2]*data[i,1]) 
     } 
    c(t1/200,t2/200) 
    } 

# definition of the gradient descent method in 2D 
gradient_descent <- function(func, derv, start, step=0.05, tol=1e-8) { 
    pt1 <- start 
    grdnt <- derv(pt1) 
    pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2]) 
    while (abs(func(pt1)-func(pt2)) > tol) { 
    pt1 <- pt2 
    grdnt <- derv(pt1) 
    pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2]) 
    print(func(pt2)) # print progress 
    } 
    pt2 # return the last point 
} 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.05, # step size (alpha) 
    1e-8) # relative tolerance for one step 

# display a summary of the results 
print(result) # coordinate of fucntion minimum 
print(ssquares(result)) # response of function minimum

來源

2016-11-23 esperanto

你能分享你的數據嗎？ –

這是我正在使用的數據集http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv data < - read.csv（「Advertising.csv」）[， - 1] – esperanto

梯度下降的分歧通常表明學習速率太高，因此降低學習速率（α）以收斂。 –

您可以向量化的目標/梯度功能更快地實現，因爲你可以看到它實際上收斂於隨機生成的數據和係數相當接近（）中的R與LM獲得的：

ssquares <- function(x) { 
    n <- nrow(data) # 200 
    sum((data[,4] - cbind(1, data[,1]) %*% x)^2)/n 
} 

# define the derivatives 
derivative <- function(x) { 
    n <- nrow(data) # 200 
    c(sum(-2*(data[,4] - cbind(1, data[,1]) %*% x)), sum(-2*(data[,1])*(data[,4] - cbind(1, data[,1]) %*% x)))/n 
} 

set.seed(1) 
#data <- matrix(rnorm(800), nrow=200) 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.05, # step size (alpha) 
    1e-8) # relative tolerance for one step 

# [1] 2.511904 
# [1] 2.263448 
# [1] 2.061456 
# [1] 1.89721 
# [1] 1.763634 
# [1] 1.654984 
# [1] 1.566592 
# [1] 1.494668 
# ... 

# display a summary of the results 
print(result) # coefficients obtained with gradient descent 
#[1] -0.10248356 0.08068382 

lm(data[,4]~data[,1])$coef # coefficients from R lm() 
# (Intercept) data[, 1] 
# -0.10252181 0.08045722 

# use new dataset, this time it takes quite sometime to converge, but the 
# values GD converges to are pretty accurate as you can see from below. 
data <- read.csv('Advertising.csv') # with advertising data, removing the first rownames column 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.00001, # step size (alpha), decreasing the learning rate 
    1e-8) # relative tolerance for one step 

# ... 
# [1] 10.51364 
# [1] 10.51364 
# [1] 10.51364 

print(result) # coordinate of fucntion minimum 
[1] 6.97016852 0.04785365 

lm(data[,4]~data[,1])$coef 
(Intercept) data[, 1] 
7.03259355 0.04753664

來源

2016-11-23 10:54:11

謝謝你的提示，我做了它，但它仍然不同於我正在使用的特定數據集，我已經在上面附上。我試圖縮小數據，它似乎適用於一些尺度參數。但是我有沒有遵循一般規則？是否有一個「更多」的趨同優化程序？ – esperanto

您正在使用的訓練數據集所使用的學習速率（alpha或步長）非常高，因此它不會收斂。使用學習率0.00001，需要一段時間才能收斂，但最終會收斂。（也可能希望增加對收斂的一點容忍度，而不是像1e-8那樣低，以使收斂速度加快）。 –

@esperanto你不需要擴展你的數據。用廣告數據更新了代碼，它收斂的alpha值小得多，讓我知道你是否面臨更多問題。 –

爲什麼我的梯度下降R迴歸失敗？

回答

相關問題