2016-11-23 102 views
2

我已經調整了以下梯度下降算法,用於迴歸存儲在data [:,1]中的x變量中存儲在數據[:,4]中的y變量。然而,梯度下降似乎正在發散。我希望能夠幫助我確定哪裏出錯。爲什麼我的梯度下降R迴歸失敗?

#define the sum of squared residuals 
ssquares <- function(x) 
    { 
    t = 0 
    for(i in 1:200) 
     { 
     t <- t + (data[i,4] - x[1] - x[2]*data[i,1])^2 
     } 
    t/200 
    } 

# define the derivatives 
derivative <- function(x) 
    { 
    t1 = 0 
    for(i in 1:200) 
     { 
     t1 <- t1 - 2*(data[i,4] - x[1] - x[2]*data[i,1]) 
     } 
    t2 = 0 
    for(i in 1:200) 
     { 
     t2 <- t2 - 2*data[i,1]*(data[i,4] - x[1] - x[2]*data[i,1]) 
     } 
    c(t1/200,t2/200) 
    } 

# definition of the gradient descent method in 2D 
gradient_descent <- function(func, derv, start, step=0.05, tol=1e-8) { 
    pt1 <- start 
    grdnt <- derv(pt1) 
    pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2]) 
    while (abs(func(pt1)-func(pt2)) > tol) { 
    pt1 <- pt2 
    grdnt <- derv(pt1) 
    pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2]) 
    print(func(pt2)) # print progress 
    } 
    pt2 # return the last point 
} 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.05, # step size (alpha) 
    1e-8) # relative tolerance for one step 

# display a summary of the results 
print(result) # coordinate of fucntion minimum 
print(ssquares(result)) # response of function minimum 
+0

你能分享你的數據嗎? –

+0

這是我正在使用的數據集http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv data < - read.csv(「Advertising.csv」)[, - 1] – esperanto

+0

梯度下降的分歧通常表明學習速率太高,因此降低學習速率(α)以收斂。 –

回答

1

您可以向量化的目標/梯度功能更快地實現,因爲你可以看到它實際上收斂於隨機生成的數據和係數相當接近()中的R與LM獲得的:

ssquares <- function(x) { 
    n <- nrow(data) # 200 
    sum((data[,4] - cbind(1, data[,1]) %*% x)^2)/n 
} 

# define the derivatives 
derivative <- function(x) { 
    n <- nrow(data) # 200 
    c(sum(-2*(data[,4] - cbind(1, data[,1]) %*% x)), sum(-2*(data[,1])*(data[,4] - cbind(1, data[,1]) %*% x)))/n 
} 

set.seed(1) 
#data <- matrix(rnorm(800), nrow=200) 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.05, # step size (alpha) 
    1e-8) # relative tolerance for one step 

# [1] 2.511904 
# [1] 2.263448 
# [1] 2.061456 
# [1] 1.89721 
# [1] 1.763634 
# [1] 1.654984 
# [1] 1.566592 
# [1] 1.494668 
# ... 

# display a summary of the results 
print(result) # coefficients obtained with gradient descent 
#[1] -0.10248356 0.08068382 

lm(data[,4]~data[,1])$coef # coefficients from R lm() 
# (Intercept) data[, 1] 
# -0.10252181 0.08045722 

# use new dataset, this time it takes quite sometime to converge, but the 
# values GD converges to are pretty accurate as you can see from below. 
data <- read.csv('Advertising.csv') # with advertising data, removing the first rownames column 

# locate the minimum of the function using the Gradient Descent method 
result <- gradient_descent(
    ssquares, # the function to optimize 
    derivative, # the gradient of the function 
    c(1,1), # start point of theplot_loss(simple_ex) search 
    0.00001, # step size (alpha), decreasing the learning rate 
    1e-8) # relative tolerance for one step 

# ... 
# [1] 10.51364 
# [1] 10.51364 
# [1] 10.51364 

print(result) # coordinate of fucntion minimum 
[1] 6.97016852 0.04785365 

lm(data[,4]~data[,1])$coef 
(Intercept) data[, 1] 
7.03259355 0.04753664 
+0

謝謝你的提示,我做了它,但它仍然不同於我正在使用的特定數據集,我已經在上面附上。我試圖縮小數據,它似乎適用於一些尺度參數。但是我有沒有遵循一般規則?是否有一個「更多」的趨同優化程序? – esperanto

+0

您正在使用的訓練數據集所使用的學習速率(alpha或步長)非常高,因此它不會收斂。使用學習率0.00001,需要一段時間才能收斂,但最終會收斂。 (也可能希望增加對收斂的一點容忍度,而不是像1e-8那樣低,以使收斂速度加快)。 –

+0

@esperanto你不需要擴展你的數據。用廣告數據更新了代碼,它收斂的alpha值小得多,讓我知道你是否面臨更多問題。 –