mgcv的變量選擇

有沒有一種方法可以自動化R中的GAM變量選擇，類似於step？我已閱讀step.gam和selection.gam的文檔，但我還沒有看到有效的代碼的答案。另外，我試過method= "REML"和select = TRUE，但都沒有從模型中刪除無關緊要的變量。mgcv的變量選擇

我推測我可以創建一個步驟模型，然後使用這些變量來創建GAM，但這在計算上似乎並不高效。

實施例：

library(mgcv) 

set.seed(0) 
dat <- data.frame(rsp = rnorm(100, 0, 1), 
        pred1 = rnorm(100, 10, 1), 
        pred2 = rnorm(100, 0, 1), 
        pred3 = rnorm(100, 0, 1), 
        pred4 = rnorm(100, 0, 1)) 

model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4), 
      data = dat, method = "REML", select = TRUE) 

summary(model) 

#Family: gaussian 
#Link function: identity 

#Formula: 
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4) 

#Parametric coefficients: 
#   Estimate Std. Error t value Pr(>|t|) 
#(Intercept) 0.02267 0.08426 0.269 0.788 

#Approximate significance of smooth terms: 
#   edf Ref.df  F p-value 
#s(pred1) 0.8770  9 0.212 0.1174 
#s(pred2) 1.8613  9 0.638 0.0374 * 
#s(pred3) 0.5439  9 0.133 0.1406 
#s(pred4) 0.4504  9 0.091 0.1775 
--- 
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

#R-sq.(adj) = 0.0887 Deviance explained = 12.3% 
#-REML = 129.06 Scale est. = 0.70996 n = 100

來源

2016-07-25 IJH

在一些我使用其他數據集的，我有向上10個變量（其我認識到統計數據並不是那麼多），並且我想減少一些變量，而不會對預測能力造成很大損失。 – IJH

我投票結束這個問題作爲題外話，因爲不是關於編程，而是統計（模型選擇） –

馬拉和Wood（2011，計算統計信息和數據分析55; 2372年至2387年）比較用於GAMS特徵選擇的各種方法。他們得出的結論是，光滑度選擇程序中的額外處罰措施給出了最好的結果。這可以在mgcv通過使用select = TRUE參數/設置被激活:: GAM（），或任何以下變化的：

model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML") 
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"), 
      data = dat, method = "REML",select=T) 
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"), 
      data = dat, method = "REML") 
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")

來源

2016-07-25 18:38:05

'bs ='加入模型是做什麼的？ – IJH

mgcv的變量選擇

回答

相關問題