2015-06-20 112 views
1

我試圖用邏輯迴歸模型來適合我的數據,使用glmnet(用於套索)和caret(用於k-fold交叉驗證)。我嘗試了兩種不同的語法,但他們都拋出一個錯誤:邏輯迴歸與插入符號和glmnet在R

fitControl <- trainControl(method = "repeatedcv", 
         number = 10, 
         repeats = 3, 
         verboseIter = TRUE) 

# with response as a integer (0/1) 
fit_logistic <- train(response ~., 
        data = df_without, 
        method = "glmnet", 
        trControl = fitControl, 
        family = "binomial") 

Error in cut.default(y, breaks, include.lowest = TRUE) : 
invalid number of intervals 

df_without$response <- as.factor(df_without$response) 
# with response as a factor 
fit_logistic <- train(as.matrix(df_without[1:47]), df_without$response, 
       method = "glmnet", 
       trControl = fitControl, 
       family = "binomial") 

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : 
    NA/NaN/Inf in foreign function call (arg 5) 
In addition: Warning message: 
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : 
    NAs introduced by coercion 

我需要我的數據幀轉換爲矩陣或沒有?

我的響應變量是否需要一個因子或只是0/1整數?

帶有df_without數據幀的.Rdata文件爲here

sessionInfo()

R version 3.2.0 (2015-04-16) 
Platform: x86_64-apple-darwin13.4.0 (64-bit) 
Running under: OS X 10.10.1 (Yosemite) 

locale: 
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 

attached base packages: 
[1] parallel splines stats  graphics grDevices utils   datasets methods base  

other attached packages: 
[1] e1071_1.6-4  plyr_1.8.2  gbm_2.1.1  survival_2.38-1  glmnet_2.0-2 foreach_1.4.2 
[7] Matrix_1.2-0 caret_6.0-47 ggplot2_1.0.1 lattice_0.20-31  lubridate_1.3.3 RJDBC_0.2-5  
[13] rJava_0.9-6  DBI_0.3.1  

loaded via a namespace (and not attached): 
[1] Rcpp_0.11.6   compiler_3.2.0  nloptr_1.0.4   class_7.3-12  iterators_1.0.7  
[6] tools_3.2.0   digest_0.6.8  lme4_1.1-7    memoise_0.2.1  nlme_3.1-120  
[11] gtable_0.1.2  mgcv_1.8-6   brglm_0.5-9    SparseM_1.6   proto_0.3-10  
[16] BradleyTerry2_1.0-6 stringr_1.0.0  gtools_3.5.0   grid_3.2.0   nnet_7.3-9   
[21] minqa_1.2.4   reshape2_1.4.1  car_2.0-25    magrittr_1.5  scales_0.2.4  
[26] codetools_0.2-11 MASS_7.3-40   pbkrtest_0.4-2   colorspace_1.2-6 quantreg_5.11  
[31] stringi_0.4-1  munsell_0.4.2 

回答

0

的問題是,你有你的數據集的連續變量。 GLMNET需要有二元變量的因子。

如果您運行第一行代碼並選擇一些非連續變量,您將看到它按預期運行。

+0

當然,glmnet和任何其他迴歸一樣,都適用於連續變量。 –

1

我有同樣的問題,我使用函數model.matrix來修復我的分類變量的編碼。

嘗試此在glmnet X參數:

as.matrix(model.matrix(response ~ .)[, -1]) 

我除去截距列,因爲在glmnet默認的是包括截距。