2016-11-18 49 views
0

如何在不破壞目標變量的情況下在插入符號中使用虛擬變量?插入符號虛擬變量不包括目標

set.seed(5) 
data <- ISLR::OJ 
data<-na.omit(data) 

dummies <- dummyVars(Purchase ~ ., data = data) 
data2 <- predict(dummies, newdata = data) 
split_factor = 0.5 
n_samples = nrow(data2) 
train_idx <- sample(seq_len(n_samples), size = floor(split_factor * n_samples)) 
train <- data2[train_idx, ] 
test <- data2[-train_idx, ] 
modelFit<- train(Purchase~ ., method='lda',preProcess=c('scale', 'center'), data=train) 

將失敗,因爲購買變量缺失。如果我將其替換爲data$Purchase <- ifelse(data$Purchase == "CH",1,0),則事前插入符號表示這不再是分類而是迴歸問題

+0

你可以做'數據2的class $購買< - 數據$購買'事後不能? – mtoto

+0

我試過了 - 但這似乎扭曲了矩陣的結果。是否有可能將插入的虛擬變量直接傳入火車?作爲管道? –

回答

3

至少示例代碼似乎在下面的註釋中指出了一些問題。要回答你的問題:

  • ifelse結果是一個整數向量,不是一個因素,所以火車功能默認爲迴歸使用火車
  • 直接傳遞的dummyVars的功能完成(X = ,Y =,...),而不是公式

爲了避免這些問題,檢查對象認真

set.seed(5) 
data <- ISLR::OJ 
data<-na.omit(data) 

# Make sure that all variables that should be a factor are defined as such 
newFactorIndex <- c("StoreID","SpecialCH","SpecialMM","STORE") 
data[, newFactorIndex] <- lapply(data[,newFactorIndex], factor) 

# See help for dummyVars. The function does not take a dependent variable and predict will give an error 
dummies <- dummyVars(~., data = data[,-1]) 
# The output of predict is a matrix, change it to data frame 
data2 <- data.frame(predict(dummies, newdata = data)) 

split_factor = 0.5 
n_samples = nrow(data2) 
train_idx <- sample(seq_len(n_samples), size = floor(split_factor * n_samples)) 

train <- data2[train_idx, ] 
test <- data2[-train_idx, ] 

# Option 1 (as asked): Specify independent and dependent variables separately 
modelFit<- train(y = data[train_idx, "Purchase"], x = data2[train_idx,], method='lda',preProcess=c('scale', 'center')) 

# Option 2: Append dependent variable to the independent variables (needs to be a data frame to allow factor and numeric) 
data2$Purchase <- data$Purchase[train_idx] 
modelFit<- train(Purchase ~., data = data2, method='lda',preProcess=c('scale', 'center'))