2017-11-11 130 views
0

使用插入程序包時,我無法使用以下用戶定義的彙總函數工作。它應該計算logloss,但我一直得到沒有找到logloss。下面,重複的例子:用戶自定義總結插入符號中的函數

data <- data.frame('target' = sample(c('Y','N'),100,replace = T), 'X1' = runif(100), 'X2' = runif(100)) 

log.loss2 <- function(data, lev = NULL, model = NULL) { 
    logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) 
    names(logloss) <- c('LL') 
    logloss 
} 

fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2) 

my.grid <- expand.grid(.decay = c(0.05), .size = c(2)) 

fit.nnet2 <- train(target ~., data = data, 
        method = "nnet", maxit = 500, metric = 'LL', 
        tuneGrid = my.grid, verbose = T) 

回答

1

錯誤是由於您未包括在調用訓練trControl = fitControl的事實。然而,將帶你到另一個錯誤是由於data$obsdata$pred的因素 - 一個需要轉換爲數值賦予12,減去1給出所需01

log.loss2 <- function(data, lev = NULL, model = NULL) { 
    data$pred <- as.numeric(data$pred)-1 
    data$obs <- as.numeric(data$obs)-1 
    logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) 
    names(logloss) <- c('LL') 
    logloss 
} 

fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2) 

fit.nnet2 <- train(target ~., data = data, 
        method = "nnet", maxit = 500, metric = "LL" , 
        tuneGrid = my.grid, verbose = T, trControl = fitControl) 
#output 
Neural Network 

100 samples 
    2 predictor 
    2 classes: 'N', 'Y' 

No pre-processing 
Resampling: Cross-Validated (1 fold) 
Summary of sample sizes: 0 
Resampling results: 

    LL  
    0.6931472 

Tuning parameter 'size' was held constant at a value of 2 
Tuning parameter 'decay' was held constant at a value of 0.05 

幾件事情要注意:

此損失函數僅適用於包含N/Y作爲類的數據,因爲概率定義爲data$Y,更好的方法是找到類的名稱並使用它。此外,其自log(0)截斷概率值的良好做法並不是一個好主意:

LogLoss <- function (data, lev = NULL, model = NULL) 
    { 
    obs <- data[, "obs"] 
    cls <- levels(obs) #find class names 
    probs <- data[, cls[2]] #use second class name 
    probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability 
    logPreds <- log(probs)   
    log1Preds <- log(1 - probs) 
    real <- (as.numeric(data$obs) - 1) 
    out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1 
    names(out) <- c("LogLoss") 
    out 
    } 
+0

這是完美的!非常感謝你,我遇到了兩個錯誤,所以感謝你注意到後續問題 – dleal

+0

歡迎你。檢查編輯其他注意事項。 – missuse