如何使用`catboost`選擇nrounds？

如果我理解正確catboost，我們需要調整nrounds就像在xgboost，使用CV。我看到在official tutorial在文獻[8]如何使用`catboost`選擇nrounds？

params_with_od <- list(iterations = 500, 
         loss_function = 'Logloss', 
         train_dir = 'train_dir', 
         od_type = 'Iter', 
         od_wait = 30) 
model_with_od <- catboost.train(train_pool, test_pool, params_with_od)

這導致最佳iterations = 211

我的問題是下面的代碼：

難道糾正：這命令使用test_pool來選擇最好的iterations而不是使用交叉驗證？
如果是，catboost是否提供命令以從CV中選擇最佳iterations，或者我需要手動執行此操作？

來源

2017-09-12 Metariat

Catboost正在進行交叉驗證以確定最佳迭代次數。 train_pool和test_pool都是包含目標變量的數據集。在本教程早些時候他們寫

train_path = '../R-package/inst/extdata/adult_train.1000' 
test_path = '../R-package/inst/extdata/adult_test.1000' 

column_description_vector = rep('numeric', 15) 
cat_features <- c(3, 5, 7, 8, 9, 10, 11, 15) 
for (i in cat_features) 
    column_description_vector[i] <- 'factor' 

train <- read.table(train_path, head=F, sep="\t", colClasses=column_description_vector) 
test <- read.table(test_path, head=F, sep="\t", colClasses=column_description_vector) 
target <- c(1) 
train_pool <- catboost.from_data_frame(data=train[,-target], target=train[,target]) 
test_pool <- catboost.from_data_frame(data=test[,-target], target=test[,target])

當您執行catboost.train（train_pool，TEST_POOL，params_with_od）train_pool用於培訓和TEST_POOL用於通過交叉驗證，以確定迭代的最佳數量。

現在，你是對的混淆，因爲在稍後的教程，他們再次使用TEST_POOL和擬合模型作出的預測（model_best類似於model_with_od，但使用不同的過擬合檢測IncToDec）：

prediction_best <- catboost.predict(model_best, test_pool, type = 'Probability')

這可能是不好的做法。現在他們可能會用它的IncToDec過度配合檢測器逃避它 - 我不熟悉它背後的數學 - 但是對於Iter型過擬合檢測器，您需要單獨的火車，驗證和測試數據集（如果您想成爲在保存方面，對IncToDec過度配合檢測器執行相同操作）。然而，它只是一個教程，展示了這些功能，所以我不會對他們已經使用過的數據太迂腐。

這裏的過度擬合探測器的鏈接，更詳細一點： https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/

來源

2017-09-13 02:36:59 ftiaronsem

那麼確實是1倍交叉驗證？ – Metariat

是的，這是正確的 – ftiaronsem

使用插入符號交叉驗證。請關注In [12]的tutorial。

來源

2017-10-18 07:34:07 nikitxskv

請詳細說明一點，而不是隻涉及外部鏈接。 –

如何使用`catboost`選擇nrounds？

回答

相關問題