R- findCorrelation（）（尖封裝）設置時精確=真

按照findCorrelation() document我運行官方實施例的細節混淆如下所示：R- findCorrelation（）（尖封裝）設置時精確=真

代碼：

library(caret) 

R1 <- structure(c(1, 0.86, 0.56, 0.32, 0.85, 0.86, 1, 0.01, 0.74, 0.32, 
        0.56, 0.01, 1, 0.65, 0.91, 0.32, 0.74, 0.65, 1, 0.36, 
        0.85, 0.32, 0.91, 0.36, 1), 
       .Dim = c(5L, 5L)) 


colnames(R1) <- rownames(R1) <- paste0("x", 1:ncol(R1)) 

findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE 
       ,verbose = TRUE)

結果：

> findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE, verbose = TRUE) 
## Compare row 1 and column 5 with corr 0.85 
## Means: 0.648 vs 0.545 so flagging column 1 
## Compare row 5 and column 3 with corr 0.91 
## Means: 0.53 vs 0.49 so flagging column 5 
## Compare row 3 and column 4 with corr 0.65 
## Means: 0.33 vs 0.352 so flagging column 4 
## All correlations <= 0.6 
## [1] "x1" "x5" "x4"

我不知道計算過程如何工作，我。即爲什麼首先比較row 1和column 5，以及如何計算平均值，即使在我閱讀the source file後。

我希望有人能夠在我的例子的幫助下解釋算法。

來源

2017-11-17 Jack

首先，它確定每個變量的平均絕對相關性。列x1和x5的平均值最高（分別爲mean(c(0.85, 0.56, 0.32, 0.86))和mean(c(0.85, 0.9, 0.36, 0.32))），所以它看起來在第一步中刪除了其中的一個。它發現x1是全球最具攻擊性的，因此將其刪除。

之後，它使用相同的過程重新計算並比較x5和。

由於所有成對相關性均低於您的閾值，因此在刪除三列後停止。

來源

2017-11-20 14:36:48 topepo

R- findCorrelation（）（尖封裝）設置時精確=真

回答

相關問題