中的R

我試圖應用單個函數來找到（逐個分開，）所有數值列和目標變量之間的相關性中的R

這是我能夠代碼施加到相關的所有列確定單個列的相關性。我試圖限制上述0.4我的相關性：

> if(abs(cor(train$YearBuilt, train$SalePrice)) > .4) { 
+  print(abs(cor(train$YearBuilt, train$SalePrice))) 
+  } 
[1] 0.5228973

我想能夠打印的列名，隨後的相關性，然後下一列名稱及其關聯等

來源

2017-04-24 David Ghan

請提供[可重現的示例]（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example）。 –

爲什麼不使用'cor（train）'給出相關矩陣？然後你可以過濾大於0.4的值。編寫自己的函數對我來說沒有任何意義。 – Masoud

一種可能使用dplyr。對管道有一點放縱，但是你的過濾功能是0.4，並保留了變量名稱。

> train = select(iris, -Species) 
> head(train) 
    Sepal.Length Sepal.Width Petal.Length Petal.Width 
1   5.1   3.5   1.4   0.2 
2   4.9   3.0   1.4   0.2 
3   4.7   3.2   1.3   0.2 
4   4.6   3.1   1.5   0.2 
5   5.0   3.6   1.4   0.2 
6   5.4   3.9   1.7   0.4 
> train %>% 
+ summarize_all(funs(cor(., iris$Sepal.Length))) %>% 
+ t() %>% 
+ as.data.frame() %>% 
+ rownames_to_column("var") %>% 
+ rename(cors = V1) %>% 
+ filter(cors > 0.4) 
      var  cors 
1 Sepal.Length 1.0000000 
2 Petal.Length 0.8717538 
3 Petal.Width 0.8179411

來源

2017-04-24 03:07:50 ericgtaylor

下面是與其他數值變量發現虹膜$ Petal.Length的相關性的一個例子：

vars <- c("Sepal.Length", "Sepal.Width", "Petal.Width") 
all <- lapply(vars, function(i) list(x= iris[,i], y=iris[,"Petal.Length"])) 
lapply(all, function(x) do.call(cor, x)) 

[[1]] 
[1] 0.8717538 

[[2]] 
[1] -0.4284401 

[[3]] 
[1] 0.9628654

來源

2017-04-24 02:32:44

這不是OP想要的。 – Masoud

回答

相關問題