選擇具有「選擇」在「dplyr」庫函數唯一值

是否有可能從dplyr庫使用select功能的data.frame列選擇所有獨特值？例如SQL表示法中的「SELECT DISTINCT field1 FROM table1」。選擇具有「選擇」在「dplyr」庫函數唯一值

謝謝！

來源

2014-08-29 Yenici

在dplyr 0.3這個功能可以使用distinct()方法可以輕鬆實現。

下面是一個例子：

distinct_df = df %>% distinct(field1)

你可以得到不同的值與向量：

distinct_vector = distinct_df$field1

您還可以在同一時間選擇列的一個子集在執行distinct()調用時，如果使用頭部/尾部/瞥見檢查數據幀，可以更清晰地查看：

distinct_df = df %>% distinct(field1) %>% select(field1) distinct_vector = distinct_df$field1

來源

2014-10-22 22:54:39

如果數據框架已經在R中，這可以工作，但如果您試圖通過數據庫連接直接在數據庫上執行查詢（即'src_postgres（）'），則它不起作用。它會報告：'錯誤：無法僅使用SQL對指定列進行獨立計算「 – djhocking 2015-01-15 16:28:28

請參閱此問題以瞭解如何連接src_postgres（）和dplyr http://stackoverflow.com/questions/21592266/i-cannot-connect- postgresql-schema-table-with-dplyr-package – 2015-03-08 14:43:58

+12

請注意，'distinct（）'工作方式在dplyr 0.5中已更改。默認情況下'distinct（）'現在只返回用作'distinct（）'的參數的列。如果你想保留其他列，你現在必須通過'.keep_all = TRUE'作爲'distinct（）' – RoyalTS 2016-07-30 15:37:48

dplyrselect函數從數據框中選擇特定的列。要返回特定數據列中的唯一值，可以使用group_by函數。例如：

library(dplyr) 

# Fake data 
set.seed(5) 
dat = data.frame(x=sample(1:10,100, replace=TRUE)) 

# Return the distinct values of x 
dat %>% 
    group_by(x) %>% 
    summarise() 

    x 
1 1 
2 2 
3 3 
4 4 
5 5 
6 6 
7 7 
8 8 
9 9 
10 10

如果要更改的列名，你可以添加以下內容：

dat %>% 
    group_by(x) %>% 
    summarise() %>% 
    select(unique.x=x)

這無論是從數據幀中的所有列是dplyr收益（和中選擇列x當然在這種情況下只有一列）並將其名稱更改爲unique.x。

您還可以使用unique(dat$x)直接在基地R中獲取唯一值。

如果你有多個變量，並希望出現在數據中的所有獨特的組合，可以如下概括上面的代碼：

set.seed(5) 
dat = data.frame(x=sample(1:10,100, replace=TRUE), 
       y=sample(letters[1:5], 100, replace=TRUE)) 

dat %>% 
    group_by(x,y) %>% 
    summarise() %>% 
    select(unique.x=x, unique.y=y)

來源

2014-08-29 15:47:33 eipi10

或者在dplyr 0使用新的不同的'（）'函數。3 – hadley 2014-09-01 15:04:45

只需添加到其他的答案，如果你寧願返回一個矢量，而不是一個數據幀，您有以下選擇：

dplyr < 0.7.0

封閉dplyr功能於一身的圓括號內，並$語法結合起來：

(mtcars %>% distinct(cyl))$cyl

dplyr> = 0.7.0

使用pull動詞：

mtcars %>% distinct(cyl) %>% pull()

來源

2016-10-20 10:57:19

選擇具有「選擇」在「dplyr」庫函數唯一值

回答

相關問題