2017-09-13 51 views
0

我正在嘗試編寫一個函數,該函數可以通過在數據框中跨多個因素進行分組來生成描述性統計信息。我花了太多時間試圖讓功能識別我選擇的變量。在R中編寫一個函數,以便根據數據幀中的變量列進行分組

這裏是假的數據:

grouping1 <- c("red", "blue", "blue", "green", "red", "blue", "red", "green")     
grouping2 <- c("high", "high", "low", "medium", "low", "high", "medium", "high")     
value <- c(22,40,72,41,36,16,88,99) 

fake_df <- data.frame(grouping1, grouping2, value) 

假的代碼示例:

library(dplyr) 

by_group_fun <- function(fun.data.in, fun.grouping.factor){ 
    fake_df2 <- fun.data.in %>% 
    group_by(fun.grouping.factor) %>% 
    summarize(mean = mean(value), median = median(value)) 
    fake_df2 
} 
by_group_fun(fake_df, grouping1) 
by_group_fun(fake_df, grouping2) 

這給了我:

Error in grouped_df_impl(data, unname(vars), drop) : 
    Column `fun.grouping.factor` is unknown 

第二次嘗試

我試着將函數中選擇的變量賦值給一個新的變量並進行轉發。

假的代碼示例(第二次嘗試):

by_group_fun2 <- function(fun.data.in, fun.grouping.factor){ 
    fun.data.in$by_var <- fun.data.in$fun.grouping.factor 

    fake_df2 <- fun.data.in %>% 
    group_by(by_var) %>% 
    summarize(mean = mean(value), median = median(value)) 
    fake_df2 
} 

by_group_fun2(fake_df, grouping1) 
by_group_fun2(fake_df, grouping2) 

此,第二次嘗試,給了我:

Error in grouped_df_impl(data, unname(vars), drop) : 
    Column `by_var` is unknown 
+1

看到這個學習如何用'dplyr'編程:HTTPS: //cran.r-project.org/web/packages/dplyr/vignettes/programming.html – www

回答

0

用這個例子來指導你

myfun <- function(df, thesecols) { 
       require(dplyr) 
       thesecols <- enquo(thesecols) # need to quote 
       df %>% 
       group_by_at(vars(!!thesecols)) # !! unquotes 
     } 

myfun(fake_df, grouping1) 

輸出

# A tibble: 8 x 3 
# Groups: grouping1 [3] 
    grouping1 grouping2 value 
    <fctr> <fctr> <dbl> 
1  red  high 22 
2  blue  high 40 
3  blue  low 72 
4  green medium 41 
5  red  low 36 
6  blue  high 16 
7  red medium 88 
8  green  high 99 
2

一個非常簡單的方式來獲得相同的輸出,而不訴諸與dplyr編程是收集分組列以長形式。雙方分組產生的鍵和值列將得到所有你要求不動,超越單一data.frame組合:

library(tidyverse) 

fake_df <- data_frame(grouping1 = c("red", "blue", "blue", "green", "red", "blue", "red", "green"), 
         grouping2 = c("high", "high", "low", "medium", "low", "high", "medium", "high"), 
         value = c(22,40,72,41,36,16,88,99)) 

fake_df %>% 
    gather(group_var, group_val, -value) %>% 
    group_by(group_var, group_val) %>% 
    summarise(mean = mean(value), 
       median = median(value)) 
#> # A tibble: 6 x 4 
#> # Groups: group_var [?] 
#> group_var group_val  mean median 
#>  <chr>  <chr> <dbl> <dbl> 
#> 1 grouping1  blue 42.66667 40.0 
#> 2 grouping1  green 70.00000 70.0 
#> 3 grouping1  red 48.66667 36.0 
#> 4 grouping2  high 44.25000 31.0 
#> 5 grouping2  low 54.00000 54.0 
#> 6 grouping2 medium 64.50000 64.5 
相關問題