2017-02-19 115 views
0

鑑於這一數據選擇多列變異新列:dplyr:基於由變量字符串

df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1)) 

我想創建一個使用dplyr選定列的橫行最小值一個額外的列min。這很容易使用的列名:

df <- df %>% rowwise() %>% mutate(min = min(x2,x5)) 

但我有一個大的DF具有不同的列名,所以我需要從價值觀mycols的一些字符串匹配。現在其他線程告訴我使用選擇幫助函數,但我必須缺少一些東西。下面是matches

mycols <- c("x2","x5") 
df <- df %>% rowwise() %>% 
    mutate(min = min(select(matches(mycols)))) 
Error: is.string(match) is not TRUE 

而且one_of

mycols <- c("x2","x5") 
df <- df %>% 
rowwise() %>% 
mutate(min = min(select(one_of(mycols)))) 
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')" 
In addition: Warning message: 
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5` 

我是什麼俯瞰? select_應該工作嗎?它不會在以下幾點:

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(mycols))) 
Error: no applicable method for 'select_' applied to an object of class "character" 

而且同樣:

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(matches(mycols)))) 
Error: is.string(match) is not TRUE 
+0

您需要使用dplyr動詞的SE版本當使用字符串。在這種情況下,使用'select _()' –

+0

不能正常工作,因爲我預計它可以工作:'df <- df %>% rowwise()%>% mutate(min = min(select_(mycols)))'yield「Error :沒有將'select_'應用於類「字符」類的對象的適用方法「 – strangeloop

+0

由於它將字符串(正則表達式)作爲參數而不是字符串向量,因此會出現'matches'錯誤。 – cderv

回答

1

這是一個有點棘手。在SE評估的情況下,您需要將該操作作爲字符串傳遞。

mycols <- '(x2,x5)' 
f <- paste0('min',mycols) 
df %>% rowwise() %>% mutate_(min = f) 
df 
# A tibble: 10 × 6 
#  x1 x2 x3 x4 x5 min 
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1  2  3  0  1  1  1 
#2  0  2  1  0  1  1 
#3  0 NA  0 NA NA NA 
#4  NA  5  1  3  1  1 
#5  0  3  3  0  3  3 
#6  1  2  0  0  4  2 
#7  1 NA NA NA NA NA 
#8  NA NA NA  0  3 NA 
#9  0  4  0  0  3  3 
#10  1  5  1  1  1  1 
+1

謝謝!現在,我想要最低的非NA值,所以我需要稍微調整一下這個代碼。看起來從'min'變爲'pmin(na.rm = T)'工作(將na.rm = T加到'min()似乎不起作用): 'f < - paste0('pmin (',mycols,',na.rm = T)')' 'df <- df %>%rowwise()%>%mutate_(min = f)' – strangeloop

3

這裏的另一種解決方案有點技術與purrr包從設計的函數式編程的tidyverse幫助。

Fist,matchesdplyr的助手將正則表達式字符串作爲參數,而不是向量。找到匹配所有列的正則表達式是一種很好的方法。 當你理解functionnal編程的基本計劃(代碼下,你可以使用你希望dplyr選擇助手)

然後,purrr功能的偉大工程與dplyr

解決問題的方法:


df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1)) 


# regex to get only x2 and x5 column 
mycols <- "x[25]" 

library(dplyr) 

df %>% 
    mutate(min_x2_x5 = 
      # select columns that you want in df 
      select(., matches(mycols)) %>% 
      # use pmap on this subset to get a vector of min from each row. 
      # dataframe is a list so pmap works on each element of the list that is to say each row 
      purrr::pmap_dbl(min) 
     ) 
#> x1 x2 x3 x4 x5 min_x2_x5 
#> 1 2 3 0 1 1   1 
#> 2 0 2 1 0 1   1 
#> 3 0 NA 0 NA NA  NA 
#> 4 NA 5 1 3 1   1 
#> 5 0 3 3 0 3   3 
#> 6 1 2 0 0 4   2 
#> 7 1 NA NA NA NA  NA 
#> 8 NA NA NA 0 3  NA 
#> 9 0 4 0 0 3   3 
#> 10 1 5 1 1 1   1 

我不會進一步解釋有關purrr在這裏,但它工作正常,你的情況