2017-10-20 54 views
0

一些數據:一次將一個函數若干列有發生變異

x <- structure(list(X. = c("4,084", "4,084", "4,084", "4,084", "4,084" 
), ADR = c("1,099.69", "68.66", "232.72", "195.66", "98"), hotel_id = c("2,313,076", 
"583,666", "1,251,372", "1,545,890", "298,160"), city_id = c("9,395", 
"17,193", "5,085", "16,808", "8,584"), star_rating = c(5, 2, 
3, 4, 4), accommodation_type_name = c("Hotel", "Bungalow", "Hotel", 
"Hotel", "Hotel"), chain_hotel = c("chain", "non-chain", "non-chain", 
"non-chain", "non-chain"), booking_date = c("10/5/2016", "12/4/2016", 
"11/6/2016", "10/22/2016", "12/11/2016"), checkin_date = c("10/27/2016", 
"12/9/2016", "11/18/2016", "11/3/2016", "12/11/2016"), checkout_date = c("10/30/2016", 
"12/12/2016", "11/20/2016", "11/4/2016", "12/12/2016"), city = c("A", 
"B", "C", "D", "E")), class = "data.frame", row.names = c(NA, 
-5L), .Names = c("X.", "ADR", "hotel_id", "city_id", "star_rating", 
"accommodation_type_name", "chain_hotel", "booking_date", "checkin_date", 
"checkout_date", "city")) 

是這樣的:

> glimpse(x) 
Observations: 27,298 
Variables: 11 
$ X.      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"... 
$ ADR      <chr> "71.06", "76.56", "153.88", "126.6", "115.08", "81.6", "77.16", "168.36",... 
$ hotel_id    <chr> "297,388", "298,322", "2,313,076", "2,240,838", "2,240,838", "331,350", "... 
$ city_id     <chr> "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "... 
$ star_rating    <dbl> 2.5, 3.0, 5.0, 3.5, 3.5, 3.0, 3.0, 5.0, 2.0, 3.0, 4.0, 2.0, 3.0, 2.0, 3.0... 
$ accommadation_type_name <chr> "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "... 
$ chain_hotel    <chr> "non-chain", "non-chain", "chain", "non-chain", "non-chain", "non-chain",... 
$ booking_date   <chr> "8/2/2016", "8/2/2016", "8/2/2016", "8/4/2016", "8/4/2016", "8/4/2016", "... 
$ checkin_date   <chr> "10/1/2016", "10/1/2016", "10/1/2016", "10/2/2016", "10/2/2016", "10/3/20... 
$ checkout_date   <chr> "10/2/2016", "10/2/2016", "10/2/2016", "10/3/2016", "10/3/2016", "10/5/20... 
$ city     <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"... 

我wouldlike變異列ADR:STAR_RATING。具體而言,我想gsub出任何逗號。

我想:

x <- x %>% 
    mutate_each(ADR:star_rating, funs(gsub ",", "")) 

但是,這將引發一個錯誤:

Error: unexpected string constant in: 
"x <- x %>% 
    mutate_each(ADR:star_rating, funs(gsub ","" 

在基礎R我可以d這樣的:

vars <- c("ADR", "hotel_id", "city_id", "star_rating") 
x[vars] <- lapply(x[vars], function(i) gsub(",", "", i)) 

但是,如果我能內做到這一點一個dplyr鏈,它會很方便,並且意味着我不需要像聲明變量時那樣寫出每個變量,我可以使用ADR:STAR_RATING。

我該如何在dplyr中實現mutate?

回答

1

我認爲它快到了。我用mutate_at(我認爲mutate_each是不建議使用)和包括內vars變量名:

library(dplyr) 
x %>% mutate_at(vars(ADR:star_rating), funs(stringr::str_replace_all(., ",", ""))) 
#>  X.  ADR hotel_id city_id star_rating accommodation_type_name 
#> 1 4,084 1099.69 2313076 9395   5     Hotel 
#> 2 4,084 68.66 583666 17193   2    Bungalow 
#> 3 4,084 232.72 1251372 5085   3     Hotel 
#> 4 4,084 195.66 1545890 16808   4     Hotel 
#> 5 4,084  98 298160 8584   4     Hotel 
#> chain_hotel booking_date checkin_date checkout_date city 
#> 1  chain 10/5/2016 10/27/2016 10/30/2016 A 
#> 2 non-chain 12/4/2016 12/9/2016 12/12/2016 B 
#> 3 non-chain 11/6/2016 11/18/2016 11/20/2016 C 
#> 4 non-chain 10/22/2016 11/3/2016  11/4/2016 D 
#> 5 non-chain 12/11/2016 12/11/2016 12/12/2016 E 
+0

啊,所以我要窩內'''瓦爾列選擇()'''?好的,謝謝!很快接受 –

+0

爲了完整理解,假設我想改變ADR,hotel_id,跳過city_id並轉到star_rating。因此,如果序列不再是ADR:star_rating,而是c(ADR,hotel_id,star_rating)。你會知道在上述方法中如何做到這一點? –

+1

@DougFir,我很高興它幫助!據我瞭解,在'vars'裏面,你可以使用與'dplyr :: select'內部相同的東西。也就是說,'vars(ADR,hotel_id,star_rating)'將起作用。也可能值得查看一下'?dplyr :: select_helpers'以獲取其他方法來選擇要變異的變量。 – markdly

相關問題