2016-03-28 78 views
0

假設以下數據只是我正在使用的非常大的數據的一部分。只有當數據幀列中的值與其他兩個列值相匹配時,才替換其中的值

mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin')) 

Date   Expense  Details  Vehicle Person 
1 2015-01-01 1566  item101 xsda Car  John 
2 2015-01-10 5646  fuel asa  Bike  Smith 
3 2015-01-27 3456  item102a  Car  Robin 
4 2015-02-27 6546  fuel asa  Car  <NA> 
5 2015-03-15 5313  fuel sda  Bike  <NA> 
6 2015-04-17 6466  fuel   Bike  <NA> 
7 2015-04-18 5456  item102a  Bike  Robin 

有兩點需要考慮

1)當車輛的車「使用的是和「燃料」被購買了約翰

2人),當車輛「自行車」是購買二手和「燃料」,那麼這個人是史密斯

我期望的輸出是

 Date  Expense Details  Vehicle Person 
1 2015-01-01 1566 item101 xsda  Car  John 
2 2015-01-10 5646 fuel    Bike  Smith 
3 2015-01-27 3456 item102a   Car  Robin 
4 2015-02-27 6546 fuel    Car  John 
5 2015-03-15 5313 fuel sda   Bike  Smith 
6 2015-04-17 6466 fuel    Bike  Smith 
7 2015-04-18 5456 item102a   Bike  Robin 

請告訴我如何解決這個問題? 我用下面的步驟和對解決方案

mydf$Details<-as.character(mydf$Details) 
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel' 

是myDF

Date  Expense  Details  Vehicle Person 
1 2015-01-01 1566  item101 xsda Car  John 
2 2015-01-10 5646  Fuel   Bike  Smith 
3 2015-01-27 3456  item102a  Car  Robin 
4 2015-02-27 6546  Fuel   Car  <NA> 
5 2015-03-15 5313  Fuel   Bike  <NA> 
6 2015-04-17 6466  Fuel   Bike  <NA> 
7 2015-04-18 5456  item102a  Bike  Robin 

注達到了一半:如果可能的話,請避免環路。 如果有更好更快的這樣做的方法,請分享

回答

1

你一半了,你說 嘗試這兩條線:使用data.table

mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car'] <- 'John' 
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike'] <- 'Smith' 
1

你可以在幾行做:

library(data.table) 

setDT(mydf) 

mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"] 
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"] 

mydf 
#>   Date Expense  Details Vehicle Person 
#> 1: 2015-01-01 1566 item101 xsda  Car John 
#> 2: 2015-01-10 5646  fuel asa Bike Smith 
#> 3: 2015-01-27 3456  item102a  Car Robin 
#> 4: 2015-02-27 6546  fuel asa  Car John 
#> 5: 2015-03-15 5313  fuel sda Bike Smith 
#> 6: 2015-04-17 6466   fuel Bike Smith 
#> 7: 2015-04-18 5456  item102a Bike Robin 

使用dplyr,你也可以做條件變異,但代碼更長。我使用stringr包進行字符串操作

library(dplyr) 
library(stringr) 
mydf %>% 
    mutate(
    Person = ifelse(
     is.na(Person) & 
     str_detect(Details, "fuel") & 
     Vehicle == "Car", 
     "John", 
     ifelse(
     is.na(Person) & 
      str_detect(Details, "fuel") & 
      Vehicle == "Bike", 
     "Smith", 
     as.character(Person))) 
) 
#>   Date Expense  Details Vehicle Person 
#> 1 2015-01-01 1566 item101 xsda  Car John 
#> 2 2015-01-10 5646  fuel asa Bike Smith 
#> 3 2015-01-27 3456  item102a  Car Robin 
#> 4 2015-02-27 6546  fuel asa  Car John 
#> 5 2015-03-15 5313  fuel sda Bike Smith 
#> 6 2015-04-17 6466   fuel Bike Smith 
#> 7 2015-04-18 5456  item102a Bike Robin 
+0

使用* data.table *可以更合適地使用join + update。 – Arun

+0

我不確定如何去做,然後... – cderv

相關問題