這是一個問題,我昨天問了遵循: Partial string match two columns R雙正則表達式匹配列[R
提供給這個答案是偉大的;然而,我發現許多物種並沒有被直接提及,也就是說烏龜從來沒有被直接描述在數據產品中,但是「異國情調」是可以接受的匹配。
dats<-data.frame(ID=c(1:4),species=c("dog","cat","rabbit","tortoise"),
species.descriptor=c("all animal dog","all animal cat","rabbit exotic","tortoise exotic"),
product=c(1,2,3,4),product.authorise=c("all animal dog cat rabbit","cat horse pig",
"dog cat","exotic"))
dats
ID species species.descriptor product product.authorise
1 dog all animal dog 1 all animal dog cat rabbit
2 cat all animal cat 2 cat horse pig
3 rabbit rabbit exotic 3 dog cat
4 tortoise tortoise exotic 4 exotic
我想出了那個作品基礎上結合$ species.descriptor和$ product.authorise在一起,然後指定行作爲「TRUE」如果一個特定的REG EXP出現在兩個或更多次的解決方案像這樣的字段:
library(stringr)
dats$bound<-paste(dats$product.authorise, dats$species.descriptor)
species_descriptor<-c("all animal","dog","cat","rabbit","exotic","horse","pig","tortoise")
species_descriptor<-setNames(nm=species_descriptor)
result<-ifelse(sapply(species_descriptor, str_count, string=dats$bound)>=2,"TRUE","FALSE")
result<-as.data.frame(result)
result$AuthorisedCount<-apply(result[,1:ncol(result)],MARGIN=1,function(x){sum(x=="TRUE",na.rm=T)})
result$SpeciesAuthorised<-ifelse(result$AuthorisedCount>=1,"TRUE","FALSE")
dats<-cbind(dats, result$SpeciesAuthorised)
names(dats)[7]<-"SpeciesAuthorised"
dats$bound<-NULL
dats
ID species species.descriptor product product.authorise SpeciesAuthorised
1 dog all animal dog 1 all animal dog cat rabbit TRUE
2 cat all animal cat 2 cat horse pig TRUE
3 rabbit rabbit exotic 3 dog cat FALSE
4 tortoise tortoise exotic 4 exotic TRUE
這很好,在大得多的數據集工作很快;但是,我意識到可能有更優雅的做事方式。我想知道有沒有人有任何建議?