2017-10-10 148 views
1

我有兩個數據幀。第一招:使用grep標記文本並粘貼到r

keyword <- c("apple","peach","grape","berry","kiwi fruit") 
keyword <- data.frame(keyword) 

enter image description here

第二個:

sentence <- c("I like apple","I hate apple","grape is good") 
url <- c("url1","url2","url3") 
sentence <- data.frame(sentence,url) 

enter image description here

我需要的是:如果關鍵字包含在句子,粘貼URL到文本。如果多個句子包含關鍵字,請粘貼所有網址。最後的結果是這樣的:

enter image description here

我試圖使用代碼波紋管,但預期它沒有發揮出來。

keyword$Label <- character(length(keyword$keyword)) 

for (i in 1:length(keyword$keyword)) { 
keyword$Label[grep(keyword$keyword[i],sentence$sentence)] <- sentence$url 
} 
+0

您需要幫助瞭解如何完成這項工作? (code-wise)或者你想知道應該做什麼? (在概念上) 我會建議做一個像條件加入...(概念明智) – zwep

+0

我需要代碼式的解決方案。謝謝 –

回答

2

stringr + dplyr + tidyr A液:

library(stringr) 
library(dplyr) 
library(tidyr) 

sentence %>% 
    mutate(sentence = str_extract(sentence, paste0(keyword$keyword, collapse = "|"))) %>% 
    right_join(keyword, by = c("sentence" = "keyword")) %>% 
    group_by(sentence) %>% 
    mutate(URL = 1:n()) %>% 
    spread(URL, url, sep = "") %>% 
    rename(keyword = sentence) 

結果:

# A tibble: 5 x 3 
# Groups: keyword [5] 
    keyword URL1 URL2 
*  <chr> <chr> <chr> 
1  apple url1 url2 
2  berry <NA> <NA> 
3  grape url3 <NA> 
4 kiwi fruit <NA> <NA> 
5  peach <NA> <NA> 

數據:

keyword <- c("apple","peach","grape","berry","kiwi fruit") 
keyword <- data.frame(keyword, stringsAsFactors = FALSE) 
sentence <- c("I like apple","I hate apple","grape is good") 
url <- c("url1","url2","url3") 
sentence <- data.frame(sentence,url, stringsAsFactors = FALSE)