R：子串匹配

我有一個包含字符names的一列中的以下內容：R：子串匹配

Raymond K 
Raymond K-S 
Raymond KS 
Bill D 
Raymond Kerry 
Blanche D 
Blanche Diamond 
Bill Dates

我也有一個字符向量m_names含有下列：

Raymond K 
Blanche D

我想創建一列outcome如果存在匹配的子字符串，則返回一個非零整數，如果沒有匹配，則返回0。例如，對於文本列上我會非常願意看到

[1] 1 1 1 0 1 2 2 0

目前，我曾嘗試下面的代碼的結果：

outcome <- pmatch(as.character(names), m_names, nomatch = 0)

但這只是返回以下outcome：

[1] 1 0 0 0 1 2 0 0

如何確保即使沒有完全匹配，代碼仍會返回一個標識R中部分匹配的值？

來源

2016-01-22 grievy

#create an empty outcome vector 

outcome<-vector(mode="integer",length=length(names)) 

# loop for the length of compare vector (m_names) 
for(i in 1:length(m_names)) { 
    outcome[grep(m_names[i],names)]<-i 
}

來源

2016-01-22 08:23:14 praneeth

我會stringi做到這一點：

library("stringi")  

# data example: 

a <- read.table(text=" 
       Raymond K 
       Raymond K-S 
       Raymond KS 
       Bill D 
       Raymond Kerry 
       Blanche D 
       Blanche Diamond 
       Bill Dates", 
       stringsAsFactors=FALSE, sep="\t") 

wek <- c("Raymond K", "Blanche D") 

# solution 

klasa <- numeric(length(a[, 1])) 
for(i in 1:length(wek)){ 
    klasa[stri_detect_fixed(a[, 1], wek[i])] <- i 
}

來源

2016-01-22 07:57:37 Marta

其實我去了stringi，發現這非常有幫助！非常感謝Marta。 – grievy

不客氣，@格里維。 – Marta

一些文件和搜索字符串一個簡單的例子：

# Some documents 
docs <- c("aab", "aba", "bbaa", "b") 

# Some search strings (regular expressions) 
searchstr <- c("aa", "ab")

1）的結果向量的數量應計算匹配的數量搜索字符串（1表示「aa」或「ab」匹配「，2表示兩者匹配）

Reduce('+', lapply(searchstr, grepl, x = docs)) 
# Returns: [1] 2 1 1 0

2）結果編號應指示搜索字符串1是否匹配或搜索字符串2是否匹配。如果兩者都匹配，則返回最高數字。（我想，那是您的本意）

n <- length(searchstr) 
Reduce(pmax, lapply(1:n, function(x) x * grepl(searchstr[x], docs))) 
# Returns: [1] 2 2 1 0

現在我們終於考慮您的示例：

docs <- c("Raymond K", "Raymond K", "Raymond KS", "Bill D", 
      "Raymond Kerry", "Blanche D", "Blanche Diamond", 
      "Bill Dates") 
searchstr <- c("Raymond K", "Blanche D") 
Reduce(pmax, lapply(1:n, function(x) x * grepl(searchstr[x], docs))) 
# Returns: [1] 1 1 1 0 1 2 2 0

來源

2016-01-22 07:58:01

R：子串匹配

回答

相關問題