2012-04-18 56 views
4

我正在尋找一個gsub字符串,它將返回表達式的所有匹配項,而不僅僅是最後一個匹配項。即:gsub返回表達式的所有匹配,而不僅僅是最後一個匹配

data <- list("a sentence with citation (Ref. 12) and another (Ref. 13)", "single (Ref. 14)") 
gsub(".*(Ref. (\\d+)).*", "\\1", data) 

返回

[1] "Ref. 13" "Ref. 14" 

所以我已經失去了參考。 12.

回答

7

可以使用strapply功能從gsubfn包做到這一點:

library(gsubfn) 

data <- list("a sentence with citation (Ref. 12) and another (Ref. 13)", "single (Ref. 14)") 
unlist(strapply(data,"(Ref. (\\d+))")) 
4

這裏是一個函數 - 實質上是一個gregexpr()的包裝 - 它將捕獲單個字符串中的多個引用。

extractMatches <- function(data, pattern) { 
    start <- gregexpr(pattern, data)[[1]] 
    stop <- start + attr(start, "match.length") - 1 
    if(-1 %in% start) { 
     "" ## **Note** you could return NULL if there are no matches 
    } else { 
     mapply(substr, start, stop, MoreArgs = list(x = data)) 
    } 
}  

data <- list("a sentence with citation (Ref. 12), (Ref. 13), and then (Ref. 14)", 
      "another sentence without reference") 
pat <- "Ref. (\\d+)" 

res <- lapply(data, extractMatches, pattern = pat) 
res 
# [[1]] 
# [1] "Ref. 12" "Ref. 13" "Ref. 14" 
# 
# [[2]] 
# [1] "" 

(**注**:如果您返回NULL,而不是""時,有一個字符串的參考,那麼你可以後期處理與do.call("c", res)結果得到只包含匹配引用單個矢量)。

6

如何

sapply(data,stringr::str_extract_all,pattern="Ref. (\\d+))") 

+0

不錯。我認爲這必須在其他地方實施。 (同樣有趣的是,'str_extract_all'接着調用'str_locate_all',它調用're_mapply(「gregexpr」,string,pattern)' - 就像我能想象的那樣,我的函數的僞代碼概要很好)。 – 2012-04-18 18:55:11

相關問題