agrep：只返回最佳匹配

我在R中使用'agrep'函數，它返回一個匹配向量。我想要一個類似於agrep的函數，只返回最佳匹配，或者如果有關係，則返回最佳匹配。目前，我正在使用結果向量的每個元素上的'cba'包中的'sdist（）'函數執行此操作，但這似乎非常冗餘。agrep：只返回最佳匹配

/編輯：這是我目前使用的功能。我想加快速度，因爲兩次計算距離似乎是多餘的。

library(cba) 
word <- 'test' 
words <- c('Teest','teeeest','New York City','yeast','text','Test') 
ClosestMatch <- function(string,StringVector) { 
    matches <- agrep(string,StringVector,value=TRUE) 
    distance <- sdists(string,matches,method = "ow",weight = c(1, 0, 2)) 
    matches <- data.frame(matches,as.numeric(distance)) 
    matches <- subset(matches,distance==min(distance)) 
    as.character(matches$matches) 
} 

ClosestMatch(word,words)

來源

2011-04-19 Zach

RecordLinkage包從CRAN取出10倍左右更快，使用stringdist代替：

library(stringdist) 

ClosestMatch2 = function(string, stringVector){ 

    stringVector[amatch(string, stringVector, maxDist=Inf)] 

}

來源

2014-11-23 15:14:40

Package'RecordLinkage'可用在CRAN上，再次（版本0.4-9截至2016-05-02。 – Uwe 2016-07-15 09:52:59

agrep包使用Levenshtein Distances匹配字符串。軟件包RecordLinkage具有C函數來計算Levenshtein距離，它可以直接用於加速您的計算。這是一個經過改進的ClosestMatch功能是

library(RecordLinkage) 

ClosestMatch2 = function(string, stringVector){ 

    distance = levenshteinSim(string, stringVector); 
    stringVector[distance == max(distance)] 

}

來源

2011-04-19 21:55:36 Ramnath

@DWin。感謝您的更正。我編輯了我的答案以更正拼寫。 – Ramnath 2011-04-20 02:51:00

感謝您的回答，這是一個很棒的功能。這個包的預期目的是什麼？那裏可能還有其他與我的項目相關的功能。 – Zach 2011-04-20 13:25:27

@Zach。是。它可能包含許多與你的工作相關的功能。在這個包的CRAN頁面上有很多小插曲可以查找（http://cran.r-project.org/web/packages/RecordLinkage/index.html） – Ramnath 2011-04-20 14:26:04

agrep：只返回最佳匹配

回答

相關問題