如何在一列鏈接中查找r中的字符串匹配？

我有一個數據表與同一列中的.txt鏈接列表。我正在尋找一種方法讓R在每個鏈接中搜索文件是否包含折扣率或折扣現金流。然後，我要R在每個鏈接旁邊創建2列（一個用於折扣率，另一個用於折扣現金流），如果存在，那麼其中將有1個，如果不存在則爲0。如何在一列鏈接中查找r中的字符串匹配？

current table with links in column websiteURL

what i want my table to look like

這裏的示例鏈接的小單子，我想通過篩選：

http://www.sec.gov/Archives/edgar/data/1015328/0000913849-04-000510.txt 
http://www.sec.gov/Archives/edgar/data/1460306/0001460306-09-000001.txt 
http://www.sec.gov/Archives/edgar/data/1063761/0001047469-04-028294.txt 
http://www.sec.gov/Archives/edgar/data/1230588/0001178913-09-000260.txt 
http://www.sec.gov/Archives/edgar/data/1288246/0001193125-04-155851.txt 
http://www.sec.gov/Archives/edgar/data/1436866/0001172661-09-000349.txt 
http://www.sec.gov/Archives/edgar/data/1089044/0001047469-04-026535.txt 
http://www.sec.gov/Archives/edgar/data/1274057/0001047469-04-023386.txt 
http://www.sec.gov/Archives/edgar/data/1300379/0001047469-04-026642.txt 
http://www.sec.gov/Archives/edgar/data/1402440/0001225208-09-007496.txt 
http://www.sec.gov/Archives/edgar/data/35527/0001193125-04-161618.txt

來源

2017-07-26 Kevin Ocampo

'dput（）'> imgs – hrbrmstr

也許這樣的事情...

checktext <- function(file, text) { 
    filecontents <- readLines(file) 
    return(as.numeric(any(grepl(text, filecontents, ignore.case = TRUE)))) 
} 

df$DR <- sapply(df$file_name, checktext, "discount rate") 
df$DCF <- sapply(df$file_name, checktext, "discounted cash flow")

更快的版本，感謝Gregor的評論W，將

checktext <- function(file, text) { 
    filecontents <- readLines(file) 
    sapply(text, function(x) as.numeric(any(grepl(x, filecontents, 
       ignore.case = T)))) 
} 

df[,c("DR","DCF")] <- t(sapply(df$file_name, checktext, 
          c("discount rate", "discounted cash flow")))

或者，如果你是從網址，而不是本地文件做這件事，在上面df$websiteURL更換df$file_name。它在你提供的短名單上爲我工作。

來源

2017-07-26 19:03:05

連接和讀取文件會很慢，但grep會很快。使用一次讀取每個文件並使用兩次'grep'會更有效率。讓'text'在你的'checktext'函數中成爲一個向量，並且使用'sapply（text，function（x）as.numeric（any（grepl（x，filecontents，ignore.case = T））））' – Gregor

@格里戈是的 - 那會更快 - 非常感謝。我已將它添加到主要答案中。 –

如何在一列鏈接中查找r中的字符串匹配？

回答

相關問題