如何從r中的文件中grep任何格式的百分比？

我想我的grep函數從多個文件中提取百分比，這些文件的格式都不相同。例如，他們可以寫成以下幾種方式：（5％，2.46％，12.9％，5％，2.46％，5 12.9％，5％，2.46％，5％等等），我想確保至少有在前面的空間和後面，以避免提取HTML代碼，或之類的東西：如何從r中的文件中grep任何格式的百分比？

<TD width="97%"></TD>

這是我與工作的代碼顯然是錯誤的，我在想，也許有一種方法可以放置在像下面星號那樣的佔位符中，以便找到像這樣的各種數字：

txt<-tryCatch(readLines(DS2[i,temp]), error = function(e) readLines(DS2[i,temp])) 
    t<-grep("**.**%", txt)

來源

2017-07-25 Kevin Ocampo

像' 「[0-9] + \\ [0 = 9] + *％。」應該'工作。「*」和「。」都是正則表達式中的特殊字符。見這裏：https://stackoverflow.com/questions/27721008/how-do-i-deal-with-special-characters-like-in-my-regex以及此網站：http：//www.regular-expressions .info /瞭解更多信息。 – lmo

也許'grep（「[0-9] {1,2} \\。[0-9] {1,2}」，text）'？ –

您最好使用xml解析函數提取相關屬性。這很容易。 –

而不是編寫一個正則表達式，它可能更容易在多個步驟中完成。使用例子，你給：

x <- c('5%', '2.46%', '12.9%', '5 %', '2.46 %', '5 12.9 %', 
     '5 percent', '2.46 percent', '5 per cent', 
     'etc..', '<TD width="97%"></TD>') 

get_pct <- function(x) { 
    x <- gsub('="[^"]+%"', '', x) 
    x <- gsub('\\s*per\\s*cent|\\s*%', '%', x) 
    is_pct <- grepl('\\d+(\\.\\d+)?', x) 
    as.numeric(ifelse(is_pct, gsub('.*?(\\d+\\.?\\d*)%.*', '\\1\\2', x), NA)) 
} 

f(x) 
[1] 5.00 2.46 12.90 5.00 2.46 12.90 5.00 2.46 5.00 NA NA

這裏的事情一步一步相同

# Eliminate percentages from html tags 
x <- gsub('="[^"]+%"', '', x) 
x 
[1] "5%"    "2.46%"   "12.9%"   "5 %"    "2.46 %"   "5 12.9 %"  
[7] "5 percent"  "2.46 percent" "5 per cent"  "etc.."   "<TD width></TD>" 

# Standardize % symbol 
x <- gsub('\\s*per\\s*cent|\\s*%', '%', x) 
x 
[1] "5%"    "2.46%"   "12.9%"   "5%"    "2.46%"   "5 12.9%"   
[7] "5%"    "2.46%"   "5%"    "etc.."   "<TD width></TD>" 

# Find percentages 
is_pct <- grepl('\\d+(\\.\\d+)?', x) 

# Extract values 
x <- ifelse(is_pct, gsub('.*?(\\d+\\.?\\d*)%.*', '\\1\\2', x), NA) 
as.numeric(x) 
[1] 5.00 2.46 12.90 5.00 2.46 12.90 5.00 2.46 5.00 NA NA

來源

2017-07-25 21:38:42 Damian

如何從r中的文件中grep任何格式的百分比？

回答

相關問題