從字符串

的最內部嵌套括號中提取文本從下面的文本字符串中，我嘗試提取特定的字符串子集。從字符串

string <- c("(Intercept)", "scale(AspectCos_30)", "scale(CanCov_500)", 
      "scale(DST50_30)", "scale(Ele_30)", "scale(NDVI_Tin_250)", "scale(Slope_500)", 
      "I(scale(Slope_500)^2)", "scale(SlopeVar_30)", "scale(CanCov_1000)", 
      "scale(NDVI_Tin_1000)", "scale(Slope_1000)", "I(scale(Slope_1000)^2)", 
      "scale(log(SlopeVar_30 + 0.001))", "scale(CanCov_30)", "scale(Slope_30)", 
      "I(scale(Slope_30)^2)")

一個好的結果會返回沒有任何特殊字符的中央文本，如下所示。

Good <- c("Intercept", "AspectCos", "CanCov", "DST50", "Ele", "NDVI", "Slope", "Slope", 
      "SlopeVar", "CanCov", "NDVI", "Slope", "Slope", "SlopeVar", "CanCov" "Slope", "Slope")

然而優選地，所得到的字符串將分別說明與「斜率」和「SlopeVar」相關聯的^2和log。具體而言，包含^2的所有字符串都將轉換爲'SlopeSq'，並且包含log的所有字符串都將轉換爲'SlopeVarPs'，如下所示。

Best <- c("Intercept", "AspectCos", "CanCov", "DST50", "Ele", "NDVI", "Slope", "SlopeSq", 
      "SlopeVar", "CanCov", "NDVI", "Slope", "SlopeSq", "SlopeVarPs", "CanCov" "Slope", "SlopeSq")

我還有很長的，醜陋的，和低效的代碼序列讓我幾乎一半的好成績，並希望任何建議。

來源

2017-06-16 B. Davis

作爲一個不那麼高效的編碼，我想有一個鏈的多個正則表達式來實現的結果（何正則表達式的每一行並在每行註釋）：

library(stringr) 
library(dplyr) 
string %>% 
    str_replace_all(".*log\\((.*?)(_.+?)?\\).*", "\\1Ps") %>% # deal with "log" entry 
    str_replace_all(".*\\((.*?\\))", "\\1") %>% # delete anything before the last "(" 
    str_replace_all("(_\\d+)?\\)\\^2", "Sq") %>% # take care of ^2 
    str_replace_all("(_.+)?\\)?", "") -> "outcome" # remove extra characters in the end (e.g. "_00" and ")") 


Best <- c("Intercept", "AspectCos", "CanCov", "DST50", "Ele", "NDVI", "Slope", "SlopeSq", 
      "SlopeVar", "CanCov", "NDVI", "Slope", "SlopeSq", "SlopeVarPs", "CanCov","Slope", "SlopeSq") 
all(outcome == Best) 
## TRUE

來源

2017-06-16 16:23:31

非常讚賞。清晰而翔實！也不知道你可以使用帶有縱梁的管道操作員。涼。 –

管道實際上來自'dplyr'。我編輯了我的答案。 –

回答

相關問題