從字符串替換字符串的部分字符串

我在這裏搜索了很多正則表達式的答案，但找不到解決這類問題的方法。從字符串替換字符串的部分字符串

我的數據集是維基百科鏈接tibble：

library(tidytext) 
library(stringr) 
text.raw <- "Berthold Speer was een [[Duitsland (hoofdbetekenis)|Duits]] [[architect]]."

我試圖清理從鏈接我的文字。此：

str_extract_all(text.raw, "[a-zA-Z\\s]+(?=\\])") 
# [1] "Duits"  "architect"

選擇我從括號之間需要的話。

此：

str_replace_all(text.raw, "\\[\\[.*?\\]\\]", str_extract(text.raw, "[a-zA-Z\\s]+(?=\\])")) 
# [1] "Berthold Speer was een Duits Duits."

作品如預期，但不完全是我所需要的。這：

str_replace_all(text.raw, "\\[\\[.*?\\]\\]", str_extract_all(text.raw, "[a-zA-Z\\s]+(?=\\])")) 
# Error: `replacement` must be a character vector

給在那裏我預計"Berthold Speer was een Duits architect"

目前我的代碼看起來像這樣的錯誤：

text.clean <- data_frame(text = text.raw) %>% 
    mutate(text = str_replace_all(text, "\\[\\[.*?\\]\\]", str_extract_all(text, "[a-zA-Z\\s]+(?=\\])")))

我希望有人知道的解決方案，或者可以點我到重複的問題如果存在一個。我期望的輸出是"Berthold Speer was een Duits architect"。

來源

2017-07-07 raoul

最後想要的字符串是什麼？ –

'architect'。我想要''[[...]]'或'[[xxx |。]中的點...]]' – raoul

'text.raw％>％gsub（pattern ='\\ [。+ \\ |'，replacement =''）％>％ gsub（pattern ='\\] | \\ [ '，replacement =''）' –

你可以使用一個單一的GSUB操作

text <- "Berthold Speer was een [[Duitsland (hoofdbetekenis)|Duits]] [[architect]]." 
gsub("\\[{2}(?:[^]|]*\\|)?([^]]*)]{2}", "\\1", text)

見online R demo。

的模式會匹配

\\[{2} - 2個[符號
(?:[^]|]*\\|)? - 任選的序列進行匹配
- [^]|]* - 比]其他零個或多個字符和|
- \\| - 一個管道符號
([^]]*) - 第1組：比]
]{2}零個或多個字符其他 - 2個]符號。

來源

2017-07-07 13:43:21

您的正則表達式技巧很荒謬+1 –

如果可能有單個右括號用'[^] |替換'[^] |] *'' ] *（？：]（？！））[^] |] *）*'和'[^]] *'用'[^]] *（？：]（？！]）[^] *'並將'perl = TRUE'參數添加到gsub。 –

謝謝！非常棒！ – raoul

從字符串替換字符串的部分字符串

回答

相關問題