在REGEX中匹配和替換多個帶引號的字符串

我想用R中的下劃線替換引號內的所有空格。我不知道如何在有多個引號時正確定義帶引號的字符串。我的開始努力失敗了，我甚至沒有得到單/雙引號。在REGEX中匹配和替換多個帶引號的字符串

require(stringi) 
s = "The 'quick brown' fox 'jumps over' the lazy dog" 
stri_replace_all(s, regex="('.*) (.*')", '$1_$2') 
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

感謝您的幫助。

來源

2017-05-25 geotheory

只有當你有一個庸俗的頭腦;） – geotheory

你需要考慮內部轉義序列？你正在處理正確逃脫的字符串？如果您可以匹配整個相關的「....」子字符串，那麼您可以替換匹配內的任何文本。 –

讓我們假設你需要匹配以'啓動所有非重疊的子串，則比其他' 1個或多個字符，然後用'結束。該模式是'[^']+'。

然後，可以使用下面的基礎R代碼：

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog" 
gr <- gregexpr("'[^']+'", x) 
mat <- regmatches(x, gr) 
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_") 
x 
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

this R demo見。或者，使用gsubfn：

> library(gsubfn) 
> rx <- "'[^']+'" 
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog" 
> gsubfn(rx, ~ gsub("\\s", "_", x), s) 
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog" 
>

爲了支持轉義序列，你可以使用一個更復雜的PCRE正則表達式：

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

詳細：

(?<!\\) - 沒有\之前當前位置
(?:\\{2})* - 零個或更多個序列2 \小號
\K - 匹配復位操作者
' - 單引號
[^'\\]* - 零個或更多 - 零個或多個字符比'和\
(?:\\.[^'\\]*)*其他序列：
- \\. - a \後跟任何c哈日但一個換行符
- [^'\\]* - 零個或多個字符比'和\
'其他 - 一個單引號。

而且R demo會是什麼樣

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog" 
cat(x, sep="\n") 
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE) 
mat <- regmatches(x, gr) 
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_") 
cat(x, sep="\n")

輸出：

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog 
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog

來源

2017-05-25 23:15:18

我有同樣的想法 - 我不知道單獨保存'mat'是否有什麼好處，因爲無論如何你必須運行regmatches'兩次。 +1無論 - regmatches <-'確實是一個非常有用的功能。 – thelatemail

是的，我也認爲使用PCRE regex選項的base R非常強大，並且是必須處理轉義序列時唯一最方便的選項（請參閱更新）。 –

綜合，謝謝Wiktor。我不是假裝理解PCRE的例子.. – geotheory

試試這個：

require(stringi) 
s = "The 'quick brown' fox 'jumps over' the lazy dog" 
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')

來源

2017-05-25 23:06:15 AChervony

這假設在''''+'字母'和'字母'+'''之間只有一個空格。 –

這不適用於引號內的兩個以上單詞。 – Rahul

我認爲。*太貪婪。這就是爲什麼具體 - 去信件可能會有所幫助。你需要修改你的字符串是大寫字母還是特殊字符。 – AChervony

在REGEX中匹配和替換多個帶引號的字符串

回答

相關問題