刪除R中字符串中的特殊字符

我想清理使用R中正則表達式的字符串。我想刪除任何不是AlphaNumeric或標點符號的東西。刪除R中字符串中的特殊字符

我使用GSUB並一直在使用

gsub("[^[:alnum:]]", " ", x)

但這刪除標點符號。有沒有辦法添加「[[：punct：]]」並將兩者結合起來。

感謝

來源

2017-10-08 Michael Montgomery

你可以用' 「[^ [：alnum：] [：PUNCT：]] +」「 - 個人而言，我會用'+'末來替代多個字符只有一個空間。 – Rentrop

這是否意味着你只是想用空格char替換任何空白字符？因爲'[^ [：alnum：] [：punct：]]'基本上匹配空格。 –

也許你想用'gsub（「[^ A-Za-z0-9 \\ p {P}]」，「」，x，perl = TRUE）'來代替非ASCII字母/數字和標點符號（但不是符號）？ –

你可以試試這個（run here）：

x <- "1 [email protected]#[email protected][email protected]#[email protected]#[email protected]#! 11 ;'. R Tutorial" 
gsub("[^A-Za-z0-9,;._-]","",x)

包括其他標點符號根據自己的需要

來源

2017-10-08 11:37:21

我覺得stringr和畫謎是非常方便的包裝打造的正則表達式

> library(stringr) 
> library(rebus) 
> # string 
> x <- "1 [email protected]#[email protected][email protected]#[email protected]#[email protected]#! 11 ;'. R Tutorial" 
> 
> # define the pattern 
> pat<-or(WRD,char_class(",;._-")) 
> # the regex of the pattern 
> pat 
<regex> (?:\w|[,;._-]) 
> # split the string into single character 
> x1<-unlist(str_split(x,"")) 
> # subset the character based on the pattern and collapse them 
> str_c(str_subset(x1,pat),collapse = "") 
[1] "111;.RTutorial"

如果您想使用t他[：PUNCT：]正則表達式

> # define the pattern using puntc 
> pat2<-or(WRD,PUNCT) 
> # the regex of the pattern 
> pat2 
<regex> (?:\w|[:punct:]) 
> # subset the character based on the pattern and collapse them 
> str_c(str_subset(x1,pat2),collapse = "") 
[1] "[email protected]#[email protected][email protected]#[email protected]#[email protected]#!11;'.RTutorial"

來源

2017-10-08 13:33:25

刪除R中字符串中的特殊字符

回答

相關問題