2014-09-02 173 views
0

說我有文字是這樣的:分割字符串遞歸

pattern = "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')" 

的挑戰是如何使用單詞分隔符從

c(" ","-","/","\\","_",":","(",")",".",",") 

家人將其分割成單詞。

期望的結果:

"This" "is" "some" "word" "expression" "I'd" "like" "to" "parse" "intelligently" "using" "special" "symbols" "like" 

方法

我可以用做sapplyfor循環:

keywords = unlist(strsplit(pattern," ")) 
keywords = unlist(strsplit(keywords,"-")) 

#等

問題:

但是什麼解決方案使用Reduce(f, x, init, accummulate=TRUE)

回答

4

您可以使用選項perl = TRUE再拆標點符號或空間

> strsplit(pattern, '[[:punct:]]|[[:space:]]', perl = TRUE) 
[[1]] 
[1] "This"   "is"   "some"   "word"   "expression" 
[6] "I"    "d"    "like"   "to"   "parse"   
[11] "intelligently" "using"   "special"  "symbols"  "like"   
[16] ""  
+0

的確非常優雅! – 2014-09-02 10:25:39

+0

雖然... – 2014-09-02 10:34:21

+0

其實並不介意「我」+「d」與「我會」。爲了簡單起見,我將在 – 2014-09-02 10:45:56

5

您不應該在這裏需要Reduce。你應該能夠做到像下面這樣:

splitters <- c(" ","/","\\","_",":","(",")",".",",","-") # dash should come last 
pattern <- paste0("[", paste(splitters, collapse = ""), "]") 
string <- "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')" 
strsplit(string, pattern)[[1]] 
# [1] "This"   "is"   "some"   "word"   
# [5] "expression" "I'd"   "like"   "to"   
# [9] "parse"   "intelligently" "using"   "special"  
# [13] "symbols"  "like"   "'"    "'" 

注意,在一個正則表達式字符類-應該擺在第一個或最後一個,所以我已經編輯相應的「分離器」的載體。此外,您可能希望在「模式」末尾添加+,以防止您想將多個空格合併爲一個空格。

+0

@DavidArenburg,它更接近了。 – A5C1D2H2I1M1N2O1R2T1 2014-09-02 10:42:14

+0

非常有幫助的情況下,需要添加自定義到其他答案 – 2014-09-02 10:49:10

+0

爲什麼「短跑應該最後」的任何原因? – 2014-09-02 12:47:14

2

我會去(這將讓"I'd"在一起)

strsplit(pattern, "[^[:alnum:][:digit:]']") 
## [[1]] 
## [1] "This"   "is"   "some"   "word"   "expression" "I'd"   "like"   "to"   "parse"   
## [10] "intelligently" "using"   "special"  "symbols"  "like"   "'"    "'"