2017-08-07 79 views
0
irb(main):161:0> "Ready for your my next session?".scan(/[A-Za-z]+|\d+|. /) 
=> ["Ready", "for", "your", "my", "next", "session"] 
=> ["Ready", "for", "your", "my", "next", "session", "?"] #==> EXPECTED 
irb(main):162:0> "yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(/[A-Za-z]+|\d+|. /) 
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a", "m", ". ", "okay"] 
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a",".", "m", ".", "``", "okay", "''"] #==> EXPECTED 

我試圖用這個scan(/[A-Za-z]+|\d+|. /)來標記字符串,甚至標點符號,即使在字符串中的轉義報價,\"紅寶石串掃描返回不同的字符串

但它是不同的結果在不同的字符串結構上表現不同?如何糾正?

+0

「預期:' 「\'\'」, 「還行」, 「 ''」'」 - 你在開玩笑嗎? 「Regexp#scan」無法將雙打字機的報價轉換爲您所期望的。 – mudasobwa

+0

_Sidenote:_匹配標點符號,正則表達式引擎有一個專用匹配器:['\ p {Punct}'](https://ruby-doc.org/core-2.4.1/Regexp.html#class-Regexp-標籤字符+屬性)。 – mudasobwa

+0

@mudasobwa如果我知道,我不會開玩笑;)如果不改變,那麼如何改正輸出到適當的令牌? – arjun

回答

1
r =/
    (?:   # begin a non-capture group 
     \"?  # optionally (?) match a double-quote 
     \p{alpha}+ # match one or more letters 
     \"?  # optionally (?) match a double-quote 
    )   # end non-capture group 
    |   # or 
    \d+   # match one or more digits 
    |   # or 
    [.,?!:;]  # match a punctuation mark 
    /x   # free-spacing regex definition mode 

"yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(r) 
    #=> ["yo", "mr", ".", "menon", "how", "are", "you", "?", "call", "at", "9", 
    # "a", ".", "m", ".", "\"okay\""] 
puts "\"okay\"" 
    # "okay" 

正則表達式通常寫

/(?:\"?\p{alpha}+\"?)|\d+|[.,?!:;]/