如何使用正則表達式在單詞邊界處分割？

我試圖做到這一點：如何使用正則表達式在單詞邊界處分割？

import re 
sentence = "How are you?" 
print(re.split(r'\b', sentence))

結果是

[u'How are you?']

我想是這樣[u'How', u'are', u'you', u'?']。這怎麼能實現？

來源

2016-05-15 oarfish

[Python不能由空字符串分割]（https://mail.python.org/pipermail/tutor/2003-August/024753的.html）。 –

此外，它應該返回'[u'How'，u''，u'are'，u''，u'you'，u'？']' –

@KennyLau是的，正確的，但那不是那麼重要，我可以返回或忽略空白，因爲過濾它是微不足道的。 – oarfish

不幸的是，Python無法通過空字符串拆分。

要解決此問題，您需要使用findall而不是split。其實\b只是字的邊界。

它相當於(?<=\w)(?=\W)|(?<=\W)(?=\w)。

這意味着，下面的代碼將工作：

import re 
sentence = "How are you?" 
print(re.findall(r'\w+|\W+', sentence))

來源

2016-05-15 11:39:55

那麼，OP不需要空白符號。 –

由'\ b'分割也會產生空白，因爲'\ b'長度爲零。 –

我的意思是'\ w + | [^ \ w \ s] +'可能更合適。 –

import re 
split = re.findall(r"[\w']+|[.,!?;]", "How are you?") 
print(split)

輸出：

['How', 'are', 'you', '?']

Ideone Demo

Regex101 Demo

Regex的說明：

"[\w']+|[.,!?;]" 

    1st Alternative: [\w']+ 
     [\w']+ match a single character present in the list below 
      Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy] 
      \w match any word character [a-zA-Z0-9_] 
      ' the literal character ' 
    2nd Alternative: [.,!?;] 
     [.,!?;] match a single character present in the list below 
      .,!?; a single character in the list .,!?; literally

來源

2016-05-15 13:49:17

http://stackoverflow.com/a/367292/6211883 –

即使訂單相同，它也相當可疑。 –

你爲什麼加入'''字符？ – oarfish

如何使用正則表達式在單詞邊界處分割？

回答

相關問題