2016-05-15 59 views
3

我試圖做到這一點:如何使用正則表達式在單詞邊界處分割?

import re 
sentence = "How are you?" 
print(re.split(r'\b', sentence)) 

結果是

[u'How are you?'] 

我想是這樣[u'How', u'are', u'you', u'?']。這怎麼能實現?

+1

[Python不能由空字符串分割](https://mail.python.org/pipermail/tutor/2003-August/024753的.html)。 –

+1

此外,它應該返回'[u'How',u'',u'are',u'',u'you',u'?']' –

+0

@KennyLau是的,正確的,但那不是那麼重要,我可以返回或忽略空白,因爲過濾它是微不足道的。 – oarfish

回答

7

不幸的是,Python無法通過空字符串拆分。

要解決此問題,您需要使用findall而不是split。其實\b只是字的邊界。

它相當於(?<=\w)(?=\W)|(?<=\W)(?=\w)

這意味着,下面的代碼將工作:

import re 
sentence = "How are you?" 
print(re.findall(r'\w+|\W+', sentence)) 
+1

那麼,OP不需要空白符號。 –

+0

由'\ b'分割也會產生空白,因爲'\ b'長度爲零。 –

+2

我的意思是'\ w + | [^ \ w \ s] +'可能更合適。 –

1
import re 
split = re.findall(r"[\w']+|[.,!?;]", "How are you?") 
print(split) 

輸出:

['How', 'are', 'you', '?'] 

Ideone Demo

Regex101 Demo


Regex的說明:

"[\w']+|[.,!?;]" 

    1st Alternative: [\w']+ 
     [\w']+ match a single character present in the list below 
      Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy] 
      \w match any word character [a-zA-Z0-9_] 
      ' the literal character ' 
    2nd Alternative: [.,!?;] 
     [.,!?;] match a single character present in the list below 
      .,!?; a single character in the list .,!?; literally 
+0

http://stackoverflow.com/a/367292/6211883 –

+0

即使訂單相同,它也相當可疑。 –

+0

你爲什麼加入'''字符? – oarfish

相關問題