將字符串轉換爲字列表？

我試圖將字符串轉換爲使用python的單詞列表。我想利用類似以下內容：將字符串轉換爲字列表？

string = 'This is a string, with words!'

然後轉換爲這樣的事情：

list = ['This', 'is', 'a', 'string', 'with', 'words']

注意標點符號和空格的遺漏。什麼是最快的方式去做這件事？

來源

2011-05-31 rectangletangle

試試這個：

import re 

mystr = 'This is a string, with words!' 
wordList = re.sub("[^\w]", " ", mystr).split()

它是如何工作的：

從文檔：

re.sub(pattern, repl, string, count=0, flags=0)

返回通過替換模式的最左邊的非重疊發生時得到的線字符串由替換repl。如果未找到該模式，則字符串將保持不變。 repl可以是一個字符串或一個函數。

所以在我們的例子：

模式是任何非字母數字字符。

[\ W]是指任何字母數字字符和等於所述字符集 [A-ZA-Z0-9_]

a到z，A至Z，）至9和下劃線。

所以我們匹配任何非字母數字字符並將其替換爲空格。它通過其分割空間的字符串，並將其轉換成一個列表

，然後我們分手（）

所以「你好世界」

成爲「世界你好」

與應用re.sub

然後[ '你好'， '世界']

分裂後（）

讓我知道是否有疑慮出現。

來源

2011-05-31 00:13:53 Bryan

記住也要處理撇號和連字符，因爲它們不包含在'\ w'中。 – Shule 2014-07-30 05:29:26

你可能想要處理格式化的撇號和非破折號連字符。 – Shule 2014-07-30 05:57:42

嗯，你可以使用

import re 
list = re.sub(r'[.!,;?]', ' ', string).split()

注意兩個string和list是內建類型的名稱，所以你可能不希望使用那些爲您的變量名。

來源

2011-05-31 00:10:30 Cameron

正則表達式的單詞會給你最大的控制。你會仔細考慮如何處理帶有破折號或撇號的單詞，如「我是」。

來源

2011-05-31 00:14:40 tofutim

正確地做到這一點非常複雜。爲了您的研究，它被稱爲詞標記化。你應該看看NLTK，如果你想看看別人怎麼做的，而不是從頭開始：

>>> import nltk 
>>> paragraph = u"Hi, this is my first sentence. And this is my second." 
>>> sentences = nltk.sent_tokenize(paragraph) 
>>> for sentence in sentences: 
...  nltk.word_tokenize(sentence) 
[u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.'] 
[u'And', u'this', u'is', u'my', u'second', u'.']

來源

2011-05-31 00:15:21

使用string.punctuation的完整性：

import re 
import string 
x = re.sub('['+string.punctuation+']', '', s).split()

這種處理換行也是如此。

來源

2011-05-31 00:24:02 mtrw

應該是被接受的anwser。 – Epoc 2017-02-08 11:41:10

最簡單的方法：

>>> import re 
>>> string = 'This is a string, with words!' 
>>> re.findall(r'\w+', string) 
['This', 'is', 'a', 'string', 'with', 'words']

來源

2011-05-31 02:19:14 JBernardo

我認爲這是對別人絆倒這個帖子上給出的反應遲緩的最簡單的方法：

>>> string = 'This is a string, with words!' 
>>> string.split() 
['This', 'is', 'a', 'string,', 'with', 'words!']

來源

2012-12-06 00:22:28 gilgamar

+19

您需要分離並排除單詞中的標點符號（例如，「字符串」和「單詞！」）。因此，這不符合OP的要求。 – Levon 2012-12-06 00:31:45

-2

你可以嘗試這樣做：

tryTrans = string.maketrans(",!", " ") 
str = "This is a string, with words!" 
str = str.translate(tryTrans) 
listOfWords = str.split()

來源

2013-08-12 13:49:25 user2675185

這是來自我對不能使用正則表達式的編碼挑戰的嘗試，

outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr).split(' ')

撇號的作用看起來很有趣。

來源

2015-05-28 06:30:26 guest201505281433

list=mystr.split(" ",mystr.count(" "))

來源

2015-08-11 15:14:35 sanchit

通過@ mtrw的回答啓發，但改善的只有一個字邊界去掉標點符號：

import re 
import string 

def extract_words(s): 
    return [re.sub('^[{0}]+|[{0}]+$'.format(string.punctuation), '', w) for w in s.split()] 

>>> str = 'This is a string, with words!' 
>>> extract_words(str) 
['This', 'is', 'a', 'string', 'with', 'words'] 

>>> str = '''I'm a custom-built sentence with "tricky" words like https://stackoverflow.com/.''' 
>>> extract_words(str) 
["I'm", 'a', 'custom-built', 'sentence', 'with', 'tricky', 'words', 'like', 'https://stackoverflow.com']

來源

2017-06-08 09:55:37

你消除字母外，每特殊字符這樣：

def wordsToList(strn): 
    L = strn.split() 
    cleanL = [] 
    abc = 'abcdefghijklmnopqrstuvwxyz' 
    ABC = abc.upper() 
    letters = abc + ABC 
    for e in L: 
     word = '' 
     for c in e: 
      if c in letters: 
       word += c 
     if word != '': 
      cleanL.append(word) 
    return cleanL 

s = 'She loves you, yea yea yea! ' 
L = wordsToList(s) 
print(L) # ['She', 'loves', 'you', 'yea', 'yea', 'yea']

我不確定這是快速還是最佳，甚至是正確的編程方式。

來源

2017-07-30 15:22:07

將字符串轉換爲字列表？

回答

相關問題