from nltk.tokenize import word_tokenize
music_comments = [['So cant you just run the bot outside of the US? ', ''], ["Just because it's illegal doesn't mean it will stop. I hope it actually gets enforced. ", ''], ['Can they do something about all the fucking bots on Tinder next? \n\nEdit: Holy crap my inbox just blew up ', '']]
print(word_tokenize(music_comments[1]))
我發現this other question這說來傳遞字符串到word_tokenize的名單,但在我的情況下運行上面我得到以下輸出後:word_tokenize在NLTK不採取字符串列表作爲參數
Traceback (most recent call last):
File "testing.py", line 5, in <module>
print(word_tokenize(music_comments[1]))
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 109, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 94, in sent_tokenize
return tokenizer.tokenize(text)
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1237, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1285, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1276, in span_tokenize
return [(sl.start, sl.stop) for sl in slices]
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1276, in <listcomp>
return [(sl.start, sl.stop) for sl in slices]
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1316, in _realign_boundaries
for sl1, sl2 in _pair_iter(slices):
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 310, in _pair_iter
prev = next(it)
File "C:\Users\Shraddheya Shendre\Anaconda3\lib\site-packages\nltk\tokenize\punkt.py", line 1289, in _slices_from_text
for match in self._lang_vars.period_context_re().finditer(text):
TypeError: expected string or bytes-like object
問題是什麼?我錯過了什麼?
您將ONE字符串傳遞給'word_tokenize()',而不是列表。這就是鏈接問題中的代碼所做的。 (當然這是你的問題的答案。) – alexis