2016-11-30 99 views
0

我目前安裝了NLTK並運行了命令nltk.download()。然而,並非所有的庫都安裝(它卡在panlex_lite上)。NLTK TweetTokenizer不起作用(Python)

的事情是,當我嘗試導入資料Tweet標記生成器我得到的錯誤:

File "create_docs.py", line 7, in

from nltk.tokenize import TweetTokenizer ImportError: cannot import 

name TweetTokenizer

我該如何面對呢?乾杯!

+0

你試過nltk.download( 'panlex_lite')? – sb0709

+0

@ sb0709,是它打印:'[nltk_data]正在下載軟件包panlex_lite到 [nltk_data]/home/vladimir/nltk_data ... '但它不會完成。 –

回答

0

這是因爲沒有正確安裝庫,所以需要跳過「panlex_lite」庫,並應該工作。

Currently is open issue for this, solution will be as follow:

I guess, we could add something like if id != 'panlex_lite' to the code... 

But, as for me, the easiest way looks like this: 

get https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml 
remove panlex from it 
upload it to a public Gist 
pass the gist's url to the downloader: python -m nltk.downloader -d /usr/local/share/nltk_data -u https://gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml all 

這裏是鏈接到的問題:look at last 2 conversations