什麼是NLTK POS tagger要我下載？

我剛開始使用詞性標註器，而且我面臨很多問題。什麼是NLTK POS tagger要我下載？

我開始詞性標註下列要求：

import nltk 
text=nltk.word_tokenize("We are going out.Just you and me.")

當我想打印'text'，會發生以下情況：

print nltk.pos_tag(text) 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "F:\Python26\lib\site-packages\nltk\tag\__init__.py", line 63, in pos_tag 
tagger = nltk.data.load(_POS_TAGGER) 
File "F:\Python26\lib\site-packages\nltk\data.py", line 594, in load 
resource_val = pickle.load(_open(resource_url)) 
File "F:\Python26\lib\site-packages\nltk\data.py", line 673, in _open 
return find(path).open() 
File "F:\Python26\lib\site-packages\nltk\data.py", line 455, in find 
    raise LookupError(resource_not_found)` 
LookupError: 
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not 
found. Please use the NLTK Downloader to obtain the resource: 

>>> nltk.download(). 

Searched in: 
    - 'C:\\Documents and Settings\\Administrator/nltk_data' 
    - 'C:\\nltk_data' 
    - 'D:\\nltk_data' 
    - 'E:\\nltk_data' 
    - 'F:\\Python26\\nltk_data' 
    - 'F:\\Python26\\lib\\nltk_data' 
    - 'C:\\Documents and Settings\\Administrator\\Application Data\\nltk_data'

我用nltk.download()，但沒有奏效。

來源

2011-12-21 Pearl

爲什麼您要將所有文本加粗？這實際上沒有必要。另外，請發佈一個最小但完整的例子來說明你的錯誤。 – 2011-12-21 13:22:34

在那裏，我爲你清理它。請以此爲例來說明如何格式化未來的問題。 – 2011-12-21 13:26:40

thankx ...現在問題已解決... – Pearl 2011-12-21 19:06:17

當您在Python中輸入nltk.download()時，會自動顯示NLTK Downloader界面。
點擊型號並選擇maxent_treebank_pos_。它會自動安裝。

import nltk 
text=nltk.word_tokenize("We are going out.Just you and me.") 
print nltk.pos_tag(text) 
[('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'JJ'), 
('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')]

來源

2011-12-22 04:43:48 Pearl

+16

此外，如果您指定標記名稱'nltk.download（'maxent_treebank_pos_tagger'）;'，則可以直接在代碼中下載它。看到這篇文章http：// stackoverflow。com/a/5208563/62921 – ForceMagic 2013-03-27 21:23:54

import nltk 
text = "Obama delivers his first speech." 

sent = nltk.sent_tokenize(text) 


loftags = [] 
for s in sent: 
    d = nltk.word_tokenize(s) 

    print nltk.pos_tag(d)

結果：

akshayy @ ubuntu的：〜/ SUMM $蟒nn1.py [（ '奧巴馬'， 'NNP'），（ '提供'， 'NNS' ），（ '他'， 'PRP $'），（ '第一'， 'JJ'），（ '講話'， 'NN'），（ ' ' '。'）]

（我剛纔問了另一個問題在哪裏使用此代碼）

來源

2013-04-04 19:18:27 akshayb

值得注意的是，這個解析是不正確的** - POS標記器已經標記爲「遞送」作爲複數名詞... – simon 2015-07-16 22:15:21

nltk.download()

點擊型號並選擇maxent_treebank_pos_。它會自動安裝。

import nltk 
text=nltk.word_tokenize("We are going out.Just you and me.") 
print nltk.pos_tag(text) 
[('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'JJ'), 
('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')]

來源

2013-08-13 13:40:55

從外殼/終端，你可以使用：

python -m nltk.downloader maxent_treebank_pos_tagger

（可能需要須藤在Linux上）

它將NLTK安裝maxent_treebank_pos_tagger（即標準的樹庫POS惡搞）並解決您的問題。

來源

2015-09-16 05:47:41

從比V3.2更高版本NLTK，請使用：

>>> import nltk 
>>> nltk.__version__ 
'3.2.1' 
>>> nltk.download('averaged_perceptron_tagger') 
[nltk_data] Downloading package averaged_perceptron_tagger to 
[nltk_data]  /home/alvas/nltk_data... 
[nltk_data] Package averaged_perceptron_tagger is already up-to-date! 
True

對於NLTK版本使用舊型號最大墒，即V3.1及以下，請使用：

>>> import nltk 
>>> nltk.download('maxent_treebank_pos_tagger') 
[nltk_data] Downloading package maxent_treebank_pos_tagger to 
[nltk_data]  /home/alvas/nltk_data... 
[nltk_data] Package maxent_treebank_pos_tagger is already up-to-date! 
True

對於有關默認pos_tag更改的更多詳細信息，請參閱https://github.com/nltk/nltk/pull/1143

來源

2016-06-06 07:01:47 alvas

什麼是NLTK POS tagger要我下載？

回答

相關問題