可以加快Wordnet Lemmatizer的速度嗎？

我通過布朗語料庫上的NLTK使用Wordnet Lemmatizer（以確定它中的名詞是以更多的單數形式還是其複數形式使用）。
即from nltk.stem.wordnet import WordNetLemmatizer
l = WordnetLemmatizer()可以加快Wordnet Lemmatizer的速度嗎？

我注意到，甚至低於例如一個最簡單的查詢需要相當長的時間（至少第二或兩個）。
l("cats")

據推測，這是因爲網絡連接必須要WORDNET進行每個查詢？..
我不知道是否有一種方法仍然使用WORDNET Lemmatizer但它執行得更快？例如，它可以幫助我將Wordnet下載到我的機器上嗎？還是有其他建議？

我試圖弄清楚Wordnet Lemmatizer是否可以做得更快，而不是嘗試不同的lemmatizer，因爲我發現它在Porter和Lancaster等其他產品中的效果最好。

來源

2013-04-24 ess

我用lemmatizer這樣

from nltk.stem.wordnet import WordNetLemmatizer #To download corpora: python -m nltk.downloader all 
    lmtzr=WordNetLemmatizer()#create a lemmatizer object 
    lemma = lmtzr.lemmatize('cats')

它不是我的機器上都慢。沒有必要連接到網絡來做到這一點。

來源

2013-06-14 21:18:25 cindyxiaoxiaoli

它不查詢互聯網，NLTK從您的本地機器讀取WordNet。當你從磁盤運行第一個查詢，NLTK共發現加載到內存：

>>> from time import time 
>>> t=time(); lemmatize('dogs'); print time()-t, 'seconds' 
u'dog' 
3.38199806213 seconds 
>>> t=time(); lemmatize('cats'); print time()-t, 'seconds' 
u'cat' 
0.000236034393311 seconds

是，如果你要lemmatize成千上萬的短語相當緩慢。但是，如果您正在執行大量冗餘查詢，則可以通過緩存函數的結果來獲得一些加速：

from nltk.stem import WordNetLemmatizer 
from functools32 import lru_cache 
wnl = WordNetLemmatizer() 
lemmatize = lru_cache(maxsize=50000)(wnl.lemmatize) 

lemmatize('dogs')

來源

2014-01-20 20:42:38 bcoughlan

關鍵是，第一個查詢還會執行一些初始化。之後，它很快。 – justhalf 2014-05-09 09:55:42

lru_cache很棒但不適用於Python 2.7：可以考慮使用repoze.lru（http://docs.repoze.org/lru/）來獲取類似的功能。 – Vorty 2015-05-27 14:53:44

@Vorty我給的例子使用了Python 3的functools的backport，它有lru_cache：https：//github.com/MiCHiLU/python-functools32 – bcoughlan 2015-05-28 08:48:00

可以加快Wordnet Lemmatizer的速度嗎？

回答

相關問題