NLTK中的詞義消歧蟒Python

我是NLTK Python的新手，我正在尋找一些可以對詞義進行消歧的示例應用程序。我在搜索結果中有很多算法，但沒有示例應用程序。我只想通過一個句子，並希望通過參考wordnet庫來了解每個單詞的含義。謝謝NLTK中的詞義消歧蟒Python

我在PERL中找到了一個類似的模塊。 http://marimba.d.umn.edu/allwords/allwords.html NLTK Python中是否有這樣的模塊？

來源

2010-09-13 thesensemakers

這裏是一個python實現：https：// github .com/alvations/pywsd – alvas 2014-02-28 09:56:34

-1

是的，它可以在NLTK中使用wordnet模塊。在您的文章中提到的工具中使用的相似性度量也存在於NLTK wordnet模塊中。

來源

2010-09-18 17:58:59 Jaggu

參考http://jaganadhg.freeflux.net/blog/archive/2010/10/16/wordnet-sense-similarity-with-nltk-some-basics.html

來源

2010-10-17 06:41:33 Jaggu

此鏈接已死亡。你能提供一個工作嗎？ – Hooked 2015-01-09 04:46:55

NLTK有API來訪問WORDNET。 Wordnet將單詞作爲同義詞。這會給你一些關於這個詞，它的上位詞，下位詞，根詞等的信息。

「Python文本處理與NLTK 2.0食譜」是一本好書，讓你開始瞭解NLTK的各種功能。閱讀，理解和實施很容易。另外，你可以看看其他論文（在NLTK領域之外），其中討論了使用維基百科進行詞義消歧。

來源

2011-01-02 16:10:12 sprezzatura

是的，實際上，有NLTK團隊編寫的a book，其中有多個章節的分類，他們明確涵蓋how to use WordNet。您也可以從Safari購買本書的物理版本。

僅供參考：NLTK由自然語言編程學者編寫，用於他們的入門編程課程。

來源

2011-12-21 18:49:57 Indolering

據我瞭解，該章致力於分類，但它不是很符合詞義消歧。 – geekazoid 2014-01-03 03:40:57

作爲一個實際的答案OP的要求，這裏有幾個WSD方法Python實現，在NLTK的同義詞集合（S）的形式返回感官，https://github.com/alvations/pywsd

它包括

Lesk算法（包括原始萊斯克,改編Lesk和簡單Lesk）
基線算法（隨機感，第一感測，最頻繁的感應）

它可以用作例如：

#!/usr/bin/env python -*- coding: utf-8 -*- 

bank_sents = ['I went to the bank to deposit my money', 
'The river bank was full of dead fishes'] 

plant_sents = ['The workers at the industrial plant were overworked', 
'The plant was no longer bearing flowers'] 

print "======== TESTING simple_lesk ===========\n" 
from lesk import simple_lesk 
print "#TESTING simple_lesk() ..." 
print "Context:", bank_sents[0] 
answer = simple_lesk(bank_sents[0],'bank') 
print "Sense:", answer 
print "Definition:",answer.definition 
print 

print "#TESTING simple_lesk() with POS ..." 
print "Context:", bank_sents[1] 
answer = simple_lesk(bank_sents[1],'bank','n') 
print "Sense:", answer 
print "Definition:",answer.definition 
print 

print "#TESTING simple_lesk() with POS and stems ..." 
print "Context:", plant_sents[0] 
answer = simple_lesk(plant_sents[0],'plant','n', True) 
print "Sense:", answer 
print "Definition:",answer.definition 
print 

print "======== TESTING baseline ===========\n" 
from baseline import random_sense, first_sense 
from baseline import max_lemma_count as most_frequent_sense 

print "#TESTING random_sense() ..." 
print "Context:", bank_sents[0] 
answer = random_sense('bank') 
print "Sense:", answer 
print "Definition:",answer.definition 
print 

print "#TESTING first_sense() ..." 
print "Context:", bank_sents[0] 
answer = first_sense('bank') 
print "Sense:", answer 
print "Definition:",answer.definition 
print 

print "#TESTING most_frequent_sense() ..." 
print "Context:", bank_sents[0] 
answer = most_frequent_sense('bank') 
print "Sense:", answer 
print "Definition:",answer.definition 
print

[OUT]：

======== TESTING simple_lesk =========== 

#TESTING simple_lesk() ... 
Context: I went to the bank to deposit my money 
Sense: Synset('depository_financial_institution.n.01') 
Definition: a financial institution that accepts deposits and channels the money into lending activities 

#TESTING simple_lesk() with POS ... 
Context: The river bank was full of dead fishes 
Sense: Synset('bank.n.01') 
Definition: sloping land (especially the slope beside a body of water) 

#TESTING simple_lesk() with POS and stems ... 
Context: The workers at the industrial plant were overworked 
Sense: Synset('plant.n.01') 
Definition: buildings for carrying on industrial labor 

======== TESTING baseline =========== 
#TESTING random_sense() ... 
Context: I went to the bank to deposit my money 
Sense: Synset('deposit.v.02') 
Definition: put into a bank account 

#TESTING first_sense() ... 
Context: I went to the bank to deposit my money 
Sense: Synset('bank.n.01') 
Definition: sloping land (especially the slope beside a body of water) 

#TESTING most_frequent_sense() ... 
Context: I went to the bank to deposit my money 
Sense: Synset('bank.n.01') 
Definition: sloping land (especially the slope beside a body of water)

來源

2014-01-03 10:21:10 alvas

最近，pywsd代碼的一部分已被移植到的NLTK的最新版本中模塊，嘗試：

>>> from nltk.wsd import lesk 
>>> sent = 'I went to the bank to deposit my money' 
>>> ambiguous = 'bank' 
>>> lesk(sent, ambiguous) 
Synset('bank.v.04') 
>>> lesk(sent, ambiguous).definition() 
u'act as the banker in a game or in gambling'

爲了獲得更好的性能WSD，而不是使用的NLTK模塊pywsd庫。一般來說，從pywsd的simple_lesk()比NLTK的lesk好。當我有空時，我會盡量更新NLTK模塊。

在迴應克里斯斯賓塞的評論，請注意的Lesk算法的限制。我只是簡單地給出一個算法的準確實現。這不是一個銀彈，http://en.wikipedia.org/wiki/Lesk_algorithm

還要注意的是，雖然：

lesk("My cat likes to eat mice.", "cat", "n")

不給你正確的答案，你可以使用pywsd實施max_similarity()：

>>> from pywsd.similarity import max_similiarity 
>>> max_similarity('my cat likes to eat mice', 'cat', 'wup', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats' 
>>> max_similarity('my cat likes to eat mice', 'cat', 'lin', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

@ Chris，如果你想要一個python setup.py，只是做一個禮貌的請求，我會寫它...

來源

2014-06-22 22:23:28 alvas

不幸的是，準確性非常糟糕。 'lesk（「我的貓喜歡吃老鼠。」，「貓」，「n」）'=>'Synset（'computerized_tomography.n.01'）'。而且pywsd甚至沒有安裝腳本... – Cerin 2014-08-23 02:47:18

親愛的克里斯，你有沒有試過lesk的其他變種？ ESP。 'simple_lesk（）'或'adapted_lesk'？已知原始版本有問題，因此可以在軟件包中找到其他解決方案。 http://en.wikipedia.org/wiki/Lesk_algorithm。另外，我在我的空閒時間裏維護着，這不是我以生活爲目的... – alvas 2014-08-23 16:52:52

是的，我在包裝中嘗試了Lesk的每個變體，而且沒有任何工作在我的樣本語料庫上。我不得不創建一個變體，該變體還使用與該詞相關的所有下標和單數形式的光暈，以獲得少數積極結果，但即便如此，它的準確率也只有15％。這不是你的代碼，這是Lesk的問題。這根本不是可靠的啓發式。 – Cerin 2014-08-24 23:32:58

NLTK中的詞義消歧蟒Python

回答

相關問題