2011-05-11 93 views
2

我有以下代碼用於從輸入文本文件中提取單詞,並使用WordNet打印單詞的同義詞,定義和例句。它基於詞性來將同義詞從同義詞中分離出來,即,作爲動詞的同義詞和作爲形容詞的同義詞被分別打印。打印詞類以及單詞的同義詞

這個詞大聲疾呼的例子是1)flabbergast,boggle,碗上面是動詞,2)傻眼,dumfounded,flabbergasted,驚愕,雷擊,dumbstruck,dumbstricken是形容詞。

如何打印與同義詞一起的詞性?我所提供的代碼,我有這麼遠低於:


import nltk 
from nltk.corpus import wordnet as wn 
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') 
fp = open('sample.txt','r') 
data = fp.read() 
tokens= nltk.wordpunct_tokenize(data) 
text = nltk.Text(tokens) 
words = [w.lower() for w in text] 
for a in words: 
    print a 
syns = wn.synsets(a) 
for s in syns: 
    print 
    print "definition:" s.definition 
    print "synonyms:" 
    for l in s.lemmas: 
     print l.name 
    print "examples:" 
    for b in s.examples: 
     print b 
    print 

回答

1

看起來你搞砸了你的縮進:

for a in words: 
    print a 
syns = wn.synsets(a) 

好像syns = wn.synsets(a)應該是words在for循環中,因此您可以爲每一個做到這一點一句話:

for w in words: 
    print w 
    syns = wn.synsets(w) 
    for s in syns: 
     print 
     print "definition:", s.definition 
     print "synonyms:" 
     for l in s.lemmas: 
      print l.name 
     print "examples:" 
     for b in s.examples: 
      print b 
    print 
0

引理有synset屬性,它在其pos屬性的演講自己的一部分。所以,如果我們有一個外稃l,我們可以像這樣訪問spech其部分:

>>> l = Lemma('gladden.v.01.joy') 
>>> l.synset.pos 
'v' 

更一般地,我們可以擴展成一個圈這通過你的文件中讀取。我使用with語句,因爲一旦循環完成,它就會很好地關閉文件。

>>> with open('sample.txt') as f: 
...  raw = f.read() 
...  for sentence in nltk.sent_tokenize(raw): 
...   sentence = nltk.wordpunct_tokenize(sentence) 
...   for word in sentence: 
...    for synset in wn.synsets(word): 
...     for lemma in synset.lemmas: 
...      print lemma.name, lemma.synset.pos 
... 

如果你想確保你只用語音爲您正在談論這個詞的同一部分選擇引理,那麼你就需要確定演講的這個詞的部分也:

>>> import nltk 
>>> from nltk.corpus import wordnet as wn 
>>> with open('sample.txt') as f: 
...  raw = f.read() 
...  for sentence in nltk.sent_tokenize(raw): 
...   sentence = nltk.pos_tag(nltk.wordpunct_tokenize(sentence)) 
...   for word, pos in sentence: 
...    print word, pos 

我會把這兩個作爲練習給讀者使用。

+0

pos函數給我以下錯誤:lemma.synset.pos AttributeError:'function'對象沒有屬性'pos' – 2015-05-20 09:52:40

+0

感謝您的註釋。自從我寫這個答案以來,NLTK API已經發生了變化。我會找一些時間來更新這個答案。 – 2015-05-20 19:50:24

+0

你能告訴我可以給pos使用引理和synset的函數嗎 – 2015-05-20 20:43:53