如何打印Wordnet的全部內容（最好使用NLTK）？

NLTK提供打印Brown（或Gutenberg）語料庫中所有單詞的功能。但是等效函數在Wordnet上似乎不起作用。如何打印Wordnet的全部內容（最好使用NLTK）？

有沒有辦法通過NLTK來做到這一點？如果沒有，那麼會怎麼做呢？

這工作：

from nltk.corpus import brown as b 
print b.words()

這將導致一個AttributeError：

from nltk.corpus import wordnet as wn 
print wn.words()

來源

2015-11-05 zadrozny

對於wordnet來說，這是一個詞義資源，所以資源中的元素被索引（又名synsets）索引。

要通過synsets迭代：

>>> from nltk.corpus import wordnet as wn 
>>> for ss in wn.all_synsets(): 
...  print ss 
...  print ss.definition() 
...  break 
... 
Synset('able.a.01') 
(usually followed by `to') having the necessary means or skill or know-how or authority to do something

對於每個同義詞集（有義/概念），有連接到它的單詞列表，稱爲lemmas：引理是的規範（「根」）形式當我們檢查字典時，我們使用的詞語。

要使用一個班輪得到引理的完整清單，共發現：

>>> lemmas_in_wordnet = set(chain(*[ss.lemma_names() for ss in wn.all_synsets()]))

有趣的是，wn.words()也將返回所有lemma_names：

>>> lemmas_in_words = set(i for i in wn.words()) 
>>> len(lemmas_in_wordnet) 
148730 
>>> len(lemmas_in_words) 
147306

但奇怪的是有一些出入至於使用wn.words()收集的單詞總數。

「打印全部內容」的WordNet的成文字似乎被什麼東西過於雄心勃勃，因爲wordnet的結構有點像一個層次圖，以相互連接同義詞集，並且每個同義詞集都有自己的屬性/屬性。這就是爲什麼Wordnet文件不能簡單地保存爲單個文本文件的原因。

要查看一個同義詞集包含：

>>> first_synset = next(wn.all_synsets()) 
>>> dir(first_synset) 
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexname', 'lin_similarity', 'lowest_common_hypernyms', 'max_depth', 'member_holonyms', 'member_meronyms', 'min_depth', 'name', 'offset', 'part_holonyms', 'part_meronyms', 'path_similarity', 'pos', 'region_domains', 'res_similarity', 'root_hypernyms', 'shortest_path_distance', 'similar_tos', 'substance_holonyms', 'substance_meronyms', 'topic_domains', 'tree', 'unicode_repr', 'usage_domains', 'verb_groups', 'wup_similarity']

通過這個howto走出去將是知道如何訪問您的共發現需要的信息有所幫助：http://www.nltk.org/howto/wordnet.html

來源

2015-11-05 07:32:20 alvas

我正在使用NLTK 3.0.3和'lemmas_in_words = set（我爲我在wn.words（））'給我：AttributeError：'WordNetCorpusReader'對象沒有屬性'字' – zadrozny

升級它到NLTK 3.1'pip install -U nltk' =） – alvas

謝謝。那就是訣竅。順便說一句，我看到你在臺大。我今年訪問了TW幾個月，非常喜歡它。 – zadrozny

請嘗試以下方法：

for word in wn.words(): 
    print word

這應該工作，因爲wn.words()實際上是產生一個迭代器字符串序列，而不是像這樣的字符串列表。 for循環導致迭代器一次生成一個單詞。

來源

2015-11-05 03:50:57

我：AttributeError的：' WordNetCorpusReader'對象沒有任何屬性'字' – zadrozny

嗯......試試以下兩行代碼，看看你是否得到和我一樣的迴應：wn.words Out [10]：> wn。單詞（）輸出[11]： –

不，對不起。你使用哪種NLTK？我在3.0.4 – zadrozny

from nltk.corpus import wordnet as wn 
synonyms=[] 
for word in wn.words(): 
    print (word,end=":") 
    for syn in wn.synsets(word): 
     for l in syn.lemmas(): 
     synonyms.append(l.name()) 
    print(set(synonyms),end="\n") 
    synonyms.clear()

來源

2018-03-06 05:23:27 Raveena

有點解釋會很好！ – Vaibhav

這將生成synset中所有單詞的同義詞輸出 – Raveena

請編輯您的問題的詳細信息。只有代碼在他們的答案往往會被標記爲刪除，因爲他們是「低質量」。 – Graham

如何打印Wordnet的全部內容（最好使用NLTK）？

回答

相關問題