2014-10-06 92 views
6

我想檢查兩個單詞(由用戶給出)之間的上位詞/下位關係,這意味着它們中的任何一個可以是其他的上位詞,或者也可以是存在沒有上位關係兩者之間。我可以使用path_similarity相同。我試圖做這樣的。如果你可以建議任何更好的方法。我也想知道是否更好從sparql查詢相同使用wordnet確定hypernym或hyponym nltk

first=wn.synset('automobile.n.01') 
second=wn.synset('car.n.01') 
first.path_similarity(second) 
+0

您沒有提供任何RDF數據,或者鏈接到任何RDF數據,以及SPARQL是一個RDF查詢語言,所以我們不能真的建議任何SPARQL查詢任何東西。有沒有你感興趣的RDF數據? – 2014-10-06 21:50:44

回答

14

首先,有word和WordNet中synset/concept之間的差。

在這裏我們看到,一個字可以有多種含義(即鏈接到多個概念):

>>> from nltk.corpus import wordnet as wn 
>>> car = 'car' 
>>> auto = 'automobile' 
>>> wn.synsets(auto) 
[Synset('car.n.01'), Synset('automobile.v.01')] 
>>> wn.synsets(car) 
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')] 

而且在這種情況下,「汽車」和「車」可以指同一Synset('car.n.01')如果是的話,那麼他們沒有hypo/hypernym關係。

還有一個lemma這個概念會讓事情變得複雜,所以我們暫時略過。假設你不是比較單詞,而是同義詞,那麼你可以簡單地找到同義詞的所有下位詞,看看其他同義詞是否出現在它內部。

如果你比較通俗的語言,看到How to get all the hyponyms of a word/synset in python nltk and wordnet?

下面將展示如何比較同義詞集。例如緣故,因爲只有對「汽車」和「汽車」

>>> from nltk.corpus import wordnet as wn 
>>> 
>>> fruit = 'fruit' 
>>> wn.synsets(fruit) 
[Synset('fruit.n.01'), Synset('yield.n.03'), Synset('fruit.n.03'), Synset('fruit.v.01'), Synset('fruit.v.02')] 
>>> wn.synsets(fruit)[0].definition() 
u'the ripened reproductive body of a seed plant' 
>>> fruit = wn.synsets(fruit)[0] 
>>> 
>>> apple = 'apple' 
>>> wn.synsets(apple) 
[Synset('apple.n.01'), Synset('apple.n.02')] 
>>> wn.synsets(apple)[0].definition() 
u'fruit with red or yellow or green skin and sweet to tart crisp whitish flesh' 
>>> apple = wn.synsets(apple)[0] 
>>> 

下面一個名詞同義詞集我會用「果」和「蘋果」,這是不是「汽車」和「車」更符合邏輯我們看到,蘋果是不是在水果的直接上下義詞:

>>> fruit.hyponyms() 
[Synset('accessory_fruit.n.01'), Synset('achene.n.01'), Synset('acorn.n.01'), Synset('aggregate_fruit.n.01'), Synset('berry.n.02'), Synset('buckthorn_berry.n.01'), Synset('buffalo_nut.n.01'), Synset('chokecherry.n.01'), Synset('cubeb.n.01'), Synset('drupe.n.01'), Synset('ear.n.05'), Synset('edible_fruit.n.01'), Synset('fruitlet.n.01'), Synset('gourd.n.02'), Synset('hagberry.n.01'), Synset('hip.n.05'), Synset('juniper_berry.n.01'), Synset('marasca.n.01'), Synset('may_apple.n.01'), Synset('olive.n.01'), Synset('pod.n.02'), Synset('pome.n.01'), Synset('prairie_gourd.n.01'), Synset('pyxidium.n.01'), Synset('quandong.n.02'), Synset('rowanberry.n.01'), Synset('schizocarp.n.01'), Synset('seed.n.01'), Synset('wild_cherry.n.01')] 
>>> 
>>> apple in fruit.hyponyms() 
False 

因此,我們必須遍歷了所有的上下義詞,看看蘋果是否是其中之一:

>>> hypofruits = set([i for i in fruit.closure(lambda s:s.hyponyms())]) 
>>> apple in hypofruits 
True 

有你有它!爲了完整起見:

>>> hyperapple = set([i for i in apple.closure(lambda s:s.hypernyms())]) 
>>> fruit in hyperapple 
True 
>>> hypoapple = set([i for i in apple.closure(lambda s:s.hyponyms())]) 
>>> fruit in hypoapple 
False 
>>> hyperfruit = set([i for i in fruit.closure(lambda s:s.hypernyms())]) 
>>> apple in hyperfruit 
False