在Python中爲NLTK命名實體識別。識別NE

我需要將單詞分類爲他們的詞類。就像一個動詞，名詞，副詞等。我用在Python中爲NLTK命名實體識別。識別NE

nltk.word_tokenize() #to identify word in a sentence 
nltk.pos_tag()  #to identify the parts of speech 
nltk.ne_chunk()  #to identify Named entities.

的放出來的，這是一棵樹。如

>>> sentence = "I am Jhon from America" 
>>> sent1 = nltk.word_tokenize(sentence) 
>>> sent2 = nltk.pos_tag(sent1) 
>>> sent3 = nltk.ne_chunk(sent2, binary=True) 
>>> sent3 
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])

當訪問此樹中的元素，我做到了，如下所示：

>>> sent3[0] 
('I', 'PRP') 
>>> sent3[0][0] 
'I' 
>>> sent3[0][1] 
'PRP'

但訪問命名實體時：

>>> sent3[2] 
Tree('NE', [('Jhon', 'NNP')]) 
>>> sent3[2][0] 
('Jhon', 'NNP') 
>>> sent3[2][1]  
Traceback (most recent call last): 
    File "<pyshell#121>", line 1, in <module> 
    sent3[2][1] 
    File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__ 
    return list.__getitem__(self, index) 
IndexError: list index out of range

我得到上述錯誤。

我想要的是得到輸出爲'NE'類似於以前的'PRP'，所以我無法確定哪個單詞是一個命名實體。是否有任何方式與Python中的NLTK做這個？如果是這樣，請發佈命令。或者在樹庫中有一個函數來做到這一點？我需要的節點值'NE'

來源

2011-04-18 Asl506

這個答案可能是關閉的基礎，在這種情況下，我會刪除它，因爲我沒有NLTK安裝在這裏試試，但我認爲你可以做到：

>>> sent3[2].node 
    'NE'

sent3[2][0]返回樹的第一個孩子，而不是節點本身

編輯：我嘗試這樣做，當我回到家，它確實工作。

來源

2011-04-18 20:58:10 bdk

看着節點屬性之前，你要檢查isinstance（sent3 [2]，樹）（在從nltk.tree導入樹之後）。 – Jacob 2011-04-19 16:00:56

@Jacob感謝隊友，真的很有幫助。我面臨的下一個問題是如何知道一個元素是否是一棵樹。因爲我需要使用for循環遍歷元素。 **如果isinstance（sent3 [2]，樹）**是我一直在尋找這一切。再次感謝。當前版本（3.1）中的 – Asl506 2011-04-20 15:22:34

'node'被替換爲'label（）' – Vladimir 2016-01-21 14:36:33

下面是我的代碼：

chunks = ne_chunk(postags, binary=True) 
for c in chunks: 
    if hasattr(c, 'node'): 
    myNE.append(' '.join(i[0] for i in c.leaves()))

來源

2013-02-15 05:11:49

我BDK同意

sent3[2].node

O/P - '東北'

我認爲在NLTK做無功能它解決方案將工作，但供參考，你可以檢查here

的循環問題，你可以這樣做： -

for i in range(len(sent3)): 
    if "NE" in str(sent3[i]): 
      print sent3[i].node

我在NLTK執行這一點，它工作正常..

來源

2013-10-09 11:18:15

現在sent3 [2] .node已經過時。

使用sent3 [2] .label（），而不是

來源

2017-04-11 17:34:02 sanju

這將工作

for sent in chunked_sentences: 
    for chunk in sent: 
    if hasattr(chunk, "label"): 
     print(chunk.label())

來源

2017-08-28 19:11:09

在Python中爲NLTK命名實體識別。識別NE

回答

相關問題