基於規則的空間實體匹配器

我想使用python庫spacy來匹配文本中的記號（將標籤添加爲語義引用）。然後，我想用這些匹配來提取令牌之間的關係。我的第一個是使用空間的matcher.add和matcher.add_pattern。該matcher.add工作正常，我能找到的標記，我的代碼至今：基於規則的空間實體匹配器

import spacy 


nlp = spacy.load('en') 

def merge_phrases(matcher, doc, i, matches): 
    if i != len(matches)-1: 
     return None 
    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches] 
    for ent_id, label, span in spans: 
     span.merge('NNP' if label else span.root.tag_, span.text, nlp.vocab.strings[label]) 



matcher = spacy.matcher.Matcher(nlp.vocab) 



matcher.add(entity_key='1', label='FINANCE', attrs={}, specs=[[{spacy.attrs.ORTH: 'financial'}, {spacy.attrs.ORTH: 'instrument'}]], on_match=merge_phrases) 
matcher.add(entity_key='2', label='BUYER', attrs={}, specs=[[{spacy.attrs.ORTH: 'acquirer'}]], on_match=merge_phrases) 
matcher.add(entity_key='3', label='CODE', attrs={}, specs=[[{spacy.attrs.ORTH: 'Code'}]], on_match=merge_phrases)

這工作得很好，它輸出相當不錯的結果：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.') 

# Output 
['Code|CODE', 'used|', 'to|', 'identify|', 'the|', 'acquirer|BUYER', 'of|', 'the|', 'financial instrument|FINANCE', '.|']

我的問題是，我如何使用matcher.add_patern匹配標記之間的關係，有點像

matcher.add_pattern("IS_OF", [{BUYER}, {'of'}, {FINANCE}])

的輸出：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.') 

# Output 
[acquirer of financial instrument]

我嘗試過不同的方式來使這個作品，但顯然不是，我想我的理解matcher.add_pattern有什麼問題。

有些請讓我在正確的方向如何做到這一點 spacy？
是否有可能在這裏添加正則表達式來查找模式，怎麼樣？
如何添加具有相同標籤的多個標記，或者以某種方式爲相同標籤創建標記列表，例如。「金融」？

我會很感激任何意見。

來源

2017-04-13 El_Patrón

您的匹配器會識別令牌，但要找到它們之間的關係，您必須執行依賴關係解析。這裏是visual example from spacy：

然後，您可以遍歷樹找到標記之間的關係。每個令牌的 https://spacy.io/docs/usage/dependency-parse#navigating

的DEP（ENUM）和dep_（詳細名稱）屬性會給你的關係，與其子

來源

2017-04-14 19:35:08 DhruvPathak

謝謝您的回答，它有很大幫助。我想知道是否能夠更方便地訓練指定的entitiy模型，以便在我的源代碼中找到新的相關實體，然後找到實體之間的關係。有一些關於這個使用NLTK的文檔，但你如何用spacy來處理這個問題，我的意思是關係提取部分？ –

您能否提供一個依賴解析的例子，這與spacy-matcher兼容，還是我在這裏得到錯誤的想法？ –

@El_Patrón答案中提供的鏈接有示例，是的，它將與spacy-mathcher兼容，因爲依賴關係解析結果是spacy令牌本身作爲dep和dep_存在的屬性 – DhruvPathak

基於規則的空間實體匹配器

回答

相關問題