2017-01-16 143 views
2

我處理一個簡單的句子來測試斯坦福大學的RelationExtractor斯坦福CoreNLP服務器的JSON響應缺少RelationExtractor註釋

微軟的總部設在紐約。

(它不是)

當我註釋的句子在Java中,通過直接使用CoreNLP jar文件,我得到了想要的結果 - CoreNLP發現微軟之間的OrgBased_In關係紐約

for (CoreMap sentence : sentences) { 
    relationType = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class).get(0).type // => OrgBased_In 
} 

然而,發送相同的句子到CoreNLP Server像這樣:

curl --data 'Microsoft is based in New York.' 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%2Clemma%2Cner%2Cparse%2Cdepparse%2Crelation%22%2C%22outputFormat%22%3A%22json%22}' -o - 

結果中包含的關係沒有數據任何的JSON響應:

{'sentences': [{'basicDependencies': [{'dep': 'ROOT', 
            'dependent': 3, 
            'dependentGloss': 'based', 
            'governor': 0, 
            'governorGloss': 'ROOT'}, 
            {'dep': 'nsubjpass', 
            'dependent': 1, 
            'dependentGloss': 'Microsoft', 
            'governor': 3, 
            'governorGloss': 'based'}, 
            {'dep': 'auxpass', 
            'dependent': 2, 
            'dependentGloss': 'is', 
            'governor': 3, 
            'governorGloss': 'based'}, 
            {'dep': 'case', 
            'dependent': 4, 
            'dependentGloss': 'in', 
            'governor': 6, 
            'governorGloss': 'York'}, 
            {'dep': 'compound', 
            'dependent': 5, 
            'dependentGloss': 'New', 
            'governor': 6, 
            'governorGloss': 'York'}, 
            {'dep': 'nmod', 
            'dependent': 6, 
            'dependentGloss': 'York', 
            'governor': 3, 
            'governorGloss': 'based'}, 
            {'dep': 'punct', 
            'dependent': 7, 
            'dependentGloss': '.', 
            'governor': 3, 
            'governorGloss': 'based'}], 
      'enhancedDependencies': [{'dep': 'ROOT', 
             'dependent': 3, 
             'dependentGloss': 'based', 
             'governor': 0, 
             'governorGloss': 'ROOT'}, 
            {'dep': 'nsubjpass', 
             'dependent': 1, 
             'dependentGloss': 'Microsoft', 
             'governor': 3, 
             'governorGloss': 'based'}, 
            {'dep': 'auxpass', 
             'dependent': 2, 
             'dependentGloss': 'is', 
             'governor': 3, 
             'governorGloss': 'based'}, 
            {'dep': 'case', 
             'dependent': 4, 
             'dependentGloss': 'in', 
             'governor': 6, 
             'governorGloss': 'York'}, 
            {'dep': 'compound', 
             'dependent': 5, 
             'dependentGloss': 'New', 
             'governor': 6, 
             'governorGloss': 'York'}, 
            {'dep': 'nmod:in', 
             'dependent': 6, 
             'dependentGloss': 'York', 
             'governor': 3, 
             'governorGloss': 'based'}, 
            {'dep': 'punct', 
             'dependent': 7, 
             'dependentGloss': '.', 
             'governor': 3, 
             'governorGloss': 'based'}], 
      'enhancedPlusPlusDependencies': [{'dep': 'ROOT', 
               'dependent': 3, 
               'dependentGloss': 'based', 
               'governor': 0, 
               'governorGloss': 'ROOT'}, 
              {'dep': 'nsubjpass', 
               'dependent': 1, 
               'dependentGloss': 'Microsoft', 
               'governor': 3, 
               'governorGloss': 'based'}, 
              {'dep': 'auxpass', 
               'dependent': 2, 
               'dependentGloss': 'is', 
               'governor': 3, 
               'governorGloss': 'based'}, 
              {'dep': 'case', 
               'dependent': 4, 
               'dependentGloss': 'in', 
               'governor': 6, 
               'governorGloss': 'York'}, 
              {'dep': 'compound', 
               'dependent': 5, 
               'dependentGloss': 'New', 
               'governor': 6, 
               'governorGloss': 'York'}, 
              {'dep': 'nmod:in', 
               'dependent': 6, 
               'dependentGloss': 'York', 
               'governor': 3, 
               'governorGloss': 'based'}, 
              {'dep': 'punct', 
               'dependent': 7, 
               'dependentGloss': '.', 
               'governor': 3, 
               'governorGloss': 'based'}], 
      'index': 0, 
      'parse': '(ROOT\n' 
        ' (S\n' 
        ' (NP (NNP Microsoft))\n' 
        ' (VP (VBZ is)\n' 
        '  (VP (VBN based)\n' 
        '  (PP (IN in)\n' 
        '   (NP (NNP New) (NNP York)))))\n' 
        ' (. .)))', 
      'tokens': [{'after': ' ', 
         'before': '', 
         'characterOffsetBegin': 0, 
         'characterOffsetEnd': 9, 
         'index': 1, 
         'lemma': 'Microsoft', 
         'ner': 'ORGANIZATION', 
         'originalText': 'Microsoft', 
         'pos': 'NNP', 
         'word': 'Microsoft'}, 
         {'after': ' ', 
         'before': ' ', 
         'characterOffsetBegin': 10, 
         'characterOffsetEnd': 12, 
         'index': 2, 
         'lemma': 'be', 
         'ner': 'O', 
         'originalText': 'is', 
         'pos': 'VBZ', 
         'word': 'is'}, 
         {'after': ' ', 
         'before': ' ', 
         'characterOffsetBegin': 13, 
         'characterOffsetEnd': 18, 
         'index': 3, 
         'lemma': 'base', 
         'ner': 'O', 
         'originalText': 'based', 
         'pos': 'VBN', 
         'word': 'based'}, 
         {'after': ' ', 
         'before': ' ', 
         'characterOffsetBegin': 19, 
         'characterOffsetEnd': 21, 
         'index': 4, 
         'lemma': 'in', 
         'ner': 'O', 
         'originalText': 'in', 
         'pos': 'IN', 
         'word': 'in'}, 
         {'after': ' ', 
         'before': ' ', 
         'characterOffsetBegin': 22, 
         'characterOffsetEnd': 25, 
         'index': 5, 
         'lemma': 'New', 
         'ner': 'LOCATION', 
         'originalText': 'New', 
         'pos': 'NNP', 
         'word': 'New'}, 
         {'after': '', 
         'before': ' ', 
         'characterOffsetBegin': 26, 
         'characterOffsetEnd': 30, 
         'index': 6, 
         'lemma': 'York', 
         'ner': 'LOCATION', 
         'originalText': 'York', 
         'pos': 'NNP', 
         'word': 'York'}, 
         {'after': '', 
         'before': '', 
         'characterOffsetBegin': 30, 
         'characterOffsetEnd': 31, 
         'index': 7, 
         'lemma': '.', 
         'ner': 'O', 
         'originalText': '.', 
         'pos': '.', 
         'word': '.'}]}]} 

我可以看到CoreNLP服務器終端的關係抽取模型已加載。

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.RelationExtractorAnnotator - Loading relation model from edu/stanford/nlp/models/supervised_relation_extractor/roth_relation_model_pipelineNER.ser 

我在這裏錯過了什麼?

謝謝!

回答

3

我認爲最終沒有人將該輸出添加到該註釋器的JSON中,我們最終可以做到這一點。

現在我們主要支持的關係抽取是新的kbp註釋器。這從TAC-KBP挑戰中提取關係。

您可以找到關係的描述在這裏: https://tac.nist.gov//2015/KBP/ColdStart/guidelines/TAC_KBP_2015_Slot_Descriptions_V1.0.pdf

這裏是我跑的示例命令:

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,entitymentions,coref,kbp -file microsoft-example.txt -outputFormat json 

如果你看一下JSON你會看到正確的關係已被提取。

+1

我的目標最終是使用新的關係類型來訓練關係提取器模型,所以我認爲KBP不是我正在尋找的。我想我必須爲關係提取器推出我自己的包裝器。感謝您的快速回復! – Simon