Stanford CorpNLP返回錯誤結果

我試圖用stanford corenlp跟在this之後的問題。我的環境是： -Stanford CorpNLP返回錯誤結果

的Java 1.7
的Eclipse 3.4.0
StandfordCoreNLP版本3.4.1（downloaded from here）。

我的代碼片斷是： -

//...........lemmatization starts........................ 

    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false); 
    String text = "painting"; 
    Annotation document = pipeline.process(text); 

    List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class); 

    for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

    {  
     for(CoreLabel token: sentence.get(TokensAnnotation.class)) 
     {  
      String word = token.get(TextAnnotation.class);  
      String lemma = token.get(LemmaAnnotation.class); 
      System.out.println("lemmatized version :" + lemma); 
     } 
    } 

    //...........lemmatization ends.........................

輸出我得到的是： -

lemmatized version :painting

在那裏我希望

lemmatized version :paint

請賜教。

來源

2015-02-23 jaykio77

這個例子中的問題是，單詞繪畫可以是的現在分詞，用於繪製或名詞，而且lemmatizer的輸出取決於分配給原始單詞的詞性標記。

如果您只在片段繪畫上運行標記器，則沒有可幫助標記器（或人類）決定如何標記該單詞的上下文。在這種情況下，它挑選標記NN和名詞的詞條繪畫實際上是繪畫。

如果您使用「我正在畫一朵花」這句話來運行相同的代碼。標記器應該正確標記繪製爲VBG並且應該返回繪製。

來源

2015-02-23 18:58:30

多數民衆贊成。但是如果我有像「繪畫」這樣的詞並且我需要「畫」出來的話。我應該使用哪些其他API /工具？我無法發送一個句子給API。 – jaykio77 2015-02-23 19:21:05

如果標籤取決於上下文，將不會有任何工具能夠基於單個詞推斷正確的POS標籤。但是，如果您事先知道該詞將是動詞，則可以手動標記該詞並運行lemmatizer。 – 2015-02-23 19:42:14

好的。我會研究它。至少我現在有一個指導 – jaykio77 2015-02-24 17:09:43

Stanford CorpNLP返回錯誤結果

回答

相關問題