2012-08-06 87 views
18

我是一個Java和斯坦福NLP工具包的新手,並試圖將它們用於項目。具體來說,我試圖使用Stanford Corenlp工具包來註釋文本(使用Netbeans而不是命令行),並試圖使用http://nlp.stanford.edu/software/corenlp.shtml#Usage(使用Stanford CoreNLP API)上提供的代碼。問題是:任何人都可以告訴我如何我可以在一個文件中得到輸出,以便我可以進一步處理它?stanford核心nlp java輸出

我試過打印圖形和句子到控制檯,只是爲了看內容。這樣可行。基本上我需要的是返回註釋文檔,以便我可以從我的主類調用它並輸出一個文本文件(如果可能的話)。我試圖尋找斯坦福corenlp的API,但鑑於我缺乏經驗,我不知道返回這種信息的最佳方式是什麼。

下面是代碼:

Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

    // read some text in the text variable 
    String text = "the quick fox jumps over the lazy dog"; 

    // create an empty Annotation just with the given text 
    Annotation document = new Annotation(text); 

    // run all Annotators on this text 
    pipeline.annotate(document); 

    // these are all the sentences in this document 
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types 
    List<CoreMap> sentences = document.get(SentencesAnnotation.class); 

    for(CoreMap sentence: sentences) { 
     // traversing the words in the current sentence 
     // a CoreLabel is a CoreMap with additional token-specific methods 
     for (CoreLabel token: sentence.get(TokensAnnotation.class)) { 
     // this is the text of the token 
     String word = token.get(TextAnnotation.class); 
     // this is the POS tag of the token 
     String pos = token.get(PartOfSpeechAnnotation.class); 
     // this is the NER label of the token 
     String ne = token.get(NamedEntityTagAnnotation.class);  
     } 

     // this is the parse tree of the current sentence 
     Tree tree = sentence.get(TreeAnnotation.class); 

     // this is the Stanford dependency graph of the current sentence 
     SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class); 
    } 

    // This is the coreference link graph 
    // Each chain stores a set of mentions that link to each other, 
    // along with a method for getting the most representative mention 
    // Both sentence and token offsets start at 1! 
    Map<Integer, CorefChain> graph = 
     document.get(CorefChainAnnotation.class); 
+0

我試着打印圖形和句子到控制檯,正好看到內容。這樣可行。基本上我需要的是返回註釋文檔,以便我可以從我的主類調用它並輸出一個文本文件(如果可能的話)。我試圖在斯坦福corenlp的API,但我真的不知道什麼是最好的方式來返回這樣的信息,由於我缺乏經驗..預先感謝您 – SophieM 2012-08-07 11:21:03

+0

@SophieM我已添加該信息的問題。在未來,您可以自由地通過編輯來做到這一點(您甚至可以獲得徽章!) – SomeKittens 2012-08-07 14:15:03

+0

謝謝! @SomeKittens – SophieM 2012-08-07 14:31:38

回答

24

一旦你的任何或全部的自然語言的分析顯示在你的代碼示例中,所有你需要做的是將其發送到一個文件在正常的Java時尚,例如,使用FileWriter進行文本格式輸出。具體而言,這裏是一個說明發送到文件(如果你給它適當的命令行參數)輸出一個簡單的完整的例子:

import java.io.*; 
import java.util.*; 

import edu.stanford.nlp.io.*; 
import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 
import edu.stanford.nlp.trees.*; 
import edu.stanford.nlp.util.*; 

public class StanfordCoreNlpDemo { 

    public static void main(String[] args) throws IOException { 
    PrintWriter out; 
    if (args.length > 1) { 
     out = new PrintWriter(args[1]); 
    } else { 
     out = new PrintWriter(System.out); 
    } 
    PrintWriter xmlOut = null; 
    if (args.length > 2) { 
     xmlOut = new PrintWriter(args[2]); 
    } 

    StanfordCoreNLP pipeline = new StanfordCoreNLP(); 
    Annotation annotation; 
    if (args.length > 0) { 
     annotation = new Annotation(IOUtils.slurpFileNoExceptions(args[0])); 
    } else { 
     annotation = new Annotation("Kosgi Santosh sent an email to Stanford University. He didn't get a reply."); 
    } 

    pipeline.annotate(annotation); 
    pipeline.prettyPrint(annotation, out); 
    if (xmlOut != null) { 
     pipeline.xmlPrint(annotation, xmlOut); 
    } 
    // An Annotation is a Map and you can get and use the various analyses individually. 
    // For instance, this gets the parse tree of the first sentence in the text. 
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class); 
    if (sentences != null && sentences.size() > 0) { 
     CoreMap sentence = sentences.get(0); 
     Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class); 
     out.println(); 
     out.println("The first sentence parsed is:"); 
     tree.pennPrint(out); 
    } 
    } 

} 
+3

謝謝你一百萬次,@Christopher Manning – SophieM 2012-08-14 08:18:39