2016-11-15 35 views
0

我發現在train.txt中訓練情感模型的數據是PTB格式,看起來像這樣。創建另一個train.txt來訓練其他域的情感模型

(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .)))) 

其真正的句子應該是

Yet the act is still charming here. 

但是解析後,我得到了不同的結構

(ROOT (S (CC Yet) (NP (DT the) (NN act)) (VP (VBZ is) (ADJP (RB still) (JJ charming)) (ADVP (RB here))) (. .))) 

跟隨我的代碼:

public static void main(String args[]){ 
    // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
    Properties props = new Properties(); 
    props.setProperty("annotators", "tokenize, ssplit,parse"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

    // read some text in the text variable 
    String text = "Yet the act is still charming here .";// Add your text here! 

    // create an empty Annotation just with the given text 
    Annotation annotation = new Annotation(text); 

    // run all Annotators on this text 

    pipeline.annotate(annotation); 

    // these are all the sentences in this document 
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types 
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class); 

    // int sentiment = 0; 
    for(CoreMap sentence: sentences) { 
     // traversing the words in the current sentence 
     Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class); 
     System.out.println(tree); 
     // System.out.println(tree.yield()); 
     tree.pennPrint(System.out); 
     // Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class); 
     // sentiment = RNNCoreAnnotations.getPredictedClass(tree); 
    } 

    // System.out.print(sentiment); 
} 

然後兩個問題出現當我使用m y自己的句子來創建train.txt。

1.我的樹不同於train.txt中的樹,我知道後者中的數字是情感的極性。但似乎樹結構不同,我想要得到一個二值化的分析樹,它可能看起來像這樣

((Yet) (((the) (act)) ((((is) ((still) (charming))) (here)) (.)))) 

一旦我得到的感悟號碼,我可以填滿它讓我自己train.txt

2.How得到的二值化解析樹的每個節點都短語,在這個例子中,我應該得到

Yet 
the 
act 
the act 
is 
still 
charming 
still charming 
is still charming 
here 
is still charming here 
. 
is still charming here . 
the act is still charming here . 
Yet the act is still charming here. 

一旦我得到它們,我可以花錢註釋他們的人類註解。

其實我google了他們很多,但不能解決它們,所以我張貼here.Any有用的答案將不勝感激!

回答

2

這添加到屬性來獲取二叉樹:

props.setProperty("parse.binaryTrees", "true"); 

這句話的二叉樹將要訪問的是這樣的:

Tree tree = sentence.set(TreeCoreAnnotations.BinarizedTreeAnnotation.class); 

下面是一些示例代碼,我寫了:

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.Word; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.trees.*; 

import java.util.ArrayList; 
import java.util.Properties; 

public class SubTreesExample { 

    public static void printSubTrees(Tree inputTree, String spacing) { 
     if (inputTree.isLeaf()) { 
      return; 
     } 
     ArrayList<Word> words = new ArrayList<Word>(); 
     for (Tree leaf : inputTree.getLeaves()) { 
      words.addAll(leaf.yieldWords()); 
     } 
     System.out.print(spacing+inputTree.label()+"\t"); 
     for (Word w : words) { 
      System.out.print(w.word()+ " "); 
     } 
     System.out.println(); 
     for (Tree subTree : inputTree.children()) { 
      printSubTrees(subTree, spacing + " "); 
     } 
    } 

    public static void main(String[] args) { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse"); 
     props.setProperty("parse.binaryTrees", "true"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "Yet the act is still charming here."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
       TreeCoreAnnotations.BinarizedTreeAnnotation.class); 
     System.out.println("Penn tree:"); 
     sentenceTree.pennPrint(System.out); 
     System.out.println(); 
     System.out.println("Phrases:"); 
     printSubTrees(sentenceTree, ""); 

    } 
} 
+0

太棒了!如果我想訓練一箇中國情感模型,那麼train.txt中的語句仍然需要進行二進制解析? @StanfordNLPHelp – ryh