0
我發現在train.txt中訓練情感模型的數據是PTB格式,看起來像這樣。創建另一個train.txt來訓練其他域的情感模型
(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .))))
其真正的句子應該是
Yet the act is still charming here.
但是解析後,我得到了不同的結構
(ROOT (S (CC Yet) (NP (DT the) (NN act)) (VP (VBZ is) (ADJP (RB still) (JJ charming)) (ADVP (RB here))) (. .)))
跟隨我的代碼:
public static void main(String args[]){
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "Yet the act is still charming here .";// Add your text here!
// create an empty Annotation just with the given text
Annotation annotation = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(annotation);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
// int sentiment = 0;
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
System.out.println(tree);
// System.out.println(tree.yield());
tree.pennPrint(System.out);
// Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
// sentiment = RNNCoreAnnotations.getPredictedClass(tree);
}
// System.out.print(sentiment);
}
然後兩個問題出現當我使用m y自己的句子來創建train.txt。
1.我的樹不同於train.txt中的樹,我知道後者中的數字是情感的極性。但似乎樹結構不同,我想要得到一個二值化的分析樹,它可能看起來像這樣
((Yet) (((the) (act)) ((((is) ((still) (charming))) (here)) (.))))
一旦我得到的感悟號碼,我可以填滿它讓我自己train.txt
2.How得到的二值化解析樹的每個節點都短語,在這個例子中,我應該得到
Yet
the
act
the act
is
still
charming
still charming
is still charming
here
is still charming here
.
is still charming here .
the act is still charming here .
Yet the act is still charming here.
一旦我得到它們,我可以花錢註釋他們的人類註解。
其實我google了他們很多,但不能解決它們,所以我張貼here.Any有用的答案將不勝感激!
太棒了!如果我想訓練一箇中國情感模型,那麼train.txt中的語句仍然需要進行二進制解析? @StanfordNLPHelp – ryh