斯坦福分析器：frenchFactored.ser.gz

我使用法語的斯坦福分析器（版本3.6.0）。我的命令行是斯坦福分析器：frenchFactored.ser.gz

java -cp stanford-parser.jar:* edu.stanford.nlp.parser.lexparser.LexicalizedParser -maxlength 30 -outputFormat conll2007 frenchFactored.ser.gz test_french.txt > test_french.conll10

但我不明白的輸出的功能，請參閱：

1濟_ CLS CLS _ 2 NULL _

2奶源_ _ VV 0根_ _

3 DES _ _ PP 2 NULL _ _

4 POMMES _ _ NN 3 NULL _ _

，電話：

5。 _ PUNC PUNC _ 2 NULL _ _

我可能在命令行中錯過了什麼？

來源

2016-03-04 starckman

斯坦福CoreNLP 3.6.0中有一個深度學習的法語依賴解析器。

下載斯坦福CoreNLP 3.6.0這裏：

http://stanfordnlp.github.io/CoreNLP/download.html

而且一定要得到法國車型罐子，這也是可用的頁面上。

然後運行這個命令來使用法語依賴解析器，確保在你的CLASSPATH法國車型的jar：

java -Xmx6g -cp "*:stanford-corenlp-full-2015-12-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -file sample-french-document.txt -outputFormat text

來源

2016-03-06 10:37:48 StanfordNLPHelp

感謝您的回覆！ – starckman

我給這個命令：java -mx1g -cp stanford-corenlp-3.7.0.jar：stanford-french-corenlp-2016-10-31-models.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french .properties -annotators tokenize，ssplit，pos，depparse -file /Users/Rafael/Desktop/LANGAGES/CORPUS/Sentences_FR/3aube_schtrouFR30.txt -outputFormat sortie.txt但我得到這個錯誤信息無法打開「edu/stanford/nlp/models/pos-tagger/french/french.tagger「作爲類路徑，文件名或URL – starckman

這些jar文件是否存在於您運行此命令的目錄中。你得到這個錯誤是因爲某些原因，法語模型jar不在你的CLASSPATH中。如果你在法語模型jar上做了jar -tf，你會看到標記文件存在。 – StanfordNLPHelp

有什麼不對您的命令：

已知的格式有：ONELINE，佩恩，latexTree，xmlTree，也就是說，wordsAndTags，rootSymbolOnly，依賴關係，typedDependencies，typedDependenciesCollapsed，搭配，semanticGraph，conllStyleDependencies，conll2007 。最後兩個都是製表符分隔值格式。 後者有更多的列填充下劃線。 [...]

來源：http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreePrint.html

你可以嘗試其他-outputFormat。

來源

2016-03-04 15:04:09 mejem

感謝，對中國解析器（xinhuaFactored.ser.gz）我得到這樣nsubj，auxpass的語法功能等，但與法國一個正如你所看到的，我只能得到「NULL」，這是否意味着函數註釋在斯坦福分析器中對於法語不可用？ – starckman

它也適用於英語（我現在嘗試過）。它似乎沒有爲法語實施。所以你的命令很好，但解析器不能像你期望的那樣工作。 – mejem

好的，這就是我在這裏閱讀的https://mailman.stanford.edu/pipermail/parser-user/2014-June/002937.html：「我們還沒有（直接）依賴解析器，而是解析爲選區然後轉換爲英文和中文，你需要以類似的方式轉換法語依賴關係的解析樹，或者使用其他一些組的依賴解析器，這並非不可能，但這將是一大堆工作。但因爲它的日期是2014年6月，所以我不確定它是否仍然如此。謝謝！ – starckman

您所查詢的是好的，但斯坦福解析器不支持此尚未（版本3.6.0）。

以下代碼在使用法語模式時會打印「false」。您正在使用的命令會在內部檢查此內容，並在虛假時安靜地避免分析。

System.out.println(
    LexicalizedParser 
    .loadModel("frenchFactored.ser.gz") 
    .treebankLanguagePack() 
    .supportsGrammaticalStructures() 
);

這就是爲什麼我使用麥芽解析器（http://www.maltparser.org/）。

如果你喜歡的以下輸出：

1 Je  Je  C CLS  null 2 suj  _ _ 
2 mange mange V V  null 0 root _ _ 
3 des  des  P P  null 2 mod  _ _ 
4 pommes pommes N N  null 3 obj  _ _ 
5 .  .  P PUNC null 2 mod  _ _

然後使用以下代碼生成它（不能簡單地使用命令行）。我使用這兩個斯坦福大學和麥芽來實現：

LexicalizedParser lexParser = LexicalizedParser.loadModel("frenchFactored.ser.gz"); 
TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), ""); 
ConcurrentMaltParserModel parserModel = ConcurrentMaltParserService.initializeParserModel(new File("fremalt-1.7.mco")); 

Tokenizer<CoreLabel> tok = tokenizerFactory.getTokenizer(new StringReader("Je mange des pommes.")); 
List<CoreLabel> rawWords2 = tok.tokenize(); 
Tree parse = lexParser.apply(rawWords2); 

// The malt parser requires token in the MaltTab format (Connll). 
// Instead of using the Stanford tagger, we could have used Melt or another parser. 
String[] tokens = parse.taggedLabeledYield().stream() 
    .map(word -> { 
     CoreLabel w = (CoreLabel)word; 
     String lemma = Morphology.lemmatizeStatic(new WordTag(w.word(), w.tag())).word(); 
     String tag = w.value(); 

     return String.join("\t", new String[]{ 
      String.valueOf(w.index()+1), 
      w.word(), 
      lemma != null ? lemma : w.word(), 
      tag != null ? String.valueOf(tag.charAt(0)) : "_", 
      tag != null ? tag : "_" 
     }); 
    }) 
    .toArray(String[]::new); 

ConcurrentDependencyGraph graph = parserModel.parse(tokens); 
System.out.println(graph);

從那裏，你可以通過編程方式使用遍歷圖形：

graph.nTokenNodes()

如果你使用Maven，只需添加以下依賴你POM：

<dependency> 
    <groupId>org.maltparser</groupId> 
    <artifactId>maltparser</artifactId> 
    <version>1.8.1</version> 
</dependency> 
<dependency> 
    <groupId>edu.stanford.nlp</groupId> 
    <artifactId>stanford-corenlp</artifactId> 
    <version>3.6.0</version> 
</dependency>

獎勵：進口

import org.maltparser.concurrent.ConcurrentMaltParserModel; 
import org.maltparser.concurrent.ConcurrentMaltParserService; 
import org.maltparser.concurrent.graph.ConcurrentDependencyGraph; 
import org.maltparser.concurrent.graph.ConcurrentDependencyNode; 
import org.maltparser.core.exception.MaltChainedException; 

import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.ling.WordTag; 
import edu.stanford.nlp.parser.lexparser.LexicalizedParser; 
import edu.stanford.nlp.process.CoreLabelTokenFactory; 
import edu.stanford.nlp.process.Morphology; 
import edu.stanford.nlp.process.PTBTokenizer; 
import edu.stanford.nlp.process.Tokenizer; 
import edu.stanford.nlp.process.TokenizerFactory; 
import edu.stanford.nlp.trees.Tree;

超：fremalt-1.7.mco文件

http://www.maltparser.org/mco/french_parser/fremalt.html

來源

2016-03-10 04:38:37 antoine

對不起，我沒有連接很長時間，沒有迴應，非常感謝。我使用法語的Mate Parser，我推薦https://code.google.com/archive/p/mate-tools/downloads – starckman

斯坦福分析器：frenchFactored.ser.gz

回答

相關問題