CoreNLP斯坦福依賴格式

上的端口和移民法案提交由參議員布朗巴克，堪薩斯CoreNLP斯坦福依賴格式

的共和黨從上面的句子，我期待得到以下類型的依賴關係：

nsubjpass(submitted, Bills) 
auxpass(submitted, were) 
agent(submitted, Brownback) 
nn(Brownback, Senator) 
appos(Brownback, Republican) 
prep_of(Republican, Kansas) 
prep_on(Bills, ports) 
conj_and(ports, immigration) 
prep_on(Bills, immigration)

這應該是可能根據表1，圖1的文件Stanford Dependencies。

使用下面的代碼，我只能夠達到以下依賴化妝（代碼輸出，這一點）：

root(ROOT-0, submitted-7) 
nmod:on(Bills-1, ports-3) 
nmod:on(Bills-1, immigration-5) 
case(ports-3, on-2) 
cc(ports-3, and-4) 
conj:and(ports-3, immigration-5) 
nsubjpass(submitted-7, Bills-1) 
auxpass(submitted-7, were-6) 
nmod:agent(submitted-7, Brownback-10) 
case(Brownback-10, by-8) 
compound(Brownback-10, Senator-9) 
punct(Brownback-10, ,-11) 
appos(Brownback-10, Republican-12) 
nmod:of(Republican-12, Kansas-14) 
case(Kansas-14, of-13)

問題 - 如何實現上述期望的輸出？

代碼

public void processTestCoreNLP() { 
    String text = "Bills on ports and immigration were submitted " + 
      "by Senator Brownback, Republican of Kansas"; 

    Annotation annotation = new Annotation(text); 
    Properties properties = PropertiesUtils.asProperties(
      "annotators", "tokenize,ssplit,pos,lemma,depparse" 
    ); 

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties); 

    pipeline.annotate(annotation); 

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) { 
     SemanticGraph sg = sentence.get(EnhancedPlusPlusDependenciesAnnotation.class); 
     Collection<TypedDependency> dependencies = sg.typedDependencies(); 
     for (TypedDependency td : dependencies) { 
      System.out.println(td); 
     } 
    } 
}

來源

2017-07-19 gimg1

是什麼代碼實際打印出來，然後呢？ – errantlinguist

不明確的道歉。代碼輸出第二個依賴關係塊。我編輯得更清楚。 – gimg1

如果您想通過NN依賴關係解析器獲取CC處理和摺疊的Stanford依賴關係（SD），您必須設置一個屬性來規避CoreNLP中的一個小錯誤。

然而，請注意，我們不再保持斯坦福依賴代碼，除非你有很好的理由使用SD，我們建議你使用通用依賴任何新項目。請查看Universal Dependencies (UD) documentation和Schuster and Manning (2016)以獲取有關UD表示的更多信息。

要獲得CCprocessed和摺疊SD表示，設置depparse.language屬性如下：

public void processTestCoreNLP() { 
    String text = "Bills on ports and immigration were submitted " + 
     "by Senator Brownback, Republican of Kansas"; 

    Annotation annotation = new Annotation(text); 
    Properties properties = PropertiesUtils.asProperties(
     "annotators", "tokenize,ssplit,pos,lemma,depparse"); 

    properties.setProperty("depparse.language", "English") 

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties); 

    pipeline.annotate(annotation); 

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) { 
    SemanticGraph sg = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class); 
    Collection<TypedDependency> dependencies = sg.typedDependencies(); 
    for (TypedDependency td : dependencies) { 
     System.out.println(td); 
    } 
    } 
}

來源

2017-07-27 19:43:08

謝謝Sebastien。這就是我一直在尋找的東西。我甚至搜查了郵件，但沒有遇到這個。 – gimg1

CoreNLP最近從舊Stanford dependencies格式（在頂部示例格式）切換到Universal Dependencies。我的第一個建議是儘可能使用新格式。對解析器的繼續開發將使用通用依賴關係，並且格式在很多方面與舊格式類似，進行模數化修改（例如，prep - >nmod）。

但是，如果您希望獲得舊的依賴格式，可以使用CollapsedCCProcessedDependenciesAnnotation批註執行此操作。

來源

2017-07-20 06:41:06

謝謝你的回答。通過我的調查，這是我認爲是真實的，但是在使用'CollapsedCCProcessedDependenciesAnnotation'時，我仍然收到相同的通用樣式依賴關係，即當應該是'prep'時仍然出現'nmod'。無論如何強迫退回到「prep」？ – gimg1

我進一步了，並設法輸出'prep'而不是'nmod'。我現在注意到的是，他們還沒有被縮減爲「prep_on」。相反，我有兩個獨立的依賴關係'prep（Bills，on）'和'pobj（on，immigration）'。我應該如何減少這種情況？'prep_on（Bills，immigration）'。我必須自己做還是有方法？ – gimg1

我在這裏深入瞭解自己，但是開始spelunking的地方應該是'Grammatical Structure'。然而，有一點可能是，斯坦福的依賴關係表示已經被拋棄了足夠長的時間，因爲它已經開始腐爛（例如'CollapsedCCProcessedDependenciesAnnotation'肯定意味着返回舊格式）。 –

CoreNLP斯坦福依賴格式

回答

相關問題