2015-12-21 130 views
3

這可能是一個愚蠢的問題,但是如何通過解析樹作爲NLP解析器(如Stanford NLP)的輸出進行迭代?這是所有嵌套括號,既不是array也不是dictionary或我用過的任何其他集合類型。Stanford NLP解析樹格式

(ROOT\n (S\n (PP (IN As)\n  (NP (DT an) (NN accountant)))\n (NP (PRP I))\n (VP (VBP want)\n  (S\n  (VP (TO to)\n   (VP (VB make)\n   (NP (DT a) (NN payment)))))))) 
+0

FWIW這是列出如何嵌套在Lisp中被表示。設想方括號,而不是圓括號和標記周圍的引號(如果有幫助的話)。 – tripleee

+0

@tripleee出於好奇,是否有一個本地python正則表達式或函數來讀取像python嵌套列表Lisp Lisp? – alvas

+0

絕對不是正則表達式!我無法找到內置的解析器,但請參閱http://stackoverflow.com/questions/3182594/parsing-s-expressions-in-python和https://sexpdata.readthedocs.org/en/latest/ – tripleee

回答

3

斯坦福分析器的這種特定輸出格式稱爲「括號內解析(樹)」。它應該與

  • 字作爲節點(例如爲,一個,會計師)
  • 短語/從句作爲標記物(例如S,NP,VP)
  • 邊緣被分層鏈接被讀取爲一曲線圖和
  • 通常的解析TOP或根節點是幻覺ROOT

(在這種情況下,可以讀取它作爲向非循環圖(DAG),因爲它是單向的且非循環)

這裏有一些庫可以讀取括號分析,例如在NLTKnltk.tree.Treehttp://www.nltk.org/howto/tree.html):

>>> from nltk.tree import Tree 
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))' 
>>> parsetree = Tree.fromstring(output) 
>>> print parsetree 
(ROOT 
    (S 
    (PP (IN As) (NP (DT an) (NN accountant))) 
    (NP (PRP I)) 
    (VP 
     (VBP want) 
     (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment)))))))) 
>>> parsetree.pretty_print() 
          ROOT        
          |         
          S        
     ______________________|________       
    |     |   VP      
    |     | ________|____     
    |     | |    S     
    |     | |    |     
    |     | |    VP     
    |     | |  ________|___    
    PP     | | |   VP    
    ___|___    | | | ________|___   
|  NP    NP | | |   NP   
| ___|______  | | | |   ___|_____  
IN DT   NN  PRP VBP TO VB  DT  NN 
| |   |  | | | |  |   |  
As an  accountant I want to make  a  payment 

>>> parsetree.leaves() 
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']