斯坦福分析器的這種特定輸出格式稱爲「括號內解析(樹)」。它應該與
- 字作爲節點(例如爲,一個,會計師)
- 短語/從句作爲標記物(例如S,NP,VP)
- 邊緣被分層鏈接被讀取爲一曲線圖和
- 通常的解析TOP或根節點是幻覺
ROOT
(在這種情況下,可以讀取它作爲向非循環圖(DAG),因爲它是單向的且非循環)
這裏有一些庫可以讀取括號分析,例如在NLTK
的nltk.tree.Tree
(http://www.nltk.org/howto/tree.html):
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
FWIW這是列出如何嵌套在Lisp中被表示。設想方括號,而不是圓括號和標記周圍的引號(如果有幫助的話)。 – tripleee
@tripleee出於好奇,是否有一個本地python正則表達式或函數來讀取像python嵌套列表Lisp Lisp? – alvas
絕對不是正則表達式!我無法找到內置的解析器,但請參閱http://stackoverflow.com/questions/3182594/parsing-s-expressions-in-python和https://sexpdata.readthedocs.org/en/latest/ – tripleee