2015-02-24 66 views
0

我想從樹結構中得到像以下給出的扁平樹。如何平分解析樹並存儲在一個字符串中進一步的字符串操作python nltk

parse tree

我要像沒有壞樹檢測到的錯誤來獲得這整個樹的字符串:

((S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day))(PP (IN of) (NP (DT the) (CD 400) (NNS money))))(VP (VBD was) (NP-PRD (CD 8.12) (NN %))(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %)))))(. .))) 
+1

爲什麼你想這樣做?這只是使它很難處理。樹木很容易,並提供很多結構,你可以從文本中重新發明。 – 2015-03-15 09:29:57

回答

2

documentation提供了pprint()方法,以展樹成一行。

解析這句話:

string = "My name is Ross and I am cool. What's going on world? I'm looking for friends." 

,然後調用pprint()產生如下:

u"(NP+SBAR+S\n (S\n (NP (PRP$ my) (NN name))\n (VP\n  (VBZ is)\n  (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n  (SBAR\n  (WHNP (WP What))\n  (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n (. ?))\n (S\n (NP (PRP I))\n (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n (. .)))" 

從這一點來說,如果你想刪除的標籤和換行,你可以使用下面的splitjoin(see here)

splitted = tree.pprint().split() 
flat_tree = ' '.join(splitted) 

執行該得到這對我來說:

u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))" 
1

的Python NLTK的樹的操作和節點抽出

from nltk.tree import Tree 
for tr in trees: 
    tr1 = str(tr) 
    s1 = Tree.fromstring(tr1) 
    s2 = s1.productions() 
1

可以使用STR功能再拆,並加入如​​按照樹轉換爲字符串提供了一個功能:

parse_string = ' '.join(str(tree).split()) 

print parse_string 
相關問題