0
procedure, when performed, some other text
procedure, limited, some other text
我想以後它與逗號選擇VBN:
import nltk
sents = [
['procedure', ',', 'when', 'performed', ',', 'some', 'other', 'text'],
['procedure', ',', 'limited', ',', 'some', 'other', 'text']
]
tokens = [nltk.pos_tag(x) for x in sents]
grammar = r"""
CHUNK: {<VBN><,>}
"""
chunker = nltk.RegexpParser(grammar)
for x in tokens:
tree = chunker.parse(x)
print tree
它的工作原理:
(S procedure/NN ,/, when/WRB (CHUNK performed/VBN ,/,) some/DT other/JJ text/NN)
(S procedure/NN ,/, (CHUNK limited/VBN ,/,) some/DT other/JJ text/NN)
但我需要選擇VBN時它用逗號包裝。某種re.compile(r'(?:,)\s*([a-z]+ed),')
在RegexpParser語法中有什麼方法可以使用(?:...)
?