2
如果我想定義其中一個令牌將與一個整數匹配的語法,那麼如何使用nltk的字符串CFG實現它?如何匹配NLTK CFG中的整數?
例如 -
S -> SK SO FK
SK -> 'SELECT'
SO -> '\d+'
FK -> 'FROM'
如果我想定義其中一個令牌將與一個整數匹配的語法,那麼如何使用nltk的字符串CFG實現它?如何匹配NLTK CFG中的整數?
例如 -
S -> SK SO FK
SK -> 'SELECT'
SO -> '\d+'
FK -> 'FROM'
打造一批短語,例如:
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10'
""")
sent = 'I shot 3 elephants'.split()
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
print(tree)
[出]:
(S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))
但請注意,這隻能處理一個數字。所以讓我們嘗試將整數壓縮成單個令牌類型,例如'#NUM#':
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")
sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in sent]
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
print(tree)
[出]:
(S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))
爲了把這些數字後面,請嘗試:
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")
original_sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in original_sent]
numbers = [i for i in original_sent if i.isdigit()]
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
treestr = str(tree)
for n in numbers:
treestr = treestr.replace('#NUM#', n, 1)
print(treestr)
[出]:
(S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))