如何匹配NLTK CFG中的整數？

如果我想定義其中一個令牌將與一個整數匹配的語法，那麼如何使用nltk的字符串CFG實現它？如何匹配NLTK CFG中的整數？

例如 -

S -> SK SO FK 
SK -> 'SELECT' 
SO -> '\d+' 
FK -> 'FROM'

來源

2015-02-05 Sudipta Bhattacharya

打造一批短語，例如：

import nltk 

groucho_grammar = nltk.CFG.fromstring(""" 
S -> NP VP 
PP -> P NP 
NP -> Det N | Det N PP | 'I' | NUM N 
VP -> V NP | VP PP 
Det -> 'an' | 'my' 
N -> 'elephant' | 'pajamas' | 'elephants' 
V -> 'shot' 
P -> 'in' 
NUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10' 
""") 

sent = 'I shot 3 elephants'.split() 
parser = nltk.ChartParser(groucho_grammar) 
for tree in parser.parse(sent): 
    print(tree)

[出]：

(S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))

但請注意，這隻能處理一個數字。所以讓我們嘗試將整數壓縮成單個令牌類型，例如'＃NUM＃'：

import nltk 

groucho_grammar = nltk.CFG.fromstring(""" 
S -> NP VP 
PP -> P NP 
NP -> Det N | Det N PP | 'I' | NUM N 
VP -> V NP | VP PP 
Det -> 'an' | 'my' 
N -> 'elephant' | 'pajamas' | 'elephants' 
V -> 'shot' 
P -> 'in' 
NUM -> '#NUM#' 
""") 

sent = 'I shot 333 elephants'.split() 
sent = ['#NUM#' if i.isdigit() else i for i in sent] 

parser = nltk.ChartParser(groucho_grammar) 
for tree in parser.parse(sent): 
    print(tree)

[出]：

(S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))

爲了把這些數字後面，請嘗試：

import nltk 

groucho_grammar = nltk.CFG.fromstring(""" 
S -> NP VP 
PP -> P NP 
NP -> Det N | Det N PP | 'I' | NUM N 
VP -> V NP | VP PP 
Det -> 'an' | 'my' 
N -> 'elephant' | 'pajamas' | 'elephants' 
V -> 'shot' 
P -> 'in' 
NUM -> '#NUM#' 
""") 

original_sent = 'I shot 333 elephants'.split() 
sent = ['#NUM#' if i.isdigit() else i for i in original_sent] 
numbers = [i for i in original_sent if i.isdigit()] 

parser = nltk.ChartParser(groucho_grammar) 
for tree in parser.parse(sent): 
    treestr = str(tree) 
    for n in numbers: 
     treestr = treestr.replace('#NUM#', n, 1) 
    print(treestr)

[出]：

(S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))

來源

2015-02-07 11:24:26 alvas

如何匹配NLTK CFG中的整數？

回答

相關問題