2014-10-22 74 views
2

我是Python新手,nltk和nlp。我寫了簡單的語法。但是在運行該程序時,會出現以下錯誤。請幫我解決這個錯誤簡單語法在Python中給ValueError

語法: -

S -> NP 
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM 
PP -> P NP 
D[NUM=sg] -> 'a' 
D -> 'the' 
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
N[NUM=pl] -> 'dogs'|'cats' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 
NOM -> A NOM|N[NUM=?n] 

代碼: -

import nltk 

grammar = nltk.data.load('file:english_grammer.cfg') 
rdparser = nltk.RecursiveDescentParser(grammar) 
sent = "a dogs".split() 
trees = rdparser.parse(sent) 

for tree in trees: print (tree) 

錯誤: -

ValueError異常:預期的非終結,發現:[NUM =? NUM =Δn] N [NUM =Δn] | D [NUM =Δn] AN [NUM =Δn] | D [NUM =Δn] N [NUM =Δn] PP | QP N [NUM =Δn] AN [NUM =?n] | D [NUM =?n] NOM PP | D [NUM =?n] NOM

+0

請同時發佈代碼中的完整錯誤追溯。 – alvas 2014-10-22 13:54:13

回答

5

我不認爲NLTK CFG語法讀者可以用方括號讀取CFG的格式。

首先讓我們嘗試CFG語法不加方括號:

from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
PP -> P NP 
D -> 'the' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 
''' 

grammar = CFG.fromstring(grammar_string) 
print grammar 

[出]:

Grammar with 18 productions (start state = S) 
    S -> NP 
    PP -> P NP 
    D -> 'the' 
    PN -> 'saumya' 
    PN -> 'dinesh' 
    PRO -> 'she' 
    PRO -> 'he' 
    PRO -> 'we' 
    A -> 'tall' 
    A -> 'naughty' 
    A -> 'long' 
    A -> 'three' 
    A -> 'black' 
    P -> 'with' 
    P -> 'in' 
    P -> 'from' 
    P -> 'at' 
    QP -> 'some' 

現在,讓我們把方括號:

from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
PP -> P NP 
D -> 'the' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
N[NUM=pl] -> 'dogs'|'cats' 
''' 

grammar = CFG.fromstring(grammar_string) 
print grammar 

[出]:

Traceback (most recent call last): 
    File "test.py", line 33, in <module> 
    grammar = CFG.fromstring(grammar_string) 
    File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring 
    encoding=encoding) 
    File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar 
    (linenum+1, line, e)) 
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
Expected an arrow 

再回到你的語法,好像你正在使用的方括號表示約束或uncontraints,因此該解決方案將是

  • 使用強調了contrainted非終端和
  • 做出了unconstrainted非終端

規則,以便您的CFG規則將如下這樣:

from nltk.parse import RecursiveDescentParser 
from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM 

PP -> P NP 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 

D -> D_def | D_sg 
D_def -> 'the' 
D_sg -> 'a' 

N -> N_sg | N_pl 
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair' 
N_pl -> 'dogs'|'cats' 
''' 

grammar = CFG.fromstring(grammar_string) 

rdparser = RecursiveDescentParser(grammar) 
sent = "a dogs".split() 
trees = rdparser.parse(sent) 

for tree in trees: 
    print (tree) 

[出]:

(S (NP (D (D_sg a)) (N (N_pl dogs)))) 
+0

感謝您的迴應。其實我想要的是從我的語法中排除以下不合格的句子。 (i)。狗 (ii)。三個女孩 (iii)。他的貓 – 2014-10-22 17:35:29

+0

該帖子解決了您發佈的錯誤。所以我想剩下的就是你的家庭作業了,你可以做到這一點,牢記在非終端中不允許使用括號,並且在NLTK API的cfg中沒有限制。玩的開心! – alvas 2014-10-22 19:33:18

+0

感謝您的更新。我會試一試.. – 2014-10-23 10:55:19

1

它看起來就像你試圖用NLTK的功能語法,這確實使用了方括號的語法來表示的特性和功能的協議。 NLTK使用特徵語法的解析器是FeatureEarleyChartParser(與RecursiveDescentParser相反)。

NLTK documentation

>>> from __future__ import print_function 
>>> import nltk 
>>> from nltk import grammar, parse 
>>> g = """ 
... % start DP 
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a] 
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that' 
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those' 
... D[AGR=[NUM='pl', PERS=1]] -> 'we' 
... D[AGR=[PERS=2]] -> 'you' 
... N[AGR=[NUM='sg', GND='m']] -> 'boy' 
... N[AGR=[NUM='pl', GND='m']] -> 'boys' 
... N[AGR=[NUM='sg', GND='f']] -> 'girl' 
... N[AGR=[NUM='pl', GND='f']] -> 'girls' 
... N[AGR=[NUM='sg']] -> 'student' 
... N[AGR=[NUM='pl']] -> 'students' 
... """ 
>>> grammar = grammar.FeatureGrammar.fromstring(g) 
>>> tokens = 'these girls'.split() 
>>> parser = parse.FeatureEarleyChartParser(grammar) 
>>> trees = parser.parse(tokens) 
>>> for tree in trees: print(tree) 
(DP[AGR=[GND='f', NUM='pl', PERS=3]] 
    (D[AGR=[NUM='pl', PERS=3]] these) 
    (N[AGR=[GND='f', NUM='pl']] girls)) 
+0

感謝您的建議。我設法通過將grammar = nltk.data.load('file:english_grammer.cfg')更改爲grammar = nltk.data.load('file:english_grammer.fcfg')來解決此問題。 – 2015-03-03 09:33:05

0

店在NLTK包.fcfg推廣和使用load_parser語法。

例如:english_grammer。 fcfg

我用下面的代碼加載它。

import nltk 
from nltk import load_parser 
chart = load_parser('file:english_grammer.fcfg') 
sent = 'the girl gave the dog a bone'.split() 
trees = chart.nbest_parse(sent) 
for tree in trees: print tree 

這就解決了我的問題。