1)使用ANTLR 4.6和給定的語法和輸入,我有以下信息:
line 3:0 no viable alternative at input 'ACOMAND 1.0 1.0\nACOMAND\nACOMAND '
在調試語法,這是非常有用的列出了由詞法分析器看到的記號:
$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar
$ alias grun
alias grun='java org.antlr.v4.gui.TestRig'
$ grun Question question -tokens data.txt
[@0,0:9='ACOMAND ',<KEYWORD>,1:0]
[@1,10:19=' 1.0',<COLUMN>,1:10]
[@2,20:29=' 1.0',<COLUMN>,1:20]
[@3,30:30='\n',<COLUMN>,1:30]
[@4,31:38='ACOMAND\n',<COLUMN>,2:0]
4.6之前,令牌被顯示[@3,30:30='\n',<n>,1:30]
,你有哪些令牌已數n
文件-grammar-.tokens
在看。現在它翻譯得非常好,你馬上就會看到這個換行符被認爲是符號COLUMN
,而不是你所期望的NEWLINE
。這是因爲詞法分析器嘗試匹配序列中的每一個規則輸入:
- 確實
'\n'
比賽[A-Z]
?不,所以它不是KEYWORD
,下規則
- 確實
'\n'
匹配.+?
?是的,所以這是一個COLUMN
,沒有機會 達到NEWLINE
規則。
所以,你需要把COLUMN
規則的NEWLINE
規則之後。
你也看到,輸入的第二線已經符號化的[@4,31:38='ACOMAND\n',<COLUMN>,2:0]
,因爲它不能被
KEYWORD: [A-Z] ... WS*?
,因爲規則要求的白色空間,只有一個NL匹配。因此用(WS* | NEWLINE)
代替WS*?
。
最後簡化冗餘規則:
grammar Question;
question
: KEYWORD COLUMN* NEWLINE
;
KEYWORD : [A-Z] {getCharPositionInLine() == 1}? ([A-Z]|'-')* (WS* | NEWLINE) {getCharPositionInLine() <= 10}? ;
NEWLINE : '\r'? '\n' ;
WS : [ \t] ;
COLUMN: .+? {(getCharPositionInLine() % 10) == 0}? ;
現在詞法分析器提供:
[@0,0:9='ACOMAND ',<KEYWORD>,1:0]
[@1,10:19=' 1.0',<COLUMN>,1:10]
[@2,20:29=' 1.0',<COLUMN>,1:20]
[@3,30:30='\n',<NEWLINE>,1:30]
[@4,31:38='ACOMAND\n',<KEYWORD>,2:0]
。
。
2)
但這一切真的很有用嗎?解析器生成器是正確的工具嗎?刪除一個空間,看看會發生什麼:
line 2:0 extraneous input 'ACOMAND\n' expecting {NEWLINE, COLUMN}
我認爲你應該離開詞法分析器做一個簡單的工作沒有這些位置的限制:創建非空白數據的令牌,並消除了空白。稍後在解析器或偵聽器中,您可以檢查位置:每個令牌具有諸如開始,停止,行等屬性。
爲什麼不是Ruby腳本? :-)
# Split 80 columns lines into 10 columns wide tokens, associate each token
# with its stop position in line (counting from 1) and an OK/WRONG flag
# if it is not aligned correctly.
tokens = Array.new
IO.readlines("data.txt").each_with_index do | line, i |
if i == 0
then
puts " #{line}"
next
end
line_tokens = Array.new
line = line.chomp # remove NL
print "line #{i + 1} : "
8.times.each do | n | # n = 0 to 7
a = n * 10 # begin of split range counting from 0
b = n * 10 + 9 # end of range
token = line.slice(a..b)
next if token.nil? || token.length == 0 # nil if edge case
print token
good_position = 'OK'
position = b + 1
case n
when 0 # first token must be at column 1
good_position = 'WRONG' if token[0] == ' '
else # other tokens must be right aligned in their 10 columns width field
if token[-1] == ' ' && token != ' ' # not followed by NL
then
good_position = 'WRONG'
unless (pos = token.rindex(' ')).nil?
position = position - 10 + pos - 1
end
end
if token.length != 10 # last in line
then
good_position = 'WRONG'
position = position - 10 + token.length
end
end
line_tokens << [token.strip, position, good_position]
break if b > line.length
end
puts # print a NL because print doesn't do it
tokens << line_tokens
end
puts
puts "Lists of tokens : "
p tokens
輸入data.txt中:
....+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8
ACOMAND 1.0 1.0
ACOMAND
ACOMAND 1.0
ACOMAND 1.0 1.0 1300.2 .9 1.0
ACOMAND 1.0 1.0 1300.2 .9
ACOMAND OKK 1.0 1300.2 .9 1.0 WOW
ACOMAND 1.0 1.0 1300.2
輸出:
$ ruby -w split.rb
....+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8
line 2 : ACOMAND 1.0 1.0
line 3 : ACOMAND
line 4 : ACOMAND 1.0
line 5 : ACOMAND 1.0 1.0 1300.2 .9 1.0
line 6 : ACOMAND 1.0 1.0 1300.2 .9
line 7 : ACOMAND OKK 1.0 1300.2 .9 1.0 WOW
line 8 : ACOMAND 1.0 1.0 1300.2
Lists of tokens :
[[["ACOMAND", 10, "OK"], ["1.0", 20, "OK"], ["1.0", 29, "WRONG"]],
[["ACOMAND", 10, "OK"]], [["ACOMAND", 10, "OK"], ["1.0", 20, "OK"]],
[["ACOMAND", 10, "OK"], ["1.0", 20, "OK"], ["1.0", 30, "OK"], ["1300.2",
40, "OK"], ["", 50, "OK"], [".9", 58, "WRONG"], ["1.0", 68, "WRONG"]],
[["ACOMAND", 10, "OK"], ["1.0", 20, "OK"], ["1.0", 30, "OK"], ["1300.2",
40, "OK"], ["", 50, "OK"], [".9", 60, "OK"]], [["ACOMAND", 10, "OK"],
["OKK", 20, "OK"], ["1.0", 30, "OK"], ["1300.2", 40, "OK"], ["", 50,
"OK"], [".9", 60, "OK"], ["1.0", 70, "OK"], ["WOW", 80, "OK"]],
[["ACOMAND", 10, "OK"], ["1.0", 20, "OK"], ["1.0", 30, "OK"], ["1300.2",
40, "OK"]]]
1)它的問題,檢查ACOMAND開始於第1列以及其他值都在一個對準固定位置,否則爲什麼不簡單'ID VALUE *?'2)請給出所有必要的語法,以便我們可以執行它。我錯過了WS和'隱式的令牌定義' – BernardK
1)是的,它很重要,所以我認爲我必須使用謂詞來確保正確的對齊。 2)我現在添加了缺少的WS語法,我很遺憾將它遺漏了。 – rooms