這裏的問題是,您正試圖在解析器中創建時區分「路徑」。構建詞法分析器內的路徑會更容易些(僞代碼如下):
grammar T;
tokens {
JAVA_TYPE_PATH,
JAVA_FIELD_PATH
}
// parser rules
PATH
: IDENTIFIER ('.' IDENTIFIER)*
{
String s = getText();
if (s is a Java class) {
setType(JAVA_TYPE_PATH);
} else if (s is a Java field) {
setType(JAVA_FIELD_PATH);
}
}
;
fragment IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;
然後在解析器,你會怎麼做:
expression
: JAVA_TYPE_PATH #javaTypeExpression
| JAVA_FIELD_PATH #javaFieldExpression
| PATH #pathExpression
;
不過,當然,這樣的java./*comment*/lang.String
輸入將被標記化錯。
在解析器中處理這一切意味着在令牌流中手動預測並檢查Java類型或字段是否存在。
一個快速演示:
grammar T;
@parser::members {
String getPathAhead() {
Token token = _input.LT(1);
if (token.getType() != IDENTIFIER) {
return null;
}
StringBuilder builder = new StringBuilder(token.getText());
// Try to collect ('.' IDENTIFIER)*
for (int stepsAhead = 2; ; stepsAhead += 2) {
Token expectedDot = _input.LT(stepsAhead);
Token expectedIdentifier = _input.LT(stepsAhead + 1);
if (expectedDot.getType() != DOT || expectedIdentifier.getType() != IDENTIFIER) {
break;
}
builder.append('.').append(expectedIdentifier.getText());
}
return builder.toString();
}
boolean javaTypeAhead() {
String path = getPathAhead();
if (path == null) {
return false;
}
try {
return Class.forName(path) != null;
} catch (Exception e) {
return false;
}
}
boolean javaFieldAhead() {
String path = getPathAhead();
if (path == null || !path.contains(".")) {
return false;
}
int lastDot = path.lastIndexOf('.');
String typeName = path.substring(0, lastDot);
String fieldName = path.substring(lastDot + 1);
try {
Class<?> clazz = Class.forName(typeName);
return clazz.getField(fieldName) != null;
} catch (Exception e) {
return false;
}
}
}
expression
: {javaTypeAhead()}? path #javaTypeExpression
| {javaFieldAhead()}? path #javaFieldExpression
| path #pathExpression
;
path
: dotIdentifierSequence
;
dotIdentifierSequence
: IDENTIFIER (DOT IDENTIFIER)*
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
DOT
: '.'
;
可與下面的類測試:
package tl.antlr4;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class Main {
public static void main(String[] args) {
String[] tests = {
"mu",
"tl.antlr4.The",
"java.lang.String",
"foo.bar.Baz",
"tl.antlr4.The.answer",
"tl.antlr4.The.ANSWER"
};
for (String test : tests) {
TLexer lexer = new TLexer(new ANTLRInputStream(test));
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTreeWalker.DEFAULT.walk(new TestListener(), parser.expression());
}
}
}
class TestListener extends TBaseListener {
@Override
public void enterJavaTypeExpression(@NotNull TParser.JavaTypeExpressionContext ctx) {
System.out.println("JavaTypeExpression -> " + ctx.getText());
}
@Override
public void enterJavaFieldExpression(@NotNull TParser.JavaFieldExpressionContext ctx) {
System.out.println("JavaFieldExpression -> " + ctx.getText());
}
@Override
public void enterPathExpression(@NotNull TParser.PathExpressionContext ctx) {
System.out.println("PathExpression -> " + ctx.getText());
}
}
class The {
public static final int ANSWER = 42;
}
這將以下內容打印到控制檯:
PathExpression -> mu
JavaTypeExpression -> tl.antlr4.The
JavaTypeExpression -> java.lang.String
PathExpression -> foo.bar.Baz
PathExpression -> tl.antlr4.The.answer
JavaFieldExpression -> tl.antlr4.The.ANSWER
啊,我從來沒有考慮將其轉移到詞法分析器。雖然它們非常有意義,但它們都代表不同的標記類型。感謝您的創造性解決方案! –
從這個問題,我認爲OP已經在做這種檢查,以確定這樣的標識符是路徑還是類/類型引用,並且需要不同的解決方案。順便說一句。從功能角度來看,如果您在解析器或詞法分析器中進行此檢查,但在詞法分析器手段中進行手動空白處理,則沒有區別。 –