0
我們的一個Web應用程序由於其內存不足而死亡。我們從內存轉儲中收集到的稀疏數據表明在我們的antlr解析實現中存在一個問題。我們看到的是一個包含超過一百萬項目的antlr tokenstream。導致此問題的輸入文本尚未找到。什麼會導致Antlr創建一個大的標記流,導致內存不足
這有可能與零寬度項目相匹配嗎? 語法中是否會存在另一個問題,導致內存使用率過高?
這裏是我們使用當前語法:
grammar AdvancedQueries;
options {
language = Java;
output = AST;
ASTLabelType=CommonTree;
}
tokens {
FOR;
END;
FIELDSEARCH;
TARGETFIELD;
RELATION;
NOTNODE;
ANDNODE;
NEARDISTANCE;
OUTOFPLACE;
}
@header {
package de.bsmo.fast.parsing;
}
@lexer::header {
package de.bsmo.fast.parsing;
}
startExpression : orEx;
expressionLevel4
: LPARENTHESIS! orEx RPARENTHESIS! | atomicExpression | outofplace;
expressionLevel3
: (fieldExpression) | expressionLevel4 ;
expressionLevel2
: (nearExpression) | expressionLevel3 ;
expressionLevel1
: (countExpression) | expressionLevel2 ;
notEx : NOT^? a=expressionLevel1 ;
andEx : (notEx -> notEx)
(AND? a=notEx -> ^(ANDNODE $andEx $a))*;
orEx : andEx (OR^ andEx)*;
countExpression : COUNT LPARENTHESIS countSub RPARENTHESIS RELATION NUMBERS -> ^(COUNT countSub RELATION NUMBERS);
countSub
: orEx;
nearExpression : NEAR LPARENTHESIS (WORD|PHRASE) MULTIPLESEPERATOR (WORD|PHRASE) MULTIPLESEPERATOR NUMBERS RPARENTHESIS -> ^(NEAR WORD* PHRASE* ^(NEARDISTANCE NUMBERS));
fieldExpression : WORD PROPERTYSEPERATOR fieldSub -> ^(FIELDSEARCH ^(TARGETFIELD WORD) fieldSub);
fieldSub
: WORD | PHRASE | LPARENTHESIS! orEx RPARENTHESIS!;
atomicExpression
: WORD
| PHRASE
| NUMBERS
;
//Out of place are elements captured that may be in the parseable input but need to be ommited from output later
//Those unwanted elements are captured here.
//MULTIPLESEPERATOR capture unwanted ","
outofplace
: MULTIPLESEPERATOR -> ^(OUTOFPLACE ^(MULTIPLESEPERATOR));
fragment NUMBER : ('0'..'9');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'?');
fragment QUOTE : ('"');
fragment LESSTHEN : '<';
fragment MORETHEN: '>';
fragment EQUAL: '=';
fragment SPACE : ('\u0009'|'\u0020'|'\u000C'|'\u00A0');
fragment WORDMATTER: ('!'|'0'..'9'|'\u0023'..'\u0027'|'*'|'+'|'\u002D'..'\u0039'|'\u003F'..'\u007E'|'\u00A1'..'\uFFFE');
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
COUNT:('C'|'c')('O'|'o')('U'|'u')('N'|'n')('T'|'t');
NEAR:('N'|'n')('E'|'e')('A'|'a')('R'|'r');
PROPERTYSEPERATOR : ':';
MULTIPLESEPERATOR : ',';
WS : (SPACE) { $channel=HIDDEN; };
NUMBERS : (NUMBER)+;
RELATION : (LESSTHEN | MORETHEN)? EQUAL // '<=', '>=', or '='
| (LESSTHEN | MORETHEN); // '<' or '>'
PHRASE : (QUOTE)(.)*(QUOTE);
WORD : WORDMATTER* ;
聽起來似乎是合理的,特別是因爲這個零長度匹配不是沒有聽說過的。你能想到一個觸發這個的輸入嗎?我的意思是antlr默認是貪婪的,所以它如何匹配一個零長度的字符串? – 2013-05-02 13:50:09
它通常發生在輸入含有不匹配任何(非零長度)令牌的字符序列時。缺少最後一個'''的雙引號字符串應該足以觸發它的語法。 – 2013-05-02 14:25:56
謝謝!嗯Antlrwork 1.4.3或我的代碼不會在「xxxyyyzzz」中窒息。我的原始問題已得到解答,但我很難理解爲什麼錯誤可能困擾我們的現場服務器,無法複製。有任何想法嗎? – 2013-05-03 07:22:17