Lex上新行出現錯誤 - Python

當我將for循環的代碼傳遞給我的程序時，無法識別{和}之間的換行符。相反，即使存在t_newline(t)函數，也會報告錯誤。

輸入到程序是：

for(int i = 0 ; i < 5 ; i++){ 
}

而且，該程序的輸出是

1 . analizadorLexico.py 
2 . analizadorSintactico.py 
3 . parser.out 
4 . parsetab.py 
5 . prueba1.txt 
6 . cpp.py 
7 . ctokens.py 
8 . lex.py 
9 . yacc.py 
10 . ygen.py 
11 . __init__.py 
12 . lex.cpython-36.pyc 
13 . yacc.cpython-36.pyc 
14 . __init__.cpython-36.pyc 
15 . analizadorLexico.cpython-36.pyc 
16 . parsetab.cpython-36.pyc 

File number: 5 
5 
Escogido el archivoprueba1.txt 
LexToken(FOR,'FOR',1,0) 
LexToken(PA,'(',1,3) 
LexToken(INT,'INT',1,4) 
LexToken(ID,'i',1,8) 
LexToken(ASSIGN,'=',1,10) 
LexToken(NUMBER,0,1,12) 
LexToken(END,';',1,14) 
LexToken(ID,'i',1,16) 
LexToken(LT,'<',1,18) 
LexToken(NUMBER,5,1,20) 
LexToken(END,';',1,22) 
LexToken(ID,'i',1,24) 
LexToken(PLUS,'+',1,25) 
LexToken(PLUS,'+',1,26) 
LexToken(PC,')',1,27) 
LexToken(CA,'{',1,28) 
Error in ' 
' 
LexToken(CC,'}',2,31)

的代碼：

reservados = ['FOR','AND','OR','NOT','XOR', 'INT', 'FLOAT', 'DOUBLE', 
'SHORT','LONG', 'BOOL'] 
tokens = reservados + [ 
     'ID', 
     'NUMBER', 
     'PLUS', 
     'MINUS', 
     'TIMES', 
     'DIVIDE', 
     'DIVE', 
     'ASSIGN', 
     'LT', 
     'MA', 
     'LTE', 
     'MAE', 
     'DIF', 
     'PA', 
     'PC', 
     'ANDC', 
     #'ORC', 
     'NOTC', 
     'MOD', 
     'CMP', 
     'END', 
     'COMMA', 
     'CA', 
     'CC', 
     #'ES' 

] 
t_ignore = ' \t' 
t_ignore_WHITESPACES = r'[ \t]+' 
t_PLUS = r'\+' 
t_MINUS = r'-' 
t_TIMES = r'\*' 
t_DIVIDE = r'/' 
t_ASSIGN = r'=' 
t_LT = r'<' 
t_MA = r'>' 
t_LTE = r'<=' 
t_MAE = r'>=' 
t_DIF = r'\!=' 
t_PA = r'\(' 
t_PC = r'\)' 
t_ANDC = r'\&&' 
#t_ORC = r'\||' 
t_NOTC = r'\!' 
t_DIVE = r'\\' 
t_MOD = r'\%' 
t_CMP = r'==' 
t_END = r'\;' 
t_COMMA = r'\,' 
t_CA = r'{' 
t_CC = r'}' 
#t_ES = r'\ ' 

def t_newline(t): 
    r'\n+' 
    t.lexer.lineno += len(t.value) 

def t_ID(t): 
    r'[a-zA-Z_][a-zA-Z0-9_]*' 
    """ 
     CONVIERTE CUALQUIER IDENTIFICADOR EN MAYUSCULA EN CASO DE QUE SE 
     HAYA ESCRITO ASÍ 
    """ 
    if t.value.upper() in reservados: 
     t.value = t.value.upper() 
     t.type = t.value 

    return t 

def t_NUMBER(t): 
    r'\d+' 
    t.value = int(t.value)  
    return t 

def t_error(t): 
    print ("Error de sintaxis '%s'" % t.value[0]) 
    t.lexer.skip(1) 


def buscarFicheros(directorio): 
    ficheros = [] 

    numArchivo = '' 
    respuesta = False 
    cont = 1 

    for dirName, subdirList, fileList in os.walk(directorio): 
     #print('Directorio encontrado: %s' % dirName) 
     for fname in fileList: 
      ficheros.append(fname) 

    for file in ficheros: 
     print ("",cont,".",file) 
     cont = cont + 1 

    while respuesta == False: 
     numArchivo = input('\nNumero del archivo: ') 
     print (numArchivo) 
     for file in ficheros: 
      if file == ficheros[int(numArchivo) - 1]: 
       respuesta = True 
       break 

    print ("Escogido el archivo" + ficheros[int(numArchivo) - 1]) 
    return ficheros[int(numArchivo) - 1] 

directorio = r'C:/Users/Carlos/Desktop/for c++/' 
archivo = buscarFicheros(directorio) 
test = directorio + archivo 

fp = codecs.open(test, "r", "utf-8") 
cadena = fp.read() 
fp.close() 

analizador = lex.lex() 
analizador.input(cadena) 

while True: 
    tok = analizador.token() 
    if not tok : break 
    print (tok)

感謝您的幫助

來源

2017-10-15 ElPapu

't_newline'應該返回't'嗎？ – snakecharmerb

否@scharcharmerb，如果返回異常，此函數不應返回任何內容 – ElPapu

如何將't_newline'傳遞給'analizador'對象？ – snakecharmerb

我覺得米最可能的解釋是該錯誤是由Windows行結尾\r\n引起的。 \r不在您要忽略的字符列表中，但沒有規則處理它，因此它會觸發錯誤。

如果這是問題，最簡單的解決方案是將\r添加到t_ignore。（我沒有看到有任何一點同時有t_ignore和t_ignore_WHITESPACES，所以我建議你刪除其中的一個。）

但是，我無法重現您提供的錯誤輸出。您帖子中的代碼似乎沒有任何可能會輸出字符串Error in '...的函數，所以這可能只是粘貼來自不同版本代碼的輸出的結果。

來源

2017-10-16 14:40:02 rici

Lex上新行出現錯誤 - Python

回答

相關問題