Ply Lex parsing problem

Question 1

Ply Lex parsing problem

parsing lex ply

Karan · Feb 16, 2011 · Viewed 8.8k times · Source

Answer

Answer

The reason that this didn't work is related to the way ply prioritises matches of tokens, the longest token regex is tested first.

The easiest way to prevent this problem is to match identifiers and reserved words at the same type, and select an appropriate token type based on the match. The following code is similar to an example in the ply documentation

import ply.lex

tokens = [ 'ID', 'NUMBER', 'LESSEQUAL', 'ASSIGN' ]
reserved = {
    'while' : 'WHILE',
    'then' : 'THEN'
}
tokens += reserved.values()

t_ignore    = ' \t'
t_NUMBER    = '\d+'
t_LESSEQUAL = '\<\='
t_ASSIGN    = '\='

def t_ID(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    if t.value in reserved:
        t.type = reserved[ t.value ]
    return t

def t_error(t):
    print 'Illegal character'
    t.lexer.skip(1)

lexer = ply.lex.lex()
lexer.input("while n <= 0 then h = 1")
while True:
    tok = lexer.token()
    if not tok:
        break
    print tok

Question 2

I'm using ply as my lex parser. My specifications are the following :

t_WHILE = r'while'  
t_THEN = r'then'  
t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*'  
t_NUMBER = r'\d+'  
t_LESSEQUAL = r'<='  
t_ASSIGN = r'='  
t_ignore  = r' \t'

When i try to parse the following string :

"while n <= 0 then h = 1"

It gives following output :

LexToken(ID,'while',1,0)  
LexToken(ID,'n',1,6)  
LexToken(LESSEQUAL,'<=',1,8)  
LexToken(NUMBER,'0',1,11)  
LexToken(ID,'hen',1,14)      ------> PROBLEM!  
LexToken(ID,'h',1,18)  
LexToken(ASSIGN,'=',1,20)  
LexToken(NUMBER,'1',1,22)

It doesn't recognize the token THEN, instead it takes "hen" as an identifier.

Any ideas?

Ply Lex parsing problem

Answer

Related questions