I have the following definition for an Identifier:
Identifier --> letter{ letter| digit}
Basically I have an identifier function that gets a string from a file and tests it to make sure that it's a valid identifier as defined above.
I've tried this:
if re.match('\w+(\w\d)?', i):
return True
else:
return False
but when I run my program every time it meets an integer it thinks that it's a valid identifier.
For example
c = 0 ;
it prints c
as a valid identifier which is fine, but it also prints 0
as a valid identifer.
What am I doing wrong here?
From official reference: identifier ::= (letter|"_") (letter | digit | "_")*
So the regular expression is:
^[^\d\W]\w*\Z
Example (for Python 2 just omit re.UNICODE
):
import re
identifier = re.compile(r"^[^\d\W]\w*\Z", re.UNICODE)
tests = [ "a", "a1", "_a1", "1a", "aa$%@%", "aa bb", "aa_bb", "aa\n" ]
for test in tests:
result = re.match(identifier, test)
print("%r\t= %s" % (test, (result is not None)))
Result:
'a' = True
'a1' = True
'_a1' = True
'1a' = False
'aa$%@%' = False
'aa bb' = False
'aa_bb' = True
'aa\n' = False