I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version:
op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')
expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )
I've played around with a number of different modifications of this simple setup. Usually, trying something like:
print(expr.parseString('1+2'))
Will return ['1']
. While I get caught in deep recursion with something like:
print(expr.parseString('(1+2)'))
What am I missing with respect to simple recursion that I can't parse arbitrarily arithmetic expressions, such as 1+(2 * 3-(4*(5+6)-(7))...
?
Wow, I guess pyparsing is really on the map! Thanks Alex and John for stepping in on this question. You are both on the mark with your responses. But let me add a comment or two:
If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.
from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
def Syntax():
op = oneOf('+ -')
lpar = Literal( '(' ).suppress()
rpar = Literal( ')' ).suppress()
num = Word(nums)
expr = Forward()
atom = num | Group(lpar + expr + rpar)
expr << atom + ZeroOrMore(op + atom)
return expr
if __name__ == "__main__":
expr = Syntax()
def test(s):
results = expr.parseString(s)
print s,'->', results
test( "(9 + 3)" )
test( "(9 + 3) * (4 / 5)" )
Giving:
(9 + 3) -> [['9', '+', '3']]
(9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.
Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing repo at https://github.com/pyparsing/pyparsing/tree/master/examples of the manual way to define this (fourFn.py), or the more recent approach using the infixNotation
helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.
To the OP, please check out those examples, I think they will help move you forward on your project.
-- Paul