Parsing an arithmetic expression and building a tree from it in Java

ChocolateBear picture ChocolateBear · Jan 4, 2011 · Viewed 59.9k times · Source

I needed some help with creating custom trees given an arithmetic expression. Say, for example, you input this arithmetic expression:

(5+2)*7

The result tree should look like:

    *
   / \
  +   7
 / \
5   2

I have some custom classes to represent the different types of nodes, i.e. PlusOp, LeafInt, etc. I don't need to evaluate the expression, just create the tree, so I can perform other functions on it later. Additionally, the negative operator '-' can only have one child, and to represent '5-2', you must input it as 5 + (-2).

Some validation on the expression would be required to ensure each type of operator has the correct the no. of arguments/children, each opening bracket is accompanied by a closing bracket.

Also, I should probably mention my friend has already written code which converts the input string into a stack of tokens, if that's going to be helpful for this.

I'd appreciate any help at all. Thanks :)

(I read that you can write a grammar and use antlr/JavaCC, etc. to create the parse tree, but I'm not familiar with these tools or with writing grammars, so if that's your solution, I'd be grateful if you could provide some helpful tutorials/links for them.)

Answer

Bill K picture Bill K · Jan 4, 2011

Assuming this is some kind of homework and you want to do it yourself..

I did this once, you need a stack

So what you do for the example is:

    parse    what to do?                Stack looks like
      (      push it onto the stack     (
      5      push 5                     (, 5
      +      push +                     (, 5, +
      2      push 2                     (, 5, +, 2
      )      evaluate until (           7            
      *      push *                     7, *
      7      push 7                     +7, *, 7
      eof    evaluate until top         49

The symbols like "5" or "+" can just be stored as strings or simple objects, or you could store the + as a +() object without setting the values and set them when you are evaluating.

I assume this also requires an order of precedence, so I'll describe how that works.

in the case of: 5 + 2 * 7

you have to push 5 push + push 2 next op is higher precedence so you push it as well, then push the three. When you encounter either a ) or the end of file or an operator with lower or equal precedence you start calculating the stack to the previous ( or the beginning of the file.

Because your stack now contains 5 + 2 * 7, when you evaluate it you pop the 2 * 7 first and push the resulting *(2,7) node onto the stack, then once more you evaluate the top three things on the stack (5 + *node) so the tree comes out correct.

If it was ordered the other way: 5 * 2 + 7, you would push until you got to a stack with "5 * 2" then you would hit the lower precedence + which means evaluate what you've got now. You'd evaluate the 5 * 2 into a *node and push it, then you'd continue by pushing the + and 3 so you had *node + 7, at which point you'd evaluate that.

This means you have a "highest current precedence" varaible that is storing a 1 when you push a +/-, a 2 when you push a * or / and a 3 for "^". This way you can just test the variable to see if your next operator's precedence is < = your current precedence.

if ")" is considered priority 4 you can treat it as other operators except that it removes the matching "(", a lower priority would not.