I have a well-formed tree that represents a mathematical expression. For example, given the string: "1+2-3*4/5"
, this gets parsed into:
subtract(add(1,2),divide(multiply(3,4),5))
Which is expressed as this tree:
What I'd like to be able to do is take this tree and reduce it as much as possible. In the case above, this is pretty simple, because all of the numbers are constants. However, things start to get trickier once I allow for unknowns (denoted with a $
followed by an identifier):
"3*$a/$a"
becomes divide(multiply(3,$a), $a)
This should simplify to 3
, since the $a
terms should cancel each other out. The question is, "how do I go about recognizing this in a generic manner?" How do I recognize that min(3, sin($x))
is always going to be sin($x)
? How do I recognize that sqrt(pow($a, 2))
is abs($a)
? How do I recognize that nthroot(pow(42, $a), $a)
(the ath root of 42 to the ath power) is 42
?
I realize this question is pretty broad, but I've been beating my head against this for a while and haven't come up with anything satisfactory enough.
You probably want to implement a term rewriting system. Regarding the underlying math, have a look at WikiPedia.
Structure of a term rewrite module
Since I implemented a solution recently...
First, prepare a class CExpression, which models the structure of your expression.
Implement CRule
, which contains a pattern and a replacement. Use special symbols as pattern variables, which need to get bound during pattern matching and replaced in the replacement expression.
Then, implement a class CRule
. It's main method applyRule(CExpression, CRule)
tries to match the rule against any applicable subexpression of expression. In case it matches, return the result.
Finaly, implement a class CRuleSet
, which is simply a set of CRule objects. The main method reduce(CExpression)
applies the set of rules as long as no more rules can be applied and then returns the reduced expression.
Additionally, you need a class CBindingEnvironment
, which maps already matched symbols to the matched values.
Try to rewrite expression to a normal form
Don't forget, that this approach works to a certain point, but is likely to be non complete. This is due to the fact, that all of the following rules perform local term rewrites.
To make this local rewrite logic stronger, one should try to transform expressions into something I'd call a normal form. This is my approach:
If a term contains literal values, try to move the term as far to the right as possible.
Eventually, this literal value may appear rightmost and can be evaluated as part of a fully literal expression.
When to evaluate fully literal expression
An interesting question is when to evaluate fully literal expression. Suppose you have an expression
x * ( 1 / 3 )
which would reduce to
x * 0.333333333333333333
Now suppose x gets replaced by 3. This would yield something like
0.999999999999999999999999
Thus eager evaluation returns a slightly incorrect value.
At the other side, if you keep ( 1 / 3 ) and first replace x by 3
3 * ( 1 / 3 )
a rewrite rule would give
1
Thus, it might be useful to evaluate fully literal expression late.
Examples of rewrite rules
Here is how my rules appear inside the application: The _1, _2, ... symbols match any subexpression:
addRule( new TARuleFromString( '0+_1', // left hand side :: pattern
'_1' // right hand side :: replacement
)
);
or a bit more complicated
addRule( new TARuleFromString( '_1+_2*_1',
'(1+_2)*_1'
)
);
Certain special symbols only match special subexpressions. E.g. _Literal1, _Literal2, ... match only literal values:
addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )',
'exp( _Literal1 + _Literal2 )'
)
);
This rule moves non-literal expression to the left:
addRule( new TARuleFromString( '_Literal*_NonLiteral',
'_NonLiteral*_Literal'
)
);
Any name, that begins with a '_', is a pattern variable. While the system matches a rule, it keeps a stack of assignments of already matched symbols.
Finally, don't forget that rules may yield non terminating replacement sequences. Thus while reducing expression, make the process remember, which intermediate expressions have already been reached before.
In my implementation, I don't save intermediate expressions directly. I keep an array of MD5() hashes of intermediate expression.
A set of rules as a starting point
Here's a set of rules to get started:
addRule( new TARuleFromString( '0+_1', '_1' ) );
addRule( new TARuleFromString( '_Literal2=0-_1', '_1=0-_Literal2' ) );
addRule( new TARuleFromString( '_1+0', '_1' ) );
addRule( new TARuleFromString( '1*_1', '_1' ) );
addRule( new TARuleFromString( '_1*1', '_1' ) );
addRule( new TARuleFromString( '_1+_1', '2*_1' ) );
addRule( new TARuleFromString( '_1-_1', '0' ) );
addRule( new TARuleFromString( '_1/_1', '1' ) );
// Rate = (pow((EndValue / BeginValue), (1 / (EndYear - BeginYear)))-1) * 100
addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )', 'exp( _Literal1 + _Literal2 )' ) );
addRule( new TARuleFromString( 'exp( 0 )', '1' ) );
addRule( new TARuleFromString( 'pow(_Literal1,_1) * pow(_Literal2,_1)', 'pow(_Literal1 * _Literal2,_1)' ) );
addRule( new TARuleFromString( 'pow( _1, 0 )', '1' ) );
addRule( new TARuleFromString( 'pow( _1, 1 )', '_1' ) );
addRule( new TARuleFromString( 'pow( _1, -1 )', '1/_1' ) );
addRule( new TARuleFromString( 'pow( pow( _1, _Literal1 ), _Literal2 )', 'pow( _1, _Literal1 * _Literal2 )' ) );
// addRule( new TARuleFromString( 'pow( _Literal1, _1 )', 'ln(_1) / ln(_Literal1)' ) );
addRule( new TARuleFromString( '_literal1 = pow( _Literal2, _1 )', '_1 = ln(_literal1) / ln(_Literal2)' ) );
addRule( new TARuleFromString( 'pow( _Literal2, _1 ) = _literal1 ', '_1 = ln(_literal1) / ln(_Literal2)' ) );
addRule( new TARuleFromString( 'pow( _1, _Literal2 ) = _literal1 ', 'pow( _literal1, 1 / _Literal2 ) = _1' ) );
addRule( new TARuleFromString( 'pow( 1, _1 )', '1' ) );
addRule( new TARuleFromString( '_1 * _1 = _literal', '_1 = sqrt( _literal )' ) );
addRule( new TARuleFromString( 'sqrt( _literal * _1 )', 'sqrt( _literal ) * sqrt( _1 )' ) );
addRule( new TARuleFromString( 'ln( _Literal1 * _2 )', 'ln( _Literal1 ) + ln( _2 )' ) );
addRule( new TARuleFromString( 'ln( _1 * _Literal2 )', 'ln( _Literal2 ) + ln( _1 )' ) );
addRule( new TARuleFromString( 'log2( _Literal1 * _2 )', 'log2( _Literal1 ) + log2( _2 )' ) );
addRule( new TARuleFromString( 'log2( _1 * _Literal2 )', 'log2( _Literal2 ) + log2( _1 )' ) );
addRule( new TARuleFromString( 'log10( _Literal1 * _2 )', 'log10( _Literal1 ) + log10( _2 )' ) );
addRule( new TARuleFromString( 'log10( _1 * _Literal2 )', 'log10( _Literal2 ) + log10( _1 )' ) );
addRule( new TARuleFromString( 'ln( _Literal1 / _2 )', 'ln( _Literal1 ) - ln( _2 )' ) );
addRule( new TARuleFromString( 'ln( _1 / _Literal2 )', 'ln( _Literal2 ) - ln( _1 )' ) );
addRule( new TARuleFromString( 'log2( _Literal1 / _2 )', 'log2( _Literal1 ) - log2( _2 )' ) );
addRule( new TARuleFromString( 'log2( _1 / _Literal2 )', 'log2( _Literal2 ) - log2( _1 )' ) );
addRule( new TARuleFromString( 'log10( _Literal1 / _2 )', 'log10( _Literal1 ) - log10( _2 )' ) );
addRule( new TARuleFromString( 'log10( _1 / _Literal2 )', 'log10( _Literal2 ) - log10( _1 )' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral + _Literal2', '_Literal1 - _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral * _Literal2', '_Literal1 / _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral / _Literal2', '_Literal1 * _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 =_NonLiteral - _Literal2', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral + _Literal2 = _Literal1 ', '_Literal1 - _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral * _Literal2 = _Literal1 ', '_Literal1 / _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral / _Literal2 = _Literal1 ', '_Literal1 * _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1 ', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal2 - _NonLiteral = _Literal1 ', '_Literal2 - _Literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = sin( _NonLiteral )', 'asin( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = cos( _NonLiteral )', 'acos( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = tan( _NonLiteral )', 'atan( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = ln( _1 )', 'exp( _Literal1 ) = _1' ) );
addRule( new TARuleFromString( 'ln( _1 ) = _Literal1', 'exp( _Literal1 ) = _1' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral', '_NonLiteral = _Literal1' ) );
addRule( new TARuleFromString( '( _Literal1 / _2 ) = _Literal2', '_Literal1 / _Literal2 = _2 ' ) );
addRule( new TARuleFromString( '_Literal*_NonLiteral', '_NonLiteral*_Literal' ) );
addRule( new TARuleFromString( '_Literal+_NonLiteral', '_NonLiteral+_Literal' ) );
addRule( new TARuleFromString( '_Literal1+(_Literal2+_NonLiteral)', '_NonLiteral+(_Literal1+_Literal2)' ) );
addRule( new TARuleFromString( '_Literal1+(_Literal2+_1)', '_1+(_Literal1+_Literal2)' ) );
addRule( new TARuleFromString( '(_1*_2)+(_3*_2)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_2*_1)+(_2*_3)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_2*_1)+(_3*_2)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_1*_2)+(_2*_3)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_Literal * _1 ) / _Literal', '_1' ) );
addRule( new TARuleFromString( '(_Literal1 * _1 ) / _Literal2', '(_Literal1 * _Literal2 ) / _1' ) );
addRule( new TARuleFromString( '(_1+_2)+_3', '_1+(_2+_3)' ) );
addRule( new TARuleFromString( '(_1*_2)*_3', '_1*(_2*_3)' ) );
addRule( new TARuleFromString( '_1+(_1+_2)', '(2*_1)+_2' ) );
addRule( new TARuleFromString( '_1+_2*_1', '(1+_2)*_1' ) );
addRule( new TARuleFromString( '_literal1 * _NonLiteral = _literal2', '_literal2 / _literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 + _NonLiteral = _literal2', '_literal2 - _literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 - _NonLiteral = _literal2', '_literal1 - _literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 / _NonLiteral = _literal2', '_literal1 * _literal2 = _NonLiteral' ) );
Make rules first-class expressions
An interesting point: Since the above rules are special expression, which get correctly evaluate by the expression parser, users can even add new rules and thus enhance the application's rewrite capabilities.
Parsing expressions (or more general: languages)
For Cocoa/OBjC applications, Dave DeLong's DDMathParser is a perfect candidate to syntactically analyse mathematical expressions.
For other languages, our old friends Lex & Yacc or the newer GNU Bison might be of help.
Far younger and with an enourmous set of ready to use syntax-files, ANTLR is a modern parser generator based on Java. Besides purely command-line use, ANTLRWorks provides a GUI frontend to construct and debug ANTLR based parsers. ANTLR generates grammars for various host language, like JAVA, C, Python, PHP or C#. The ActionScript runtime is currently broken.
In case you'd like to learn how to parse expressions (or languages in general) from the bottom-up, I'd propose this free book's text from Niklaus Wirth (or the german book edition), the famous inventor of Pascal and Modula-2.