antlr3 - Generating a Parse Tree

HugeAntlrs picture HugeAntlrs · May 12, 2011 · Viewed 7.1k times · Source

I'm having trouble figuring out the antlr3 API so I can generate and use a parse tree in some javascript code. When I open the grammar file using antlrWorks (their IDE), the interpreter is able to show me the parse tree, and it's even correct.

I'm having a lot of difficulties tracking down resources on how to get this parse tree in my code using the antlr3 runtime. I've been messing around with the various functions in the runtime and Parser files but to no avail:

var input = "(PR=5000)",
cstream = new org.antlr.runtime.ANTLRStringStream(input),
lexer = new TLexer(cstream),
tstream = new org.antlr.runtime.CommonTokenStream(lexer),
parser = new TParser(tstream);

var tree = parser.query().tree;
var nodeStream = new org.antlr.runtime.tree.CommonTreeNodeStream(tree);
nodeStream.setTokenStream(tstream);

parseTree = new org.antlr.runtime.tree.TreeParser(nodeStream);

Since antlrWorks can display the parse tree without any tree grammar from myself, and since I have read that antlr automatically generates a parse tree from the grammar file, I'm assuming that I can access this basic parse tree with some runtime functions that I am probably not aware of. Am I correct in this thinking?

Answer

Bart Kiers picture Bart Kiers · May 12, 2011

HugeAntlrs wrote:

Since antlrWorks can display the parse tree without any tree grammar from myself, and since I have read that antlr automatically generates a parse tree from the grammar file, I'm assuming that I can access this basic parse tree with some runtime functions that I am probably not aware of. Am I correct in this thinking?

No, that is incorrect. ANTLR creates a flat, 1 dimensional stream of tokens.

ANTLRWorks creates its own parse tree on the fly when interpreting some source. You have no access to this tree (not with Javascript or even with Java). You will have to define the tokens that you think should be the roots of your (sub) trees and/or define the tokens that need to be removed from your AST. Checkout the following Q&A that explains how to create a proper AST: How to output the AST built using ANTLR?

EDIT

Since there's no proper JavaScript demo on SO yet, here's a quick demo.

The following grammar parses boolean expression with the following operators:

  • or
  • and
  • is
  • not

where not has the highest precedence.

Of course there are true and false, and the expressions can be grouped using parenthesis.

file: Exp.g

grammar Exp;

options {
  output=AST;
  language=JavaScript;
}

parse
  :  exp EOF -> exp
  ;

exp
  :  orExp
  ;

orExp
  :  andExp (OR^ andExp)*
  ;

andExp
  :  eqExp (AND^ eqExp)*
  ;

eqExp
  :  unaryExp (IS^ unaryExp)*
  ;

unaryExp
  :  NOT atom -> ^(NOT atom)
  |  atom
  ;

atom
  :  TRUE
  |  FALSE
  |  '(' exp ')' -> exp
  ;

OR     : 'or' ;
AND    : 'and' ;
IS     : 'is' ;
NOT    : 'not' ;
TRUE   : 'true' ;
FALSE  : 'false' ;
SPACE  : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

The grammar above produces an AST which can be fed to the tree-walker below:

file: ExpWalker.g

tree grammar ExpWalker;

options {
  tokenVocab=Exp;
  ASTLabelType=CommonTree;
  language=JavaScript;
}

// `walk` returns a string
walk returns [expr]
  :  exp {expr = ($exp.expr == 1) ? 'True' : 'False';}
  ;

// `exp` returns either 1 (true) or 0 (false)
exp returns [expr]
  :  ^(OR  a=exp b=exp) {expr = ($a.expr == 1 || $b.expr == 1) ? 1 : 0;}
  |  ^(AND a=exp b=exp) {expr = ($a.expr == 1 && $b.expr == 1) ? 1 : 0;}
  |  ^(IS  a=exp b=exp) {expr = ($a.expr == $b.expr) ? 1 : 0;}
  |  ^(NOT a=exp)       {expr = ($a.expr == 1) ? 0 : 1;}
  |  TRUE               {expr = 1;}
  |  FALSE              {expr = 0;}
  ;

(apologies for the messy JavaScript code inside { ... }: I have very little experience with JavaScript!)

Now download ANTLR 3.3 (no earlier version!) and the JavaScript runtime files:

Rename antlr-3.3-complete.jar to antlr-3.3.jar and unzip antlr-javascript-runtime-3.1.zip and store all files in the same folder as your Exp.g and ExpWalker.g files.

Now generate the lexer, parser and tree-walker:

java -cp antlr-3.3.jar org.antlr.Tool Exp.g 
java -cp antlr-3.3.jar org.antlr.Tool ExpWalker.g

And test it all with the following html file:

<html>
  <head>
    <script type="text/javascript" src="antlr3-all-min.js"></script>
    <script type="text/javascript" src="ExpLexer.js"></script>
    <script type="text/javascript" src="ExpParser.js"></script>
    <script type="text/javascript" src="ExpWalker.js"></script>

    <script type="text/javascript">

    function init() {
      var evalButton = document.getElementById("eval");
      evalButton.onclick = evalExpression;
    }

    function evalExpression() {
      document.getElementById("answer").innerHTML = "";
      var expression = document.getElementById("exp").value;
      if(expression) {
        var lexer = new ExpLexer(new org.antlr.runtime.ANTLRStringStream(expression));
        var tokens = new org.antlr.runtime.CommonTokenStream(lexer);
        var parser = new ExpParser(tokens);
        var nodes = new org.antlr.runtime.tree.CommonTreeNodeStream(parser.parse().getTree());
        nodes.setTokenStream(tokens);
        var walker = new ExpWalker(nodes);
        var value = walker.walk();
        document.getElementById("answer").innerHTML = expression + " = " + value;
      }
      else {
        document.getElementById("exp").value = "enter an expression here first"; 
      }
    }

    </script>
  </head>
  <body onload="init()">
    <input id="exp" type="text" size="35" />
    <button id="eval">evaluate</button>
    <div id="answer"></div>
  </body>
</html>

And behold the result:

enter image description here