Get Control flow graph from Abstract Syntax Tree

user5915 picture user5915 · Sep 18, 2008 · Viewed 10.3k times · Source

I have an AST derived from the ANTLR Parser Generator for Java. What I want to do is somehow construct a control flow graph of the source code, where each statement or expression is a unique Node. I understand there must be some recursiveness to this identification, I was wondering what you would suggest as the best option and if ANTLR has a toolset I can use for this job. Cheers, Chris


EDIT - My main concern is to get a control flow graph(CFG) from the AST. This way I can get a tree representation of the source. To clarify, both the source code and the implementation language is Java.

Answer

EfForEffort picture EfForEffort · Sep 18, 2008

Usually CFGs are computed on a lower-level representation (e.g. JVM bytecode). Someone did a thesis on such things a few years ago. There might be a helpful way described in there for how to get at that representation.

Since your source and target languages are the same, there's no code generation step -- you're already done! However, now you get to walk the AST. At each node of the AST, you have to ask yourself: is this a "jumping" instruction or not? Method calls and if statements are examples of jumping instructions. So are loop constructs (such as for and while). Instructions such as addition and multiplication are non-jumping.

First associate with each java statement a node in the CFG, along with an entry and exit node. As a first approximation, walk the tree and:

  1. if the current statement is a method call, figure out where the entry node is for the corresponding body of that method call, and make an edge pointing from the current statement to that entry node. if the statement is a method return, enumerate the places that could have called it and add an edge to those.
  2. for each non-jumping statement, make an edge between it and the next statement.

This will give you some kind of CFG. The procedure is slightly hairy in step 2 because the method called may be declared in a library, and not elsewhere in the AST -- if so, either don't make an edge or make an edge to a special node representing the entry to that library method.

Does this make sense?