I have an AST derived from the ANTLR Parser Generator for Java. What I want to do is somehow construct a control flow graph of the source code, where each statement or expression is a unique Node. I understand there must be some recursiveness to this identification, I was wondering what you would suggest as the best option and if ANTLR has a toolset I can use for this job. Cheers, Chris
EDIT - My main concern is to get a control flow graph(CFG) from the AST. This way I can get a tree representation of the source. To clarify, both the source code and the implementation language is Java.
Usually CFGs are computed on a lower-level representation (e.g. JVM bytecode). Someone did a thesis on such things a few years ago. There might be a helpful way described in there for how to get at that representation.
Since your source and target languages are the same, there's no code generation step -- you're already done! However, now you get to walk the AST. At each node of the AST, you have to ask yourself: is this a "jumping" instruction or not? Method calls and if statements are examples of jumping instructions. So are loop constructs (such as for
and while
). Instructions such as addition and multiplication are non-jumping.
First associate with each java statement a node in the CFG, along with an entry and exit node. As a first approximation, walk the tree and:
This will give you some kind of CFG. The procedure is slightly hairy in step 2 because the method called may be declared in a library, and not elsewhere in the AST -- if so, either don't make an edge or make an edge to a special node representing the entry to that library method.
Does this make sense?