Can I get an XML AST dump of C/C++ code with clang without using the compiler?

jdehaan picture jdehaan · Mar 15, 2011 · Viewed 14.3k times · Source

I managed to compile successfully clang for windows with cmake and visual studio 10. I would like to get an XML file as AST representation of the source code. There is one option that provides the result with clang with gcc under linux (ubuntu) but doesn't work on the windows box:

clang -cc1 -ast-print-xml source.c

However, this is invoking the compilation stage (which I would like to avoid). Digging in the source code didn't help me so far as I am quite new to clang. I could manage to generate the binary version of the AST by using:

clang -emit-ast source.c

Unfortunately, this format is unusable directly for parsing. Is there some existing method to directly generate the XML tree instead of a binary one in clang?

The goal is to use the XML representation in other tools in the .NET environment so I would need to make some wrapping around the native clang lib to access the binary AST. Maybe there is a third option if someone already wrote some binary clang AST parser for .NET?

Is it possible that I am missing something like if the AST generated by the clang front end is not equivalent to the one generated in the compilation stage.

Answer

Matthieu M. picture Matthieu M. · Mar 18, 2011

For your information, the XML printer has been removed from the 2.9 version by Douglas Gregor (responsible of CLang FrontEnd).

The issue was that the XML printer was lacking. A number of the AST nodes had never been implemented in the printer, as well as a number of the properties of some nodes, which led to an inaccurate representation of the source code.

Another point raised by Douglas was that the output should be suitable not for debugging CLang itself (which is what the -emit-ast is about) but for consumption by external tools. This requires the output to be stable from one version to another. Notably it should not be a 1-on-1 mapping of CLang internal, but rather translate the source code into standarized language.

Unless there is significant work on the printer (which requires volunteers) it will not be integrated back...