How to approach creating a JVM programming language?

functional picture functional · Aug 1, 2010 · Viewed 25.5k times · Source

I have created a compiler in C (using Lex & Bison) for a dynamic typed programming language that supports loops, functions declarations inside functions, recursive calls, etc. I also created a virtual machine that runs the intermediate code created by the compiler.

I was thinking about compiling it to Java bytecode instead of my own intermediate code.

I saw that the question about creating a JVM language has already been asked, but I don’t find the answer very informative.

So here are my questions:

  1. I guess to create a language for JVM a must is to read the JVM specification book, what other books can you suggest (except Dragon Book of course)? I’m mostly concerned about books or tutorials on how to create a JVM language, not a compiler in general.
  2. There are many Java libraries to read, write and change .class files like jclasslib, bcel, gnu bytecode, etc. Which one would you suggest? Also, are you aware of C libraries that do the same job?
  3. I was thinking about having a look at maybe another language that targets the JVM like Clojure, Jython or JRuby. But all these languages are very high level and complicated (to create a compiler for them). I was looking for a simpler (I don't mind if it's unknown or unused) programming language that targets the JVM and it's compiler is open source. Any ideas?

Answer

theomega picture theomega · Aug 1, 2010

I would also recommend ASM, but have a look at Jasmin, I used it (or, rather, had to use it) for a university project, and it worked quite well. I wrote a lexer-parser-analyzer-optimizer-generator combination for a programing language using Java and Jasmin, so it was generating JVM Code. I uploaded the code here; the interesting part should be the source code itself. In the folder bytecode/InsanelyFastByteCodeCreator.java, you find a piece of code which transforms an AST Tree into the input format of Jasmin assembler. It is quite straightforward.

The source language (which was transformed to the AST by the lexer-parser-analyzer) is a subset of Java called MiniJava. It lacks some “complicated” features like inheritance, constructors, static methods, private fields and methods. None of those features are difficult to implement, but there was another task to write an x86 backend (so to generate machine assembler), and those things tend to get difficult if you got no JVM which handles some of those things.

In case you wonder about the strange class name: The task of the university project was to transform the AST into an SSA Graph (representing the input code), optimize the graph, and then turn it into Java bytecode. That was about ¾ of the work of the project and the InsanlyFastByteCodeCreator was just a short-cut to test everything.

Have a look at the “Java Virtual Machine” book from Jon Meyer and Troy Downing. This book heavily references the Jasmin Assembler; it’s quite helpful for understanding the JVM internals.