Confused by java compilation process
OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode, then the jvm which is platform dependent translates it into machine code.
So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it? If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code. How can a language itself compile its own language code? It all seems like chicken and egg problem to me.
Now what exactly does the .class file contain? Is it a abstract syntax tree in text form, is it tabular information, what is it?
can anybody tell me clear and detailed way about how my java source code gets converted in machine code.
OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode,
Actually the compiler itself works as a native executable (hence javac.exe). And true, it transforms source file into bytecode. The bytecode is platform independent, because it's targeted at Java Virtual Machine.
then the jvm which is platform dependent translates it into machine code.
Not always. As for Sun's JVM there are two jvms: client and server. They both can, but not certainly have to compile to native code.
So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it?
This exe
file is a wrapped java bytecode. It's for convenience - to avoid complicated batch scripts. It starts a JVM and executes the compiler.
If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code.
That's exactly what wrapping code does.
How can a language itself compile its own language code? It all seems like chicken and egg problem to me.
True, confusing at first glance. Though, it's not only Java's idiom. The Ada's compiler is also written in Ada itself. It may look like a "chicken and egg problem", but in truth, it's only a bootstrapping problem.
Now what exactly does the .class file contain? Is it an abstract syntax tree in text form, is it tabular information, what is it?
It's not Abstract Syntax Tree. AST is only used by tokenizer and compiler at compiling time to represent code in memory. .class
file is like an assembly, but for JVM. JVM, in turn, is an abstract machine which can run specialized machine language - targeted only at virtual machine. In it's simplest, .class
file has a very similar structure to normal assembly. At the beginning there are declared all static variables, then comes some tables of extern function signatures and lastly the machine code.
If You are really curious You can dig into classfile using "javap" utility. Here is sample (obfuscated) output of invoking javap -c Main
:
0: new #2; //class SomeObject
3: dup
4: invokespecial #3; //Method SomeObject."<init>":()V
7: astore_1
8: aload_1
9: invokevirtual #4; //Method SomeObject.doSomething:()V
12: return
So You should have an idea already what it really is.
can anybody tell me clear and detailed way about how my java source code gets converted in machine code.
I think it should be more clear right now, but here's short summary:
You invoke javac
pointing to your source code file. The internal reader (or tokenizer) of javac reads your file and builds an actual AST out of it. All syntax errors come from this stage.
The javac
hasn't finished its job yet. When it has the AST the true compilation can begin. It's using visitor pattern to traverse AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class
file containing bytecode.
Now it's time to run the thing. You invoke java
with the name of .class file. Now the JVM starts again, but to interpret Your code. The JVM may, or may not compile Your abstract bytecode into the native assembly. The Sun's HotSpot compiler in conjunction with Just In Time compilation may do so if needed. The running code is constantly being profiled by the JVM and recompiled to native code if certain rules are met. Most commonly the hot code is the first to compile natively.
Edit: Without the javac
one would have to invoke compiler using something similar to this:
%JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile
As you can see it's calling Sun's private API so it's bound to Sun JDK implementation. It would make build systems dependent on it. If one switched to any other JDK (wiki lists 5 other than Sun's) then above code should be updated to reflect the change (since it's unlikely the compiler would reside in com.sun.tools.javac package). Other compilers could be written in native code.
So the standard way is to ship javac
wrapper with JDK.