I was just messing around. I downloaded the dex2jar http://code.google.com/p/dex2jar/ and the Java Decompiler JD-GUI http://java.decompiler.free.fr/?q=jdgui
I got my own apk file (signed, sealed and on Google Play), used dex2jar to make it into a jar repository.
command line (Windows users use .bat, everyone else .sh):
d2j-dex2jar.bat -f MyAwesomeApp.apk
I dragged and dropped the output into a JD-GUI, and all the class files, the original code reappeared. I was taken aback a bit. Is my java/Android code this exposed? How is ProGuard protecting my apk if it can be decompiled and regenerated so easily? It doesn't seem obfuscated at all...
Thanks in advance.
Obfuscators usually simply change classes, methods and fields names to names that have no meaning. So, if you have "ScoreCalculator.computeScore(Player p, Match m)" you end up with "A.zk(F f, R r)". This is similar to what Uglify or Closure compiler do for javascript, except that in javascript it is to reduce source length.
It is possible to understand what the method does anyway, it is only harder.
Aslo, Java uses late binding (as DLLs or SO files). So, calls that go outside your code (like to java.util, java.lang etc.. packages) cannot be obfuscated. Also, if your code needs to receive calls from outside (a typical example, register a listener on a button), that code cannot be obfuscated. Same happens for a DLL, where you can see clearly the name of method that need to be called form outside the DLL and calls to other DLLs.
However, the mapping between a certain source code and the compiled code is not necessarily one to one. Older C compilers used to produce the same op code for a given source directive, so decompilers were very effective. Then C compilers added many optimizations to resulting op code, and these optimizations made decompiler mostly ineffective [1]
Java never implemented (a lot of) optimizations at compile time, because to run on different platforms (there including different android devices), Java decided to apply serious optimizations later, at run time, based on the architecture and hardware properties of the running device (this is what "HotSpot" is mostly about [2]).
Good obfuscators usually also reorder bytecode instructions, or insert some useless ones, or apply some optimizations upfront to make decompilers unable (or less able) to derive source code so easily.
This technique is useless when it comes to people who can read bytecode, as any possible C obfuscation is useless if a person can read assembler code.
As many cracking softwares demonstrate, reverse engineering is always possible, even with C or other laguages, even on firmware (think about iPhone firmwares), cause the client your code is running on is always untrusted, and can always be tampered with.
If you have very mission critical code, something worth a lot of money that someone else may steal, I'd suggest to run it server side, or validate it server side somehow.