Bytecode features not available in the Java language

Bart van Heukelom picture Bart van Heukelom · Jul 26, 2011 · Viewed 56.9k times · Source

Are there currently (Java 6) things you can do in Java bytecode that you can't do from within the Java language?

I know both are Turing complete, so read "can do" as "can do significantly faster/better, or just in a different way".

I'm thinking of extra bytecodes like invokedynamic, which can't be generated using Java, except that specific one is for a future version.

Answer

Rafael Winterhalter picture Rafael Winterhalter · Apr 22, 2014

After working with Java byte code for quite a while and doing some additional research on this matter, here is a summary of my findings:

Execute code in a constructor before calling a super constructor or auxiliary constructor

In the Java programming language (JPL), a constructor's first statement must be an invocation of a super constructor or another constructor of the same class. This is not true for Java byte code (JBC). Within byte code, it is absolutely legitimate to execute any code before a constructor, as long as:

  • Another compatible constructor is called at some time after this code block.
  • This call is not within a conditional statement.
  • Before this constructor call, no field of the constructed instance is read and none of its methods is invoked. This implies the next item.

Set instance fields before calling a super constructor or auxiliary constructor

As mentioned before, it is perfectly legal to set a field value of an instance before calling another constructor. There even exists a legacy hack which makes it able to exploit this "feature" in Java versions before 6:

class Foo {
  public String s;
  public Foo() {
    System.out.println(s);
  }
}

class Bar extends Foo {
  public Bar() {
    this(s = "Hello World!");
  }
  private Bar(String helper) {
    super();
  }
}

This way, a field could be set before the super constructor is invoked which is however not longer possible. In JBC, this behavior can still be implemented.

Branch a super constructor call

In Java, it is not possible to define a constructor call like

class Foo {
  Foo() { }
  Foo(Void v) { }
}

class Bar() {
  if(System.currentTimeMillis() % 2 == 0) {
    super();
  } else {
    super(null);
  }
}

Until Java 7u23, the HotSpot VM's verifier did however miss this check which is why it was possible. This was used by several code generation tools as a sort of a hack but it is not longer legal to implement a class like this.

The latter was merely a bug in this compiler version. In newer compiler versions, this is again possible.

Define a class without any constructor

The Java compiler will always implement at least one constructor for any class. In Java byte code, this is not required. This allows the creation of classes that cannot be constructed even when using reflection. However, using sun.misc.Unsafe still allows for the creation of such instances.

Define methods with identical signature but with different return type

In the JPL, a method is identified as unique by its name and its raw parameter types. In JBC, the raw return type is additionally considered.

Define fields that do not differ by name but only by type

A class file can contain several fields of the same name as long as they declare a different field type. The JVM always refers to a field as a tuple of name and type.

Throw undeclared checked exceptions without catching them

The Java runtime and the Java byte code are not aware of the concept of checked exceptions. It is only the Java compiler that verifies that checked exceptions are always either caught or declared if they are thrown.

Use dynamic method invocation outside of lambda expressions

The so-called dynamic method invocation can be used for anything, not only for Java's lambda expressions. Using this feature allows for example to switch out execution logic at runtime. Many dynamic programming languages that boil down to JBC improved their performance by using this instruction. In Java byte code, you could also emulate lambda expressions in Java 7 where the compiler did not yet allow for any use of dynamic method invocation while the JVM already understood the instruction.

Use identifiers that are not normally considered legal

Ever fancied using spaces and a line break in your method's name? Create your own JBC and good luck for code review. The only illegal characters for identifiers are ., ;, [ and /. Additionally, methods that are not named <init> or <clinit> cannot contain < and >.

Reassign final parameters or the this reference

final parameters do not exist in JBC and can consequently be reassigned. Any parameter, including the this reference is only stored in a simple array within the JVM what allows to reassign the this reference at index 0 within a single method frame.

Reassign final fields

As long as a final field is assigned within a constructor, it is legal to reassign this value or even not assign a value at all. Therefore, the following two constructors are legal:

class Foo {
  final int bar;
  Foo() { } // bar == 0
  Foo(Void v) { // bar == 2
    bar = 1;
    bar = 2;
  }
}

For static final fields, it is even allowed to reassign the fields outside of the class initializer.

Treat constructors and the class initializer as if they were methods

This is more of a conceptional feature but constructors are not treated any differently within JBC than normal methods. It is only the JVM's verifier that assures that constructors call another legal constructor. Other than that, it is merely a Java naming convention that constructors must be called <init> and that the class initializer is called <clinit>. Besides this difference, the representation of methods and constructors is identical. As Holger pointed out in a comment, you can even define constructors with return types other than void or a class initializer with arguments, even though it is not possible to call these methods.

Create asymmetric records*.

When creating a record

record Foo(Object bar) { }

javac will generate a class file with a single field named bar, an accessor method named bar() and a constructor taking a single Object. Additionally, a record attribute for bar is added. By manually generating a record, it is possible to create, a different constructor shape, to skip the field and to implement the accessor differently. At the same time, it is still possible to make the reflection API believe that the class represents an actual record.

Call any super method (until Java 1.1)

However, this is only possible for Java versions 1 and 1.1. In JBC, methods are always dispatched on an explicit target type. This means that for

class Foo {
  void baz() { System.out.println("Foo"); }
}

class Bar extends Foo {
  @Override
  void baz() { System.out.println("Bar"); }
}

class Qux extends Bar {
  @Override
  void baz() { System.out.println("Qux"); }
}

it was possible to implement Qux#baz to invoke Foo#baz while jumping over Bar#baz. While it is still possible to define an explicit invocation to call another super method implementation than that of the direct super class, this does no longer have any effect in Java versions after 1.1. In Java 1.1, this behavior was controlled by setting the ACC_SUPER flag which would enable the same behavior that only calls the direct super class's implementation.

Define a non-virtual call of a method that is declared in the same class

In Java, it is not possible to define a class

class Foo {
  void foo() {
    bar();
  }
  void bar() { }
}

class Bar extends Foo {
  @Override void bar() {
    throw new RuntimeException();
  }
}

The above code will always result in a RuntimeException when foo is invoked on an instance of Bar. It is not possible to define the Foo::foo method to invoke its own bar method which is defined in Foo. As bar is a non-private instance method, the call is always virtual. With byte code, one can however define the invocation to use the INVOKESPECIAL opcode which directly links the bar method call in Foo::foo to Foo's version. This opcode is normally used to implement super method invocations but you can reuse the opcode to implement the described behavior.

Fine-grain type annotations

In Java, annotations are applied according to their @Target that the annotations declares. Using byte code manipulation, it is possible to define annotations independently of this control. Also, it is for example possible to annotate a parameter type without annotating the parameter even if the @Target annotation applies to both elements.

Define any attribute for a type or its members

Within the Java language, it is only possible to define annotations for fields, methods or classes. In JBC, you can basically embed any information into the Java classes. In order to make use of this information, you can however no longer rely on the Java class loading mechanism but you need to extract the meta information by yourself.

Overflow and implicitly assign byte, short, char and boolean values

The latter primitive types are not normally known in JBC but are only defined for array types or for field and method descriptors. Within byte code instructions, all of the named types take the space 32 bit which allows to represent them as int. Officially, only the int, float, long and double types exist within byte code which all need explicit conversion by the rule of the JVM's verifier.

Not release a monitor

A synchronized block is actually made up of two statements, one to acquire and one to release a monitor. In JBC, you can acquire one without releasing it.

Note: In recent implementations of HotSpot, this instead leads to an IllegalMonitorStateException at the end of a method or to an implicit release if the method is terminated by an exception itself.

Add more than one return statement to a type initializer

In Java, even a trivial type initializer such as

class Foo {
  static {
    return;
  }
}

is illegal. In byte code, the type initializer is treated just as any other method, i.e. return statements can be defined anywhere.

Create irreducible loops

The Java compiler converts loops to goto statements in Java byte code. Such statements can be used to create irreducible loops, which the Java compiler never does.

Define a recursive catch block

In Java byte code, you can define a block:

try {
  throw new Exception();
} catch (Exception e) {
  <goto on exception>
  throw Exception();
}

A similar statement is created implicitly when using a synchronized block in Java where any exception while releasing a monitor returns to the instruction for releasing this monitor. Normally, no exception should occur on such an instruction but if it would (e.g. the deprecated ThreadDeath), the monitor would still be released.

Call any default method

The Java compiler requires several conditions to be fulfilled in order to allow a default method's invocation:

  1. The method must be the most specific one (must not be overridden by a sub interface that is implemented by any type, including super types).
  2. The default method's interface type must be implemented directly by the class that is calling the default method. However, if interface B extends interface A but does not override a method in A, the method can still be invoked.

For Java byte code, only the second condition counts. The first one is however irrelevant.

Invoke a super method on an instance that is not this

The Java compiler only allows to invoke a super (or interface default) method on instances of this. In byte code, it is however also possible to invoke the super method on an instance of the same type similar to the following:

class Foo {
  void m(Foo f) {
    f.super.toString(); // calls Object::toString
  }
  public String toString() {
    return "foo";
  }
}

Access synthetic members

In Java byte code, it is possible to access synthetic members directly. For example, consider how in the following example the outer instance of another Bar instance is accessed:

class Foo {
  class Bar { 
    void bar(Bar bar) {
      Foo foo = bar.Foo.this;
    }
  }
}

This is generally true for any synthetic field, class or method.

Define out-of-sync generic type information

While the Java runtime does not process generic types (after the Java compiler applies type erasure), this information is still attcheched to a compiled class as meta information and made accessible via the reflection API.

The verifier does not check the consistency of these meta data String-encoded values. It is therefore possible to define information on generic types that does not match the erasure. As a concequence, the following assertings can be true:

Method method = ...
assertTrue(method.getParameterTypes() != method.getGenericParameterTypes());

Field field = ...
assertTrue(field.getFieldType() == String.class);
assertTrue(field.getGenericFieldType() == Integer.class);

Also, the signature can be defined as invalid such that a runtime exception is thrown. This exception is thrown when the information is accessed for the first time as it is evaluated lazily. (Similar to annotation values with an error.)

Append parameter meta information only for certain methods

The Java compiler allows for embedding parameter name and modifier information when compiling a class with the parameter flag enabled. In the Java class file format, this information is however stored per-method what makes it possible to only embed such method information for certain methods.

Mess things up and hard-crash your JVM

As an example, in Java byte code, you can define to invoke any method on any type. Usually, the verifier will complain if a type does not known of such a method. However, if you invoke an unknown method on an array, I found a bug in some JVM version where the verifier will miss this and your JVM will finish off once the instruction is invoked. This is hardly a feature though, but it is technically something that is not possible with javac compiled Java. Java has some sort of double validation. The first validation is applied by the Java compiler, the second one by the JVM when a class is loaded. By skipping the compiler, you might find a weak spot in the verifier's validation. This is rather a general statement than a feature, though.

Annotate a constructor's receiver type when there is no outer class

Since Java 8, non-static methods and constructors of inner classes can declare a receiver type and annotate these types. Constructors of top-level classes cannot annotate their receiver type as they most not declare one.

class Foo {
  class Bar {
    Bar(@TypeAnnotation Foo Foo.this) { }
  }
  Foo() { } // Must not declare a receiver type
}

Since Foo.class.getDeclaredConstructor().getAnnotatedReceiverType() does however return an AnnotatedType representing Foo, it is possible to include type annotations for Foo's constructor directly in the class file where these annotations are later read by the reflection API.

Use unused / legacy byte code instructions

Since others named it, I will include it as well. Java was formerly making use of subroutines by the JSR and RET statements. JBC even knew its own type of a return address for this purpose. However, the use of subroutines did overcomplicate static code analysis which is why these instructions are not longer used. Instead, the Java compiler will duplicate code it compiles. However, this basically creates identical logic which is why I do not really consider it to achieve something different. Similarly, you could for example add the NOOP byte code instruction which is not used by the Java compiler either but this would not really allow you to achieve something new either. As pointed out in the context, these mentioned "feature instructions" are now removed from the set of legal opcodes which does render them even less of a feature.