I want to encrypt/decrypt lots of small (2-10kB) pieces of data. The performance is ok for now: On a Core2Duo, I get about 90 MBytes/s AES256 (when using 2 threads). But I may need to improve that in the future - or at least reduce the impact on the CPU.
The JVM will not, by itself, take advantage of special CPU features when executing code which happens to be an AES encryption: recognizing some code as being an implementation of AES is beyond the abilities of the JIT compiler. To use special hardware (e.g. the "Padlock" on VIA processors, or the AES-NI instructions on the newer Intel processors), you must go, at some point, through "native code".
Possibly, a JCE provider could do that for you. I am not aware of any readily available JCE provider which includes optimized native code for AES (there was a project called Apache JuiCE, but it seems to be stalled and I do not know its status). However it is conceivable that SunJCE will do that in a future version (but with Oracle buying Sun and the overfeaturism of OpenJDK 7, it is unclear when the next Java version will be released). Alternatively, bite the bullet and use native code yourself. Native code is invoked through JNI, and for the native AES code, a popular implementation is the one from Brian Gladman. When you get a bigger and newer processor with the AES-NI instruction, replace that native code with some code which knows about these instructions, as Intel describes.
By using AES-128 instead of AES-256 you should get a +40% speed boost. Breaking AES-128 is currently beyond the technological reach of Mankind, and should stay so for the next few decades. Do you really need a 256-bit key for AES ?