While writing a Matrix class for use with OpenGL libraries, I came across the question of whether to use Java arrays or a Buffer strategy to store the data (JOGL offers direct-buffer copy for Matrix operations). To analyze this, I wrote a small performance test program that compares the relative speeds of loop and bulk operations on Arrays vs Buffers vs direct Buffers.
I'd like to share my results with you here (as I find them rather interesting). Please feel free to comment and/or point out any mistakes.
The code can be viewed at pastebin.com/is7UaiMV.
Loop-read array is implemented as A[i] = B[i] as otherwise the JIT optimizer will completely remove that code. Actual var = A[i] seems to be pretty much the same.
In the sample result for array size of 10,000 it is very likely that the JIT optimizer has replaced the looped array access with a System.arraycopy like implementation.
There is no bulk-get buffer->buffer as Java implements A.get(B) as B.put(A), therefore the results would be the same as the bulk-put results.
Under almost all situations it is strongly recommended to use the Java internal Arrays. Not only is the put/get speed massively faster, the JIT is as well able to perform much better optimizations on the final code.
Buffers should only be used if both the following applies:
Note that a backened-buffer has a Java Array backening the content of the buffer. It is recommended to do operations on this back-buffer instead of looping put/get.
Direct buffers should only be used if you worry about memory usage and never access the underlying data. They are slightly slower than non-direct buffers, much slower if the underlying data is accessed, but use less memory. In addition there is an extra overhead when converting non-byte data (like float-arrays) into bytes when using a direct buffer.
For more details see here:
Note: Percentage is only for ease of reading and has no real meaning.
-- Array tests: -----------------------------------------
Loop-write array: 87.29 ms 11,52%
Arrays.fill: 64.51 ms 8,51%
Loop-read array: 42.11 ms 5,56%
System.arraycopy: 47.25 ms 6,23%
-- Buffer tests: ----------------------------------------
Loop-put buffer: 603.71 ms 79,65%
Index-put buffer: 536.05 ms 70,72%
Bulk-put array->buffer: 105.43 ms 13,91%
Bulk-put buffer->buffer: 99.09 ms 13,07%
Bulk-put bufferD->buffer: 80.38 ms 10,60%
Loop-get buffer: 505.77 ms 66,73%
Index-get buffer: 562.84 ms 74,26%
Bulk-get buffer->array: 137.86 ms 18,19%
-- Direct buffer tests: ---------------------------------
Loop-put bufferD: 570.69 ms 75,29%
Index-put bufferD: 562.76 ms 74,25%
Bulk-put array->bufferD: 712.16 ms 93,96%
Bulk-put buffer->bufferD: 83.53 ms 11,02%
Bulk-put bufferD->bufferD: 118.00 ms 15,57%
Loop-get bufferD: 528.62 ms 69,74%
Index-get bufferD: 560.36 ms 73,93%
Bulk-get bufferD->array: 757.95 ms 100,00%
-- Array tests: -----------------------------------------
Loop-write array: 22.10 ms 6,21%
Arrays.fill: 10.37 ms 2,91%
Loop-read array: 81.12 ms 22,79%
System.arraycopy: 10.59 ms 2,97%
-- Buffer tests: ----------------------------------------
Loop-put buffer: 355.98 ms 100,00%
Index-put buffer: 353.80 ms 99,39%
Bulk-put array->buffer: 16.33 ms 4,59%
Bulk-put buffer->buffer: 5.40 ms 1,52%
Bulk-put bufferD->buffer: 4.95 ms 1,39%
Loop-get buffer: 299.95 ms 84,26%
Index-get buffer: 343.05 ms 96,37%
Bulk-get buffer->array: 15.94 ms 4,48%
-- Direct buffer tests: ---------------------------------
Loop-put bufferD: 355.11 ms 99,75%
Index-put bufferD: 348.63 ms 97,93%
Bulk-put array->bufferD: 190.86 ms 53,61%
Bulk-put buffer->bufferD: 5.60 ms 1,57%
Bulk-put bufferD->bufferD: 7.73 ms 2,17%
Loop-get bufferD: 344.10 ms 96,66%
Index-get bufferD: 333.03 ms 93,55%
Bulk-get bufferD->array: 190.12 ms 53,41%
-- Array tests: -----------------------------------------
Loop-write array: 156.02 ms 4,37%
Arrays.fill: 109.06 ms 3,06%
Loop-read array: 300.45 ms 8,42%
System.arraycopy: 147.36 ms 4,13%
-- Buffer tests: ----------------------------------------
Loop-put buffer: 3385.94 ms 94,89%
Index-put buffer: 3568.43 ms 100,00%
Bulk-put array->buffer: 159.40 ms 4,47%
Bulk-put buffer->buffer: 5.31 ms 0,15%
Bulk-put bufferD->buffer: 6.61 ms 0,19%
Loop-get buffer: 2907.21 ms 81,47%
Index-get buffer: 3413.56 ms 95,66%
Bulk-get buffer->array: 177.31 ms 4,97%
-- Direct buffer tests: ---------------------------------
Loop-put bufferD: 3319.25 ms 93,02%
Index-put bufferD: 3538.16 ms 99,15%
Bulk-put array->bufferD: 1849.45 ms 51,83%
Bulk-put buffer->bufferD: 5.60 ms 0,16%
Bulk-put bufferD->bufferD: 7.63 ms 0,21%
Loop-get bufferD: 3227.26 ms 90,44%
Index-get bufferD: 3413.94 ms 95,67%
Bulk-get bufferD->array: 1848.24 ms 51,79%
Direct buffers are not meant to accelerate access from Java code. (If that were possible there was something wrong with the JVM’s own array implementation.)
These byte buffers are for interfacing with other components as you can write a byte buffer to a ByteChannel
and you can use direct buffers in conjunction with native code such as with the OpenGL libraries you mentioned. It’s intended to accelerate these operation then. Using a graphics card’s chip for rendering can accelerate the overall operation to a degree more than compensating the possibly slower access to the buffer from Java code.
By the way, if you measure the access speed to a byte buffer, especially the direct byte buffers, it’s worth changing the byte order to the native byte order before acquiring a FloatBuffer
view:
FloatBuffer bufferD = ByteBuffer.allocateDirect(SIZE * 4)
.order(ByteOrder.nativeOrder())
.asFloatBuffer();