Does interleaving in VBOs speed up performance when using VAOs

Merni picture Merni · Sep 17, 2013 · Viewed 8.1k times · Source

You usually get a speed up when you use interleaved VBOs instead of using multiple VBOs. Is this also valid when using VAOs?

Because it's much more convenient to have a VBO for the positions, and one for the normals etc. And you can use one VBO in multiple VAOs.

Answer

Sam picture Sam · Sep 17, 2013

VAOs

  • For sharing larger data sets, a dedicated buffer containing a single vertex (attrib) array is surely a way to go, while one could still interleave specific arrays in another buffer and combine them using a VAO.

  • A VAO handles the binding of all those buffers and the vertex (attrib) array states such as array buffer bindings and attrib entries with (buffer) pointers and enable/disable flags. Aside from its convenience, it is designed for doing this job quickly, not to mention the simple API call, which changes all states at once, without the tedious enabling and disabling of attrib arrays. It basically does, what we had to do manually before. However, with my own VAO-like implementation, I could not measure any performance loss, even when doing lots of binds. From my point of view, the major advantage is its convenience.

So, a VAO doesn't decide on drawing performance in terms of glDraw*, but it can have an impact on the overhead of state changes.

Interleaved data formats...

  • ...cause less GPU cache pressure, because the vertex coordinate and attributes of a single vertex aren't scattered all over in memory. They fit consecutively into few cache lines, whereas scattered attributes could cause more cache updates and therefore evictions. The worst case scenario could be one (attribute) element per cache line at a time because of distant memory locations, while vertices get pulled in a non-deterministic/non-contiguous manner, where possibly no prediction and prefetching kicks in. GPUs are very similar to CPUs in this matter.

  • ...are also very useful for various external formats, which satisfy the deprecated interleaved formats, where datasets of compatible data sources can be read straight into mapped GPU memory. I ended up re-implementing these interleaved formats with the current API for exactly those reasons.

  • ...should be layouted alignment friendly just like simple arrays. Mixing various data types with different size/alignment requirements may need padding to be GPU and CPU friendly. This is the only downside I know of, appart from the more difficult implementation.

  • ...do not prevent you from pointing to single attrib arrays in them for sharing.

Interleaving will most probably improve draw performance.

Conclusion:

From what I experienced, it is best to have cleanly designed interfaces for vertex data sources and 'compiled' VAOs, where one can encapsulate the VAO factory appropriately. This factory can then be altered to initialize interleaved, separate or mixed vertex buffer layouts from data sources, without breaking anything. This is especially useful for profiling.

After all that babbling, my advice is simple: Proper and sufficiently abstracted design before and for optimization.