OpenGL is it better to batch draw or to have static VBOs

mk12 picture mk12 · Sep 24, 2011 · Viewed 12.2k times · Source

What is preferrable, from an effiency point of view (or another point of view if it's important) ?

Situation
An OpenGL application that draws many lines at different positions every frame (60 fps). Lets say there are 10 lines. Or 100 000 lines. Would the answer be different?

  • #1 Have a static VBO that never changes, containing 2 vertices of a line

Every frame would have one glDrawArrays call per line to draw, and in between there would be matrix transformations to position our one line

  • #2 Update the VBO with the data for all the lines every frame

Every frame would have a single draw call

Answer

ssube picture ssube · Sep 24, 2011

The second is incredibly more efficient.

Changing states, particularly transformation and matrices, tends to cause recalculation of other states and generally more math.

Updating geometry, however, simply involves overwriting a buffer.

With modern video hardware on rather massive bandwidth busses, sending a few floats across is trivial. They're designed for moving tons of data quickly, it's a side effect of the job. Updating vertex buffers is exactly what they do often and fast. If we assum points of 32 bytes each (float4 position and color), 100000 line segments is less than 6 MB and PCIe 2.0 x16 is about 8 GB/s, I believe.

In some cases, depending on how the driver or card handles transforms, changing one may cause some matrix multiplication and recalculating of other values, including transforms, culling and clipping planes, etc. This isn't a problem if you change the state, draw a few thousand polys, and repeat, but when the state changes are often, they will have a significant cost.

A good example of this being previously solved is the concept of batching, minimizing state changes so more geometry can be drawn between them. This is used to more efficiently draw large amounts of geometry.

As a very clear example, consider the best case for #1: transform set triggers no additional calculation and the driver buffers zealously and perfectly. To draw 100000 lines, you need:

  • 100000 matrix sets (in system RAM)
  • 100000 matrix set calls with function call overhead (to video driver, copying the matrix to the buffer there)
  • 100000 matrices copied to video RAM, performed in a single lump
  • 100000 line draw calls

The function call overhead alone is going to kill performance.

On the other hand, batching involves:

  • 100000 point calculations and sets, in system RAM
  • 1 vbo copy to video RAM. This will be a large chunk, but a single contiguous chunk and both sides know what to expect. It can be handled well.
  • 1 matrix set call
  • 1 matrix copy to video RAM
  • 1 draw call

You do copy more data, but there's a good chance the VBO contents still aren't as expensive as copying the matrix data. Plus, you save a huge amount of CPU time in function calls (200000 down to 2). This simplifies life for you, the driver (which has to buffer everything and check for redundant calls and optimize and handle downloading) and probably the video card as well (which may have had to recalculate). To make it really clear, visualize simple code for it:

1:

for (i = 0; i < 100000; ++i)
{
    matrix = calcMatrix(i);
    setMatrix(matrix);
    drawLines(1, vbo);
}

(now unwrap that)

2:

matrix = calcMatrix();
setMatrix(matrix);
for (i = 0; i < 100000; ++i)
{
    localVBO[i] = point[i];
}
setVBO(localVBO);
drawLines(100000, vbo);