What are the advantage of using indirect rendering in OpenGL?

viktorzeid picture viktorzeid · Oct 23, 2013 · Viewed 10.8k times · Source

I read that the APIs like glDrawElementsIndirect, glDrawArraysIndirect help us in indirect rendering. Indirect rendering is different from direct in the way that the rendering parameters like "number of vertex attributes", "number of instances to draw", "starting vertex attribute from buffer object" etc are provided in a buffer object by the GPU itself rather than being provided by the CPU in the draw call.

I understood that. It also explained that the advantage is that it gets rendered faster because there is no CPU interaction involved. But wait, wasn't it the CPU that actually made the render call? It still specified the rendering mode (GL_TRIANGLES etc). It also possibly loaded the vertex attributes.

So is all the perf gain in indirect rendering being accounted for by just not having to pass these tiny variables : "count", "primitive count", "first vertex attribute", "instance count" ? This doesn't make much sense to me. (It is not changing any state either)

Answer

Damon picture Damon · Oct 23, 2013

The performance gain is often not so much due to passing some small variable like "count" or "instance count", but due to knowing these. In order to know these values, you must do a round trip to the CPU, which is only possible after the result is available, i.e. after a server sync (plus it adds the latency of the bus).

Say you are using transform feedback with a geometry shader. This means no matter what you feed in, you don't really know what comes out on the other end, not before the batch has finished and you've queried the counts, anyway.
Indirect rendering addresses this, you don't need to know and actually you don't want to know. The information goes into a buffer object, and the GPU can access it without your intervention.

That's analogous to conditional rendering. Actually you could skip the whole thing of conditional rendering, couldn't you. Instead of submitting commands to the command queue that will maybe not get executed (how inefficient!), you could run your occlusion query and see whether it passes or not, and then decide whether to submit those objects that you want to draw.
Except this means you must wait until the query (and thus the previous batch) is finished, sync, and do a PCIe transfer before making this decision. During this time, the GPU likely stalls, and then you've still not set up the right buffers/textures and submitted commands. In reality, it is therefore much more efficient to speculatively submit commands and let the driver/GPU decide whether to discard them or whether to draw them.

That's also the idea behind ARB_query_buffer_object, which lets you read a query result into a buffer object.

EDIT:
Also, indirect rendering allows for much more efficient submission of render command batches (especially in combination with persistent mappings) which may avoid much or all of the server/client and CPU/GPU synchronizations normally present and may come from another processor core and saves the per-drawcall fixed overhead. See pages 62 onward in Cass Everitt's talk.