Improving raytracer performance

FeepingCreature picture FeepingCreature · Apr 22, 2009 · Viewed 11.2k times · Source

I'm writing a comparatively straightforward raytracer/path tracer in D (http://dsource.org/projects/stacy), but even with full optimization it still needs several thousand processor cycles per ray. Is there anything else I can do to speed it up? More generally, do you know of good optimizations / faster approaches for ray tracing?

Edit: this is what I'm already doing.

  • Code is already running highly parallel
  • temporary data is structured in a cache-efficient fashion as well as aligned to 16b
  • Screen divided into 32x32-tiles
  • Destination array is arranged in such a way that all subsequent pixels in a tile are sequential in memory
  • Basic scene graph optimizations are performed
    • Common combinations of objects (plane-plane CSG as in boxes) are replaced with preoptimized objects
  • Vector struct capable of taking advantage of GDC's automatic vectorization support
  • Subsequent hits on a ray are found via lazy evaluation; this prevents needless calculations for CSG
  • Triangles neither supported nor priority. Plain primitives only, as well as CSG operations and basic material properties
  • Bounding is supported

Answer

simon picture simon · Apr 22, 2009

The typical first order improvement of raytracer speed is some sort of spatial partitioning scheme. Based only on your project outline page, it seems you haven't done this.

Probably the most usual approach is an octree, but the best approach may well be a combination of methods (e.g. spatial partitioning trees and things like mailboxing). Bounding box/sphere tests are a quick cheap and nasty approach, but you should note two things: 1) they don't help much in many situations and 2) if your objects are already simple primitives, you aren't going to gain much (and might even lose). You can more easily (than octree) implement a regular grid for spatial partitioning, but it will only work really well for scenes that are somewhat uniformly distributed (in terms of surface locations)

A lot depends on the complexity of the objects you represent, your internal design (i.e. do you allow local transforms, referenced copies of objects, implicit surfaces, etc), as well as how accurate you're trying to be. If you are writing a global illumination algorithm with implicit surfaces the tradeoffs may be a bit different than if you are writing a basic raytracer for mesh objects or whatever. I haven't looked at your design in detail so I'm not sure what, if any, of the above you've already thought about.

Like any performance optimization process, you're going to have to measure first to find where you're actually spending the time, then improving things (algorithmically by preference, then code bumming by necessity)