In my last rendering architecture post I talked about using geometry caching to ray trace micropolygons. But Psychopath didn't stay there.
It wasn't long after making the first micropolygon renders that I stumbled across the paper Two-Level Ray Tracing with Reordering for Highly Complex Scenes by Hanika et al. Instead of using a geometry cache to ray trace diced surfaces, Hanika et al. used something called "ray reordering".
With standard ray tracing you trace one ray at a time, doing all of its intersection tests against all relevant parts of the scene before moving on to the next ray. But with ray reordering you instead queue up many rays at once and attempt (as best you can) to explicitly change the order of the ray tests, with the goal of testing many rays against the same part of the scene at once.
There are many different ways to go about this, but typically it involves some mechanism that lets you pause a ray's traversal through the scene. If you have such a mechanism, you can then pause a ray just before it's tested against an object in the scene. Once the ray is paused, you can move on to other rays in the hopes that some of them will also be paused before that same object. When you have enough rays paused before an object, you can unpause all of them together and do the ray tests for that object all at once.
Doing things this way definitely involves some overhead, but the benefit is that—especially if you queue a large number of rays—you end up accessing scene data in a much more coherent way. And that allows you to do some really cool things.
Classically, ray reordering has been used to ray trace scenes that are larger than available RAM: by explicitly trying to access the scene in a coherent way, you can drastically reduce paging to/from disk.
But really, ray reordering can be applied to any situation where accessing scene elements is expensive. It allows you to distribute the cost of that access among many rays. And in the case of Hanika et al., they applied this to dicing surfaces into microgeometry. If you re-dice a surface for every single ray test, that's obviously going to be prohibitively slow. But if you re-dice a surface for a whole batch of rays at a time, then the amortized cost can be quite reasonable.
In a sense, this is just doing the reverse of what a geometry cache does. A geometry cache saves up geometry to (hopefully) be tested against multiple rays. Ray reordering saves up rays to be tested as a (hopefully) large batch against some geometry.
For the second iteration of Psychopath, I did both. I kept the geometry cache, but I also implemented ray reordering as described in the Hanika et al. paper. And the results were interesting.
It turned out that geometry caching and ray reordering complemented each other nicely, picking up the slack for each other in the areas where each was weak. The ray reordering prevented the hard performance cliff that would happen when the geometry cache was too small (essentially helping to access the cache in a more coherent way). And the geometry cache helped reduce the amount of dicing when the ray reordering couldn't extract enough coherence from the ray batch.
So all-in-all it seemed like this was a great combination. But it turned out there were some pretty severe drawbacks that I didn't realize until I got further along in development. The drawbacks weren't so much from doing both in combination, but rather were individual drawbacks of each approach.
But I'll talk about that in another post.