Now that I've covered the drawbacks I ran into with the geometry cache, I'd like to talk about the issue I ran into with ray reordering.
Ray reordering worked extremely well in pretty much every way. It was actually quite fast, it wasn't too complex or difficult to implement, and I was even able to do a pretty nice QBVH implementation thanks to the awesome paper Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing by Áfra et al.
But the big drawback of ray reordering is that you have to keep ray state small. You have to be able to pause a ray's traversal, which means storing its current traversal state so that you can resume tracing later. And you are typically tracing thousands or even millions of rays at a time, so that state data can't be large.
With a typical BVH this is relatively straight-forward because it's a tree structure. You never have to store where you came from in the tree because it's implicit. No matter where you are in the BVH, you can just go to the parent node, and then the parent's parent node, and so-on.
Or put into different terms, it's a bit like writing an algorithm for walking a maze. There is a classic maze solving algorithm that doesn't require any state to be stored other than the current position: all you do is follow the wall on your right. And this works great as long as the maze is simply connected. By analogy, that's more-or-less how Psychopath handled things, and it allowed ray state to be very small.
Then I decided to introduce instancing (and full, arbitrarily deep, hierarchical instancing at that). When I did this, everything that allowed ray reordering to work came crashing down.
The fundamental problem is that when you introduce instancing you change the BVH from being a tree structure to being a DAG structure. Many nodes will then have more than one parent. Or to use the maze analogy again, it's like introducing loops and overpasses into the maze. When you do that, suddenly you have to start recording additional information about your traversal to do it correctly. And worse, that information can grow to be arbitrarily large.
When tracing thousands or millions of rays at a time, having unbounded storage for each ray is really not feasible. It also introduces memory allocation into the traversal process, which isn't ideal.
To get around this I considered limiting instancing to being a single level deep. That would put simple bounds on the amount of state information. But I really, really wanted hierarchical instancing. As a production feature I think it's really useful (imagine instancing leaves on a tree, and then instancing trees in a forest).
So I banged my head against the wall for a week or so, trying to figure out if there was a way I could reasonably make hierarchical instancing work with ray reordering. In practice, people aren't actually going to make arbitrarily deep instancing hierarchies, so I considered exploiting that in some way. But ultimately all the solutions I came up with seemed overly complex and/or brittle. I wanted things to Just Work™.
In the end, I abandoned ray reordering for something called breadth first ray tracing. I had already been using breadth first ray tracing in Psychopath's surface splitting code, so I had a reasonably good handle on it. And it had some really cool properties all its own. But I'll talk about that in another post.