Psychopath Renderer

a slightly psychotic path tracer

BVH4 Without SIMD

I recently switched the Rust version of Psychopath over from a binary BVH (BVH2) to a 4-branching BVH (BVH4). This is a fairly standard thing to do in a ray tracer to take advantage of SIMD operations on modern CPUs. With a BVH4 you can test against four bounding boxes at once with SIMD, which in theory can get you some pretty significant speedups.

However, my BVH4 implementation in Psychopath doesn't use SIMD, and is nevertheless faster than (or at least comparable to) my BVH2 implementation. This surprised me because I thought (?) the conventional wisdom was that a BVH4 is slower except when used to exploit SIMD.

So I'm writing this post to explain how I arrived here and why BVH4 might always be better.

A Back-of-the-Napkin Analysis

One of the things that makes BVHs (regardless of branch factor) a bit different than some other tree data structures is that in a BVH you have to test all of the children to figure out which ones to traverse into. You don't have to test them all at once—you can test against one child and immediately traverse into it if the ray hits it. But eventually you have to come back and test against the other sibling(s).

The reason for this is that BVH child nodes can overlap in space. Just because you found an intersection in one doesn't rule out a closer intersection in its siblings, so you still have to traverse into any that were hit to check for a closer intersection. This is true even if you test the nodes in front-to-back order.

(Note: in my discussion with Fran├žois Beaune (of Appleseed fame), he noted that in theory you could store information about whether the children are overlapping or not. However, I don't think that changes this analysis, because you still on average have to test against multiple children, and such a trick would apply to all branch factors anyway. But I've been wrong before!)

So let's say we have a scene with sixteen triangles (or whatever geometry—I'll assume triangles). If we color in the nodes we test against in a minimal traversal that reaches a leaf, it looks like this for a BVH2 and a BVH4:

BVH minimal leaf traversal

Interestingly, it turns out that it's nine nodes in both cases. And this equality holds true for any scene with any number of triangles.

For a BVH2 with N triangles, the number of nodes tested in this kind of minimal traversal can be computed with log2(N) * 2 + 1. And for a BVH4 with log4(N) * 4 + 1.

You can generalize this to any branch factor with logB(N) * B + 1, where B is the branch factor. If you take that equation and graph the number of node tests for various branch factors, you get a graph that looks like this:

BVH cost graph

(X-axis is the branch factor B, and Y-axis is the number of node tests. Note that although this particular graph assumes a fixed N, the patterns hold for all values of N.)

There are a couple of things to note about this graph:

  1. It shows that, yes indeed, branch factors of 2 and 4 do the same number of node tests. Or put another way: the equations I gave above for a BVH2 and BVH4 are equivalent.
  2. When the branch factor exceeds 4, it starts doing worse and worse (more node tests). This is consistent with general wisdom that larger branch factors are progressively worse for efficiency. It just appears to not kick in until after a branch factor of 4.

(Also: 3. It seems to be the case that a branch factor of 3 is actually the most efficient. I haven't explored this at all, but it would be really cool if someone else would—I'd be very curious to see if it actually performs better. The tricky bit, I think, will be constructing a high quality tree with that branch factor.)

Implementation in Psychopath

There is still more analysis to cover, but this is the point in the analysis where my curiousity spurred me to try implementing it in Psychopath. So I'm going to take a brief detour to show what happened.

The implementation is pretty straight-forward. To build the BVH4, I first construct a BVH2 and then collapsing it down to a BVH4. I believe this is the standard way of constructing BVH4's. To traverse the BVH4 I use the technique described in Efficient Ray Tracing Kernels for Modern CPU Architectures by Fuetterling et al. to determine which order to traverse the nodes in (see section 2.2 in the paper). Importantly, that technique results in the exact same traversal order as my BVH2 implementation.

The primary implications of constructing and traversing the BVH4 this way are:

  1. The topology and bounding boxes of the BVH2 and BVH4 are equivalent: the BVH4 is just missing half of the inner nodes, skipping every-other level of the BVH2.
  2. As mentioned above, the traversal order of the BVH2 and BVH4 are identical.
  3. The leaf data (e.g. triangles) tested against due to the BVH2 and BVH4 are also identical.

In other words, the only difference between the BVH2 and the BVH4 is the branch factor itself.

Once I got everything working I did a bunch of test renders on various test scenes I have. Here are some of those renders (all rendered at 16 samples per pixel):

630 cars (Model by 'thecali', available on BlendSwap.)

drone (Model from SKYWATCH, a super cool short film.)

city

cornell with character

I rendered all of the test scenes with both the BVH2 and BVH4 to compare performance. Here are the render times (excluding BVH build time), calculated as best-out-of-five:

Cars Drone City Cornell(ish)
BVH2 31.074s 8.392s 22.824s 4.405s
BVH4 30.967s 8.357s 22.882s 4.357s

The only scene where the BVH2 beats the BVH4 is the city scene, and that's a bit suspect because it was an anomalously fast render among its five runs. But the main point is this: it appears that the non-SIMD BVH4 does indeed perform at least comparably to the BVH2.

A More Complete Analysis

The results from the render tests are largely consistent with the initial analysis. But there is nevertheless something important missing from that analysis: it assumes that for every parent, precisely one child will need to be traversed into. No more, and no less.

What if none of the children are hit? In that case the number of node tests looks like this:

no children hit

At first blush, this suggests that the BVH4 should be worse. Except... the render tests didn't bare that out. And that's because there is another case: all child nodes being hit. And that looks like this:

all children hit

So the reality is that there are three cases for the BVH2:

children hit table

Basically, the BVH2 has both a better best case and a worse worst case. While the BVH4 always tests against all four children, the BVH2 can test against as few as two and as many as six children and grandchildren. Nevertheless, the average case is still the same for both the BVH2 and BVH4: four.

However! That's a very loose definition of "average". We're assuming that all cases are equally likely, but in reality that depends on the particular BVH and the particular rays being tested. This, of course, is the kind of thing that makes you curious: how many ray-node tests are actually being done in the test scenes?

So I instrumented the code to count all ray-node tests, and this is the result:

Cars Drone City Cornell(ish)
BVH2 10619320877 2231421620 2788802110 386592706
BVH4 10461971888 2153069143 2652842455 334467875
% of BVH2 98.5% 96.5% 95.1% 86.5%

I don't claim that these are exhaustive test scenes by any means. For one thing, they're all shaded with gray lambert, and maybe the distribution of rays with other materials would change things. But nevertheless, this is at least interesting. In all of the scenes I tested, the BVH4 did fewer tests than the BVH2. Sometimes by a tiny margin, but it was always less.

Wrapping Up

So that's it. BVH4 appears to be at least comparable to BVH2 (if not a hair faster) even without SIMD. So for a more robust ray tracer implementation, BVH4 may be worth exploring. And not only for the potential (tiny) perf gains, but also because higher branch factors require fewer nodes and thus take up less memory.

Of course, there are some potentially confounding factors here:

  • Psychopath does breadth-first ray tracing with thousands of rays at once, which means that the cost of any per-node computations other than ray tests get amortized over many rays. Specifically I'm referring to the traversal code lookup from Efficient Ray Tracing Kernels for Modern CPU Architectures, which might have more relative overhead in a more typical ray tracer. Moreover, if you don't use the traversal code approach, then you'll have to sort the four children based on ray hit distance, which is (super-linearly) more expensive than sorting two children.
  • My test scenes are far from definitive! Testing with more production-like scenes, with complex shaders, depth complexity, etc. is still needed to make a strong case.
  • I haven't done any analysis of the "overlap" flags idea that Fran├žois suggested, much less any testing of it with actual scenes. I'm skeptical if this would have any branch-factor-specific impact, but I've been known to be very wrong about such things in the past! And at the very least, it's an interesting idea to explore.

I guess I'm not trying to make any especially strong claims with this (at least not in writing—buy me a beer some time). And since if you're writing a BVH4 you might as well use SIMD anyway, that makes most of this pretty moot. But for me this was nevertheless really fascinating to discover.

I hope this write-up made for an interesting read. And if anyone takes a crack at replicating these results, especially with more production-like scenes, I would be very interested to hear about it!

Switching from C++ to Rust

I'll get right to it: I've rewritten Psychopath in Rust. And I feel like this requires an explanation.

The short version is that Psychopath is a personal project that I work on for fun, and I had fun rewriting it in Rust. The rest of this post is the long version, as seen through the lens of my experiences developing in Rust.

But first I want to be really, really clear that I do not advocate rewriting things in Rust. It's apparently a meme to say that things should be rewritten in Rust (or so I hear... I haven't actually witnessed it much). If you have a working code-base that's actually used for real things, then rewriting it in Rust (or any language) requires strong justification. In my case, Psychopath isn't used for real things, so it doesn't take much justification aside from "I felt like it".

With that out of the way, let's dive in!

The good

I'm going to first talk about the things I like most in Rust. But don't worry, we'll get to the bad stuff too!

Cargo

The thing that seems to get talked about most with Rust is its memory safety. But for me, that's actually pretty far down the list of things I like about Rust. What I've enjoyed most, by far, about Rust is the tooling around it.

In C++, if I want to build a project that has a bunch of dependencies it's... kind of a pain. Unless your OS ships with the needed dev dependencies, you're going to need to download and build them all, make sure whatever build system the code-base uses can find them, and probably wrestle with a bunch of other fiddly things. It's a pain. And it's also recursive, because the dependencies may have dependencies. And it's also a pain to help other people when it's your code-base that they're trying to build.

In Rust, you download the code and run:

cargo build --release  

That's it.

Cargo is Rust's official build tool, and it automatically downloads, caches, and builds the needed dependencies (both direct and indirect) and links them into your build. The only time I've seen it get messy is when there are indirect C/C++ dependencies (e.g. via wrappers for C/C++ libraries). But even then, it's no messier than a standard C or C++ code base, and only for those dependencies.

The extent to which that makes my life easier is difficult to over-state. It feels like I can breath again. Want to just try using a library in your code-base quick? Super fast and easy. Decide you don't like it? Get rid of it just as quickly.

Other tooling

I could go on and on about the tooling, but I'll summarize by pointing out two other things that Rust comes with:

  • Documentation generation
  • Unit testing

Having a standard way to generate documentation from your code-base is great, and it has a trickle-down effect on the ecosystem where more things seems to be documented than not. (Admittedly, the quality of the documentation varies.)

And adding unit tests is as simple as adding the test attribute to a function:

#[test]
fn my_test() {  
    assert!(some_condition);
}

Having these kinds of basic things included just makes life easier.

In short, Rust comes with all the basics of modern tooling for a programming language. And it's done well.

No headers

Another thing that I love about Rust is the lack of header files. I hate header files. They're awful. I don't care what anyone says.

Instead of header files, Rust has a really nice module system. Some people seem to get confused by it, but it made perfect sense to me from the beginning. But even if it's a little confusing at first, it's still far better than headers. I don't have to write function signatures twice, keep them in sync, etc.

Strict typing

C and C++ have strict-ish typing, which is nice. But Rust has super-duper strict typing, which is even nicer. Combined with some other aspects of Rust, it makes refactoring a much more confident exercise.

The super strict typing is also the foundation of Rust's type-inference. Within a function, Rust is able to infer most of your types. So there's almost a contradiction in that Rust's super strict typing actually lets you worry about and wrangle with types less. Both more correct and easier.

(As an aside: Rust does not do inter-function type inference: you have to specify the types in function signatures. Essentially, Rust believes in API boundaries being explicit. This keeps things easy to reason about for the programmer and prevents APIs from accidentally changing.)

Memory safety

Okay, you knew it was coming. I do, in fact, like the memory-safe-by-default approach that Rust takes. I spent days pulling my hair out tracking down segfaults and other weird behaviors in the C++ version of Psychopath, and it's just not fun.

In Rust, this rarely happens. And when it does happen it's only because I dropped into unsafe code somewhere, so I know where to start looking. Unsafe code aside, the worst that happens is a run-time panic with a stack trace showing exactly where things went wrong. And usually it doesn't even make it that far. Usually it's a compile error.

But I don't want to over-state this. Memory safety is nice, but I would use Rust even without it. Rust has a lot more going for it than just memory safety. It's a really well-rounded language... assuming you want to do low-level coding.

The bad

So, Rust isn't all sunshine and rainbows, unfortunately. And I would be remiss if I didn't talk about the things I find frustrating.

No unions

(UPDATE: Rust now has unions, as of release 1.19. Yay!)

This is actually on its way, so this will likely be outdated quickly. But up to this point Rust doesn't have unions.

That's not quite as bad as it sounds, because Rust does have sum types (using the enum keyword) which are essentially tagged unions and really nice. But when working on e.g. the BVH in Psychopath it would have been really nice to use the lower-level version of unions to really squeeze those extra bytes out of the node structs.

No SIMD intrinsics

Stable Rust has no support for SIMD intrinsics. You can use them on the unstable 'nightly' compiler, but there's no current track for stabilizing them as far as I'm aware.

What this means for Psychopath is that I have a flag you can pass to cargo that tells it to build alternate SIMD versions of some code, but you can only pass that flag (and have it successfully build) if you're using nightly.

EDIT: on the reddit thread stusmall points out that SIMD is planned to be stablized at some point and is considered "on the horizon". What I meant to say is that there isn't a clear timeline for it. It is considered important and will be stablized eventually, but it's anyone's best guess when that might be.

Slow build times

This isn't so much of a hit against Rust compared to C++ (which also has slow build times), but hearing about e.g. Go and D's build times makes me really jealous. And the language that Jonathan Blow is developing seems to just blow things out of the water.

As a new language without all the historical baggage, I would have hoped for Rust to be much faster at building. The Rust team is definitely working on improving the situation, but it is so far off from where I feel like it ought to be that the incremental improvements they're making seem like drops in a bucket.

Mind you, it's entirely possible that I'm being overly pessimistic here. So take it with a grain of salt. But that's how I feel about it at the moment.

Fortunately, Psychopath isn't that large of a code-base, and I've also split it into multiple sub-crates which definitely helps. Building Psychopath itself (assuming external dependencies are already built) takes around 17 seconds on my machine. So it's not terrible, and is about what I would expect from C++ as well.

Immature ecosystem

There aren't a lot of stable libraries out there for Rust. Fortunately, this is being actively worked on. But even so, rendering and vfx is a pretty niche space. And there are just so many things written in C++ for rendering and VFX that are unlikely to get robust Rust equivalents any time soon. It makes writing a renderer in Rust a little daunting.

Of course this isn't Rust's fault by any means. But it is a reality.

The saving grace (perhaps) is that Rust's C FFI story is excellent. Me and another fellow have been working on a Rust wrapper for OpenEXR and it has been a largely pleasant experience. There are some fiddly bits to get right since OpenEXR is C++ rather than C. And it does take time. But it takes a lot less time than writing a full-featured OpenEXR library from scratch.

So I suspect that writing wrappers is what I'll be doing when I need non-trivial functionality from C++ libraries. But that's certainly not as straightforward as just using a library directly. And it negates a little the very first positive point I mentioned about easy builds.

On the bright side, if I'm successful at this, then many of those libraries will already be wrapped for other intrepid developers. Eventually.

Fighting the borrow checker

The borrow checker is the part of Rust's compiler that makes sure ownership rules are properly followed. It is the basis of Rust's memory safety guarantees, which is awesome. But it also takes some getting used to.

By and large, when the borrow checker has refused to compile something I wrote, it's because it was right and I was wrong. So my complaint isn't actually that the borrow checker exists, but rather it's that the error messages are sometimes... confusing. And it's also not always clear how to change things to satisfy it.

Mind you, the error messages are nowhere near as bad as template meta-programming errors in C++. And it's certainly better than tracking down mysterious memory safety bugs. And the borrow checker errors have made great strides in getting better. And actually, most of the time they're pretty clear these days...

...but even so, it's a bit of a hurdle that you have to get over. And every once in a while the borrow checker throws you a curve ball that's just really confusing, and that can be really frustrating.

Having said all that, I left the borrow checker until last for the same reason I put memory safety last in the list of good things: I think its importance is overblown. A lot of people seem to talk a lot about it, but I think it's a minor annoyance.

With most new languages there are some new concepts or ways of thinking about things that you have to wrap your head around, and Rust is no different. But if you code in C++, I can't imagine it being a significant hurdle. It might piss you off for the first week, and then you'll get over it.

Overall

I'm certainly not ready to recommend Rust to anyone doing serious rendering or vfx work yet, mostly because of the lack of unions and SIMD in the stable compiler. And I think it absolutely makes sense for people to be cautious and see how things develop over the next few years.

However, I absolutely do recommend giving Rust a spin for a non-trivial fun project. The immature ecosystem might bite you depending on what you're trying to do, but I think the experience of trying to create something in Rust is kind of... eye opening. You'll likely be a bit frustrated by the borrow checker at first, but once you get over that I think you're equally likely to find programming in Rust really fun. And Rust as a language is clearly extremely capable over-all.

Subdivision Surfaces

I recently implemented Catmull-Clark subdivision surfaces in Psychopath.

One of my goals with Psychopath is for all curved surfaces to actually render as curved surfaces rather than as tessellated triangle meshes. To accomplish this for subdivision surfaces I am leveraging OpenSubdiv to pre-process the cage mesh into b-spline patches. Psychopath then converts the b-spline patches into bezier patches which it already knows how to render.

The end result is this:

Pontiac GTO '67

(Model by 'thecali', available on BlendSwap—please note that the version in the render above has some parts missing, as some bits were modeled with curves which Psychopath can't render yet.)

In other words, I can finally render interesting models! Yay! Most things are modeled with Catmull-Clark subdivision surfaces these days, so this a big step.

The only down-side so far is that OpenSubdiv seems to be a bit on the slow side when converting the mesh to b-spline patches—it takes around 45 seconds for this model. It's possible I'm not using the library optimally. It's certainly something I'll need to investigate. But I'm quite pleased for now!

Spectral Rendering

Up to this point Psychopath has represented color with RGB values. This is a convenient way to handle color in a renderer because most color input to the renderer is going to be in some kind of RGB format (textures, hand-specified colors, etc.) and most color output the renderer produces is going to be in some kind of RGB format as well (final rendered images). So it makes sense that if your input is RGB, and your output is RGB, probably everything in-between should be RGB as well.

But RGB isn't how real light works. Real light is composed of a spectrum of wavelengths, much the same way that sound is. RGB is a bit like sampling audio at only three frequencies.

Even so, rendering with RGB usually works fine because human color vision is also more-or-less RGB. But there are side effects of spectrums that, although their causes are not visible to the naked eye, their results are. Fluorescent lighting is a good example: the effect it has on colors can't be accurately reproduced in RGB without also tweaking the colors of the materials. Prisms and rainbows are another example.

Admittedly, these examples are largely corner cases that can be faked, and I'm designing Psychopath with animation and VFX in mind—both of which are cases where artistic control is more important than strict accuracy. But nevertheless, being able to render spectral effects is cool! And it might come in handy when trying to match scene lighting for a VFX shot in a fluorescent-lit office building.

So I decided to make Psychopath a spectral renderer.

There are two broad approaches to spectral rendering: binning, and Monte Carlo spectral sampling.

Binning is basically RGB on crack. You can think of RGB color as three spectral samples: one in the reddish part of the spectrum, one in the greenish part, and one in the blueish part. Binning formalizes and generalizes this concept. Instead of three vaguely defined samples, you have N very specific ranges or "bins" over the visible spectrum. Anywhere from 9 to 30 bins is common with this approach.

Monte Carlo spectral sampling takes a completely different approach. Instead of dividing up the spectrum into predefined regions, each light path gets a randomly assigned wavelength. Then as more and more paths are traced, more and more points on the spectrum get sampled, and the image converges to the correct colors in much the same way as everything else in a Monte Carlo path tracer.

Both approaches have their pros and cons.

The biggest pro of the binning approach is that colors still have a clear canonical representation within the renderer: every color in the renderer is specified by a set of N values. Simple, easy, obvious.

In the Monte Carlo approach, colors don't have a single canonical way to be represented. However, that is also one of the pros of the Monte Carlo approach: you can have many different ways to represent color, and as long as they can all be sampled spectrally, they can all coexist effortlessly within the same renderer. Monte Carlo sampling is also more accurate than binning. For example, if you rendered a scene with a prism, the resulting image from a binning approach would only have as many colors as there are bins. But a Monte Carlo approach would, when converged, reveal all the colors of the rainbow in a smooth gradient. Finally, the Monte Carlo approach takes up less memory (only one value instead of 9-30), and might be faster depending on a variety of factors.

In the end, I decided to go with the Monte Carlo approach for Psychopath. The combination of being more accurate and taking less memory was very attractive. It also strikes me as more elegant: since I'm already solving the rest of the rendering equation with (Quasi-) Monte Carlo, why not just extend that into the color spectrum dimension as well?

However, the Monte Carlo approach has one big down side: color noise. Below is a spectral render of a solid medium-gray background with one sample per pixel.

One sample spectral render

As you can see, the resulting render is not a good representation of the underlying color in the scene. Instead of gray, we get rainbow. Even at 16 samples per pixel, the color noise is still quite visible:

Sixteen sample spectral render

With this approach it takes a lot of samples to make the color noise go away. And really, we aren't using our ray tracing very efficiently here: most visual phenomenon in the real world are not significantly spectral in nature, and yet we're taking the time to trace an entire light path for each single spectral sample.

There is a clever solution to this problem in the paper "Hero Wavelength Spectral Sampling" by Wilkie et al. The basic idea is simple: instead of just a single wavelength per light path, have several. One of the wavelengths is randomly chosen (the "hero" wavelength), and the others are evenly spaced apart from that to cover the visible spectrum.

The result of this method with four wavelengths per light path at one sample per pixel looks like this for the gray background:

One sample hero spectral render

And looks like this at sixteen samples per pixel:

Sixteen sample hero spectral render

The difference is enormous. Instead of unrecognizable color at one sample per pixel, we can clearly make out the intended color, albeit with noise. And at sixteen samples per pixel the color noise is all but gone.

The hero wavelength paper is very much worth a read, because it also covers how to properly handle cases with strong spectrally-dependant effects. But even just this simple idea makes a big difference, and makes Monte Carlo spectral sampling practical.

One last practical matter: given that artists aren't going to manually specify an entire spectrum for every color they use, the renderer needs a way to convert RGB input into a matching light spectrum automatically. There are a few approaches out there to do this, but the one I'm using in Psychopath is from the recent paper "Physically Meaningful Rendering Using Tristimulus Colours" by Meng et al. The awesome thing about the algorithm they develop in the paper is that it accurately covers (almost) the entire XYZ color space. This is significant because that means it can be used for converting colors from any color space: just convert to XYZ first. This is not the case for any of the other approaches I'm aware of.

Basic Shading System

Psychopath now has a basic shading system! Yay!

As you can see in this artistically-awful image, different objects can now have different shaders assigned to them:

Multiple Materials

This was actually fairly easy to get up and running. It just required some design work. I did indeed go with shading being done at intersection time, as described in my last post.

This basic shading system does not use OSL, and doesn't even support texturing of any kind. Each shader just produces a single fixed BSDF. But building this basic version of a shading system helped me work out the machinery behind how shaders get assigned to objects and how they are processed during rendering.

One of the cool features of this shading system is that shaders are assigned hierarchically. Any node in the scene graph (more-or-less) can have a shader assigned to it, and all nodes under such a node will inherit that shader—unless they themselves have been assigned a different shader. This makes it easy to assign a single shader to the entire scene, for example.

Designing a Shading System

One of Psychopath's current limitations is that every surface has to have the same shader. This is obviously not workable if it's going to be used as a "real" renderer, so I need to design a shading system that allows different shaders to be assigned to different objects.

Eventually I want to use Open Shading Language for shading in Psychopath, but as a first pass I'm going to do something a little simpler to get my feet wet. I've never written a shading system, after all.

But even so, since Psychopath traces many rays at once there are actually some interesting choices to make here which will carry over to how Psychopath utilizes OSL. There are two broad approaches I can take:

  1. Do the shading when a ray hits something. This would require storing the result of the shading in the intersection structure, which would be a BSDF closure.

  2. Do the shading after all of the ray intersection tests have been completed. This would require storing all information necessary to do the shading in the intersection structure, which would be things such as a reference to the shader and any required differential geometry data.

The biggest benefit of the first approach is that it keeps the architecture a bit simpler. Displacement shaders will have to be evaluated during ray traversal anyway, so the first approach would keep all of the shading code in the same general part of Psychopath.

The biggest benefit of the second approach is that the shading evaluations can be reordered to maximize e.g. coherent texture access.

I've been going back-and-forth trying to decide which approach to take, but I think I've finally settled on the first approach. The first approach gives me more flexibility in how much data I store, and since Psychopath is using breadth first ray tracing there are already strong guarantees about accessing shaders and their associated data in a coherent way.

The main problem with the second approach is that the amount of data that needs to be stored is unbounded. If a mesh has e.g. multiple UV sets, then Psychopath has to store intersection data about all of those UV sets to be able to do the shading later. And I'd also like Psychopath to be able to support arbitrary geometry data for use in shaders, which would exacerbate the problem. It won't be possible to predict ahead of time how much data needs to be stored.

With the first approach, on the other hand, the amount of data can be bounded. There is still the possibility (even likelihood) of multiple mixed closures. But because they are closures, I have more flexibility. I can take a monte carlo approach and, after shading, select a subset of them to be stored along with the appropriate weighting information. The downside to that, of course, would be increased noise. But the max number of closures to store could be a render setting, which would allow the user to decide what to do with that memory/noise tradeoff.

The biggest problem with the first approach, however, is that the shader will need to be evaluated at every ray hit during traversal, rather than just the final hit. And especially for non-trivial shaders in a complex scene, that could have a non-negligible performance impact.

But in the end, I think it's a reasonable tradeoff. I would rather keep memory usage predictably bounded, as one of Psychopath's goals is to be able to render very large scenes.