Psychopath Renderer

a slightly psychotic path tracer

Switching from C++ to Rust

I'll get right to it: I've rewritten Psychopath in Rust. And I feel like this requires an explanation.

The short version is that Psychopath is a personal project that I work on for fun, and I had fun rewriting it in Rust. The rest of this post is the long version, as seen through the lens of my experiences developing in Rust.

But first I want to be really, really clear that I do not advocate rewriting things in Rust. It's apparently a meme to say that things should be rewritten in Rust (or so I hear... I haven't actually witnessed it much). If you have a working code-base that's actually used for real things, then rewriting it in Rust (or any language) requires strong justification. In my case, Psychopath isn't used for real things, so it doesn't take much justification aside from "I felt like it".

With that out of the way, let's dive in!

The good

I'm going to first talk about the things I like most in Rust. But don't worry, we'll get to the bad stuff too!

Cargo

The thing that seems to get talked about most with Rust is its memory safety. But for me, that's actually pretty far down the list of things I like about Rust. What I've enjoyed most, by far, about Rust is the tooling around it.

In C++, if I want to build a project that has a bunch of dependencies it's... kind of a pain. Unless your OS ships with the needed dev dependencies, you're going to need to download and build them all, make sure whatever build system the code-base uses can find them, and probably wrestle with a bunch of other fiddly things. It's a pain. And it's also recursive, because the dependencies may have dependencies. And it's also a pain to help other people when it's your code-base that they're trying to build.

In Rust, you download the code and run:

cargo build --release  

That's it.

Cargo is Rust's official build tool, and it automatically downloads, caches, and builds the needed dependencies (both direct and indirect) and links them into your build. The only time I've seen it get messy is when there are indirect C/C++ dependencies (e.g. via wrappers for C/C++ libraries). But even then, it's no messier than a standard C or C++ code base, and only for those dependencies.

The extent to which that makes my life easier is difficult to over-state. It feels like I can breath again. Want to just try using a library in your code-base quick? Super fast and easy. Decide you don't like it? Get rid of it just as quickly.

Other tooling

I could go on and on about the tooling, but I'll summarize by pointing out two other things that Rust comes with:

  • Documentation generation
  • Unit testing

Having a standard way to generate documentation from your code-base is great, and it has a trickle-down effect on the ecosystem where more things seems to be documented than not. (Admittedly, the quality of the documentation varies.)

And adding unit tests is as simple as adding the test attribute to a function:

#[test]
fn my_test() {  
    assert!(some_condition);
}

Having these kinds of basic things included just makes life easier.

In short, Rust comes with all the basics of modern tooling for a programming language. And it's done well.

No headers

Another thing that I love about Rust is the lack of header files. I hate header files. They're awful. I don't care what anyone says.

Instead of header files, Rust has a really nice module system. Some people seem to get confused by it, but it made perfect sense to me from the beginning. But even if it's a little confusing at first, it's still far better than headers. I don't have to write function signatures twice, keep them in sync, etc.

Strict typing

C and C++ have strict-ish typing, which is nice. But Rust has super-duper strict typing, which is even nicer. Combined with some other aspects of Rust, it makes refactoring a much more confident exercise.

The super strict typing is also the foundation of Rust's type-inference. Within a function, Rust is able to infer most of your types. So there's almost a contradiction in that Rust's super strict typing actually lets you worry about and wrangle with types less. Both more correct and easier.

(As an aside: Rust does not do inter-function type inference: you have to specify the types in function signatures. Essentially, Rust believes in API boundaries being explicit. This keeps things easy to reason about for the programmer and prevents APIs from accidentally changing.)

Memory safety

Okay, you knew it was coming. I do, in fact, like the memory-safe-by-default approach that Rust takes. I spent days pulling my hair out tracking down segfaults and other weird behaviors in the C++ version of Psychopath, and it's just not fun.

In Rust, this rarely happens. And when it does happen it's only because I dropped into unsafe code somewhere, so I know where to start looking. Unsafe code aside, the worst that happens is a run-time panic with a stack trace showing exactly where things went wrong. And usually it doesn't even make it that far. Usually it's a compile error.

But I don't want to over-state this. Memory safety is nice, but I would use Rust even without it. Rust has a lot more going for it than just memory safety. It's a really well-rounded language... assuming you want to do low-level coding.

The bad

So, Rust isn't all sunshine and rainbows, unfortunately. And I would be remiss if I didn't talk about the things I find frustrating.

No unions

This is actually on its way, so this will likely be outdated quickly. But up to this point Rust doesn't have unions.

That's not quite as bad as it sounds, because Rust does have sum types (using the enum keyword) which are essentially tagged unions and really nice. But when working on e.g. the BVH in Psychopath it would have been really nice to use the lower-level version of unions to really squeeze those extra bytes out of the node structs.

No SIMD intrinsics

Stable Rust has no support for SIMD intrinsics. You can use them on the unstable 'nightly' compiler, but there's no current track for stabilizing them as far as I'm aware.

What this means for Psychopath is that I have a flag you can pass to cargo that tells it to build alternate SIMD versions of some code, but you can only pass that flag (and have it successfully build) if you're using nightly.

EDIT: on the reddit thread stusmall points out that SIMD is planned to be stablized at some point and is considered "on the horizon". What I meant to say is that there isn't a clear timeline for it. It is considered important and will be stablized eventually, but it's anyone's best guess when that might be.

Slow build times

This isn't so much of a hit against Rust compared to C++ (which also has slow build times), but hearing about e.g. Go and D's build times makes me really jealous. And the language that Jonathan Blow is developing seems to just blow things out of the water.

As a new language without all the historical baggage, I would have hoped for Rust to be much faster at building. The Rust team is definitely working on improving the situation, but it is so far off from where I feel like it ought to be that the incremental improvements they're making seem like drops in a bucket.

Mind you, it's entirely possible that I'm being overly pessimistic here. So take it with a grain of salt. But that's how I feel about it at the moment.

Fortunately, Psychopath isn't that large of a code-base, and I've also split it into multiple sub-crates which definitely helps. Building Psychopath itself (assuming external dependencies are already built) takes around 17 seconds on my machine. So it's not terrible, and is about what I would expect from C++ as well.

Immature ecosystem

There aren't a lot of stable libraries out there for Rust. Fortunately, this is being actively worked on. But even so, rendering and vfx is a pretty niche space. And there are just so many things written in C++ for rendering and VFX that are unlikely to get robust Rust equivalents any time soon. It makes writing a renderer in Rust a little daunting.

Of course this isn't Rust's fault by any means. But it is a reality.

The saving grace (perhaps) is that Rust's C FFI story is excellent. Me and another fellow have been working on a Rust wrapper for OpenEXR and it has been a largely pleasant experience. There are some fiddly bits to get right since OpenEXR is C++ rather than C. And it does take time. But it takes a lot less time than writing a full-featured OpenEXR library from scratch.

So I suspect that writing wrappers is what I'll be doing when I need non-trivial functionality from C++ libraries. But that's certainly not as straightforward as just using a library directly. And it negates a little the very first positive point I mentioned about easy builds.

On the bright side, if I'm successful at this, then many of those libraries will already be wrapped for other intrepid developers. Eventually.

Fighting the borrow checker

The borrow checker is the part of Rust's compiler that makes sure ownership rules are properly followed. It is the basis of Rust's memory safety guarantees, which is awesome. But it also takes some getting used to.

By and large, when the borrow checker has refused to compile something I wrote, it's because it was right and I was wrong. So my complaint isn't actually that the borrow checker exists, but rather it's that the error messages are sometimes... confusing. And it's also not always clear how to change things to satisfy it.

Mind you, the error messages are nowhere near as bad as template meta-programming errors in C++. And it's certainly better than tracking down mysterious memory safety bugs. And the borrow checker errors have made great strides in getting better. And actually, most of the time they're pretty clear these days...

...but even so, it's a bit of a hurdle that you have to get over. And every once in a while the borrow checker throws you a curve ball that's just really confusing, and that can be really frustrating.

Having said all that, I left the borrow checker until last for the same reason I put memory safety last in the list of good things: I think its importance is overblown. A lot of people seem to talk a lot about it, but I think it's a minor annoyance.

With most new languages there are some new concepts or ways of thinking about things that you have to wrap your head around, and Rust is no different. But if you code in C++, I can't imagine it being a significant hurdle. It might piss you off for the first week, and then you'll get over it.

Overall

I'm certainly not ready to recommend Rust to anyone doing serious rendering or vfx work yet, mostly because of the lack of unions and SIMD in the stable compiler. And I think it absolutely makes sense for people to be cautious and see how things develop over the next few years.

However, I absolutely do recommend giving Rust a spin for a non-trivial fun project. The immature ecosystem might bite you depending on what you're trying to do, but I think the experience of trying to create something in Rust is kind of... eye opening. You'll likely be a bit frustrated by the borrow checker at first, but once you get over that I think you're equally likely to find programming in Rust really fun. And Rust as a language is clearly extremely capable over-all.

Subdivision Surfaces

I recently implemented Catmull-Clark subdivision surfaces in Psychopath.

One of my goals with Psychopath is for all curved surfaces to actually render as curved surfaces rather than as tessellated triangle meshes. To accomplish this for subdivision surfaces I am leveraging OpenSubdiv to pre-process the cage mesh into b-spline patches. Psychopath then converts the b-spline patches into bezier patches which it already knows how to render.

The end result is this:

Pontiac GTO '67

(Model by 'thecali', available on BlendSwap—please note that the version in the render above has some parts missing, as some bits were modeled with curves which Psychopath can't render yet.)

In other words, I can finally render interesting models! Yay! Most things are modeled with Catmull-Clark subdivision surfaces these days, so this a big step.

The only down-side so far is that OpenSubdiv seems to be a bit on the slow side when converting the mesh to b-spline patches—it takes around 45 seconds for this model. It's possible I'm not using the library optimally. It's certainly something I'll need to investigate. But I'm quite pleased for now!

Spectral Rendering

Up to this point Psychopath has represented color with RGB values. This is a convenient way to handle color in a renderer because most color input to the renderer is going to be in some kind of RGB format (textures, hand-specified colors, etc.) and most color output the renderer produces is going to be in some kind of RGB format as well (final rendered images). So it makes sense that if your input is RGB, and your output is RGB, probably everything in-between should be RGB as well.

But RGB isn't how real light works. Real light is composed of a spectrum of wavelengths, much the same way that sound is. RGB is a bit like sampling audio at only three frequencies.

Even so, rendering with RGB usually works fine because human color vision is also more-or-less RGB. But there are side effects of spectrums that, although their causes are not visible to the naked eye, their results are. Fluorescent lighting is a good example: the effect it has on colors can't be accurately reproduced in RGB without also tweaking the colors of the materials. Prisms and rainbows are another example.

Admittedly, these examples are largely corner cases that can be faked, and I'm designing Psychopath with animation and VFX in mind—both of which are cases where artistic control is more important than strict accuracy. But nevertheless, being able to render spectral effects is cool! And it might come in handy when trying to match scene lighting for a VFX shot in a fluorescent-lit office building.

So I decided to make Psychopath a spectral renderer.

There are two broad approaches to spectral rendering: binning, and Monte Carlo spectral sampling.

Binning is basically RGB on crack. You can think of RGB color as three spectral samples: one in the reddish part of the spectrum, one in the greenish part, and one in the blueish part. Binning formalizes and generalizes this concept. Instead of three vaguely defined samples, you have N very specific ranges or "bins" over the visible spectrum. Anywhere from 9 to 30 bins is common with this approach.

Monte Carlo spectral sampling takes a completely different approach. Instead of dividing up the spectrum into predefined regions, each light path gets a randomly assigned wavelength. Then as more and more paths are traced, more and more points on the spectrum get sampled, and the image converges to the correct colors in much the same way as everything else in a Monte Carlo path tracer.

Both approaches have their pros and cons.

The biggest pro of the binning approach is that colors still have a clear canonical representation within the renderer: every color in the renderer is specified by a set of N values. Simple, easy, obvious.

In the Monte Carlo approach, colors don't have a single canonical way to be represented. However, that is also one of the pros of the Monte Carlo approach: you can have many different ways to represent color, and as long as they can all be sampled spectrally, they can all coexist effortlessly within the same renderer. Monte Carlo sampling is also more accurate than binning. For example, if you rendered a scene with a prism, the resulting image from a binning approach would only have as many colors as there are bins. But a Monte Carlo approach would, when converged, reveal all the colors of the rainbow in a smooth gradient. Finally, the Monte Carlo approach takes up less memory (only one value instead of 9-30), and might be faster depending on a variety of factors.

In the end, I decided to go with the Monte Carlo approach for Psychopath. The combination of being more accurate and taking less memory was very attractive. It also strikes me as more elegant: since I'm already solving the rest of the rendering equation with (Quasi-) Monte Carlo, why not just extend that into the color spectrum dimension as well?

However, the Monte Carlo approach has one big down side: color noise. Below is a spectral render of a solid medium-gray background with one sample per pixel.

One sample spectral render

As you can see, the resulting render is not a good representation of the underlying color in the scene. Instead of gray, we get rainbow. Even at 16 samples per pixel, the color noise is still quite visible:

Sixteen sample spectral render

With this approach it takes a lot of samples to make the color noise go away. And really, we aren't using our ray tracing very efficiently here: most visual phenomenon in the real world are not significantly spectral in nature, and yet we're taking the time to trace an entire light path for each single spectral sample.

There is a clever solution to this problem in the paper "Hero Wavelength Spectral Sampling" by Wilkie et al. The basic idea is simple: instead of just a single wavelength per light path, have several. One of the wavelengths is randomly chosen (the "hero" wavelength), and the others are evenly spaced apart from that to cover the visible spectrum.

The result of this method with four wavelengths per light path at one sample per pixel looks like this for the gray background:

One sample hero spectral render

And looks like this at sixteen samples per pixel:

Sixteen sample hero spectral render

The difference is enormous. Instead of unrecognizable color at one sample per pixel, we can clearly make out the intended color, albeit with noise. And at sixteen samples per pixel the color noise is all but gone.

The hero wavelength paper is very much worth a read, because it also covers how to properly handle cases with strong spectrally-dependant effects. But even just this simple idea makes a big difference, and makes Monte Carlo spectral sampling practical.

One last practical matter: given that artists aren't going to manually specify an entire spectrum for every color they use, the renderer needs a way to convert RGB input into a matching light spectrum automatically. There are a few approaches out there to do this, but the one I'm using in Psychopath is from the recent paper "Physically Meaningful Rendering Using Tristimulus Colours" by Meng et al. The awesome thing about the algorithm they develop in the paper is that it accurately covers (almost) the entire XYZ color space. This is significant because that means it can be used for converting colors from any color space: just convert to XYZ first. This is not the case for any of the other approaches I'm aware of.

Basic Shading System

Psychopath now has a basic shading system! Yay!

As you can see in this artistically-awful image, different objects can now have different shaders assigned to them:

Multiple Materials

This was actually fairly easy to get up and running. It just required some design work. I did indeed go with shading being done at intersection time, as described in my last post.

This basic shading system does not use OSL, and doesn't even support texturing of any kind. Each shader just produces a single fixed BSDF. But building this basic version of a shading system helped me work out the machinery behind how shaders get assigned to objects and how they are processed during rendering.

One of the cool features of this shading system is that shaders are assigned hierarchically. Any node in the scene graph (more-or-less) can have a shader assigned to it, and all nodes under such a node will inherit that shader—unless they themselves have been assigned a different shader. This makes it easy to assign a single shader to the entire scene, for example.

Designing a Shading System

One of Psychopath's current limitations is that every surface has to have the same shader. This is obviously not workable if it's going to be used as a "real" renderer, so I need to design a shading system that allows different shaders to be assigned to different objects.

Eventually I want to use Open Shading Language for shading in Psychopath, but as a first pass I'm going to do something a little simpler to get my feet wet. I've never written a shading system, after all.

But even so, since Psychopath traces many rays at once there are actually some interesting choices to make here which will carry over to how Psychopath utilizes OSL. There are two broad approaches I can take:

  1. Do the shading when a ray hits something. This would require storing the result of the shading in the intersection structure, which would be a BSDF closure.

  2. Do the shading after all of the ray intersection tests have been completed. This would require storing all information necessary to do the shading in the intersection structure, which would be things such as a reference to the shader and any required differential geometry data.

The biggest benefit of the first approach is that it keeps the architecture a bit simpler. Displacement shaders will have to be evaluated during ray traversal anyway, so the first approach would keep all of the shading code in the same general part of Psychopath.

The biggest benefit of the second approach is that the shading evaluations can be reordered to maximize e.g. coherent texture access.

I've been going back-and-forth trying to decide which approach to take, but I think I've finally settled on the first approach. The first approach gives me more flexibility in how much data I store, and since Psychopath is using breadth first ray tracing there are already strong guarantees about accessing shaders and their associated data in a coherent way.

The main problem with the second approach is that the amount of data that needs to be stored is unbounded. If a mesh has e.g. multiple UV sets, then Psychopath has to store intersection data about all of those UV sets to be able to do the shading later. And I'd also like Psychopath to be able to support arbitrary geometry data for use in shaders, which would exacerbate the problem. It won't be possible to predict ahead of time how much data needs to be stored.

With the first approach, on the other hand, the amount of data can be bounded. There is still the possibility (even likelihood) of multiple mixed closures. But because they are closures, I have more flexibility. I can take a monte carlo approach and, after shading, select a subset of them to be stored along with the appropriate weighting information. The downside to that, of course, would be increased noise. But the max number of closures to store could be a render setting, which would allow the user to decide what to do with that memory/noise tradeoff.

The biggest problem with the first approach, however, is that the shader will need to be evaluated at every ray hit during traversal, rather than just the final hit. And especially for non-trivial shaders in a complex scene, that could have a non-negligible performance impact.

But in the end, I think it's a reasonable tradeoff. I would rather keep memory usage predictably bounded, as one of Psychopath's goals is to be able to render very large scenes.

Breadth First Ray Tracing

Phew! I'm almost finished with my "catching up to the present" retrospective architectural posts. I may sprinkle a few more retrospective posts here and there, but this is the last for this series. This post will expand upon my first and second ray reordering posts and talk about breadth first ray tracing, which is what eventually replaced ray reordering in Psychopath.

First, let me say that breadth first ray tracing is really cool. Like, really cool. Much like ray reordering, it accesses scene data coherently for a batch of rays. But unlike ray reordering it has strong guarantees about those access patterns. Specifically, each instance of a scene element is guaranteed to be accessed at most once for an entire batch of rays. And, perhaps surprisingly, breadth first ray tracing is also much simpler to implement than ray reordering.

So how does it work?

The core idea behind breadth first ray tracing is to trace an entire batch of rays together. This is subtly different from ray reordering where, although you also have a batch of rays, you are tracing each ray separately—you just use pausing to switch between them at strategic times. But in breadth first ray tracing you really trace all of the rays together, even down to traversing the acceleration structure of the scene.

The most fundamental operation of breadth first ray tracing is something called "ray filtering". Imagine that you have a batch of rays and an object's bounding box. You can test the whole batch of rays against the bounding box together (either with SIMD if it's wide enough, or sequentially), and then create a new sub-batch out of the rays that hit it. Once you have that sub-batch, you can then proceed to test only that sub-batch against the object's geometry.

That process of creating a new batch from the rays that hit something is called "ray filtering". And in breadth first ray tracing you do ray filtering at every step of traversing the scene's acceleration structure.

There are multiple ways to implement ray filtering. Some approaches use partitioning to rearrange the rays into two adjacent groups in memory, storing pointers to the sub-lists. Other approaches actually create new lists of pointers to the individual rays. But the basic principle is the same regardless of how you do it (albeit with different memory behavior).

In Psychopath I'm taking the partitioning approach, because it has lower memory requirements and accesses the rays in memory more coherently. Scanning through a linear array of rays while doing the ray tests is a lot faster than chasing pointers and jumping all over the place in memory, especially with a large number of rays.

You might object that there is still the matter of partitioning, which scans the rays again, potentially incoherently. But, in fact, partitioning algorithms also generally access arrays of data quite coherently. And, moreover, although most papers I've read split the ray testing and partitioning into separate steps, you can easily roll them into a single step.

A good partitioning algorithm only executes its predicate once for each item in the array being partitioned. C++'s std::partition, for example, makes this guarantee for random-access containers. And this means that the ray test itself can be used as the predicate.

Pseudo-code for the ray testing and partitioning rolled into one might, then, look something like this:

bbox = get_bounding_box()
rays = get_ray_batch()
filtered_rays = partition(rays, bbox.test_ray)

Where partition is a function that partitions a list (in this case rays) based on a unary predicate (in this case bbox.test_ray) and returns a reference to the part of the list that tested true.

This is almost frighteningly simple.

To incorporate this into a BVH traversal algorithm you need a stack for the ray batch references, which is fairly straight-forward. Pseudo code for the entire BVH traversal algorithm might look something like this:

node_stack.push(get_root_bvh_node())
rays_stack.push(get_ray_batch())

while node_stack is not empty:
    node = node_stack.pop()
    rays = rays_stack.pop()
    filtered_rays = partition(rays, node.bbox.test_ray())
    if filtered_rays is not empty:
        if node is leaf:
            node.data.test_rays(rays)
        else:
            node_stack.push(node.child1)
            node_stack.push(node.child2)
            rays_stack.push(filtered_rays)
            rays_stack.push(filtered_rays)

With node_stack and rays_stack being the stacks for BVH node references and ray batch references, respectively.

This is really simple code, bordering on elegant. Especially when compared to ray reordering where you have to store ray state, pause traversal, un-pause traversal, store which rays are about to hit which nodes, etc.

Looking at this code, it also starts to become clear why breadth first ray tracing can do instancing: we can use stacks during traversal. The node stack in particular lets us keep track of where we've come from and what we need to do next. And because that stack is shared by the entire ray batch, it takes up only a tiny amount of space even for fairly deep stacks. With ray reordering we would need a separate stack for each ray, which would be huge in aggregate.

And it's not just the node stack that is helpful. With breadth first ray tracing we can make any number of shared stacks for any kind of data we might want as long as that data is the same for each ray. So, for example, we can have a stack of transform matrices representing the current transform space of the rays, which is also necessary for instancing.

So that's breadth first ray tracing in a nut shell.

There are, of course, drawbacks to breadth first ray tracing as well. The three biggest drawbacks I've found so far are:

  1. The overhead of creating/partitioning/whatever the rays lists slows things down.
  2. Because the rays traverse the acceleration structure together, they have to traverse it in the same order. That, in turn, means we end up doing a lot of unnecessary ray tests that could be avoided with individual traversal orders. Specifically, we miss out on optimizations like always testing against the closest node first—the closest node for one ray often isn't the closest node for another.
  3. Implementing typical SIMD optimizations like QBVH's etc. becomes a lot trickier and doesn't provide as much benefit because of drawback #1 above.

Point #1 isn't an enormous bottleneck on its own, especially when ray testing is rolled directly into the partitioning step. But the severity depends on how large the ray data structures are: the larger they are, the more costly partitioning becomes. Unfortunately, rays are fairly fat data structures in Psychopath. That's definitely something I want to improve at some point if I can.

Point #2 can be addressed somewhat by first splitting ray batches into separate directionally-ordered batches. But then you also lose some of the traversal coherence, so it's a trade-off.

Point #3 depends on what you're doing. Most breadth first ray tracing papers I've seen exploit SIMD instructions by testing multiple rays against single bounding boxes, which actually appears to work very well based on those papers. But unfortunately such techniques are not (as far as I know) compatible with efficient motion blur implementations. The fundamental problem is that with motion blur you're essentially testing each ray against a "different" bounding box anyway since it's moving over time.

On the flip side, QBVH's work very well with motion blur, but they're difficult to put into a breadth-first context. It's not impossible, though. I've implemented a breadth-first QBVH traverser successfully by using a small bit stack embedded within each ray. But, on average, it doesn't reduce the partitioning cost and makes the ray data structure larger. Thankfully, it still speeds things up, just not as much as QBVH's with individual ray traversal.

On the whole, though, I'm definitely happy with breadth first ray tracing. It's relatively simple to implement, it gives extremely strong guarantees about scene data access patterns, and it allows for full hierarchical instancing. Definitely a good fit for what I'm trying to do with Psychopath. So unless something better comes along, I expect this will remain a core part of Psychopath's architecture for the foreseeable future.