Blog Archive About

Psychopath Renderer

a slightly psychotic path tracer

2018 - 11 - 13

A Different Approach

I've been away from Psychopath for a while working on other projects, but recently I stumbled upon a blog post by Yining Karl Li, entitled "Mipmapping with Bidirectional Techniques". In it, he describes his solution to a problem I've been pondering for a while: how to handle texture filtering in the context of bidirectional light transport.

The problem essentially comes down to this: texture filtering should be done with respect to the projected area on screen, but when tracing rays starting from a light source you don't know what that projected area is going to be yet. In Psychopath's case it's even worse because it applies not just to texture filtering but also to dicing rates for geometry. In Psychopath you can't even calculate ray intersections without a filter width. So this problem has been bugging me for a while.

Yining explores an approach to this problem that I've also considered, which is to have the filter width simply be independent of the rays being traced. In other words: screw ray differentials, just use camera-based heuristics. The benefit of this approach is that every point on every surface has a single well-defined filter width, regardless of where rays are coming from or how they're being generated. The down side is that there are circumstances (such as magnification through a lens) where those filters become too blurry.

These are all things I've thought about before, and I've gone back and forth many times about how I want to approach this. However, Yining's post also linked to a paper from Weta Digital about their renderer Manuka. And that sent me down a rabbit hole that has me reconsidering how I want Psychopath's entire architecture to work.

Tying Geometry and Shading Together

There are a lot of cool things about Manuka, but the one that stuck out to me—and the one that has me reconsidering a lot of things—is how they handle shading.

Generally speaking, ray tracing renderers do shading when a ray hits something. But Manuka takes a radically different approach. Manuka does all of its shading before rendering even starts, by dicing all geometry into sub-pixel polygons and calculating shading at the vertices.

If you're at all familiar with the Reyes rendering architecture, that should sound really familiar. The difference is that instead of baking colors into the geometry like in Reyes, they bake surface closures. This means that light transport is still calculated with path tracing, but all texture lookups etc. are done up-front and baked into the geometry.

Honestly, I think this is genius. There's an elegance to essentially saying, "geometry and shading are the same thing". It's an elegance that Reyes had, but that I failed to reproduce in Psychopath—even in the early days when I was essentially trying to do path traced Reyes.

One of the fantastic outcomes of this approach is that the scene has a clear static definition regardless of what rays are being traced. This handily solves the bidirectional texture filtering problem, as well as Psychopath's related bidirectional dicing problem. It also, of course, has the afore-mentioned problem with magnification. But I think that's a pretty reasonable trade-off as long as you provide ways for people to work around it when needed.

Having said all of this, there are other trade-offs that Manuka makes that I'm not so inclined to reproduce in Psychopath. Specifically, Manuka literally does all of their dicing and shading up-front and keeps it all in memory to be traced against. I think their reasons for doing that are very sound, but that just doesn't seem interesting to me. So I would still like to explore a more dynamic approach, closer to what I've already been doing.

A New Architecture

So with all of that background out of the way, this is roughly the architecture I am now envisioning for Psychopath:

  • Scene data is effectively statically defined: dicing rates, texture filters, etc. are all determined independent of the rays being traced, and are therefore consistent between all rays.

  • Even though the scene data is conceptually static, it can still be computed dynamically, as long as the result is deterministic and independent of the rays being traced.

  • Geometry and shading are the same thing: shading is defined at the same points that geometry is defined, and should generally be sub-pixel in size.

The specifics of how these decisions are going to play out is still a bit unknown to me. But it's what I'll be working towards and experimenting with for Psychopath's next steps (albeit slowly, as my time is somewhat limited these days).

This is also exciting to me because in some sense this is getting back to Psychopath's original roots. I started the project because I wanted to see if I could merge Reyes rendering and path tracing in a useful way. I've made a lot of detours since then (many of them interesting and worthwhile), but fundamentally I think that's still the idea that intrigues me most. And I think this is a good step in that direction.

Lastly: a shout out to Johannes Hanika, who co-authored the Manuka paper, and somehow seems to be involved in most of the papers I've found genuinely inspiring over the last several years. If you're reading this, Johannes, I would love to buy you a beer!

2017 - 08 - 03

BVH4 Without SIMD

I recently switched the Rust version of Psychopath over from a binary BVH (BVH2) to a 4-branching BVH (BVH4). This is a fairly standard thing to do in a ray tracer to take advantage of SIMD operations on modern CPUs. With a BVH4 you can test against four bounding boxes at once with SIMD, which in theory can get you some pretty significant speedups.

However, my BVH4 implementation in Psychopath doesn't use SIMD, and is nevertheless faster than (or at least comparable to) my BVH2 implementation. This surprised me because I thought (?) the conventional wisdom was that a BVH4 is slower except when used to exploit SIMD.

So I'm writing this post to explain how I arrived here and why BVH4 might always be better.

A Back-of-the-Napkin Analysis

One of the things that makes BVHs (regardless of branch factor) a bit different than some other tree data structures is that in a BVH you have to test all of the children to figure out which ones to traverse into. You don't have to test them all at once—you can test against one child and immediately traverse into it if the ray hits it. But eventually you have to come back and test against the other sibling(s).

The reason for this is that BVH child nodes can overlap in space. Just because you found an intersection in one doesn't rule out a closer intersection in its siblings, so you still have to traverse into any that were hit to check for a closer intersection. This is true even if you test the nodes in front-to-back order.

(Note: in my discussion with François Beaune (of Appleseed fame), he noted that in theory you could store information about whether the children are overlapping or not. However, I don't think that changes this analysis, because you still on average have to test against multiple children, and such a trick would apply to all branch factors anyway. But I've been wrong before!)

So let's say we have a scene with sixteen triangles (or whatever geometry—I'll assume triangles). If we color in the nodes we test against in a minimal traversal that reaches a leaf, it looks like this for a BVH2 and a BVH4:

BVH minimal leaf traversal

Interestingly, it turns out that it's nine nodes in both cases. And this equality holds true for any scene with any number of triangles.

For a BVH2 with N triangles, the number of nodes tested in this kind of minimal traversal can be computed with log2(N) * 2 + 1. And for a BVH4 with log4(N) * 4 + 1.

You can generalize this to any branch factor with logB(N) * B + 1, where B is the branch factor. If you take that equation and graph the number of node tests for various branch factors, you get a graph that looks like this:

BVH cost graph

(X-axis is the branch factor B, and Y-axis is the number of node tests. Note that although this particular graph assumes a fixed N, the patterns hold for all values of N.)

There are a couple of things to note about this graph:

  1. It shows that, yes indeed, branch factors of 2 and 4 do the same number of node tests. Or put another way: the equations I gave above for a BVH2 and BVH4 are equivalent.
  2. When the branch factor exceeds 4, it starts doing worse and worse (more node tests). This is consistent with general wisdom that larger branch factors are progressively worse for efficiency. It just appears to not kick in until after a branch factor of 4.

(Also: 3. It seems to be the case that a branch factor of 3 is actually the most efficient. I haven't explored this at all, but it would be really cool if someone else would—I'd be very curious to see if it actually performs better. The tricky bit, I think, will be constructing a high quality tree with that branch factor.)

Implementation in Psychopath

There is still more analysis to cover, but this is the point in the analysis where my curiousity spurred me to try implementing it in Psychopath. So I'm going to take a brief detour to show what happened.

The implementation is pretty straight-forward. To build the BVH4, I first construct a BVH2 and then collapsing it down to a BVH4. I believe this is the standard way of constructing BVH4's. To traverse the BVH4 I use the technique described in Efficient Ray Tracing Kernels for Modern CPU Architectures by Fuetterling et al. to determine which order to traverse the nodes in (see section 2.2 in the paper). Importantly, that technique results in the exact same traversal order as my BVH2 implementation.

The primary implications of constructing and traversing the BVH4 this way are:

  1. The topology and bounding boxes of the BVH2 and BVH4 are equivalent: the BVH4 is just missing half of the inner nodes, skipping every-other level of the BVH2.
  2. As mentioned above, the traversal order of the BVH2 and BVH4 are identical.
  3. The leaf data (e.g. triangles) tested against due to the BVH2 and BVH4 are also identical.

In other words, the only difference between the BVH2 and the BVH4 is the branch factor itself.

Once I got everything working I did a bunch of test renders on various test scenes I have. Here are some of those renders (all rendered at 16 samples per pixel):

630 cars

(Model by 'thecali', available on BlendSwap.)


(Model from SKYWATCH, a super cool short film.)


cornell with character

I rendered all of the test scenes with both the BVH2 and BVH4 to compare performance. Here are the render times (excluding BVH build time), calculated as best-out-of-five:

Cars Drone City Cornell(ish)
BVH2 31.074s 8.392s 22.824s 4.405s
BVH4 30.967s 8.357s 22.882s 4.357s

The only scene where the BVH2 beats the BVH4 is the city scene, and that's a bit suspect because it was an anomalously fast render among its five runs. But the main point is this: it appears that the non-SIMD BVH4 does indeed perform at least comparably to the BVH2.

A More Complete Analysis

The results from the render tests are largely consistent with the initial analysis. But there is nevertheless something important missing from that analysis: it assumes that for every parent, precisely one child will need to be traversed into. No more, and no less.

What if none of the children are hit? In that case the number of node tests looks like this:

no children hit

At first blush, this suggests that the BVH4 should be worse. Except... the render tests didn't bare that out. And that's because there is another case: all child nodes being hit. And that looks like this:

all children hit

So the reality is that there are three cases for the BVH2:

children hit table

Basically, the BVH2 has both a better best case and a worse worst case. While the BVH4 always tests against all four children, the BVH2 can test against as few as two and as many as six children and grandchildren. Nevertheless, the average case is still the same for both the BVH2 and BVH4: four.

However! That's a very loose definition of "average". We're assuming that all cases are equally likely, but in reality that depends on the particular BVH and the particular rays being tested. This, of course, is the kind of thing that makes you curious: how many ray-node tests are actually being done in the test scenes?

So I instrumented the code to count all ray-node tests, and this is the result:

Cars Drone City Cornell(ish)
BVH2 10619320877 2231421620 2788802110 386592706
BVH4 10461971888 2153069143 2652842455 334467875
% of BVH2 98.5% 96.5% 95.1% 86.5%

I don't claim that these are exhaustive test scenes by any means. For one thing, they're all shaded with gray lambert, and maybe the distribution of rays with other materials would change things. But nevertheless, this is at least interesting. In all of the scenes I tested, the BVH4 did fewer tests than the BVH2. Sometimes by a tiny margin, but it was always less.

Wrapping Up

So that's it. BVH4 appears to be at least comparable to BVH2 (if not a hair faster) even without SIMD. So for a more robust ray tracer implementation, BVH4 may be worth exploring. And not only for the potential (tiny) perf gains, but also because higher branch factors require fewer nodes and thus take up less memory.

Of course, there are some potentially confounding factors here:

  • Psychopath does breadth-first ray tracing with thousands of rays at once, which means that the cost of any per-node computations other than ray tests get amortized over many rays. Specifically I'm referring to the traversal code lookup from Efficient Ray Tracing Kernels for Modern CPU Architectures, which might have more relative overhead in a more typical ray tracer. Moreover, if you don't use the traversal code approach, then you'll have to sort the four children based on ray hit distance, which is (super-linearly) more expensive than sorting two children.
  • My test scenes are far from definitive! Testing with more production-like scenes, with complex shaders, depth complexity, etc. is still needed to make a strong case.
  • I haven't done any analysis of the "overlap" flags idea that François suggested, much less any testing of it with actual scenes. I'm skeptical if this would have any branch-factor-specific impact, but I've been known to be very wrong about such things in the past! And at the very least, it's an interesting idea to explore.

I guess I'm not trying to make any especially strong claims with this (at least not in writing—buy me a beer some time). And since if you're writing a BVH4 you might as well use SIMD anyway, that makes most of this pretty moot. But for me this was nevertheless really fascinating to discover.

I hope this write-up made for an interesting read. And if anyone takes a crack at replicating these results, especially with more production-like scenes, I would be very interested to hear about it!

2017 - 06 - 13

Switching from C++ to Rust

I'll get right to it: I've rewritten Psychopath in Rust. And I feel like this requires an explanation.

The short version is that Psychopath is a personal project that I work on for fun, and I had fun rewriting it in Rust. The rest of this post is the long version, as seen through the lens of my experiences developing in Rust.

But first I want to be really, really clear that I do not advocate rewriting things in Rust. It's apparently a meme to say that things should be rewritten in Rust (or so I hear... I haven't actually witnessed it much). If you have a working code-base that's actually used for real things, then rewriting it in Rust (or any language) requires strong justification. In my case, Psychopath isn't used for real things, so it doesn't take much justification aside from "I felt like it".

With that out of the way, let's dive in!

The good

I'm going to first talk about the things I like most in Rust. But don't worry, we'll get to the bad stuff too!


The thing that seems to get talked about most with Rust is its memory safety. But for me, that's actually pretty far down the list of things I like about Rust. What I've enjoyed most, by far, about Rust is the tooling around it.

In C++, if I want to build a project that has a bunch of dependencies it's... kind of a pain. Unless your OS ships with the needed dev dependencies, you're going to need to download and build them all, make sure whatever build system the code-base uses can find them, and probably wrestle with a bunch of other fiddly things. It's a pain. And it's also recursive, because the dependencies may have dependencies. And it's also a pain to help other people when it's your code-base that they're trying to build.

In Rust, you download the code and run:

cargo build --release

That's it.

Cargo is Rust's official build tool, and it automatically downloads, caches, and builds the needed dependencies (both direct and indirect) and links them into your build. The only time I've seen it get messy is when there are indirect C/C++ dependencies (e.g. via wrappers for C/C++ libraries). But even then, it's no messier than a standard C or C++ code base, and only for those dependencies.

The extent to which that makes my life easier is difficult to over-state. It feels like I can breath again. Want to just try using a library in your code-base quick? Super fast and easy. Decide you don't like it? Get rid of it just as quickly.

Other tooling

I could go on and on about the tooling, but I'll summarize by pointing out two other things that Rust comes with:

  • Documentation generation
  • Unit testing

Having a standard way to generate documentation from your code-base is great, and it has a trickle-down effect on the ecosystem where more things seems to be documented than not. (Admittedly, the quality of the documentation varies.)

And adding unit tests is as simple as adding the test attribute to a function:

fn my_test() {

Having these kinds of basic things included just makes life easier.

In short, Rust comes with all the basics of modern tooling for a programming language. And it's done well.

No headers

Another thing that I love about Rust is the lack of header files. I hate header files. They're awful. I don't care what anyone says.

Instead of header files, Rust has a really nice module system. Some people seem to get confused by it, but it made perfect sense to me from the beginning. But even if it's a little confusing at first, it's still far better than headers. I don't have to write function signatures twice, keep them in sync, etc.

Strict typing

C and C++ have strict-ish typing, which is nice. But Rust has super-duper strict typing, which is even nicer. Combined with some other aspects of Rust, it makes refactoring a much more confident exercise.

The super strict typing is also the foundation of Rust's type-inference. Within a function, Rust is able to infer most of your types. So there's almost a contradiction in that Rust's super strict typing actually lets you worry about and wrangle with types less. Both more correct and easier.

(As an aside: Rust does not do inter-function type inference: you have to specify the types in function signatures. Essentially, Rust believes in API boundaries being explicit. This keeps things easy to reason about for the programmer and prevents APIs from accidentally changing.)

Memory safety

Okay, you knew it was coming. I do, in fact, like the memory-safe-by-default approach that Rust takes. I spent days pulling my hair out tracking down segfaults and other weird behaviors in the C++ version of Psychopath, and it's just not fun.

In Rust, this rarely happens. And when it does happen it's only because I dropped into unsafe code somewhere, so I know where to start looking. Unsafe code aside, the worst that happens is a run-time panic with a stack trace showing exactly where things went wrong. And usually it doesn't even make it that far. Usually it's a compile error.

But I don't want to over-state this. Memory safety is nice, but I would use Rust even without it. Rust has a lot more going for it than just memory safety. It's a really well-rounded language... assuming you want to do low-level coding.

The bad

So, Rust isn't all sunshine and rainbows, unfortunately. And I would be remiss if I didn't talk about the things I find frustrating.

No unions

(UPDATE: Rust now has unions, as of release 1.19. Yay!)

This is actually on its way, so this will likely be outdated quickly. But up to this point Rust doesn't have unions.

That's not quite as bad as it sounds, because Rust does have sum types (using the enum keyword) which are essentially tagged unions and really nice. But when working on e.g. the BVH in Psychopath it would have been really nice to use the lower-level version of unions to really squeeze those extra bytes out of the node structs.

No SIMD intrinsics

(UPDATE: Rust now has stable SIMD intrinsics, as of release 1.27. Yay!)

Stable Rust has no support for SIMD intrinsics. You can use them on the unstable 'nightly' compiler, but there's no current track for stabilizing them as far as I'm aware.

What this means for Psychopath is that I have a flag you can pass to cargo that tells it to build alternate SIMD versions of some code, but you can only pass that flag (and have it successfully build) if you're using nightly.

EDIT: on the reddit thread stusmall points out that SIMD is planned to be stablized at some point and is considered "on the horizon". What I meant to say is that there isn't a clear timeline for it. It is considered important and will be stablized eventually, but it's anyone's best guess when that might be.

Slow build times

This isn't so much of a hit against Rust compared to C++ (which also has slow build times), but hearing about e.g. Go and D's build times makes me really jealous. And the language that Jonathan Blow is developing seems to just blow things out of the water.

As a new language without all the historical baggage, I would have hoped for Rust to be much faster at building. The Rust team is definitely working on improving the situation, but it is so far off from where I feel like it ought to be that the incremental improvements they're making seem like drops in a bucket.

Mind you, it's entirely possible that I'm being overly pessimistic here. So take it with a grain of salt. But that's how I feel about it at the moment.

Fortunately, Psychopath isn't that large of a code-base, and I've also split it into multiple sub-crates which definitely helps. Building Psychopath itself (assuming external dependencies are already built) takes around 17 seconds on my machine. So it's not terrible, and is about what I would expect from C++ as well.

Immature ecosystem

There aren't a lot of stable libraries out there for Rust. Fortunately, this is being actively worked on. But even so, rendering and vfx is a pretty niche space. And there are just so many things written in C++ for rendering and VFX that are unlikely to get robust Rust equivalents any time soon. It makes writing a renderer in Rust a little daunting.

Of course this isn't Rust's fault by any means. But it is a reality.

The saving grace (perhaps) is that Rust's C FFI story is excellent. Me and another fellow have been working on a Rust wrapper for OpenEXR and it has been a largely pleasant experience. There are some fiddly bits to get right since OpenEXR is C++ rather than C. And it does take time. But it takes a lot less time than writing a full-featured OpenEXR library from scratch.

So I suspect that writing wrappers is what I'll be doing when I need non-trivial functionality from C++ libraries. But that's certainly not as straightforward as just using a library directly. And it negates a little the very first positive point I mentioned about easy builds.

On the bright side, if I'm successful at this, then many of those libraries will already be wrapped for other intrepid developers. Eventually.

Fighting the borrow checker

The borrow checker is the part of Rust's compiler that makes sure ownership rules are properly followed. It is the basis of Rust's memory safety guarantees, which is awesome. But it also takes some getting used to.

By and large, when the borrow checker has refused to compile something I wrote, it's because it was right and I was wrong. So my complaint isn't actually that the borrow checker exists, but rather it's that the error messages are sometimes... confusing. And it's also not always clear how to change things to satisfy it.

Mind you, the error messages are nowhere near as bad as template meta-programming errors in C++. And it's certainly better than tracking down mysterious memory safety bugs. And the borrow checker errors have made great strides in getting better. And actually, most of the time they're pretty clear these days...

...but even so, it's a bit of a hurdle that you have to get over. And every once in a while the borrow checker throws you a curve ball that's just really confusing, and that can be really frustrating.

Having said all that, I left the borrow checker until last for the same reason I put memory safety last in the list of good things: I think its importance is overblown. A lot of people seem to talk a lot about it, but I think it's a minor annoyance.

With most new languages there are some new concepts or ways of thinking about things that you have to wrap your head around, and Rust is no different. But if you code in C++, I can't imagine it being a significant hurdle. It might piss you off for the first week, and then you'll get over it.


I'm certainly not ready to recommend Rust to anyone doing serious rendering or vfx work yet, mostly because of the lack of unions and SIMD in the stable compiler. And I think it absolutely makes sense for people to be cautious and see how things develop over the next few years.

However, I absolutely do recommend giving Rust a spin for a non-trivial fun project. The immature ecosystem might bite you depending on what you're trying to do, but I think the experience of trying to create something in Rust is kind of... eye opening. You'll likely be a bit frustrated by the borrow checker at first, but once you get over that I think you're equally likely to find programming in Rust really fun. And Rust as a language is clearly extremely capable over-all.

2015 - 07 - 16

Subdivision Surfaces

I recently implemented Catmull-Clark subdivision surfaces in Psychopath.

One of my goals with Psychopath is for all curved surfaces to actually render as curved surfaces rather than as tessellated triangle meshes. To accomplish this for subdivision surfaces I am leveraging OpenSubdiv to pre-process the cage mesh into b-spline patches. Psychopath then converts the b-spline patches into bezier patches which it already knows how to render.

The end result is this:

Pontiac GTO '67

(Model by 'thecali', available on BlendSwap—please note that the version in the render above has some parts missing, as some bits were modeled with curves which Psychopath can't render yet.)

In other words, I can finally render interesting models! Yay! Most things are modeled with Catmull-Clark subdivision surfaces these days, so this a big step.

The only down-side so far is that OpenSubdiv seems to be a bit on the slow side when converting the mesh to b-spline patches—it takes around 45 seconds for this model. It's possible I'm not using the library optimally. It's certainly something I'll need to investigate. But I'm quite pleased for now!