Blog Archive About

Psychopath Renderer

a slightly psychotic path tracer

2020 - 09 - 22

Storing Color Compactly

One of the tricky things about a Manuka-style rendering architecture is that you need to store your shading information really compactly, since it will be stored at every micropolygon vertex.

Storing surface normals seems like a fairly solved problem at this point, and I recommend the paper Survey of Efficient Representations for Independent Unit Vectors by Cigolle et al. if you haven't already read it.

Color, on the other hand, still seems like a bit of an open problem. For something like Psychopath, ideally I want a color format that meets these criteria:

  • Compact (ideally 32 bits or less).
  • Fast to decode (I want to use this at runtime while rendering, potentially decoding on every ray hit).
  • Covers the entire gamut of human-visible colors.
  • Has sufficient dynamic range to capture day-to-day natural phenomenon (e.g. the sun) without clipping.
  • Has sufficient precision to avoid visible banding.

I took a few detours, but I believe I've come up with a format that meets all of these criteria. And I call it FLuv32.

It's based on the 32-bit variant of LogLuv, which is a color format designed by Greg Ward of Radiance fame. LogLuv ticks almost all of the boxes: it's 32 bits, it covers the full gamut of visible colors, it has a massive dynamic range of 127 stops, and it has precision that exceeds the capabilities of human vision. It's a super cool format, and I strongly recommend reading the original paper.

However, the one box that it doesn't tick is performance. Since it uses a log encoding for luminance, decoding it requires an exp2() call, which is slower than I want.1

FLuv32 is essentially the same as LogLuv, but replaces the log encoding with floating point (hence the "F" in the name) which can be decoded with just a few fast operations. To achieve the same precision with floating point, I dropped the sign bit, foregoing negative luminance capabilities. But aside from that and a couple minor tweaks, it's the same as LogLuv. It has the same precision, the same massive dynamic range, and also fits in 32 bits. It's just a lot faster.

The FLuv32 color format

For anyone wanting to use FLuv32 themselves, I have a reference implementation in Rust here, and below is a description of the format.

Like LogLuv, FLuv32 uses an absolute color space, and as such takes CIE XYZ triplets as input. Also like LogLuv, it uses the Y channel directly as luminance, and computes u' and v' with the following CIELUV formulas:

u' = 4X / (X + 15Y + 3Z)
v' = 9Y / (X + 15Y + 3Z)

FLuv32 stores Y, u', and v' in the following bit layout (from most to least significant bit):

  • 7 bits: Y exponent (bias 42)
  • 9 bits: Y mantissa (implied leading 1, for 10 bits precision)
  • 8 bits: u'
  • 8 bits: v'

To store u' and v' in 8 bits each, their values need to be scaled to a [0, 255] range. u' is in the interval [0, 0.62], and v' is a tad smaller than that, so the original LogLuv uses a scale of 410 for both. FLuv32 departs from that very slightly, and instead uses the following scales:

u'-scale = 817 / 2
v'-scale = 1235 / 3

The motivation for this change is to be able to exactly represent E, the chromaticity of an equal-energy spectrum. This is not perceptually meaningful, which is presumably why the original LogLuv didn't bother. But in the context of a spectral renderer this may be useful, and it's trivial to accommodate.

The luminance, Y, is stored with 7 bits of exponent and 9 bits of mantissa. The exponent has a bias of 42, and the mantissa follows the standard convention of having an implicit leading one, giving it 10 bits of precision. The minimum exponent indicates a value of zero (denormal numbers are not supported). The maximum exponent is given no special treatment (no infinities, no NaN).

The exponent's bias of 42 may seem surprising, since typically a 7-bit exponent would be biased by 63. The reasoning for this is two-fold:

  • The dynamic range is already monstrously excessive for practical imaging applications, so to some extent it doesn't matter.
  • It seems more useful to be able to represent the luminance of supernovae in physical units (which exceed 2^64 cd/m^2) than ludicrously dark values (and even so, 2^-42 is already ludicrously dark).

42 was chosen by putting two thirds of the exponent range above 1.0, and one third below.2

Other formats

I also explored some other formats before arriving at FLuv32.

The obvious first one is simply using three 16-bit floats for RGB values. Surprisingly, although this provides more than enough precision, it doesn't provide enough dynamic range. Standard IEEE half-floats only have 5 bits of exponent, giving only 32 stops. That's not bad, but it's not quite enough to cover the whole range from dark shadows to the brightness of the sun. Moreover, 3 half floats takes up 48 bits, which is unnecessary.

After that, I considered RGBE, which is actually a predecessor to LogLuv, and is used in the Radiance HDR image format. In its typical formulation, 8 bits each are given to R, G, B, and the shared exponent. This format is very fast to decode, but unfortunately doesn't have the precision needed to avoid visible banding.

I implemented two other RGBE variants to try to improve things:

  • 32-bits with a 9-9-9-5 split. While this improved precision, it also unfortunately reduced dynamic range too much.
  • 40-bits with a 11-11-11-7 split. This provides more than enough precision and range, at the expense of slightly increased storage requirements. I think this is actually a solidly good format if RGB is a requirement.

Before stumbling on LogLuv, I also explored my own home-grown luma-chroma based format, based on YCbCr. It didn't pan out, largely because if you don't know the precise RGB colorspace you're working from, you can't accurately separate chroma from luminance, which is critical for storing chroma in fewer bits. You could, of course, just choose a specific RGB colorspace (e.g. ACES2065-1), but even then your chroma don't end up being perceptually uniform, which also thwarts low-bit-count storage.

All-in-all, I would say if you want to store HDR RGB specifically, go with the 40-bit RGBE format, which should easily accommodate any RGB color space with plenty of precision. And if you just want to store HDR color more generally, then both LogLuv and FLuv32 are great formats.

Shower thought

It occurred to me that in a spectral renderer you might not even need to do a full decode from FLuv32 back to XYZ. You have to spectrally upsample all your colors anyway, and usually that's done via some kind of chroma-oriented lookup table. So I'm wondering now if it would be reasonable to build a lookup table that's just directly indexed by the quantized u' and v' coordinates.

I haven't fully explored that yet, but decoding just the luminance from FLuv32 is super fast. On CPU, on my machine, the full XYZ decode is about 3ns, whereas the Y decode is about 1ns. So that would be a significant savings. Even moreso if you would just end up recomputing chroma from XYZ anyway for a spectral lookup.


  1. Arguably you could use a fast approximate exp2 to overcome performance limitations. However, the accuracy would need to be good enough to avoid introducing visible error, otherwise the whole point of the format is somewhat subverted.
  2. That it also happens to be the answer to everything is pure coincidence.
2020 - 04 - 20

Light Trees and The Many Lights Problem

This post will walk you through a particular approach for efficiently importance sampling huge numbers of lights in a scene. I call the approach "Light Trees".

This post has been a long time coming. I actually wrote about this technique (tersely) back in 2014 on But I never got around to writing about it properly on this blog. So I'm fixing that now!

Before I get started, I want to make it clear that I am not the first person to come up with this idea. In particular, the Manuka guys have apparently been doing something like this since the inception of their renderer. And there are several other people that apparently came up with pretty much the same technique around the same time as me.

Really, the core approach is super straightforward. And the fact that I (of all people) independently came up with it is a strong indication that it's pretty obvious. After all, I'm an animator professionally, not a software developer.

So my goal in this post is not to claim any credit for the idea. Quite the contrary—given how many people have independently come up with it, I don't think anyone can reasonably claim credit for it.

Rather, the goal of this post is to clarify and distill the approach, since the papers I've seen about it describe more detailed or complex variants. I discuss those papers and their contributions at the end of this post.

The Many Lights Problem

The problem we're trying to solve with light trees is sometimes called the "many lights" problem. If you're already familiar with this problem and with importance sampling, then please feel free to skip to the "Light Trees" section. These first two sections are just filling in that background.

In a typical forward path tracer, when you want to calculate lighting at a point on a surface, you generally do two things1:

  1. Sample the light(s) in the scene to calculate direct lighting.
  2. Shoot one or more rays out in random directions to calculate indirect lighting.

The many lights problem shows up in that first step.

Calculating direct lighting is really easy if you only have one light: just shoot a shadow ray at the light, and if the ray reaches it, calculate the lighting. But what if there's more than one light? In that case, there are two broad approaches:

  1. Sample all lights.
  2. Pick one light at random.

The first approach—sampling all lights—works really well:

three lights full sampling

The second approach—sampling one light at random—also works well enough, but results in some noise due to the random selection:

three lights random sampling

Both approaches are fine. The former takes longer to render because it needs to do more calculations per shading point, but produces less noise. The latter renders faster but produces more noise, which takes more samples to get rid of. Which approach is better depends on the particular scene.

The many lights problem starts to show up when there are... eh... many lights. A large number of lights. For example, I have a test scene that contains roughly 8000 lights. Rendering this scene by sampling all lights looks like this2:

city scape full sampling

It doesn't look too bad. But it took 3 hours to render at only 16 samples per pixel, which is about 11 minutes per sample. That's because we have to spawn 8000 shadow rays on every ray hit, and potentially do as many lighting calculations. Imagine the render times on an actual production scene with even more lights, and at the sample rates needed to converge to a sufficiently noise-free image.

You might think that randomly selecting lights will save us. And, indeed, that brings our render time down to 6 seconds. But the result is so noisy that it's unusable:

city scape random sampling

That's because the probability of selecting a light that matters to the shading point is so low. Even if we crank our samples into the thousands, the render would still be somewhat noisy.

So it seems like we're stuck, then. Either our render times are intractable, or our renders are unusably noisy. And that, in a nut shell, is the many lights problem.

Importance Sampling

The answer to this problem is something called importance sampling3.

With approach #2 (random selection) we have a lot of wiggle room in how we select our lights, and the idea behind importance sampling is to use that wiggle room to sample important things with higher probability.

As an example, let's make a scene with two lights, with one of the lights much brighter than the other. If we render it with equal probability of selecting either light, it looks like this:

two lights equal sampling

But if we use importance sampling to select the brighter light with higher probability, it looks like this:

two lights importance sampling

So is that it, then? We just select the brighter lights more often, and we're good to go?

Unfortunately, no. You can get a hint of why that won't work by noticing that the area under the dimmer light actually gets more noisy in the importance sampled image.

The problem is that even though one light is brighter than the other, that doesn't mean it actually contributes more light to the point we're shading. For shading points that are sufficiently close to the dim light, the dim light is effectively the brighter one.

For a simple scene like the one above, that's not a big problem. But for something like the city scape scene it's a huge problem. First of all, all the lights in that scene are about the same brightness, so even with brightness-based importance sampling we would just be selecting them equally anyway. But also, even if some of the lights were brighter, what are the chances that any given light has much effect on any given shading point? Very low.

So what we actually want is to somehow sample the lights in proportion to the brightness they contribute to the current shading point. If we do that, things look way better:

two lights light tree sampling

But... how do we do that?

Light Trees

(To summarize for those that skipped to this section: our goal is to select a light from all lights in the scene according to their contributions to the current shading point.)

The naive way to do this would be to calculate the shading from all the lights, and then select a light based on those results. But that doesn't get us anywhere because then we're just sampling all the lights anyway.

So instead we need some efficient way to approximate that. And that's exactly what light trees let us do.

A light tree is a pretty straightforward data structure: it's just a BVH of the lights in the scene, where each node in the BVH acts as an approximation of the lights under it. For it to be an effective approximation, that means we also need to store some extra data at each node.

The simplest kind of light tree would simply store the total energy of all the lights under each node. You could then treat each node as a light with the same size as the spatial bounds of the node, and emitting the same energy as the lights under it.

I'm going to use that simple light tree variant to illustrate the technique. But I'll discuss some possible extensions in a bit.

Okay, so we have our light tree as described above, and we have our shading point on a surface. How do we select a light?

Starting from the root of the tree, we:

  1. Calculate the lighting (or an approximation thereof) from each of the child nodes.
  2. Take those lighting results and select which child to traverse into based on them. For example, if the left node contributes twice as much light, then we'll be twice as likely to traverse into it.
  3. If the selected child is a leaf, return the actual light under it. Otherwise, go back to step 1 with the selected child.

So that's the basic idea. But there are a couple of further details:

  • We also need to return the probability of selecting that light. This is because the calling code needs to weight that light's contribution based on that probability.
  • We need a good "approximate lighting" function to use in step 1. Incidentally, I like to call this the "weighting" function, because we only use it to determine the weighted probability of traversing into each node.

Calculating the light-selection probability is pretty easy: we just combine the probabilities of all the child nodes we traverse into, which is just multiplying them all together.

Coming up with a good weighting function is trickier. The ideal weighting function would exactly calculate the lighting from all the individual lights under the node. But if we could do that efficiently then we wouldn't need the light tree in the first place! So instead we need something that approximates that well. I'm going to punt on this for now, but we'll come back to it later. For the moment, just assume we have a reasonable weighting function.

Given all of those things, our light selection process is the following pseudo code, where tree_root is the root node of the light tree, shading_point is the data of the point we're shading, and n is a random number in the range [0, 1):

def light_tree_select(tree_root, shading_point, n):
    node = tree_root   # Updated as we traverse.
    selection_p = 1.0  # Selection probability, also updated as we go.
        if is_leaf(node):
            return (node.light, selection_p)

        # Child node weights, or "approximate lighting".
        child_1_w = weight(node.child_1, shading_point)
        child_2_w = weight(node.child_2, shading_point)

        # Normalized probabilities of traversing into either child.
        child_1_p = child_1_w / (child_1_w + child_2_w)
        child_2_p = 1.0 - child_1_p

        # Select the child for the next loop.
        if n < child_1_p:
            node = node.child_1
            selection_p *= child_1_p # Accumulate probability.
            n /= child_1_p           # Scale random number.
            node = node.child_2
            selection_p *= child_2_p        # Accumulate probability.
            n = (n - child_1_p) / child_2_p # Scale random number.

And that's it! Well, modulo some floating point corner cases. But that's the entire thing. Really.

One thing I didn't mention yet: you need to scale the random number on each traversal step. That's what lets us get away with using just one number, instead of needing a new random number every time we choose a child. I'm feeling too lazy to explain how it works in detail, but you can think of it as the tree whittling down the space of the random number as it traverses deeper and deeper.

Anyway, if we implement that pseudo code, we get this beautiful result at 16 samples per pixel:

city scape basic light tree 16spp

This particular render uses a pretty simple weighting function: it assumes the shading point is lambert, treats the node as a spherical light, and just uses the analytic formula for lambert shading from a spherical light4.

That isn't an especially good weighting function for anything other than diffuse surfaces. But it works surprisingly well in many cases. It's definitely not sufficient for a robust implementation, however.

Potential Extensions and Improvements

Starting from that base, there are a bunch of potential improvements we could explore. For example:

  • A better weighting function that handles, among other things, a wider variety of materials well. This is more-or-less necessary for a robust implementation.
  • Storing more information than just total energy in the nodes, and using that for better weighting. For example, storing the approximate directionality of lights in cases such as spot lights and some area lights.
  • Alternative tree constructions. For example, using something other than the surface area heuristic to cluster lights, or using higher-arity trees.
  • Accounting for shadowing (somehow...).
  • etc.

The core idea—using a hierarchy to approximate light importance—is always the same. But the potential for variety within that is huge.

As a quick example, this render:

city scape better light tree 16spp also 16 samples per pixel, but (if you flip between it and the last render) it's clearly far less noisy. This was achieved by playing with some of the things above.

Further Reading

There are two published papers I'm aware of that describe variations on this light tree approach, both of which are very much worth reading:

Importance Sampling of Many Lights With Adaptive Tree Splitting is, in my opinion, the more significant paper of the two. It covers a large breadth of things including:

  • A novel tree construction approach that goes beyond the SAH, accounting for factors unique to light trees.
  • A way to account for the directionality of lights in the tree and to use that information to improve weighting calculations.
  • A novel "tree splitting" approach for returning multiple lights with reduced variance.

The main downside is that their weighting functions don't work well without their tree splitting approach, and can approach very high variance without it. (In fact, the Stochastic Lightcuts paper makes comparisons to it without the tree splitting, which highlights how poorly it performs in that case.)

Stochastic Lightcuts proposes the same technique as both this post and my 2014 post on ompf2, but frames it as an extension to lightcuts. I personally find that framing confusing, and it actually took me a couple readings to realize that it is, in fact, the same technique. But some people might find that framing easier to understand, especially if they're already familiar with lightcuts.

However, I believe the significant contribution of Stochastic Lightcuts is the weighting function they propose, which is based on the lightcuts error bounding function, and accounts for different surface materials and lights with directional components. Moreover, unlike the weighting function from the Adaptive Tree Splitting paper, it works really well with the standard traversal approach—no tree splitting required. I would definitely recommend this paper as the baseline for any real implementation of light trees, and then explore improvements from there.


  1. This is a simplification, of course. But it's sufficient for illustrating the problem.

  2. If you compare to later renders in the post, you'll notice that it doesn't quite match. That's because this one was rendered in Cycles, but the others are rendered in Psychopath. Psychopath doesn't support sampling all lights, and I didn't feel like implementing a whole feature just for illustration purposes.

  3. For a more thorough introduction to importance sampling, I strongly recommend the amazing PBRT book, which is available for free online. (But is also very much worth buying!)

  4. See the paper "Area Light Sources for Real-Time Graphics" by John M. Snyder for the appropriate formula.

2020 - 04 - 14

Building a Sobol Sampler

Up until recently I was using Leonhard Grünschloß's Faure-permuted Halton sampler as my main sampler in Psychopath. I even ported it to Rust when I made the switch from C++. And it works really well.

I've also had a Rust port of Grünschloß's Sobol sampler in Psychopath's git repo for a while, and from time to time I played with it. But it never seemed notably better than the Halton sampler in terms of variance, and it was significantly slower at generating samples. This confused me, because Sobol seems like the go-to low discrepancy sequence for 3d rendering, assuming you want to use low discrepancy sequences at all.

However, I recently stumbled across some new (to me) information, which ultimately sent me down a rabbit hole implementing my own Sobol sampler. I'd like to share that journey.

(Incidentally, if you want to look at or use my Sobol sampler, you can grab it here. It's MIT licensed, like the rest of Psychopath.)

Don't Offset Sobol Samples

The first two things I learned are about what not to do. And the title of this section actually refers to both of those things simultaneously:

  1. Cranely Patterson Rotation.
  2. Randomly offseting your starting index into the sequence for each pixel.

These are two different strategies for decorrelating samples between pixels. The first offsets where the samples are in space, and the second offsets where you are in the sequence. Both are bad with Sobol sequences, but for different reasons.

Cranley Patterson Rotation is bad because it increases variance. Not just apparent variance, but actual variance. It's still not entirely clear to me why that's the case—it feels counter-intuitive. But I first found out about it in the slides of a talk by Heitz, et al. about their screen-space blue noise sampler. It's also been borne out in some experiments of my own since then.

So the first lesson is: don't use Cranley Patterson Rotation. It makes your variance worse.

Randomly offsetting what index you start at for each pixel is bad for a completely different reason: it's slow. Due to the way computing Sobol samples works, later samples in the sequence take longer to compute. This is what was making it slow for me. I already used this technique with the Halton sampler, to great success. So I assumed I could just apply it to Sobol as well. And you can. It works correctly. It just ends up being comparatively slow.

So the second lesson is: don't offset into your Sobol sequence (at least not by much).

The Right Way to Decorrelate

So if you can't do Cranley Patterson Rotation, and you can't offset into your sequence, how can you decorrelate your pixels with a Sobol sampler?

The answer is: scrambling.

There are two ways (that I'm aware of) to scramble a Sobol sequence while maintaining its good properties:

  1. Random Digit Scrambling
  2. Owen Scrambling

Random digit scrambling amounts to just xoring the samples with a random integer, using a different random integer for each dimension. The PBRT book covers this in more detail, but it turns out that doing this actually preserves all of the good properties of the Sobol sequence. So it's a very efficient way to decorrelate your pixels.

I gave that a try, and it immediately made the Sobol sampler comparable to the Halton sampler in terms of performance. So it's definitely a reasonable way to do things.

But then, due to a semi-unrelated discussion with the author of Raygon, I ended up re-reading the paper Progressive Multi-Jittered Sample Sequences by Christensen et al. The paper itself is about a new family of samplers, but in it they also compare their new family to a bunch of other samplers, including the Sobol sequence. And not just the plain Sobol sequence, but also the rotated, random-digit scrambled, and Owen-scrambled Sobol sequences.

The crazy thing that I had no idea about, and which they discuss at length in the paper, is that Owen scrambled Sobol actually converges faster than plain Sobol. That just seems nuts to me. Basically, Owen scrambling is the anti-Cranley-Patterson: it's a way of decorrelating your pixels while actually reducing variance. So even if you're not trying to decorrelate your pixels, it's still a good thing to do.

(As an interesting aside: random digit scrambling is a strict subset of Owen scrambling. Or, flipping that around, Owen scrambling is a generalization of random digit scrambling. But what's nice about that is you can use the same proof to show that both approaches preserve the good properties of Sobol sequences.)

Implementing Owen Scrambling Efficiently

The trouble with Owen scrambling is that it's slow. If you do some searches online, you'll find a few papers out there about implementing it. But most of them are really obtuse, and none of them are really trying to improve performance. Except, if I recall correctly, there's one paper about using the raw number crunching power of GPUs to try to make it faster. But that's not really that helpful, since you probably want to use your GPU for other number crunching.

So it basically seems like Owen-scrambling is too slow to be practical for 3d rendering.

Except! There's a single sentence in the "Progressive Multi-Jittered Sample Sequences" paper that almost seems like an afterthought. In fact, I almost missed it:

We do Owen scrambling efficiently with hashing – similar in spirit to Laine and Karras [LK11], but with a better hash function provided by Brent Burley.

"LK11" refers to the paper Stratified sampling for stochastic transparency by Laine et al. The really weird thing is that this paper isn't even about Sobol sampling. But they, too, have a one-liner that's really important:

This produces the same result as performing an Owen scramble [...]

What they're referring to here is a kind of hashing operation. They use it to decorrelate a standard base-2 radical inverse sequence, but the same approach applies just as well to Sobol sequences.

But you can't just use any hash function. In fact, you have to use a really specific, weird, special kind of hash function that would be a very poor choice for any application that you would typically use hashing for.

Here's the basic idea: the key insight Laine et al. provide is that Owen scrambling is equivalent to a hash function that only avalanches downward, from higher bits to lower bits. In other words, a hash function where each bit only affects bits lower than itself. This is a super cool insight! And it means that if we can design a high-quality, efficient hash function with this property, then we have efficient Owen scrambling as well.

It turns out, though, that developing a high-quality hash function that meets these criteria is pretty challenging. For example, the hash function they present in the paper, while fast, is pretty poor quality. Nevertheless, I decided to take a crack at it myself.

So far, I've stuck to the general approach from Laine et al., but worked on optimizing the constants and tweaking a few things. I don't think I've gained any real insights beyond their paper, but I have nevertheless made significant improvements through those tweaks. At this point, I have a hash function that I would call "okay" for the purposes of Owen scrambling. But there is very clearly still a lot of room for improvement.

I'm extremely curious what the implementation from "Progressive Multi-Jittered Sample Sequences" looks like. And I'm especially curious if there are any interesting insights behind it regarding how to construct this bizarre kind of hash function.

My Sobol Sampler

Again, you can grab my Sobol sampler here if you like.

It includes implementations of all the (non-slow) decorrelation approaches I mentioned in this post:

  • Cranley-Patterson rotation
  • Random Digit Scrambling
  • And hash-based Owen Scrambling

It also includes, of course, plain non-scrambled Sobol.

If you do make use of this somewhere, I strongly recommend using the Owen-scrambled version. But it can be fun to play with the others as well for comparison.

However, one thing I noticed in my testing is that Cranley-Patterson rotation seems to do a better job of decorrelating pixels. The two scrambling approaches seem to exhibit some correlation artifacts in the form of clusters of mildly higher-variance splotches in certain scenes. It's mild, and Owen scrambling still seems to win out variance-wise over Cranely-Patterson. But still. This is definitely a work in progress, so be warned.

Regardless, I've had a lot of fun with this, and I've learned a lot. Pretty much everything I talked about in this post was like black magic to me before diving head-first into all of it. I definitely recommend looking into these topics yourself if this post was at all interesting to you.

2020 - 04 - 12

Notes on Color: 01

During my time working on Psychopath, I've slowly gained a better understanding of color science, color management, and other color related topics. I'd like to jot down some notes about my current understanding of these topics, because the angle I learned them from is slightly non-standard. My hope is that this may help other people gain a better understanding of the topic as well.

I don't know how many entries in this series there will be. Maybe this will be the only one! Who knows, I'm lazy sometimes. But my hope is to make additional posts about this in the future. I have a lot that I'd like to write about.

Color Blindness

First, a fun fact about me: I'm partially color blind. More specifically, I'm an anomalous trichromat, meaning that, compared to normal color-vision people, one of the three cones in my eyes is shifted in its spectral sensitivity.

One of the more interesting experiences I've had was the process of being diagnosed with color blindness. Most people are familiar with the Ishihara color blindness test (even if not by name). You're shown a series of circles made up of colored dots, and each circle has an embedded image of some kind. If you can see the image then you have normal color vision, and if you can't then you're color blind to some extent.

What most people don't know, however, is that the full Ishihara test also includes the reverse: images that only color blind people can see but normal color vision people can't. This surprises a lot of people, and trying to understand how that works sent me down a fun rabbit hole about color vision a couple of decades ago.

That was my first foray into color science and color perception, and my understanding of color management and related topics is built on what I learned back then. I'm going to present color from that same perspective because I haven't seen it approached that way in the graphics community before, and I personally think it helps to clarify a lot of things that are otherwise quite... vague and wishy washy.

What is Color Vision?

The first step towards understanding almost anything about color is understanding color vision and how it relates to actual physical light.

Physical light is a fairly complex phenomenon (see e.g. polarization, various quantum effects, etc.). And if I'm being completely honest, I don't fully understand everything about it. However, the aspects of light relevant to color perception are (thankfully) pretty simple, and that's what I'll be talking about here.

To explain light and how it relates to color vision, I like to make an analogy between light and sound. Sound is made up of pressure waves (typically in the air) of various frequencies and amplitudes. For example, the wave of a single pure tone looks like this:

Graph of a tone

A louder version of that same tone looks like this:

Graph of a louder tone

And a higher-pitched version looks like this:

Graph of a higher-pitched tone

However, most sounds aren't a single pure tone. Most sounds have many tones of varying pitch and loudness layered on top of each other. This is what produces the rich harmonies and textures we experience in sounds. For example, here is a graph of a more complex sound: a chord of three notes being played on a piano:

Wave graph of three piano notes

Quick question: can you tell that there are three notes being played in that graph? Me either. That's because these waveform-style graphs aren't a good visualization for this kind of discussion. A more useful graph for us is one that visualizes how loud the sound is at each frequency. Here is the same piano sound graphed1 that way:

Spectrograph of three piano notes

Notice how you can make out the three separate notes, which show up as three spikes on the graph. Neat!

Light is also made up of waves, but they are electromagnetic waves rather than pressure waves. Nevertheless, we can graph light in the same way. For example, here is a graph2 of highly saturated yellow light, showing its frequency spiking around the yellow part of the spectrum:

Spectrograph of a pure yellow

With sound, we perceive different frequencies as being different pitches. But with light, we perceive different frequencies as being different colors. If you had a machine that could produce any frequency of light, and had it slowly move through all the frequencies of visible light, you would see it produce all the colors of the rainbow one after another, as shown along the bottom of the graph.

However, like sound, most light isn't made up of a single pure frequency. Most light has a range of frequencies layered on top of each other. Unfortunately, although our ears are capable of hearing multiple frequencies at once, our eyes aren't that good, and are unable to perceive the equivalent of the three piano notes in color. And that brings us to the next topic!

Human Color Perception & Metamers

Our ears have thousands of tiny hairs that are each sensitive to different frequencies, which is what allows our hearing to distinguish so many frequencies at once. Our eyes, on the other hand, only have three types of light sensors for seeing color, which are called "cones". Each type of cone is sensitive to different frequencies of light: one is sensitive to the long wavelengths ("L"), one to the medium wavelengths ("M"), and one to the short wavelengths ("S"). We can graph the sensitivities of each type of cone like this:

Spectrograph of cone sensitivities

The curve for each cone represents how sensitive it is at different frequencies. Importantly, the cones cannot distinguish where within their sensitivity range they are being stimulated. For example, the L cone can't tell if it's being stimulated at its peak or at the far end of its left tail. All it knows is that it's being stimulated somewhere within that range.

Nevertheless, because the sensitivities of the cones overlap, if they all work together, they can triangulate a single spike (or "tone") in the light spectrum. For example, a pure yellow spike would stimulate both the L and M cones roughly equally:

Spectrograph of a pure yellow with cone sensitivities overlaid

So we can make a pretty good guess that when the L and M cones are stimulated equally, the light spectrum is probably spiking at yellow. However, we could also have two spikes:

Spectrograph of a red and green together

And... well, the cones can't really tell the difference between that and the single yellow spike.3

This is, in fact, how our digital color displays (such as computer monitors) work. I imagine you already know that they use red, green, and blue lights at different intensities to create different colors. But what may not have occurred to you is that the reason that works is because all humans are color blind.

When a computer monitor turns on the red and green emitters of a pixel to display yellow, it's taking advantage of the fact that we can't tell the difference between a harmony of red + green and an actual pure yellow. It is fooling our eyes into seeing a yellow that isn't really there. If our eyes were as good as our ears, they wouldn't be fooled—we'd see both the red and green simultaneously, not yellow.

The big take-away here is that there are a lot of light spectrums that our eyes cannot distinguish. And there is a name for that phenomenon: two or more different light spectrums that appear the same to a given observer are called metamers.

The difference between color blind people and people with normal color vision isn't the existence of metamers, it's the quantity of metamers and which spectrums are metamers. To drive that point home: there are animals such as the mantis shrimp that have far, far better color vision than us humans do, because the mantis shrimp has not just three color sensors in its eyes but at least twelve. Compared to the mantis shrimp, normal color vision people are horribly, horribly color blind—only slightly less so than people with impaired color vision.

This, of course, makes me feel a little better as a color blind person. But the reason it's actually relevant to discussion of color management, 3d rendering, etc. is because it highlights the difference between physical light and human color perception.

Light spectrums are a well defined physical phenomenon: you can measure them, you can make a graph, and that is simply what the light spectrum is. Color, on the other hand, is a perceptual experience caused by light. Light doesn't have a color, it is perceived as a color by a given observer, and it depends on the particular observer. You can still measure and graph color, but only on a per-observer basis.

Wrapping Up

If I've done a good job explaining things, then hopefully you've grasped everything here reasonably well. The concepts I've just introduced are, I believe, critical for understanding color perception and color management, so if any of this was unclear please let me know!

Lastly, for the actual color scientists out there: I realize that I've glossed over some things here. Sorry about that. Nevertheless, I hope you think this is a good introduction to the topic.


This isn't super important, but it's a fun little fact. I said earlier that our eyes can't see harmonies in light spectrums, but that's not quite true. There is one harmonic color that we can see (albeit a very simple one): purple.

We see purple when there is energy in the long and short ends of the visible spectrum, but not the medium length part, like this:

Spectrograph of a red and blue together

Our eyes can distinguish that because both the L and S cones will be strongly stimulated, but not the M cone. So for anyone who's favorite color is purple: congratulations, it's a very special color!


  1. For the pedantic, the data for this graph has actually been further processed to isolate the fundamental frequencies of the piano. But the principle stands.

  2. Most of the graphs in this post are just approximated by hand, so please don't take them as precise in any way. That's also why I (intentionally) left out the scales of the axes.

  3. The astute among you might notice that the S cone actually gets stimulated a little, so the red-green combination isn't quite identical to the pure yellow in how it stimulates the cones. However, if you just add a little bit of energy everywhere in the pure yellow graph (making it a little less saturated of a yellow) then it would be. This is the first hint of something fairly interesting, which is that an RGB display can't reproduce all the fully saturated colors, even if the three RGB colors are themselves fully saturated.