One of the tricky things about a Manukastyle rendering architecture is that you need to store your shading information really compactly, since it will be stored at every micropolygon vertex.
Storing surface normals seems like a fairly solved problem at this point, and I recommend the paper Survey of Efficient Representations for Independent Unit Vectors by Cigolle et al. if you haven't already read it.
Color, on the other hand, still seems like a bit of an open problem. For something like Psychopath, ideally I want a color format that meets these criteria:
 Compact (ideally 32 bits or less).
 Fast to decode (I want to use this at runtime while rendering, potentially decoding on every ray hit).
 Covers the entire gamut of humanvisible colors.
 Has sufficient dynamic range to capture daytoday natural phenomenon (e.g. the sun) without clipping.
 Has sufficient precision to avoid visible banding.
I took a few detours, but I believe I've come up with a format that meets all of these criteria. And I call it FLuv32.
It's based on the 32bit variant of LogLuv, which is a color format designed by Greg Ward of Radiance fame. LogLuv ticks almost all of the boxes: it's 32 bits, it covers the full gamut of visible colors, it has a massive dynamic range of 127 stops, and it has precision that exceeds the capabilities of human vision. It's a super cool format, and I strongly recommend reading the original paper.
However, the one box that it doesn't tick is performance. Since it uses a log encoding for luminance, decoding it requires an exp2()
call, which is slower than I want.^{1}
FLuv32 is essentially the same as LogLuv, but replaces the log encoding with floating point (hence the "F" in the name) which can be decoded with just a few fast operations. To achieve the same precision with floating point, I dropped the sign bit, foregoing negative luminance capabilities. But aside from that and a couple minor tweaks, it's the same as LogLuv. It has the same precision, the same massive dynamic range, and also fits in 32 bits. It's just a lot faster.
The FLuv32 color format
For anyone wanting to use FLuv32 themselves, I have a reference implementation in Rust here, and below is a description of the format.
Like LogLuv, FLuv32 uses an absolute color space, and as such takes CIE XYZ triplets as input. Also like LogLuv, it uses the Y channel directly as luminance, and computes u' and v' with the following CIELUV formulas:
u' = 4X / (X + 15Y + 3Z)
v' = 9Y / (X + 15Y + 3Z)
FLuv32 stores Y, u', and v' in the following bit layout (from most to least significant bit):
 7 bits: Y exponent (bias 42)
 9 bits: Y mantissa (implied leading 1, for 10 bits precision)
 8 bits: u'
 8 bits: v'
To store u' and v' in 8 bits each, their values need to be scaled to a [0, 255] range. u' is in the interval [0, 0.62], and v' is a tad smaller than that, so the original LogLuv uses a scale of 410 for both. FLuv32 departs from that very slightly, and instead uses the following scales:
u'scale = 817 / 2
v'scale = 1235 / 3
The motivation for this change is to be able to exactly represent E, the chromaticity of an equalenergy spectrum. This is not perceptually meaningful, which is presumably why the original LogLuv didn't bother. But in the context of a spectral renderer this may be useful, and it's trivial to accommodate.
The luminance, Y, is stored with 7 bits of exponent and 9 bits of mantissa. The exponent has a bias of 42, and the mantissa follows the standard convention of having an implicit leading one, giving it 10 bits of precision. The minimum exponent indicates a value of zero (denormal numbers are not supported). The maximum exponent is given no special treatment (no infinities, no NaN).
The exponent's bias of 42 may seem surprising, since typically a 7bit exponent would be biased by 63. The reasoning for this is twofold:
 The dynamic range is already monstrously excessive for practical imaging applications, so to some extent it doesn't matter.
 It seems more useful to be able to represent the luminance of supernovae in physical units (which exceed
2^64 cd/m^2
) than ludicrously dark values (and even so,2^42
is already ludicrously dark).
42 was chosen by putting two thirds of the exponent range above 1.0, and one third below.^{2}
Other formats
I also explored some other formats before arriving at FLuv32.
The obvious first one is simply using three 16bit floats for RGB values. Surprisingly, although this provides more than enough precision, it doesn't provide enough dynamic range. Standard IEEE halffloats only have 5 bits of exponent, giving only 32 stops. That's not bad, but it's not quite enough to cover the whole range from dark shadows to the brightness of the sun. Moreover, 3 half floats takes up 48 bits, which is unnecessary.
After that, I considered RGBE, which is actually a predecessor to LogLuv, and is used in the Radiance HDR image format. In its typical formulation, 8 bits each are given to R, G, B, and the shared exponent. This format is very fast to decode, but unfortunately doesn't have the precision needed to avoid visible banding.
I implemented two other RGBE variants to try to improve things:
 32bits with a 9995 split. While this improved precision, it also unfortunately reduced dynamic range too much.
 40bits with a 1111117 split. This provides more than enough precision and range, at the expense of slightly increased storage requirements. I think this is actually a solidly good format if RGB is a requirement.
Before stumbling on LogLuv, I also explored my own homegrown lumachroma based format, based on YCbCr. It didn't pan out, largely because if you don't know the precise RGB colorspace you're working from, you can't accurately separate chroma from luminance, which is critical for storing chroma in fewer bits. You could, of course, just choose a specific RGB colorspace (e.g. ACES20651), but even then your chroma don't end up being perceptually uniform, which also thwarts lowbitcount storage.
Allinall, I would say if you want to store HDR RGB specifically, go with the 40bit RGBE format, which should easily accommodate any RGB color space with plenty of precision. And if you just want to store HDR color more generally, then both LogLuv and FLuv32 are great formats.
Shower thought
It occurred to me that in a spectral renderer you might not even need to do a full decode from FLuv32 back to XYZ. You have to spectrally upsample all your colors anyway, and usually that's done via some kind of chromaoriented lookup table. So I'm wondering now if it would be reasonable to build a lookup table that's just directly indexed by the quantized u' and v' coordinates.
I haven't fully explored that yet, but decoding just the luminance from FLuv32 is super fast. On CPU, on my machine, the full XYZ decode is about 3ns, whereas the Y decode is about 1ns. So that would be a significant savings. Even moreso if you would just end up recomputing chroma from XYZ anyway for a spectral lookup.
Footnotes

Arguably you could use a fast approximate
exp2
to overcome performance limitations. However, the accuracy would need to be good enough to avoid introducing visible error, otherwise the whole point of the format is somewhat subverted. 
That it also happens to be the answer to everything is pure coincidence.