When compressing high-resolution color data (e.g., 24-bit RGB, where each channel is 8-bit) to a lower bit depth (e.g., 12-bit RGB, where each channel is 4-bit), simple truncation results in severe "Color Banding". This manifests as harsh, visible steps in smooth gradients. To preserve the perception of continuous gradations and higher color depth on limited hardware, we employ Dithering.
Instead of simply truncating the bottom bits of a pixel's color value, dithering selectively adds structured noise to the pixel before truncation.
For Ordered Dithering, we use a predefined pattern matrix, typically a
Bayer Matrix (4x4):
[ 0, 8, 2, 10 ]
[ 12, 4, 14, 6 ]
[ 3, 11, 1, 9 ]
[ 15, 7, 13, 5 ]
The mathematical operation per pixel at coordinate
Noise_Value = Bayer_Matrix[y % 4][x % 4]
Pixel_Out = (Pixel_In + Noise_Value) & 0xF0 // Truncate lower 4 bitsBy intentionally adding a small amount of noise, pixels that are mathematically "close" to the next 4-bit threshold will occasionally be pushed over the edge into the brighter value. Because the Bayer matrix is designed to scatter high and low values evenly, your eye naturally acts as a "low-pass filter," averaging the dots together geographically to perceive the missing intermediate colors.
Many simple implementations try to be clever by conditionally applying dithering. For example, they might only apply noise if the pixel is above a certain threshold, or they might clip the addition to prevent overflow before truncation.
- The Problem: This often creates a visible, artificial boundary (a 'noise wall') between very dark areas (which remain un-dithered) and slightly brighter areas (where dithering abruptly starts).
To achieve perfect gradients, you must apply the noise mathematically to all pixels, and then truncate.
- The Caveat: If you apply noise to absolute black
0x00, the matrix will occasionally bump it up to0x10, causing your true blacks to emit a faint glowing grid. - The Solution: In our implementation, we assume that output values below
0x10effectively do not emit light on the target display. Therefore, we bypass the dithering addition strictly for input pixels< 0x10. Everything else is flawlessly dithered, preventing artificial boundaries while preserving pure blacks.
A static
Instead of keeping the Bayer grid locked to the exact same screen coordinates on every frame, we scramble it over time.
-
The Problem with Simple Scrolling: Moving the matrix
$X+1, Y+1$ every frame creates a visible, distracting "scrolling" or "raining" artifact across the screen. -
2D Scrambling Solution: We use a 4-bit frame counter, giving us 16 distinct frames. We bit-scramble the counter outputs into our X and Y spatial offsets:
X_offset = {frame_cnt[0], frame_cnt[2]} Y_offset = {frame_cnt[1], frame_cnt[3]}
- The Result: The matrix jumps pseudo-randomly across all 16 possible starting positions. Because it hits every position exactly once every 16 frames, the average brightness perfectly mathematically integrates over time. Structurally, the static pattern noise is transformed into uncorrelated, highly pleasing "Film Grain."
If you apply the exact same Bayer noise value to the Red, Green, and Blue channels of a single pixel simultaneously, you are exclusively fluctuating the pixel's Luminance (overall brightness).
- The Problem: The human eye is incredibly sensitive to high-frequency variations in Luminance. This makes the black/white Bayer grid dots very harsh and contrasty.
-
The Solution (Decorrelation): We read the exact same Bayer matrix, but from slightly different starting spatial coordinates for each color channel!
-
Red: No offset (
$X, Y$ ) -
Green: Offset (
$X+1, Y+2$ ) -
Blue: Offset (
$X+2, Y+1$ )
-
Red: No offset (
- The Result: The noise is pushed almost entirely into the Chrominance (Color) domain. Human eyes are very insensitive to high-frequency chroma noise. By "decorrelating" the channels, the harsh Luma grid completely dissolves into a uniform, soft scatter. The perceived image quality is significantly higher, appearing closer to a high-bit-depth display.
To achieve the best possible quality while respecting hardware limits (zero external memory), we implemented a 2-Stage Hybrid Pipeline that combines the strengths of both Ordered Dithering and Error Diffusion.
- Pass 1 (Temporal Ordered Dither):
- Applies a 2-bit temporally scrambled Bayer matrix strictly to ultra-dark areas (
< 0x04). - This injects high-frequency dynamic noise (film grain) that breaks up static spatial structures.
- Applies a 2-bit temporally scrambled Bayer matrix strictly to ultra-dark areas (
- Pass 2 (Low-Gray Energy Accumulator):
- Historically known as Error Diffusion, but refined for physical hardware constraints.
- Instead of the classic Floyd-Steinberg approach which adds noise everywhere, this operates as a conditional energy harvester.
- Step 1: Energy Accumulation: When
(Input + Incoming Error) < Hardware_Threshold, the output is suppressed to0x00. The entire energy value is "harvested" and propagated to neighbors. - Step 2: Ignition (Firing): When the accumulated energy hits the threshold (e.g.,
0x10), it "fires" a valid PWM pulse by outputting the accumulated value. The error is then reset to0, preventing unnecessary noise in brighter regions. - The Result: A perfectly clean output in bright areas combined with a "Milky Way" of tiny, bright dots in the ultra-dark regions that the human eye integrates as smooth, deep gradients.
To demonstrate the power of this hybrid approach, we simulated a display with severe bit-depth limitations:
Notice how the Hard Clamped image suffers from severe black crush and banding. In the Hybrid result, the background gradients are smooth, and the shadow details in the fur are fully recovered through the interaction of temporal noise and spatial diffusion.
Our quantitative analysis shows that the 2-Stage Hybrid architecture achieves a +3.30 dB improvement in PSNR compared to simple truncation, proving its effectiveness in high-fidelity image restoration for low-grayscale displays.
(See DESIGN.md for detailed RTL hardware architecture and timing diagrams).


