The Borg (Bring yer Own GRaphics) project—supported by NLnet—is establishing a fully transparent, end-to-end silicon implementation flow for open-source GPU hardware using a 100% libre EDA toolchain. Recognizing that full GPU development is highly complex, the initiative capitalizes on recent advances in low-cost chip manufacturing to make individual tape-outs feasible for small teams.
📖 Read the Borg GPU Book for detailed documentation.
100-frame placement animation from the OpenROAD EDA toolchain. Colors indicate functional Chisel modules.
The design is a TinyQV RISC-V SoC with the Borg FP16 shader processor as a memory-mapped peripheral, targeting both iCE40 FPGAs (pico-ice) and ASIC (IHP SG13G2 via Tiny Tapeout).
A minimal programmable shading unit with:
- FP16 Fused Multiply-Add (FMA) — IEEE-754 compliant HardFloat unit supporting ADD, MUL, FMA, FNEG, FSTEP, and FRCP operations
- 32 general-purpose FP16 registers (r0–r31), MMIO-accessible from the CPU
- 56-word instruction memory for shader programs
- Hardware FP16 reciprocal (RCP) — LUT + linear interpolation for perspective division
- Hardware Tile Buffer — 16-pixel buffer for RGB and Z-buffer depth testing
- Hardware Texture Unit — Morton-encoded texture coordinate expansion
- 4-cycle pipeline with automatic halt-on-zero-instruction
The firmware implements a full triangle rendering pipeline:
- Vertex Shader — 4×4 MVP matrix multiply with hardware perspective division, executed as a single shader pass on the Borg FPU
- Screen-Space Translation — NDC to pixel coordinates with configurable framebuffer resolution
- Rasterization — Hardware-iterator driven edge evaluation with native FP16 coordinate expansion and FSM auto-chaining
- Fragment Shader — Unified pass (compiled via linear scan allocator) performing barycentric interpolation for RGB, Z, and UV simultaneously
- Hardware Z-Buffer — Per-pixel depth testing in the hardware tile buffer
- Hardware Texturing — Morton-encoded texel fetch with snooped fragment coordinates
- Framebuffer Output — Results written to PSRAM, read by host (RP2040) for display
Shaders are compiled from GLSL-like source to a compact binary format (SPIR-B) and loaded at runtime from PSRAM — no firmware reflash needed to change shaders.
The MMIO architecture is generated automatically via the Accellera SystemRDL standard using PeakRDL-chisel, emitting both the Chisel BorgGpuRegs layout and the C-headers directly.
It features an asynchronous 2-entry Command FIFO so the CPU can pack and queue asynchronous drawing packets while the GPU handles geometry and rasterization in the background.
Based on Michael Bell's TinyQV, an RV32I RISC-V core with nibble-serial processing designed for Tiny Tapeout. The original Verilog was rewritten in Chisel and heavily modified — including expanded register file support (RV32E → RV32I), integrated Borg peripheral bus, and adapted pipeline for QSPI flash/PSRAM and UART.
make test-allmake test-chisel-borg # Borg FPU unit tests (Chisel)
make test-chisel-core # TinyQV CPU tests (Chisel)
make test-cocotb-soc-core-rtl # CPU SoC integration tests (cocotb)
make test-cocotb-soc-borg-rtl # Borg peripheral tests (cocotb)Fast C++ simulators for RTL validation, capable of rendering frames locally without an FPGA, featuring a real-time cycle-accurate interactive view.
make -C simulation/verilator vkcube_gui # Run vkcube in the interactive Verilator viewer
make -C simulation/arcilator vkcube_gui # Run in the faster Arcilator viewerPrerequisites: pico-ice FPGA + Raspberry Pi debug probe.
cd fpga
make burn # Build bitstream and upload to FPGA
make triangle # Run triangle rendering (vertex shader on FPGA, display on RP2040)make gds # Full RTL-to-GDS flow via LibreLane/OpenROAD| Milestone | Status |
|---|---|
| FPU integrated into TinyQV SoC | ✅ Done |
| Vertex shader on FPGA | ✅ Done |
| Triangle rasterization + fragment shading | ✅ Done |
| SPIR-B runtime shader loading | ✅ Done |
| Per-vertex color interpolation | ✅ Done |
| Hardware Tile Buffer (Z-Buffer depth test) | ✅ Done |
| Hardware Texture Address Unit (Morton encoding) | ✅ Done |
| 32-bit RISC-V instructions & 32-entry register file | ✅ Done |
| Hardware perspective projection (4×4 MVP shader) | ✅ Done |
| Hardware FP16 reciprocal (FRCP) | ✅ Done |
| Cycle-accurate C++ simulation (Arcilator & Verilator) | ✅ Done |
| Interactive UI Viewer (zero-copy Pygame) | ✅ Done |
| Test manufactured chip | ⏳ Pending |
| Vulkan driver | 📋 Planned |
| Component | Description | License |
|---|---|---|
| Chisel | Hardware construction language (Scala → Verilog) | Apache-2.0 |
| TinyQV | RV32I RISC-V CPU core (rewritten in Chisel) | Apache-2.0 |
| Berkeley HardFloat | IEEE-754 floating-point units (FMA) | BSD-3-Clause |
| LibreLane | RTL-to-GDS ASIC flow orchestrator | Apache-2.0 |
| Yosys | RTL synthesis | ISC |
| OpenROAD | Place and route | BSD-3-Clause |
| Magic | Layout tool, DRC, GDS export | MIT |
| KLayout | GDS viewer and DRC | GPL-2.0 |
| IHP SG13G2 PDK | IHP 130nm process design kit | Apache-2.0 |
| cocotb | Python-based RTL simulation and testing | BSD-3-Clause |
| Icarus Verilog | Verilog simulation (cocotb backend) | GPL-2.0 |
| Verilator | Verilog linting and simulation | LGPL-3.0 |
| nextpnr | FPGA place and route (iCE40) | ISC |
| IceStorm | iCE40 FPGA bitstream tools | ISC |
| Netgen | LVS (Layout vs. Schematic) | MIT |
| GCC | RISC-V cross-compiler (riscv32-embedded) |
GPL-3.0 |
| Mill | Scala build tool | MIT |
| Tiny Tapeout Tools | Build and submission orchestrator | Apache-2.0 |
| Nix | Reproducible development environment | LGPL-2.1 |
| CIRCT/firtool | Chisel → Verilog compiler (FIRRTL) | Apache-2.0 (LLVM) |
| Arcilator | Cycle-accurate FIRRTL C++ simulator | Apache-2.0 (LLVM) |
| OpenJDK | Java runtime for Chisel/Mill | GPL-2.0 + CE |
| SystemRDL | Register logic definition standard | Accellera |
| PeakRDL | Toolchain for parsing and exporting SystemRDL | GPL-3.0 |
| nanobind | Zero-overhead C++ to Python bindings | BSD-3-Clause |
| Pygame (SDL2) | Hardware-accelerated UI windowing subsystem | LGPL-2.1 |
If you use the Borg GPU in your research or project, please cite it using the following metadata:
@software{Wendleder_Borg_-_Tiny_2026,
author = {Wendleder, Andreas},
license = {CERN-OHL-S-2.0},
month = apr,
title = {{Borg - Tiny GPU}},
url = {https://github.com/gonsolo/Borg},
version = {0.1.0},
year = {2026},
note = {Funded by NLnet NGI0 Commons Fund. SkyWater 130nm and IHP SG13G2 Tapeouts}
}Alternatively, see the CITATION.cff file for machine-readable citation information.


