Skip to content

gonsolo/Borg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

756 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Borg - Tiny Open Source Graphics Processing Unit

GDS Sky130 GDS IHP Book Test FPGA

Foundational workflow for an open-source GPU

The Borg (Bring yer Own GRaphics) project—supported by NLnet—is establishing a fully transparent, end-to-end silicon implementation flow for open-source GPU hardware using a 100% libre EDA toolchain. Recognizing that full GPU development is highly complex, the initiative capitalizes on recent advances in low-cost chip manufacturing to make individual tape-outs feasible for small teams.

📖 Read the Borg GPU Book for detailed documentation.

vkcube rendered by the Borg GPU

ASIC Global Placement Evolution

ASIC Global Placement Animation

100-frame placement animation from the OpenROAD EDA toolchain. Colors indicate functional Chisel modules.

Architecture

The design is a TinyQV RISC-V SoC with the Borg FP16 shader processor as a memory-mapped peripheral, targeting both iCE40 FPGAs (pico-ice) and ASIC (IHP SG13G2 via Tiny Tapeout).

Borg Shader Processor

A minimal programmable shading unit with:

  • FP16 Fused Multiply-Add (FMA) — IEEE-754 compliant HardFloat unit supporting ADD, MUL, FMA, FNEG, FSTEP, and FRCP operations
  • 32 general-purpose FP16 registers (r0–r31), MMIO-accessible from the CPU
  • 56-word instruction memory for shader programs
  • Hardware FP16 reciprocal (RCP) — LUT + linear interpolation for perspective division
  • Hardware Tile Buffer — 16-pixel buffer for RGB and Z-buffer depth testing
  • Hardware Texture Unit — Morton-encoded texture coordinate expansion
  • 4-cycle pipeline with automatic halt-on-zero-instruction

Rendering Pipeline

The firmware implements a full triangle rendering pipeline:

  1. Vertex Shader — 4×4 MVP matrix multiply with hardware perspective division, executed as a single shader pass on the Borg FPU
  2. Screen-Space Translation — NDC to pixel coordinates with configurable framebuffer resolution
  3. Rasterization — Hardware-iterator driven edge evaluation with native FP16 coordinate expansion and FSM auto-chaining
  4. Fragment Shader — Unified pass (compiled via linear scan allocator) performing barycentric interpolation for RGB, Z, and UV simultaneously
  5. Hardware Z-Buffer — Per-pixel depth testing in the hardware tile buffer
  6. Hardware Texturing — Morton-encoded texel fetch with snooped fragment coordinates
  7. Framebuffer Output — Results written to PSRAM, read by host (RP2040) for display

SPIR-B Shader Format

Shaders are compiled from GLSL-like source to a compact binary format (SPIR-B) and loaded at runtime from PSRAM — no firmware reflash needed to change shaders.

SystemRDL & Hardware Command FIFO

The MMIO architecture is generated automatically via the Accellera SystemRDL standard using PeakRDL-chisel, emitting both the Chisel BorgGpuRegs layout and the C-headers directly.

It features an asynchronous 2-entry Command FIFO so the CPU can pack and queue asynchronous drawing packets while the GPU handles geometry and rasterization in the background.

TinyQV CPU

Based on Michael Bell's TinyQV, an RV32I RISC-V core with nibble-serial processing designed for Tiny Tapeout. The original Verilog was rewritten in Chisel and heavily modified — including expanded register file support (RV32E → RV32I), integrated Borg peripheral bus, and adapted pipeline for QSPI flash/PSRAM and UART.

Prerequisites

Building and Testing

Run all tests (Chisel + RTL cocotb)

make test-all

Individual test targets

make test-chisel-borg          # Borg FPU unit tests (Chisel)
make test-chisel-core          # TinyQV CPU tests (Chisel)
make test-cocotb-soc-core-rtl  # CPU SoC integration tests (cocotb)
make test-cocotb-soc-borg-rtl  # Borg peripheral tests (cocotb)

Cycle-Accurate C++ Simulation & Interactive Pygame UI

Fast C++ simulators for RTL validation, capable of rendering frames locally without an FPGA, featuring a real-time cycle-accurate interactive view.

make -C simulation/verilator vkcube_gui # Run vkcube in the interactive Verilator viewer
make -C simulation/arcilator vkcube_gui  # Run in the faster Arcilator viewer

FPGA (pico-ice)

Prerequisites: pico-ice FPGA + Raspberry Pi debug probe.

cd fpga
make burn           # Build bitstream and upload to FPGA
make triangle       # Run triangle rendering (vertex shader on FPGA, display on RP2040)

ASIC (Tiny Tapeout)

Borg GPU GDS Render

make gds            # Full RTL-to-GDS flow via LibreLane/OpenROAD

Milestones

Milestone Status
FPU integrated into TinyQV SoC ✅ Done
Vertex shader on FPGA ✅ Done
Triangle rasterization + fragment shading ✅ Done
SPIR-B runtime shader loading ✅ Done
Per-vertex color interpolation ✅ Done
Hardware Tile Buffer (Z-Buffer depth test) ✅ Done
Hardware Texture Address Unit (Morton encoding) ✅ Done
32-bit RISC-V instructions & 32-entry register file ✅ Done
Hardware perspective projection (4×4 MVP shader) ✅ Done
Hardware FP16 reciprocal (FRCP) ✅ Done
Cycle-accurate C++ simulation (Arcilator & Verilator) ✅ Done
Interactive UI Viewer (zero-copy Pygame) ✅ Done
Test manufactured chip ⏳ Pending
Vulkan driver 📋 Planned

Software Bill of Materials

Component Description License
Chisel Hardware construction language (Scala → Verilog) Apache-2.0
TinyQV RV32I RISC-V CPU core (rewritten in Chisel) Apache-2.0
Berkeley HardFloat IEEE-754 floating-point units (FMA) BSD-3-Clause
LibreLane RTL-to-GDS ASIC flow orchestrator Apache-2.0
Yosys RTL synthesis ISC
OpenROAD Place and route BSD-3-Clause
Magic Layout tool, DRC, GDS export MIT
KLayout GDS viewer and DRC GPL-2.0
IHP SG13G2 PDK IHP 130nm process design kit Apache-2.0
cocotb Python-based RTL simulation and testing BSD-3-Clause
Icarus Verilog Verilog simulation (cocotb backend) GPL-2.0
Verilator Verilog linting and simulation LGPL-3.0
nextpnr FPGA place and route (iCE40) ISC
IceStorm iCE40 FPGA bitstream tools ISC
Netgen LVS (Layout vs. Schematic) MIT
GCC RISC-V cross-compiler (riscv32-embedded) GPL-3.0
Mill Scala build tool MIT
Tiny Tapeout Tools Build and submission orchestrator Apache-2.0
Nix Reproducible development environment LGPL-2.1
CIRCT/firtool Chisel → Verilog compiler (FIRRTL) Apache-2.0 (LLVM)
Arcilator Cycle-accurate FIRRTL C++ simulator Apache-2.0 (LLVM)
OpenJDK Java runtime for Chisel/Mill GPL-2.0 + CE
SystemRDL Register logic definition standard Accellera
PeakRDL Toolchain for parsing and exporting SystemRDL GPL-3.0
nanobind Zero-overhead C++ to Python bindings BSD-3-Clause
Pygame (SDL2) Hardware-accelerated UI windowing subsystem LGPL-2.1

Citation

If you use the Borg GPU in your research or project, please cite it using the following metadata:

@software{Wendleder_Borg_-_Tiny_2026,
  author = {Wendleder, Andreas},
  license = {CERN-OHL-S-2.0},
  month = apr,
  title = {{Borg - Tiny GPU}},
  url = {https://github.com/gonsolo/Borg},
  version = {0.1.0},
  year = {2026},
  note = {Funded by NLnet NGI0 Commons Fund. SkyWater 130nm and IHP SG13G2 Tapeouts}
}

Alternatively, see the CITATION.cff file for machine-readable citation information.

About

Foundational workflow for an open-source GPU

Resources

Stars

Watchers

Forks

Contributors