Skip to content

Tugbars/VectorFFT

Repository files navigation

banner

A from-scratch mixed-radix FFT library in C99 with hand-tuned AVX2/AVX-512 codelets.
Beats FFTW on 23 of 39 tested sizes. No external dependencies.


Benchmark Results

benchmark_chart

Platform: Intel Core i9-14900KF · 48 KB L1d · DDR5 · AVX2 · FFTW 3.3.10 (FFTW_ESTIMATE)


Accuracy

accuracy

Getting Started

git clone https://github.com/yourusername/VectorFFT.git
cd VectorFFT && mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
#include "vfft_planner.h"
#include "vfft_register_codelets.h"

vfft_codelet_registry reg;
vfft_register_all(&reg);

vfft_plan *plan = vfft_plan_create(N, &reg);
vfft_execute_forward(plan, re, im, out_re, out_im);
vfft_plan_destroy(plan);

Performance Tuning

VectorFFT uses measurement, not heuristics. Three calibration benchmarks determine every performance-critical threshold on your specific hardware:

./bench_walk          # Walk thresholds per radix    → vfft_calibration.txt
./bench_il            # IL crossovers per radix      → vfft_calibration.txt
./bench_factorize     # Optimal factorizations       → vfft_wisdom.txt

Both files are read automatically at plan creation. Without them, the planner uses conservative defaults — calibration typically improves performance by 15–40%.

Known limitations:

  • DIF backward codelets are ~10–15% slower than DIT forward — primary source of roundtrip losses
  • Small N (≤128) — per-stage overhead dominates; FFTW's monolithic codelets win here
  • R=5 at K>2048 falls back to log3 derivation; planner minimizes this via R=10/R=25 fusion

Acknowledgments

FFTW by Matteo Frigo and Steven G. Johnson — the gold standard. VectorFFT's prime codelets (R=17, 19, 23) are translated from FFTW's genfft output.

VectorFFT — Because every nanosecond counts.

About

VectorFFT is a vectorized, pure C FFT library optimized for x86 processors (AVX-512, AVX2, SSE2) with zero external dependencies. It implements mixed-radix algorithms for common sizes and Bluestein's method for arbitrary lengths, with OpenMP multi-threading for large transforms. Designed for both digital signal processing and financial applications

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors