A C++ microbenchmark repository for cache behavior, memory access, synchronization, communication paths, language/runtime overhead, allocator tradeoffs, container lookup, and syscall or network boundary cost.
- Build intuition for cache hierarchy and memory access patterns
- Compare concurrency, communication, language, container, and allocator tradeoffs with reproducible microbenchmarks
- Measure syscall, IPC, and local transport overhead with small focused benchmarks
- Produce evidence-based performance notes from stable runs
- CMake >= 3.20
- A C++20 compiler (clang++ or g++)
- Git + internet access (for fetching
google/benchmark)
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -jscripts/run_all.shStandard pattern:
./build/benchmark/<binary_name> --benchmark_min_time=0.3sExamples:
./build/benchmark/bm_stride_access --benchmark_min_time=0.3s
./build/benchmark/bm_cache_levels --benchmark_min_time=0.3sQueue tuned run:
./build/benchmark/bm_queue \
--benchmark_filter='BM_Queue(MutexTransfer/batch:64/backoff:0|SpscRingTransfer/batch:8/backoff:0)$' \
--benchmark_min_time=1s \
--benchmark_repetitions=10 \
--benchmark_report_aggregates_only=truebm_stride_access: locality loss from larger access stridebm_pointer_chasing: sequential access vs irregular pointer traversalbm_false_sharing: adjacent counters vs cache-line-padded countersbm_aos_vs_soa: layout sensitivity for dense vs sparse field usagebm_mutex_vs_atomic: contention scaling for shared counter updatesbm_cache_levels: throughput drop as working set crosses cache levelsbm_ilp: dependent vs independent instruction streamsbm_branch_prediction: predictable, alternating, random, and branchless control flowbm_inlining_effects: forced inline, forced noinline, and function-pointer call shapebm_cache_associativity: friendly stride vs conflict-prone stridebm_queue: mutex queue vs tuned SPSC ring transferbm_mpsc_mpmc_queue: mutex queue vs bounded lock-free queue under MPSC and MPMC loadbm_cv_vs_spin: condition variable, yield, and spinning handoff costbm_lock_variants: mutex, spinlock, and ticket-lock contention scalingbm_queue_message_size: queue throughput across multiple payload sizesbm_memory_pool:new/deletevs locked pool vs thread-local poolbm_tlb_pressure: contiguous, page-stride, and randomized page walksbm_cross_thread_free: producer allocation with consumer-side free across general-purpose, pool, and PMR pathsbm_allocator_variants:new/delete,malloc/free,pmr, and arena-style allocationbm_allocator_mixed_size: mixed-size allocation across general-purpose and PMR pool pathsbm_vector_deque_list: sequence-container scan cost acrossvector,deque, andlistbm_mmap_vs_read: sequentialread, randompread, and mapped-file scanbm_clock_overhead:chrono,clock_gettime, andgettimeofdaycall costbm_mmap_cow: private first-touch/rewrite and shared mapped-write behaviorbm_page_fault_mlock: first-touch, prefaulted, andmlock-backed page accessbm_memory_order: throughput and correctness litmus tests acrossrelaxed, release/acquire, andseq_cstbm_thread_affinity: default vs shared/split placement-hint thread handoff with verification countersbm_pipe_vs_shm: pipe syscall handoff vs shared-memory mailbox handoffbm_socketpair_vs_pipe: Unix streamsocketpairvs pipe message handoffbm_virtual_vs_template_dispatch: template, virtual, and function-pointer dispatchbm_std_function_vs_lambda: lambda, functor, function pointer, andstd::functionbm_exception_vs_error_code: exception path vs optional-style error signalingbm_variant_vs_virtual:std::variantvisitation vs virtual hierarchy dispatchbm_dynamic_cast_vs_tag: RTTI-based dispatch vs enum-tag dispatchbm_aliasing_effects: potential aliasing vsrestrict-style no-alias accessbm_container_lookup:map,unordered_map, and sorted-vector lookupbm_socket_loopback: local TCP vs Unix stream loopback message transfer
results-summary.md: current run summary and conclusions
Generate figures from benchmark runs:
python3 scripts/generate_plots.pybenchmark/cache/: cache, locality, and working-set behaviorbenchmark/layout/: data layout experimentsbenchmark/concurrency/: synchronization, queues, and thread placementbenchmark/memory/: allocator and pooling behaviorbenchmark/cpu/: instruction-throughput experimentsbenchmark/containers/: container and lookup tradeoff benchmarksbenchmark/language/: dispatch and callable abstraction benchmarksbenchmark/syscalls/: file and syscall boundary measurementsbenchmark/ipc/: communication-path benchmarksbenchmark/network/: local socket and transport-path benchmarks


