Skip to content

viz/core: VK_EXT_debug_utils + VK_EXT_validation_features (debug builds)#472

Draft
farbod-nv wants to merge 20 commits intomainfrom
fm/televiz_vkraii
Draft

viz/core: VK_EXT_debug_utils + VK_EXT_validation_features (debug builds)#472
farbod-nv wants to merge 20 commits intomainfrom
fm/televiz_vkraii

Conversation

@farbod-nv
Copy link
Copy Markdown
Contributor

Foundation for the upcoming vk::raii migration: get the validation harness in place before we start moving handles around.

When Config::enable_validation is on AND VK_LAYER_KHRONOS_validation is available, VkContext now also:

  • enables VK_EXT_debug_utils (instance extension)
  • chains VkValidationFeaturesEXT into instance create info with BEST_PRACTICES + SYNCHRONIZATION_VALIDATION
  • registers a create-time messenger via VkInstanceCreateInfo::pNext (catches errors from vkCreateInstance itself)
  • registers a persistent VkDebugUtilsMessengerEXT routing warning+error messages to stderr

The persistent messenger is torn down before vkDestroyInstance.

Object naming helpers (vkSetDebugUtilsObjectNameEXT) land later when there are concrete call sites to name.

Foundation for the upcoming vk::raii migration: get the validation
harness in place before we start moving handles around.

When Config::enable_validation is on AND VK_LAYER_KHRONOS_validation
is available, VkContext now also:
  - enables VK_EXT_debug_utils (instance extension)
  - chains VkValidationFeaturesEXT into instance create info with
    BEST_PRACTICES + SYNCHRONIZATION_VALIDATION
  - registers a create-time messenger via VkInstanceCreateInfo::pNext
    (catches errors from vkCreateInstance itself)
  - registers a persistent VkDebugUtilsMessengerEXT routing
    warning+error messages to stderr

The persistent messenger is torn down before vkDestroyInstance.

Object naming helpers (vkSetDebugUtilsObjectNameEXT) land later
when there are concrete call sites to name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05a438c0-4711-4b68-821e-caec3fda208d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fm/televiz_vkraii

Comment @coderabbitai help to get the list of available commands and usage tips.

farbod-nv and others added 18 commits May 6, 2026 11:22
The single canonical include point for Televiz Vulkan code.
Defines VULKAN_HPP_NO_CONSTRUCTORS so struct types are aggregates
and C++20 designated initializers work, then pulls in vulkan.hpp
and vulkan_raii.hpp.

Header comment documents the conventions migration commits will
follow:
  - vk::raii::* for owned handles
  - vk::StructureChain for pNext chains
  - designated initializers for struct creation
  - raw handle extraction (*handle_) only at marked CUDA / OpenXR
    interop boundaries

Two unit tests pin the toolchain: vk::raii::Context constructs
without throwing (loader is reachable), and a designated-init +
StructureChain combination compiles + behaves correctly.

No existing code migrated yet — that starts in the next commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VkContext now owns its Vulkan handles as vk::raii::* members
(Instance, DebugUtilsMessengerEXT, PhysicalDevice, Device, Queue,
PipelineCache). The reverse-declaration order matches Vulkan's
parent-before-child destruction requirement: pipeline_cache_ →
queue_ → device_ → physical_device_ → debug_messenger_ → instance_.

destroy() now just resets each member to nullptr (move-from
nullptr triggers vk::raii's destructor) — eliminates the manual
"if (h \!= VK_NULL_HANDLE) { vkDestroy*; h = VK_NULL_HANDLE; }"
chain. Helpers (is_validation_layer_available,
device_supports_extensions, find_graphics_compute_queue_family,
score_physical_device) and enumerate_physical_devices()
similarly use vulkan-hpp / vk::raii equivalents.

Public API surface preserved:
  * Existing raw-handle getters (instance(), device(), ...) extract
    via *member_ — consumers (viz_session, viz_layers) keep working
    unchanged.
  * Added raii getters (raii_instance(), raii_device(), ...) for
    in-tree consumers to use during their own migration commits.

create_instance / create_logical_device / create_pipeline_cache
use vk::raii::* constructors directly. Designated initializers on
the create-info structs make the call sites read top-down.

All 52 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Over-pruned in 3896199 — these aren't narration, they describe
what each Config field controls and what each PhysicalDeviceInfo
field maps to in Vulkan. Useful for callers reading the header.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Color + depth attachments (image, memory, view), render pass, and
framebuffer become vk::raii::* members, declared parent-first so
reverse-order destruction matches Vulkan's child-before-parent
requirement (framebuffer → views → images → memory).

destroy() / destroy_attachments() collapse from ~30 lines of
manual vkDestroy* chain to nullptr-resets. resize() restore-on-
failure logic is preserved unchanged.

Public raw-handle accessors keep working — return *member_ from
the underlying raii types — so viz_session and viz_layers
consumers don't need updates.

All 52 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fence + two semaphores become vk::raii::Fence / vk::raii::Semaphore
members. destroy() collapses to nullptr-resets. wait() / reset()
go through raii_device().waitForFences / resetFences.

Public raw-handle accessors keep returning the underlying handles
via *member_ for compositor consumers.

All 52 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The riskiest single file in the migration because of the CUDA-Vulkan
interop: external memory FD export, timeline semaphore FD export,
and a cudaMipmappedArray imported from those.

Vulkan side:
  - VkImage / VkDeviceMemory / VkImageView / VkSemaphore / VkCommandPool
    become vk::raii::* members.
  - pNext chains use vk::StructureChain:
      Image:     ImageCreateInfo + ExternalMemoryImageCreateInfo
      Memory:    MemoryAllocateInfo + ExportMemoryAllocateInfo
      Semaphore: SemaphoreCreateInfo + ExportSemaphoreCreateInfo
                 + SemaphoreTypeCreateInfo (timeline)
  - vkGetMemoryFdKHR / vkGetSemaphoreFdKHR replaced with the
    vk::raii::Device member functions (vulkan-hpp dispatches the
    extension proc internally).
  - One-shot layout transition uses vk::raii::CommandBuffers and
    vk::raii::Queue::submit() — no more manual cmd-buffer lifetime.

destroy() preserves explicit ordering: cudaDeviceSynchronize +
CUDA-side teardown FIRST (the imports are pinned against the
Vulkan handles), THEN waitIdle + raii nullptr-resets. CUDA handles
stay raw — interop boundary.

All 52 unit tests pass. -50 LOC net.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NO_CONSTRUCTORS placement

Three issues from review:

1. VkValidationFeaturesEXT was wrongly chained as pNext of
   VkDebugUtilsMessengerCreateInfoEXT. Per spec it's only valid as
   pNext of VkInstanceCreateInfo. Also missing the
   VK_EXT_validation_features instance extension itself; without it,
   the validation features struct is silently ignored. Same
   debug_create_info was reused for the persistent messenger,
   propagating the bogus pNext.

   Fix: enable VK_EXT_validation_features when validation is on.
   Use vk::StructureChain<InstanceCreateInfo, DebugUtilsMessengerCreateInfoEXT,
   ValidationFeaturesEXT> for instance creation. Pass a plain
   DebugUtilsMessengerCreateInfoEXT (no pNext) to the persistent
   messenger constructor.

2. DeviceImage::run_one_shot_layout_transition() was reinterpret_cast'ing
   a VkCommandBuffer pointer to vk::CommandBuffer*. vk::CommandBuffer
   wraps the raw handle 1:1 today but layout punning isn't guaranteed.
   Replace with a real vk::CommandBuffer local (constructed from
   *cmd) and take its address.

3. VULKAN_HPP_NO_CONSTRUCTORS was defined inside viz/core/vk.hpp,
   so it took effect only for TUs that included viz/core/vk.hpp
   BEFORE vulkan.hpp. If another header pulled vulkan.hpp first,
   the macro silently no-op'd and structs lost designated-init.
   Promote to a viz_core PUBLIC compile_definition so every Televiz
   TU gets it regardless of include order.

All 52 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer flagged the previous chain ordering as misleading: physically,
StructureChain<InstanceCreateInfo, DebugUtilsMessengerCreateInfoEXT,
ValidationFeaturesEXT> linked ValidationFeaturesEXT as pNext of the
messenger struct, even though both extensions extend InstanceCreateInfo
and the loader walks the whole list regardless of order.

Reorder to put ValidationFeaturesEXT directly after InstanceCreateInfo,
and rewrite the comment to explain that pNext is a flat linked list —
order is not semantic, but the physical linkage now matches the mental
model of "both structs attach to the instance create info".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Internal handles (VkSwapchainKHR, per-image semaphores, surface
capabilities/formats/present-mode queries) move to vk::raii and
vulkan-hpp types. Public API (raw VkImage, VkSemaphore in
AcquiredImage; raw VkSurfaceKHR in create) stays unchanged so
WindowBackend keeps working through this commit.

destroy_semaphores / destroy_swapchain_only collapse into a single
destroy() since vk::raii handles cleanup themselves; recreate() now
moves the old swapchain into a local raii object whose destructor
runs after init() so the oldSwapchain handle stays alive across the
new vkCreateSwapchainKHR call.

vkQueuePresentKHR is still called via the C entry point because
raii::Queue::presentKHR throws on the OUT_OF_DATE / SUBOPTIMAL
result codes we use for flow control.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GlfwWindow::create now takes const vk::raii::Instance& instead of a
raw VkInstance, and adopts the raw VkSurfaceKHR returned by
glfwCreateWindowSurface into vk::raii::SurfaceKHR via the raw-handle
adopt constructor. destroy() drops the surface (via raii reset) before
glfwDestroyWindow, preserving destruction ordering.

WindowBackend passes ctx.raii_instance(); the test that asserted
invalid_argument on a null VkInstance is dropped — the type system
now prevents it at the call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
VkBuffer/VkDeviceMemory and VkCommandPool/VkCommandBuffer become
vk::raii equivalents (CommandBuffers — the owning vector — for the
single readback cmd buffer). destroy_readback_staging is folded into
destroy() since raii handles cleanup; mapMemory/unmapMemory now go
through vk::raii::DeviceMemory.

readback_byte_size_ promoted to vk::DeviceSize (nameless typedef of
VkDeviceSize), allocationSize / size now use designated initializers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Command pool / buffer move to vk::raii (CommandBuffers vector for
the single buffer slot). Recording path converts to vulkan-hpp
methods (cmd.begin / beginRenderPass / setScissor / endRenderPass /
end) and a StructureChain<SubmitInfo, TimelineSemaphoreSubmitInfo>
for the timeline submit, with raii_queue() taking over from
vkQueueSubmit's raw entry point only where flow control needs the
result code (kept raw via reinterpret_cast for the submit_or_signal
fallback path).

Layer record() and backend record_post_render_pass remain raw
VkCommandBuffer recipients — that's the recording boundary the
layer base class declares.

Public Config keeps VkClearColorValue (raw boundary); the union is
layout-compatible with vk::ClearColorValue, so it's reinterpreted
when constructing the vk::ClearValue array.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sampler / DescriptorSetLayout / PipelineLayout / Pipeline /
DescriptorPool / DescriptorSets all become vk::raii. ShaderModule
locals in create_pipeline use vk::raii::ShaderModule so the
hand-written ShaderGuard goes away (raii handles that).

destroy() collapses to nullptr-resets in reverse-creation order;
manual sequencing was already required (sets-before-pool, pipeline-
before-layout) and the declared field order makes raii preserve it.

Recording path uses vk::CommandBuffer wrappers for bindPipeline /
bindDescriptorSets / draw; the cmd parameter is still raw VkCommandBuffer
since that's the layer base class's contract. updateDescriptorSets
goes through raii_device().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Whitespace / brace-position adjustments only — no semantic changes.
Picked up by `clang-format --dry-run --Werror` after the vk::raii
migration commits. CI's clang-format check now passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
vk::raii::SwapchainKHR::acquireNextImage throws on the OUT_OF_DATE /
SUBOPTIMAL result codes that we use for flow control (caller's signal
to recreate). The migration regressed window-resize: the throw escaped
out of WindowBackend::begin_frame and aborted the smoke binary.

Drop to vkAcquireNextImageKHR so the VkResult is observable, mirroring
the same workaround already in place for vkQueuePresentKHR.

Repro: ./build/examples/televiz/window_smoke/viz_window_smoke,
resize the GLFW window — main loop terminated with
"vk::SwapchainKHR::acquireNextImage: ErrorOutOfDateKHR".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es on availability

CI reported on a different vulkan-hpp SDK that vk::PFN_DebugUtilsMessengerCallbackEXT
is not declared, breaking the previous reinterpret_cast. Newer SDKs
(including the local toolchain) wrap the callback as a vk::PFN_*
typedef using vk::Flags<...>; older ones leave it as the raw C
PFN_vk*. Direct assignment fails on the new SDK because vk::Flags<>
is structurally distinct from the raw flag type.

Derive the field type via decltype(std::declval<...>().pfnUserCallback)
so the cast resolves to whatever the active SDK declares — no need
to spell either name. The ABI is identical (vk::Flags<T> is a trivial
uint32_t wrapper).

Also gate VK_EXT_validation_features on instance-extension
availability (not just on the validation layer being present): some
loaders advertise the layer without the extension, in which case
unconditionally requesting it would fail vkCreateInstance. New
helper is_instance_extension_available() and a third instance-create
branch covers the layer-but-no-features case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rgs to locals

Two CI failures on Ubuntu ARM64 Debug, both caused by a stricter / older
vulkan-hpp than the local SDK:

1. find_memory_type in render_target.cpp and device_image.cpp took
   vk::PhysicalDevice by value and relied on implicit conversion from
   const vk::raii::PhysicalDevice& at the call site. The CI SDK doesn't
   provide that conversion. Standardize on const vk::raii::PhysicalDevice&
   (matches OffscreenBackend's already-portable signature) — the body
   calls .getMemoryProperties() which works directly on the raii type.

2. vk::StructureChain<...> constructed with anonymous designated-init
   aggregates as arguments fails template-argument deduction on the
   CI SDK. Same workaround we already used in vk_context.cpp and
   viz_compositor.cpp: lift each chain element to a named local first,
   then pass the locals to the chain constructor. Three sites in
   device_image.cpp (image, allocate, semaphore chains).

Local build + sanitizer build + 52 unit tests stay green; clang-format
clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@farbod-nv farbod-nv force-pushed the fm/televiz_vkraii branch from e352cce to e127860 Compare May 6, 2026 20:22
Older vulkan-hpp SDKs (Ubuntu 22.04 stock) declare both
  vk::Xxx::operator==(const vk::Xxx&) const = default;  // from <=>
  // built-in: operator==(VkXxx, VkXxx)
which makes `*raii_obj == VK_NULL_HANDLE` ambiguous — VK_NULL_HANDLE
matches both candidates. Cast to the raw VkXxx so the comparison
binds unambiguously to the C builtin.

Eight sites: glfw_window (instance), viz_compositor (cmd buffer),
offscreen_backend (buffer), swapchain (swapchain x3, device x2).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant