Skip to content

Signal when SPIR-V legalization needs loop unroll#15

Closed
AnastaZIuk wants to merge 109 commits intomainfrom
unroll
Closed

Signal when SPIR-V legalization needs loop unroll#15
AnastaZIuk wants to merge 109 commits intomainfrom
unroll

Conversation

@AnastaZIuk
Copy link
Copy Markdown
Member

@AnastaZIuk AnastaZIuk commented Mar 20, 2026

Summary

  • add dedicated SpirvEmitter bits for legalization-time loop unroll and legalization-time SSA rewrite
  • set those bits only for the known lowering paths that still need those expensive transforms for correctness
  • pass the bits into the narrowed SPIR-V legalization overload
  • update a few CodeGenSPIRV checks so they validate the semantically relevant SPIR-V shape instead of the older inflated IR shape
  • update the nested SPIR-V submodule pointers so this branch materializes the validated companion optimizer state, including the isolated -O1experimental VariablePointers capability cleanup
  • use the DXC trunk Godbolt reproducer, which is the preprocessed output of our path tracer at about 58k LoC

Root cause

The expensive blanket policy lives in the SPIR-V optimizer recipe, not in HLSL parsing or the DXC CLI front-end.

DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.

Two confirmed correctness-sensitive cases are:

  • variable image sample offsets, which still need materialized loop unroll to legalize into forms such as ConstOffset
  • combined image sampler conversion, which does not need loop unroll but still needs legalize-time SSA rewrite cleanup to avoid invalid SPIR-V such as storing sampled image types

The paired SPIRV-Tools branch now also carries one deliberately isolated -O1experimental follow-up: trim stale VariablePointers / VariablePointersStorageBuffer declarations only at the end of the fast path when the final module no longer needs them. This fixes the EX37 sampler regression without changing the shared default trim path.

This patch provides that producer-side signal. The companion optimizer change in KhronosGroup/SPIRV-Tools#6612 then uses it to avoid the broader blanket defaults.

Validation

  • reproducer: godbolt.org/z/o5xf1hq36 (note: Compiler Explorer cache can make repeated runs look much faster than a cold compile)
  • shader payload: preprocessed output of our path tracer at about 58k LoC
  • local machine: AMD Ryzen 5 5600G with Radeon Graphics, 6 physical cores, 12 logical processors, Windows-reported max clock 3901 MHz
  • on the same payload and the same machine, SPIRV-Tools@487ff843bd8a + DXC@bd9a8b1c5365 reduced the workload from 19.161 s to 6.042 s
  • with SPIRV-Tools@57007cf46bb4 + DXC@b02b772e0b50, the same payload measured 4.702 s
  • with the current branch pair SPIRV-Tools@f5339a9dd2e2 + DXC@73a31a4a4b6d, the same payload keeps the reduced hot-path compile cost while fixing the EX37 -O1experimental sampler regression
  • full local CodeGenSPIRV lit/FileCheck passes: 1403 expected passes, 2 expected failures, 0 unexpected

Companion SPIR-V optimizer PR:
KhronosGroup/SPIRV-Tools#6612

AnastaZIuk and others added 30 commits November 10, 2022 12:40
…nd link it with delayed DLL loading approach
…o include directories, archive and runtime output needed by dxcompiler target at configure time with respect to configuration
# Conflicts:
#	CMakeLists.txt
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 54afd2c6a5f1a5bcbef66967515fa69f5a3650fd 73a31a4a4b6d56e6a312e8521c3af70e7014b187 -- include/dxc/WinEtwAdapter.h include/dxc/Support/Global.h include/dxc/Support/SPIRVOptions.h include/dxc/Support/dxcapi.use.h include/dxc/WinAdapter.h include/llvm/CodeGen/SchedulerRegistry.h lib/DxcSupport/HLSLOptions.cpp lib/DxcSupport/dxcmem.cpp lib/IR/Core.cpp lib/MSSupport/MSFileSystemImpl.cpp projects/dxilconv/tools/dxilconv/dxilconv.cpp tools/clang/include/clang/AST/ASTContext.h tools/clang/include/clang/AST/HlslTypes.h tools/clang/include/clang/Frontend/CompilerInstance.h tools/clang/include/clang/Frontend/FrontendAction.h tools/clang/lib/AST/ASTContext.cpp tools/clang/lib/Frontend/CompilerInstance.cpp tools/clang/lib/Frontend/FrontendAction.cpp tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/DeclResultIdMapper.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.h tools/clang/lib/Sema/SemaHLSL.cpp tools/clang/tools/dxcompiler/DXCompiler.cpp tools/clang/tools/dxcompiler/dxcapi.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp tools/clang/tools/dxcvalidator/dxcvalidator.cpp tools/clang/tools/dxildll/dxildll.cpp tools/clang/tools/dxrfallbackcompiler/DXCompiler.cpp tools/clang/tools/dxrfallbackcompiler/dxcapi.cpp
View the diff from clang-format here.
diff --git a/include/dxc/Support/Global.h b/include/dxc/Support/Global.h
index 0f4bb898e..578e7f010 100644
--- a/include/dxc/Support/Global.h
+++ b/include/dxc/Support/Global.h
@@ -354,7 +354,7 @@ inline void OutputDebugFormatA(const char *pszFormat, ...) {
 #define DXASSERT_ARGS(expr, fmt, ...)                                          \
   do {                                                                         \
     if (!(expr)) {                                                             \
-      fprintf(stderr, fmt __VA_OPT__(, ) __VA_ARGS__);                        \
+      fprintf(stderr, fmt __VA_OPT__(, ) __VA_ARGS__);                         \
       assert(false);                                                           \
     }                                                                          \
   } while (0)
diff --git a/include/dxc/Support/dxcapi.use.h b/include/dxc/Support/dxcapi.use.h
index d7fe9681e..d48c5b62a 100644
--- a/include/dxc/Support/dxcapi.use.h
+++ b/include/dxc/Support/dxcapi.use.h
@@ -74,7 +74,8 @@ class SpecificDllLoader : public DllLoader {
                   // test multiple validators versions.
     if (m_dll == nullptr)
       return HRESULT_FROM_WIN32(GetLastError());
-    m_createFn = reinterpret_cast<DxcCreateInstanceProc>(reinterpret_cast<void *>(GetProcAddress(m_dll, fnName)));
+    m_createFn = reinterpret_cast<DxcCreateInstanceProc>(
+        reinterpret_cast<void *>(GetProcAddress(m_dll, fnName)));
 
     if (m_createFn == nullptr) {
       HRESULT hr = HRESULT_FROM_WIN32(GetLastError());
@@ -106,7 +107,8 @@ class SpecificDllLoader : public DllLoader {
       fnName2[s] = '2';
       fnName2[s + 1] = '\0';
 #ifdef _WIN32
-      m_createFn2 = reinterpret_cast<DxcCreateInstance2Proc>(reinterpret_cast<void *>(GetProcAddress(m_dll, fnName2)));
+      m_createFn2 = reinterpret_cast<DxcCreateInstance2Proc>(
+          reinterpret_cast<void *>(GetProcAddress(m_dll, fnName2)));
 #else
       m_createFn2 = (DxcCreateInstance2Proc)::dlsym(m_dll, fnName2);
 #endif
diff --git a/include/dxc/WinAdapter.h b/include/dxc/WinAdapter.h
index ca04bec2d..97b6d76ca 100644
--- a/include/dxc/WinAdapter.h
+++ b/include/dxc/WinAdapter.h
@@ -198,7 +198,8 @@
 #define OutputDebugStringA(msg) fputs(msg, stderr)
 #define OutputDebugFormatA(...) fprintf(stderr, __VA_ARGS__)
 
-// I have no idea if I don't break something like INSTALL targets, requires CI tests
+// I have no idea if I don't break something like INSTALL targets, requires CI
+// tests
 #include "WinEtwAdapter.h"
 
 #define UInt32Add UIntAdd
diff --git a/include/llvm/CodeGen/SchedulerRegistry.h b/include/llvm/CodeGen/SchedulerRegistry.h
index 69ffc384b..02fd338e9 100644
--- a/include/llvm/CodeGen/SchedulerRegistry.h
+++ b/include/llvm/CodeGen/SchedulerRegistry.h
@@ -42,8 +42,9 @@ public:
   static MachinePassRegistry<FunctionPassCtor> Registry;
 
   RegisterScheduler(const char *N, const char *D, FunctionPassCtor C)
-      : MachinePassRegistryNode(N, D, C)
-  { Registry.Add(this); }
+      : MachinePassRegistryNode(N, D, C) {
+    Registry.Add(this);
+  }
   ~RegisterScheduler() { Registry.Remove(this); }
 
 
diff --git a/lib/DxcSupport/HLSLOptions.cpp b/lib/DxcSupport/HLSLOptions.cpp
index 7f13a4b61..ad507b2b6 100644
--- a/lib/DxcSupport/HLSLOptions.cpp
+++ b/lib/DxcSupport/HLSLOptions.cpp
@@ -792,9 +792,8 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
   }
 
   opts.DisableOptimizations = false;
-  if (Arg *A =
-          Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3,
-                          OPT_Od)) {
+  if (Arg *A = Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
+                               OPT_O3, OPT_Od)) {
     if (A->getOption().matches(OPT_O0))
       opts.OptLevel = 0;
     if (A->getOption().matches(OPT_O1))
@@ -1269,8 +1268,7 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
       errors << "-Oconfig should not be specified more than once";
       return 1;
     }
-    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
-                        OPT_O3)) {
+    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3)) {
       errors << "-Oconfig should not be used together with -O";
       return 1;
     }
diff --git a/lib/MSSupport/MSFileSystemImpl.cpp b/lib/MSSupport/MSFileSystemImpl.cpp
index ca734f7b4..e75019bef 100644
--- a/lib/MSSupport/MSFileSystemImpl.cpp
+++ b/lib/MSSupport/MSFileSystemImpl.cpp
@@ -323,8 +323,7 @@ typedef BOOLEAN(WINAPI *PtrCreateSymbolicLinkW)(
 
 PtrCreateSymbolicLinkW create_symbolic_link_api =
     PtrCreateSymbolicLinkW(reinterpret_cast<void *>(::GetProcAddress(
-        ::GetModuleHandleW(L"Kernel32.dll"),
-                                            "CreateSymbolicLinkW")));
+        ::GetModuleHandleW(L"Kernel32.dll"), "CreateSymbolicLinkW")));
 } // namespace
 #endif
 
diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 20c515750..8284419f1 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,10 +593,9 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
       constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
       curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
       isSpecConstantMode(false), needsLegalization(false),
-      needsLegalizationLoopUnroll(false),
-      needsLegalizationSsaRewrite(false),
-      sawExplicitUnrollHint(false),
-      beforeHlslLegalization(false), mainSourceFile(nullptr) {
+      needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
+      sawExplicitUnrollHint(false), beforeHlslLegalization(false),
+      mainSourceFile(nullptr) {
 
   // Get ShaderModel from command line hlsl profile option.
   const hlsl::ShaderModel *shaderModel =
@@ -972,8 +971,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
       !dsetbindingsToCombineImageSampler.empty() ||
       spirvOptions.signaturePacking;
   needsLegalizationSsaRewrite =
-      needsLegalizationSsaRewrite ||
-      !dsetbindingsToCombineImageSampler.empty();
+      needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
 
   // Run legalization passes
   if (spirvOptions.codeGenHighLevel) {
@@ -3396,7 +3394,8 @@ SpirvInstruction *SpirvEmitter::processCall(const CallExpr *callExpr) {
           return nullptr;
         }
         if (!canActAsOutParmVar(param)) {
-          emitError("argument for a non SPIR-V intrinsic parameter with vk::ext_reference attribute "
+          emitError("argument for a non SPIR-V intrinsic parameter with "
+                    "vk::ext_reference attribute "
                     "must be a applied to `inout`",
                     arg->getExprLoc());
           return nullptr;
  • Check this box to apply formatting changes to this branch.

Pull in the companion SPIRV-Tools change that adds a dedicated
TrimVariablePointersCapabilitiesPass at the end of the O1experimental fast
performance recipe.

The follow-up fixes a real sampler regression where the final optimized
module still carried stale VariablePointers / VariablePointersStorageBuffer
capability declarations even though the final IR no longer contained the
pointer forms that required them. The failing EX37 shader remained
validator-legal and had only scalar OpSelect %float, but removing only
those stale capability lines fixed the downstream runtime corruption.

Keep the default trim path untouched by keeping the cleanup isolated to the
experimental fast path in SPIRV-Tools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants