add no-copy hermitian transpose support to matmul by simonbyrne · Pull Request #1173 · NVIDIA/MatX

simonbyrne · 2026-05-07T20:20:27Z

At the moment, calling matmul(hermitianT(A),B) or matmul(conj(transpose_matrix(A)), B)) will instantiate a temporary intermediate matrix. This changes it to directly call the appropriate cuBLAS (similar to how matmul(transpose_matrix(A),B) is handled).

copy-pr-bot · 2026-05-07T20:20:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

simonbyrne · 2026-05-07T20:20:36Z

/build

greptile-apps · 2026-05-07T20:26:54Z

Greptile Summary

This PR avoids materializing an intermediate matrix when hermitianT(A) or conj(bare_tensor) is passed as a matmul operand by introducing a WithMatmulOperand dispatch layer that extracts the inner tensor and forwards the conjugate-transpose intent to cuBLASLt via CUBLAS_OP_C. It also cleanly separates memory-order tracking (orderA/B/C) from the cuBLAS operation flags (opA/opB), which was previously conflated.

hermitianT(A) fast path: extracts the inner tensor view, passes it directly with CUBLAS_OP_C — genuinely zero-copy.
conj(tensor_view) fast path: creates transpose_matrix(input) (a TransposeMatrixOp) and passes it to getCublasSupportedTensor, but GetSupportedTensor assumes any transform op has already completed PreRun and calls in.Data() directly — yielding a null GPU pointer since PreRun was never invoked here.
conj(transpose_matrix(A)) and conj(hermitianT(A)) still fall to the else branch (materializing a copy), which is correct and tested.

Confidence Score: 3/5

The hermitianT(A) fast path is correct and zero-copy, but the conj(bare tensor_view) branch hands cuBLASLt a null GPU pointer and will crash at runtime.

The is_conj_tensor_view_unary_op_v branch constructs a TransposeMatrixOp inline without calling PreRun, then passes it to getCublasSupportedTensor. GetSupportedTensor assumes transform ops have already run and reads in.Data() directly, which is null here. No test exercises matmul(conj(a), b) with a raw tensor, so the defect goes undetected. The rest of the change — hermitianT dispatch, orderA/B/C separation, cache-key updates — is clean.

include/matx/transforms/matmul/matmul_cuda.h — the is_conj_tensor_view_unary_op_v branch of WithMatmulOperand (lines 1316-1322).

Important Files Changed

Filename	Overview
include/matx/transforms/matmul/matmul_cuda.h	Core change: separates memory-order tracking (orderA/B/C) from cuBLAS operation flags (opA/opB), adds WithMatmulOperand dispatch that handles hermitianT(A) correctly but has a null-pointer bug in the conj(tensor_view) branch due to using an unrun TransposeMatrixOp.
include/matx/operators/hermitian.h	Adds Input() accessor to HermitianTransOp so the inner operand can be extracted without a copy in the matmul fast path.
include/matx/operators/unary_operators.h	Adds Input() accessor to matxUnaryOp to expose the inner operand for type-trait inspection in matmul dispatch.
test/00_transform/MatMul.cu	New hermitian-transpose tests cover hermitianT(A/B), conj(transpose_matrix(A/B)), batched hermitianT, and real-type hermitianT; the conj(bare_tensor) path introduced in WithMatmulOperand has no corresponding test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["matmul_impl(C, A_, B_)"] --> B{"c is col-major?"}
    B -- yes --> C["matmul_impl with transposed args"]
    B -- no --> D["WithMatmulOperand(A_)"]
    D --> E{"can_use_metadata_op?"}
    E -- no --> F["getCublasSupportedTensor(A_) + opA=N"]
    E -- yes --> G{"is_hermitian_trans_op_v?"}
    G -- yes --> H["extract inner tensor\nopA=CUBLAS_OP_C\nzero-copy"]
    G -- no --> I{"is_conj_tensor_view_unary_op_v?"}
    I -- yes --> J["transpose_matrix(input)\nPreRun not called - null ptr\nopA=CUBLAS_OP_C"]
    I -- no --> F
    H --> K["WithMatmulOperand(B_)"]
    F --> K
    J --> K
    K --> L["MatMulCUDAExecPrepared"]
    L --> M["GetGemmParams\norderA/B from strides\nopA/opB from requested"]
    M --> N["cuBLASLt / CUTLASS"]

_{Reviews (2): Last reviewed commit: "fixes for reviews" | Re-trigger Greptile}

simonbyrne · 2026-05-07T21:09:47Z

/build

coveralls · 2026-05-08T02:43:13Z

Coverage is 94.316% — sbyrne/hermitian into main. No base build found for main.

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread include/matx/transforms/matmul/matmul_cuda.h

Comment thread include/matx/transforms/matmul/matmul_cuda.h Outdated

simonbyrne added 2 commits May 7, 2026 21:09

add hermitian transpose support to cuBLAS

2639e2e

fixes for reviews

c283249

simonbyrne force-pushed the sbyrne/hermitian branch from d4adbb6 to c283249 Compare May 7, 2026 21:09

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread include/matx/transforms/matmul/matmul_cuda.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add no-copy hermitian transpose support to matmul#1173

add no-copy hermitian transpose support to matmul#1173
simonbyrne wants to merge 2 commits intomainfrom
sbyrne/hermitian

simonbyrne commented May 7, 2026

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

simonbyrne commented May 7, 2026

Uh oh!

greptile-apps Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

simonbyrne commented May 7, 2026

Uh oh!

Uh oh!

coveralls commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonbyrne commented May 7, 2026

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

simonbyrne commented May 7, 2026

Uh oh!

greptile-apps Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

simonbyrne commented May 7, 2026

Uh oh!

Uh oh!

coveralls commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 7, 2026 •

edited

Loading

coveralls commented May 8, 2026 •

edited

Loading