Skip to content

Support and test mul(C, A, B, alpha, beta) for triangular mats#727

Merged
maleadt merged 9 commits into
mainfrom
ksh/tri
Jun 4, 2026
Merged

Support and test mul(C, A, B, alpha, beta) for triangular mats#727
maleadt merged 9 commits into
mainfrom
ksh/tri

Conversation

@kshyatt
Copy link
Copy Markdown
Member

@kshyatt kshyatt commented Jun 2, 2026

Should hopefully fix #726

@kshyatt kshyatt requested a review from maleadt June 2, 2026 17:29
@kshyatt
Copy link
Copy Markdown
Member Author

kshyatt commented Jun 2, 2026

Ah but needs the TLC for 1.10 and 1.11, a task for tomorrow...

@kshyatt
Copy link
Copy Markdown
Member Author

kshyatt commented Jun 3, 2026

OK, this actually looks ... somewhat ok? Windows fail seems unrelated, CUDA 1.10 is happening elsewhere also and is due to Pkg issues.

maleadt and others added 6 commits June 3, 2026 14:42
The redundant `=true` definitions are dropped (base already short-circuits
the matching direction), and the load-bearing opposite-triangle queries move
from JLArrays to core, keyed on the GPU array parent so every backend gets
them. The matmul kernels now use istriu/istril uniformly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`mul!(C, A, B, α, 0)` with α≠1 reaches the kernel, where `β * C[i,j]`
turned an uninitialized NaN/Inf into NaN. Guard the β term and add a
regression test that initializes C with NaN.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
upperB was never read (B's structure comes from its getindex), and the
1.10 kernel's wA/wB wraps were computed but never passed to the kernel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The istriu/istril overrides were unsound: UpperTriangular(Diagonal(...))
is also lower triangular, so a type-only `istril = false` lies. The
kernels only need the wrapper's structural guarantee, which a plain `isa`
check gives directly and without scalar indexing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The three version-gated tri×tri methods shared a near-identical body and
kernel. Extract _triangular_matmatmul!(C, A, B, α, β) and reduce each to a
thin forwarder (the pre-1.12 ones unpack α/β from the MulAddMul). Net -47
lines; verified on Julia 1.10, 1.11 and 1.12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Existing cases only covered α=1 (scaling is a no-op) or β=0 (the β·C term
drops), so the general α·(A·B) + β·C path was untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maleadt
Copy link
Copy Markdown
Member

maleadt commented Jun 3, 2026

OpenCL.jl crash is an LLVM bug, fixed in llvm/llvm-project#201417, back-ported as part of JuliaPackaging/Yggdrasil#13884,

@maleadt maleadt merged commit 82acaed into main Jun 4, 2026
33 of 35 checks passed
@maleadt maleadt deleted the ksh/tri branch June 4, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mul! on AbstractTriangulars fails if beta is not 0

2 participants