Add output_shapes for AddMM by pHequals7 · Pull Request #3262 · ml-explore/mlx

pHequals7 · 2026-03-16T04:49:51Z

Summary

Adds output_shapes() to AddMM, enabling compile(shapeless=True) for models with biased Linear layers.

Change

AddMM::output_shapes returns inputs[0].shape() (the C matrix shape), which is already validated to match the output shape at construction in ops.cpp.

Context

Most transformer models use biased Linear layers, which dispatch through AddMM. Without this, compile(shapeless=True) throws "primitive does not have shape inference implemented". This follows the same pattern as #2601 (Convolution::output_shapes) and #1727 (shapeless SliceUpdate + Broadcast).

Discovered while porting mlx-whisper to Swift — whisper-small has 145 biased Linear layers that all fail in shapeless compile without this.

Files Changed

File	Lines	Change
`mlx/primitives.h`	+1	Declaration
`mlx/primitives.cpp`	+6	Implementation

Previous scope

Originally included Slice and CustomKernel output_shapes — dropped per review feedback. CustomKernel shape inference via a callback function could be a separate discussion/PR.

Enable `compile(shapeless: true)` for models that use: 1. **AddMM** (biased Linear layers): Most transformer models use Linear with bias=true, which dispatches AddMM. Without output_shapes, compile(shapeless:true) fails. The fix matches Matmul::output_shapes. 2. **Slice** (array subscripting): Any compiled function that slices arrays (e.g., `array[0..<N]`) needs Slice::output_shapes. The implementation re-normalizes slice bounds against runtime input shape. Limited to constant-dimension slices; variable-dimension slices should use take()/DynamicSlice. 3. **CustomKernel** (metalKernel API): Custom Metal kernels created via the metalKernel() API can now work inside compile(shapeless:true). Output shapes are stored at construction time and returned during compile-time shape inference. A -1 sentinel in output shapes triggers dynamic computation from input sizes (total_input_size / num_outputs), enabling kernels with variable output sizes (e.g., KV cache append). Discovered while porting mlx-whisper to Swift using mlx-swift. All three primitives are essential for compiled inference with custom fused kernels.

jagrit06 · 2026-03-17T17:10:57Z

mlx/primitives.cpp

+std::vector<Shape> AddMM::output_shapes(const std::vector<array>& inputs) {
+  // out = alpha * (A @ B) + beta * C
+  // Output shape matches C (inputs[0]), with last dim from B (inputs[2])
+  auto out_shape = inputs[0].shape();
+  out_shape.back() = inputs[2].shape(-1);
+  return {std::move(out_shape)};
+}
+


Why wouldn't the out shape just be the c shape ?

From ops.cpp

if (c.shape() != out_shape) { throw std::invalid_argument( "[addmm] input c must broadcast to the output shape"); } auto out = array( std::move(out_shape), out_type, std::make_shared<AddMM>(to_stream(s), alpha, beta), {a, b, c});

Good catch — you're right, since c.shape() is already validated against out_shape at construction, we can just return it directly. Simplified in 95382f9.

C (inputs[0]) is already validated to match the output shape at construction in ops.cpp, so we can return its shape directly instead of recalculating from B's last dimension. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

angeloskath · 2026-03-19T06:18:05Z

mlx/backend/metal/custom_kernel.cpp

+          if (resolved_shapes[i][j] == -1) resolved_shapes[i][j] = per_out;
+        }
+      }
+    }


I think this is not doing what you were thinking it was doing.

resolved_shapes.size() is the number of outputs. total is the sum of number of elements of all inputs. What is per_out supposed to represent?

Say for instance I write a custom kernel that adds two arrays. per_out would be 2 times the input size 🤷‍♂️

The only way to do this properly is to pass a function that computes the output shapes from the input shapes. If this function is passed then shapeless compilation of the custom kernel will be automatically enabled otherwise not.

yeah you're right, the -1 sentinel was a hack that only worked for my specific kv cache concat case. a shape inference function passed to metalKernel() makes way more sense as a general api.

stripped this pr down to just AddMM which is the straightforward one. happy to open a separate issue for the CustomKernel shape inference function if that's useful — or leave it for someone with better context on the compile internals.

angeloskath · 2026-03-19T06:23:30Z

mlx/primitives.cpp

+  // Works for constant-dimension slices; variable-dimension slices
+  // should use take()/DynamicSlice instead.


Suggested change

// Works for constant-dimension slices; variable-dimension slices

// should use take()/DynamicSlice instead.

done, dropped it. makes sense that constant slices don't need this.

Drop Slice and CustomKernel changes per review feedback: - Slice::output_shapes unnecessary for constant-dimension slices - CustomKernel -1 sentinel is not a general solution; proper approach is a shape inference function (separate discussion) Keeping only AddMM::output_shapes which is straightforward — C's shape is already validated to match the output at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jagrit06 reviewed Mar 17, 2026

View reviewed changes

angeloskath reviewed Mar 19, 2026

View reviewed changes

pHequals7 changed the title ~~Add output_shapes for AddMM, Slice, and CustomKernel~~ Add output_shapes for AddMM Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add output_shapes for AddMM#3262

Add output_shapes for AddMM#3262
pHequals7 wants to merge 3 commits intoml-explore:mainfrom
pHequals7:fix/addmm-slice-customkernel-output-shapes

pHequals7 commented Mar 16, 2026 •

edited

Loading

Uh oh!

jagrit06 Mar 17, 2026

Uh oh!

pHequals7 Mar 17, 2026

Uh oh!

angeloskath Mar 19, 2026

Uh oh!

pHequals7 Mar 19, 2026

Uh oh!

angeloskath Mar 19, 2026

Uh oh!

pHequals7 Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Works for constant-dimension slices; variable-dimension slices
		// should use take()/DynamicSlice instead.

Conversation

pHequals7 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change

Context

Files Changed

Previous scope

Uh oh!

jagrit06 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

pHequals7 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

angeloskath Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

pHequals7 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

angeloskath Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

pHequals7 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pHequals7 commented Mar 16, 2026 •

edited

Loading