Fix dimensions of 97_ScaledDotProductAttention by arielge · Pull Request #149 · ScalingIntelligence/KernelBench

arielge · 2026-04-06T12:03:54Z

In the context of torch.nn.functional.scaled_dot_product_attention, the input tensors have shape (batch_size, num_heads, sequence_length, head_dim), where head_dim is the dimension per attention head.

The total model embedding dimension equals num_heads × head_dim, so for a model with embedding dimension of 1024 and 32 heads, the appropriate dimension would be 32.

Fix dimensions of 97_ScaledDotProductAttention

3ded2bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dimensions of 97_ScaledDotProductAttention#149

Fix dimensions of 97_ScaledDotProductAttention#149
arielge wants to merge 1 commit intoScalingIntelligence:mainfrom
arielge:main

arielge commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arielge commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant