Skip to content

Fix dimensions of 97_ScaledDotProductAttention#149

Open
arielge wants to merge 1 commit intoScalingIntelligence:mainfrom
arielge:main
Open

Fix dimensions of 97_ScaledDotProductAttention#149
arielge wants to merge 1 commit intoScalingIntelligence:mainfrom
arielge:main

Conversation

@arielge
Copy link
Copy Markdown

@arielge arielge commented Apr 6, 2026

In the context of torch.nn.functional.scaled_dot_product_attention, the input tensors have shape (batch_size, num_heads, sequence_length, head_dim), where head_dim is the dimension per attention head.

The total model embedding dimension equals num_heads × head_dim, so for a model with embedding dimension of 1024 and 32 heads, the appropriate dimension would be 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant