Skip to content

Add optional norm for lm head vectors#808

Open
klei22 wants to merge 4 commits into
ReaLLMASIC:masterfrom
klei22:add-optional-norm-for-lm-head-vectors
Open

Add optional norm for lm head vectors#808
klei22 wants to merge 4 commits into
ReaLLMASIC:masterfrom
klei22:add-optional-norm-for-lm-head-vectors

Conversation

@klei22
Copy link
Copy Markdown
Collaborator

@klei22 klei22 commented May 6, 2026

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional normalization step for LM head vectors (applied to lm_head.weight before computing logits) to enable experimentation with logit-space stability and geometry, plus introduces a capped hypersphere normalization variant.

Changes:

  • Add CappedHyperSphereNorm and register it in the normalization variant dictionary.
  • Introduce CLI/config plumbing for norm_variant_lm_head and associated hypersphere-style parameters (norm_lm_head_*), and route LM-head logits through a helper that can normalize the head weight.
  • Add an exploration YAML to compare LM-head norm variants (none / hypersphere / RMS / capped hypersphere).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
variations/norm_variations.py Adds CappedHyperSphereNorm and exposes it via norm_dictionary.
train_args.py Adds CLI options for LM-head norm selection/params and includes the new norm variant in choices; also changes default --device.
model.py Adds LM-head weight normalization path and uses it when computing logits.
gpt_conf.py Extends GPTConfig with LM-head norm configuration fields.
explorations/default_inf_lm_head_norm_comparison.yaml Provides an experiment matrix to compare LM-head norm variants under default_inf-like settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread model.py
if self.config.norm_variant_abs is not None:
self.transformer['post_abs_norm'] = self.build_norm_from_variant(config, "norm_variant_abs", "norm_abs")
if self.config.norm_variant_lm_head is not None:
self.transformer['lm_head_norm'] = self.build_norm_from_variant(config, "norm_variant_lm_head", "norm_lm_head")
Comment thread model.py
Comment on lines +253 to +254
return self.transformer.lm_head_norm(lm_head_weight)

Comment thread train_args.py

# System args
training_group.add_argument('--device', default='cuda', type=str)
training_group.add_argument('--device', default='cuda:0', type=str)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants