Add CappedHyperSphereNorm variatrion and LM Head norm option by klei22 · Pull Request #803 · ReaLLMASIC/ReaLLM-Forge

klei22 · 2026-05-03T21:02:36Z

This pull request introduces support for applying optional normalization variants to the lm_head (language modeling head) in the model, allowing experimentation with different normalization strategies. It adds configuration, argument parsing, and implementation for several normalization types, including a new CappedHyperSphereNorm. The changes also update the experiment YAML to enable systematic comparison of these variants.

Support for lm_head normalization:

Added new configuration options in GPTConfig for specifying normalization variants and parameters for the lm_head, including type, radius, scale, gain, and radius learning.
Updated argument parsing in train_args.py to accept new lm_head normalization options from the command line. [1] [2]
Modified model initialization and forward pass in model.py to build, apply, and use the specified lm_head normalization in all relevant code paths. [1] [2] [3] [4] [5] [6]

New normalization variant:

Implemented CappedHyperSphereNorm, a normalization layer that projects vectors onto a hypersphere only if they exceed a certain radius, and registered it in the normalization dictionary. [1] [2]
Added cappedhyperspherenorm to the list of valid normalization choices in argument parsing.

Experimentation and configuration:

Added a new YAML experiment config (default_inf_lm_head_norm_comparison.yaml) to systematically compare different lm_head normalization strategies, including the new variant, across multiple head dimensions and other settings.

Copilot

Pull request overview

Adds an optional normalization step for the model’s lm_head weights/logit computation to support experimentation with different normalization strategies, including a new CappedHyperSphereNorm, and provides an exploration YAML to compare variants under default_inf-style settings.

Changes:

Added norm_variant_lm_head and associated radius/scale/gain/radius_learning config + CLI args.
Implemented and registered CappedHyperSphereNorm in the norm variations registry.
Updated model forward paths to route lm_head logits computation through a normalization-aware helper; added an exploration config for systematic comparisons.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
variations/norm_variations.py	Adds `CappedHyperSphereNorm` and registers it in `norm_dictionary`.
train_args.py	Adds CLI args for selecting/configuring `lm_head` norm variants and parameters.
model.py	Builds optional `lm_head_norm` module and applies it during logits computation across forward paths.
gpt_conf.py	Adds `GPTConfig` fields for `lm_head` normalization selection and parameters.
explorations/default_inf_lm_head_norm_comparison.yaml	New experiment config to sweep `lm_head` norm variants and head dims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        if self.config.norm_variant_abs is not None:
            self.transformer['post_abs_norm'] = self.build_norm_from_variant(config, "norm_variant_abs", "norm_abs")
+        if self.config.norm_variant_lm_head is not None:
+            self.transformer['lm_head_norm'] = self.build_norm_from_variant(config, "norm_variant_lm_head", "norm_lm_head")


+    def compute_lm_head_logits(self, x, lm_head_module):
+        weight = self.apply_lm_head_norm(lm_head_module.weight)
+        return F.linear(x, weight, lm_head_module.bias)
+


+    """Project vectors onto a sqrt(n_embd) hypersphere only when outside the radius."""
+
+    def __init__(self, config):
+        super().__init__()
+        self.radius = math.sqrt(config.n_embd)
+
+    def forward(self, x):
+        norms = x.norm(2, dim=-1, keepdim=True)
+        scale = torch.where(norms > self.radius, self.radius / (norms + 1e-8), torch.ones_like(norms))


Fix lm-head norm sweep expansion and remove duplicate branch

3a674a2

klei22 requested review from Copilot and gkielian May 3, 2026 21:02

Copilot started reviewing on behalf of klei22 May 3, 2026 21:03 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CappedHyperSphereNorm variatrion and LM Head norm option#803

Add CappedHyperSphereNorm variatrion and LM Head norm option#803
klei22 wants to merge 1 commit into
ReaLLMASIC:masterfrom
klei22:codex/add-optional-norm-for-lm-head-vectors-fpwgak

klei22 commented May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klei22 commented May 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants