Add CappedHyperSphereNorm variatrion and LM Head norm option#803
Open
klei22 wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional normalization step for the model’s lm_head weights/logit computation to support experimentation with different normalization strategies, including a new CappedHyperSphereNorm, and provides an exploration YAML to compare variants under default_inf-style settings.
Changes:
- Added
norm_variant_lm_headand associated radius/scale/gain/radius_learning config + CLI args. - Implemented and registered
CappedHyperSphereNormin the norm variations registry. - Updated model forward paths to route
lm_headlogits computation through a normalization-aware helper; added an exploration config for systematic comparisons.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| variations/norm_variations.py | Adds CappedHyperSphereNorm and registers it in norm_dictionary. |
| train_args.py | Adds CLI args for selecting/configuring lm_head norm variants and parameters. |
| model.py | Builds optional lm_head_norm module and applies it during logits computation across forward paths. |
| gpt_conf.py | Adds GPTConfig fields for lm_head normalization selection and parameters. |
| explorations/default_inf_lm_head_norm_comparison.yaml | New experiment config to sweep lm_head norm variants and head dims. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.config.norm_variant_abs is not None: | ||
| self.transformer['post_abs_norm'] = self.build_norm_from_variant(config, "norm_variant_abs", "norm_abs") | ||
| if self.config.norm_variant_lm_head is not None: | ||
| self.transformer['lm_head_norm'] = self.build_norm_from_variant(config, "norm_variant_lm_head", "norm_lm_head") |
Comment on lines
+255
to
+258
| def compute_lm_head_logits(self, x, lm_head_module): | ||
| weight = self.apply_lm_head_norm(lm_head_module.weight) | ||
| return F.linear(x, weight, lm_head_module.bias) | ||
|
|
Comment on lines
+207
to
+215
| """Project vectors onto a sqrt(n_embd) hypersphere only when outside the radius.""" | ||
|
|
||
| def __init__(self, config): | ||
| super().__init__() | ||
| self.radius = math.sqrt(config.n_embd) | ||
|
|
||
| def forward(self, x): | ||
| norms = x.norm(2, dim=-1, keepdim=True) | ||
| scale = torch.where(norms > self.radius, self.radius / (norms + 1e-8), torch.ones_like(norms)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces support for applying optional normalization variants to the
lm_head(language modeling head) in the model, allowing experimentation with different normalization strategies. It adds configuration, argument parsing, and implementation for several normalization types, including a newCappedHyperSphereNorm. The changes also update the experiment YAML to enable systematic comparison of these variants.Support for lm_head normalization:
GPTConfigfor specifying normalization variants and parameters for thelm_head, including type, radius, scale, gain, and radius learning.train_args.pyto accept newlm_headnormalization options from the command line. [1] [2]model.pyto build, apply, and use the specifiedlm_headnormalization in all relevant code paths. [1] [2] [3] [4] [5] [6]New normalization variant:
CappedHyperSphereNorm, a normalization layer that projects vectors onto a hypersphere only if they exceed a certain radius, and registered it in the normalization dictionary. [1] [2]cappedhyperspherenormto the list of valid normalization choices in argument parsing.Experimentation and configuration:
default_inf_lm_head_norm_comparison.yaml) to systematically compare differentlm_headnormalization strategies, including the new variant, across multiple head dimensions and other settings.