ci: Update transformers to latest version 5.8.1#2223
Open
svcnvidia-nemo-ci wants to merge 6 commits into
Open
Conversation
Signed-off-by: nemo-ci Bot <nemo-ci-bot@nvidia.com>
Contributor
|
/ok to test 75f3ed7 |
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Contributor
|
/ok to test d1afd77 |
…_completion
transformers 5.5.0 still honored `padding_side: "right"` baked into the
tokenizer's saved tokenizer_config.json, but 5.8.1 ignores that field and
uses the LlamaTokenizer class default ("left"). The label-masking and
attention-mask logic in _package_tokenized_example assumes right padding,
so test_format_prompt_completion_options[128-max_length-True] fails on
5.8.1 (labels[-1] is the answer/EOS token instead of -100).
Temporarily set tokenizer.padding_side = "right" around the tokenize call
and restore it afterward, so the function is robust regardless of the
tokenizer's saved or class-default padding_side.
Signed-off-by: Abhishree <abhishreetm@gmail.com>
… v5 .float() bug transformers 5.8.1 `PreTrainedModel._init_weights` for `nn.Linear` does `init.normal_(module.weight.float(), ...)`. When `module.weight` is bfloat16 (cross-encoder loads with torch_dtype=bfloat16), `.float()` returns a *new* fp32 tensor and the init writes into that copy, leaving the original weight as whatever uninitialized garbage `to_empty()` left behind during `_move_missing_keys_from_meta_to_device`. For LlamaBidirectionalForSequenceClassification, `score.weight` is the only key missing from a backbone-only checkpoint, so it ended up with garbage values (max abs ~1e13). Param L2 norm logged as NaN and the cross-encoder forward produced NaN loss from step 0, failing L2_Retrieval. Override `_init_weights` to init the score head directly in-place using the original tensor (no `.float()`), then defer to super() for the backbone modules which init correctly on fp32 anyway. Signed-off-by: Abhishree <abhishreetm@gmail.com>
…VisionModel transformers 5.8 flattened SiglipVisionModel: in v5.5 it wrapped a SiglipVisionTransformer as `self.vision_model`, so a Gemma3 vision_tower's checkpoint and in-memory FQNs were `vision_tower.vision_model.embeddings.*`. In 5.8 SiglipVisionModel exposes `embeddings/encoder/post_layernorm` directly, so in-memory FQNs are now `model.vision_tower.embeddings.*` while HF gemma3 checkpoints saved before this flip still ship the legacy `vision_tower.vision_model.*` keys. The Gemma3 VLM PEFT roundtrip failure (`test_hf_peft_checkpoint` reporting `src_norm=0.0000` on every vision_tower param and NaN restored loss) was because the v5.5-era key mapping in `_VLM_KEY_MAPPINGS["gemma3"]` only renamed `vision_tower.` -> `model.vision_tower.`, producing `model.vision_tower.vision_model.*` which no longer matches the in-memory model, leaving vision_tower uninitialized (zeros from `to_empty()` -> NaN once forward-pass arithmetic touches them). Add a more specific renaming rule that strips the legacy `vision_model.` segment for gemma3 (`vision_tower.vision_model.X` -> `model.vision_tower.X`) ordered before the generic `vision_tower.` rule so it wins under `_get_key_renaming_mapping`'s first-match semantics. Update the three Gemma3 VLM checkpoint roundtrip tests (test_dcp_vlm.py, test_hf_sharded_vlm.py, test_hf_consolidated_vlm.py) to expect the new flat keys for the same tensors. Signed-off-by: Abhishree <abhishreetm@gmail.com>
Contributor
|
/ok to test bcf56e4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
beep boop 🤖: Updating transformers to latest version on pypi