Skip to content

ci: Update transformers to latest version 5.8.1#2223

Open
svcnvidia-nemo-ci wants to merge 6 commits into
mainfrom
transformers_bump_5.8.1
Open

ci: Update transformers to latest version 5.8.1#2223
svcnvidia-nemo-ci wants to merge 6 commits into
mainfrom
transformers_bump_5.8.1

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

beep boop 🤖: Updating transformers to latest version on pypi

Signed-off-by: nemo-ci Bot <nemo-ci-bot@nvidia.com>
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team as a code owner May 13, 2026 08:47
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented May 15, 2026

/ok to test 75f3ed7

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
@thomasdhc
Copy link
Copy Markdown
Contributor

/ok to test d1afd77

athitten added 3 commits May 19, 2026 18:52
…_completion

transformers 5.5.0 still honored `padding_side: "right"` baked into the
tokenizer's saved tokenizer_config.json, but 5.8.1 ignores that field and
uses the LlamaTokenizer class default ("left"). The label-masking and
attention-mask logic in _package_tokenized_example assumes right padding,
so test_format_prompt_completion_options[128-max_length-True] fails on
5.8.1 (labels[-1] is the answer/EOS token instead of -100).

Temporarily set tokenizer.padding_side = "right" around the tokenize call
and restore it afterward, so the function is robust regardless of the
tokenizer's saved or class-default padding_side.

Signed-off-by: Abhishree <abhishreetm@gmail.com>
… v5 .float() bug

transformers 5.8.1 `PreTrainedModel._init_weights` for `nn.Linear` does
`init.normal_(module.weight.float(), ...)`. When `module.weight` is
bfloat16 (cross-encoder loads with torch_dtype=bfloat16), `.float()`
returns a *new* fp32 tensor and the init writes into that copy, leaving
the original weight as whatever uninitialized garbage `to_empty()` left
behind during `_move_missing_keys_from_meta_to_device`.

For LlamaBidirectionalForSequenceClassification, `score.weight` is the
only key missing from a backbone-only checkpoint, so it ended up with
garbage values (max abs ~1e13). Param L2 norm logged as NaN and the
cross-encoder forward produced NaN loss from step 0, failing L2_Retrieval.

Override `_init_weights` to init the score head directly in-place using
the original tensor (no `.float()`), then defer to super() for the
backbone modules which init correctly on fp32 anyway.

Signed-off-by: Abhishree <abhishreetm@gmail.com>
…VisionModel

transformers 5.8 flattened SiglipVisionModel: in v5.5 it wrapped a
SiglipVisionTransformer as `self.vision_model`, so a Gemma3 vision_tower's
checkpoint and in-memory FQNs were `vision_tower.vision_model.embeddings.*`.
In 5.8 SiglipVisionModel exposes `embeddings/encoder/post_layernorm`
directly, so in-memory FQNs are now `model.vision_tower.embeddings.*` while
HF gemma3 checkpoints saved before this flip still ship the legacy
`vision_tower.vision_model.*` keys.

The Gemma3 VLM PEFT roundtrip failure (`test_hf_peft_checkpoint` reporting
`src_norm=0.0000` on every vision_tower param and NaN restored loss) was
because the v5.5-era key mapping in `_VLM_KEY_MAPPINGS["gemma3"]` only
renamed `vision_tower.` -> `model.vision_tower.`, producing
`model.vision_tower.vision_model.*` which no longer matches the in-memory
model, leaving vision_tower uninitialized (zeros from `to_empty()` -> NaN
once forward-pass arithmetic touches them).

Add a more specific renaming rule that strips the legacy `vision_model.`
segment for gemma3 (`vision_tower.vision_model.X` -> `model.vision_tower.X`)
ordered before the generic `vision_tower.` rule so it wins under
`_get_key_renaming_mapping`'s first-match semantics.

Update the three Gemma3 VLM checkpoint roundtrip tests
(test_dcp_vlm.py, test_hf_sharded_vlm.py, test_hf_consolidated_vlm.py) to
expect the new flat keys for the same tensors.

Signed-off-by: Abhishree <abhishreetm@gmail.com>
@athitten
Copy link
Copy Markdown
Contributor

/ok to test bcf56e4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants