ci: Update transformers to latest version 5.8.1 by svcnvidia-nemo-ci · Pull Request #2223 · NVIDIA-NeMo/Automodel

svcnvidia-nemo-ci · 2026-05-13T08:47:30Z

beep boop 🤖: Updating transformers to latest version on pypi

Signed-off-by: nemo-ci Bot <nemo-ci-bot@nvidia.com>

copy-pr-bot · 2026-05-13T08:47:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-05-15T16:57:45Z

/ok to test 75f3ed7

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>

thomasdhc · 2026-05-19T18:03:11Z

/ok to test d1afd77

…_completion transformers 5.5.0 still honored `padding_side: "right"` baked into the tokenizer's saved tokenizer_config.json, but 5.8.1 ignores that field and uses the LlamaTokenizer class default ("left"). The label-masking and attention-mask logic in _package_tokenized_example assumes right padding, so test_format_prompt_completion_options[128-max_length-True] fails on 5.8.1 (labels[-1] is the answer/EOS token instead of -100). Temporarily set tokenizer.padding_side = "right" around the tokenize call and restore it afterward, so the function is robust regardless of the tokenizer's saved or class-default padding_side. Signed-off-by: Abhishree <abhishreetm@gmail.com>

… v5 .float() bug transformers 5.8.1 `PreTrainedModel._init_weights` for `nn.Linear` does `init.normal_(module.weight.float(), ...)`. When `module.weight` is bfloat16 (cross-encoder loads with torch_dtype=bfloat16), `.float()` returns a *new* fp32 tensor and the init writes into that copy, leaving the original weight as whatever uninitialized garbage `to_empty()` left behind during `_move_missing_keys_from_meta_to_device`. For LlamaBidirectionalForSequenceClassification, `score.weight` is the only key missing from a backbone-only checkpoint, so it ended up with garbage values (max abs ~1e13). Param L2 norm logged as NaN and the cross-encoder forward produced NaN loss from step 0, failing L2_Retrieval. Override `_init_weights` to init the score head directly in-place using the original tensor (no `.float()`), then defer to super() for the backbone modules which init correctly on fp32 anyway. Signed-off-by: Abhishree <abhishreetm@gmail.com>

…VisionModel transformers 5.8 flattened SiglipVisionModel: in v5.5 it wrapped a SiglipVisionTransformer as `self.vision_model`, so a Gemma3 vision_tower's checkpoint and in-memory FQNs were `vision_tower.vision_model.embeddings.*`. In 5.8 SiglipVisionModel exposes `embeddings/encoder/post_layernorm` directly, so in-memory FQNs are now `model.vision_tower.embeddings.*` while HF gemma3 checkpoints saved before this flip still ship the legacy `vision_tower.vision_model.*` keys. The Gemma3 VLM PEFT roundtrip failure (`test_hf_peft_checkpoint` reporting `src_norm=0.0000` on every vision_tower param and NaN restored loss) was because the v5.5-era key mapping in `_VLM_KEY_MAPPINGS["gemma3"]` only renamed `vision_tower.` -> `model.vision_tower.`, producing `model.vision_tower.vision_model.*` which no longer matches the in-memory model, leaving vision_tower uninitialized (zeros from `to_empty()` -> NaN once forward-pass arithmetic touches them). Add a more specific renaming rule that strips the legacy `vision_model.` segment for gemma3 (`vision_tower.vision_model.X` -> `model.vision_tower.X`) ordered before the generic `vision_tower.` rule so it wins under `_get_key_renaming_mapping`'s first-match semantics. Update the three Gemma3 VLM checkpoint roundtrip tests (test_dcp_vlm.py, test_hf_sharded_vlm.py, test_hf_consolidated_vlm.py) to expect the new flat keys for the same tensors. Signed-off-by: Abhishree <abhishreetm@gmail.com>

athitten · 2026-05-20T17:41:04Z

/ok to test bcf56e4

Update to latest transformers version 5.8.1

cef47a7

Signed-off-by: nemo-ci Bot <nemo-ci-bot@nvidia.com>

svcnvidia-nemo-ci requested a review from a team as a code owner May 13, 2026 08:47

Merge branch 'main' into transformers_bump_5.8.1

75f3ed7

copy-pr-bot Bot temporarily deployed to test May 15, 2026 16:58 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 15, 2026 16:58 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 16:58 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 17:00 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 17:06 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 15, 2026 17:13 Inactive

fix: revert changes no required

d1afd77

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 18:05 Inactive

copy-pr-bot Bot temporarily deployed to test May 19, 2026 18:05 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 18:06 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 18:08 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 18:22 Inactive

athitten added 3 commits May 19, 2026 18:52

athitten requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, athitten, hemildesai, pthombre and zyzhou5 as code owners May 20, 2026 17:40

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 17:41 Inactive

copy-pr-bot Bot temporarily deployed to test May 20, 2026 17:41 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:41 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:43 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 17:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:48 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Update transformers to latest version 5.8.1#2223

ci: Update transformers to latest version 5.8.1#2223
svcnvidia-nemo-ci wants to merge 6 commits into
mainfrom
transformers_bump_5.8.1

svcnvidia-nemo-ci commented May 13, 2026

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

akoumpa commented May 15, 2026

Uh oh!

thomasdhc commented May 19, 2026

Uh oh!

athitten commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

svcnvidia-nemo-ci commented May 13, 2026

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

akoumpa commented May 15, 2026

Uh oh!

thomasdhc commented May 19, 2026

Uh oh!

athitten commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants