Bump transformers from 4.39.3 to 5.8.1 by dependabot[bot] · Pull Request #5 · phalberg/InstructScore_HyTe

dependabot · 2026-05-19T17:47:44Z

Bumps transformers from 4.39.3 to 5.8.1.

Release notes

Patch release v5.8.1

This release is mainly to fix the Deepseek V4 integration!!!

[fix] Add fatal_error to ContinuousBatchingManager so the serving... by @qgallouedec, @remi-or

Fix WeightConverter regex incorrectly matching shared_experts as experts by @silencelamb, @claude

Fix deepseek v4 by @ArthurZucker (#45892)

Deepseek v4 csa mask collapse by @ArthurZucker, @Sawyer117 (#45928)

Release 5.8.0

Release v5.8.0

New Model additions

DeepSeek-V4

DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections for Manifold-Constrained Hyper-Connections (mHC), and bootstraps the first few MoE layers with a static token-id → expert-id hash table. This implementation covers DeepSeek-V4-Flash, DeepSeek-V4-Pro, and their -Base pretrained variants, which share the same architecture but differ in width, depth, expert count and weights.

Links: Documentation | Paper

Add DeepSeek V4 (#45643) by @ArthurZucker in #45643

Gemma 4 Assistant

Gemma 4 Assistant is a small, text-only model that enables speculative decoding for Gemma 4 models using the Multi-Token Prediction (MTP) method and associated candidate generator. The model shares the same Gemma4TextModel backbone as other Gemma 4 models but uses KV sharing throughout the entire model, allowing it to reuse the KV cache populated by the target model and skip the pre-fill phase entirely. This architecture includes cross-attention to make the most of the target model's context, allowing the assistant to accurately predict more drafted tokens per drafting round.

Links: Documentation

First model (#45788) by @SindhuRaghuram97 in #45788

GraniteSpeechPlus

Granite Speech Plus is a variant of Granite Speech that enhances the projector by consuming the concatenation of the encoder's final hidden states with an arbitrary subset of its intermediate hidden states along the feature dimension. It is a multimodal speech-to-text model that can transcribe audio, provide speaker annotation and word level timestamps by responding to text prompts. The model inherits the same architecture components as Granite Speech including the speech encoder, query transformer projector, language model, and optional LoRA adapter.

Links: Documentation

Support for a new Granite-Speech-Plus model (#45695) by @zvik in #45695

Granite4Vision

Granite Vision 4.1 is a vision-language model from IBM Research designed for enterprise-grade document data extraction. It specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction. The model builds on LLaVA-NeXT with architectural innovations including SigLIP2 Vision Encoder, Window Q-Former Projectors, and DeepStack Feature Injection with 8 vision-to-LLM injection points.

Links: Documentation

Add Granite 4.1 Vision (granite4_vision) (#45597) by @artem-spector in #45597

... (truncated)

Commits

cc832f9 up
f966d7b Deepseek v4 csa mask collapse (#45928)
283f3f0 Fix deepseek v4 (#45892)
4e0be9c Fix WeightConverter regex incorrectly matching shared_experts as experts in D...
ec85262 [fix] Add fatal_error to ContinuousBatchingManager so the serving layer c...
049d2bf v5.8.0
2871caf Add Granite 4.1 Vision (granite4_vision) (#45597)
aaec109 fix: correct spelling in continuous_api docstring (#45749)
7050d0e Fix link to modular transformers documentation (#45746)
df2f2b5 Gemma4: fix failed test cases (#45568)
Additional commits viewable in compare view

Bumps [transformers](https://github.com/huggingface/transformers) from 4.39.3 to 5.8.1. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.39.3...v5.8.1) --- updated-dependencies: - dependency-name: transformers dependency-version: 5.8.1 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 19, 2026

phalberg self-assigned this May 19, 2026

dependabot Bot force-pushed the dependabot/pip/transformers-5.8.1 branch from 968754f to c62a85d Compare May 19, 2026 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump transformers from 4.39.3 to 5.8.1#5

Bump transformers from 4.39.3 to 5.8.1#5
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/transformers-5.8.1

dependabot Bot commented on behalf of github May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Patch release v5.8.1

Release 5.8.0

Release v5.8.0

New Model additions

DeepSeek-V4

Gemma 4 Assistant

GraniteSpeechPlus

Granite4Vision

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dependabot Bot commented on behalf of github May 19, 2026 •

edited

Loading