Skip to content

feat: Add Qwen3.5 support to Megatron backend#637

Draft
vivekkalyan wants to merge 3 commits intofeat/merged-mode-megatronfrom
feat/qwen35-megatron
Draft

feat: Add Qwen3.5 support to Megatron backend#637
vivekkalyan wants to merge 3 commits intofeat/merged-mode-megatronfrom
feat/qwen35-megatron

Conversation

@vivekkalyan
Copy link
Copy Markdown
Collaborator

@vivekkalyan vivekkalyan commented Apr 2, 2026

Summary

This adds Qwen3.5 MoE support to the Megatron backend on top of #636.

With this stack, ART can now train Qwen/Qwen3.5-35B-A3B with Megatron while serving inference from the dedicated vLLM process using merged weight updates.

What changed

  • Bump megatron-bridge to a revision that includes upstream Qwen3 support needed for this path
  • Teach the Megatron provider path to accept Qwen3.5 MoE Bridge models and patch the hybrid layer spec so ART's flex attention only applies to the standard attention layers
  • Add Qwen3.5 LoRA coverage for:
    • gated delta net in-projection
    • shared experts
    • the Qwen3.5 attention packing layout while preserving existing Qwen3 behavior
  • Replace the old ART-specific merged-weight math with a Bridge compatibility layer that converts ART adapters into Bridge AdapterWeights for export and merge
  • Add provider, wrapper, and helper coverage for the Qwen3.5 path

Validation

  • Clean-cluster Qwen3.5 Megatron dedicated + merged smoke completed through step @2
  • Clean-cluster 20-step yes/no/maybe run completed successfully for Qwen/Qwen3.5-35B-A3B

@vivekkalyan vivekkalyan force-pushed the feat/qwen35-megatron branch from 9fcc5e9 to 3ad7d06 Compare April 2, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant