feat: Add dedicated merged mode to Megatron backend by vivekkalyan · Pull Request #636 · OpenPipe/ART

vivekkalyan · 2026-04-01T00:03:36Z

Summary

This adds dedicated merged mode to the Megatron backend.

In dedicated merged mode, ART now keeps vLLM running on the inference GPU, trains Megatron on the trainer GPU, and updates inference weights in place through vLLM's native weight transfer APIs. This allows the training of models which do not have LoRA support on vLLM as well as enabling faster inference when used in LocalBackend

What this enables

Use Megatron with dedicated trainer and inference GPUs in rollout_weights_mode="merged"
Keep inference and training decoupled while only pausing generation during the actual merged weight swap
Advance the served model alias step by step without restarting the dedicated vLLM server

Implementation

Add a Megatron service-to-trainer job protocol for:
- initial merged sync
- LoRA training jobs
- merged training jobs
Start dedicated vLLM in merged mode with native weight transfer enabled
Initialize NCCL weight transfer between the Megatron trainer and vLLM
Convert live Megatron weights through Megatron Bridge into HF/vLLM checkpoint names, merge ART LoRA deltas into those tensors, and send them directly to vLLM
Update the served model name after each successful merged sync
Reuse a shared TCP port helper instead of depending on a backend-specific implementation

Validation

Unit coverage:
- tests/unit/test_megatron_dedicated.py
Fresh-cluster 2-GPU smoke:
- trainer on GPU 0
- inference on GPU 1
- base model Qwen/Qwen3-30B-A3B-Instruct-2507
- dedicated merged mode completed two real train steps and advanced the served model from @0 to @2

Notes

This PR is stacked on feat: Add dedicated lora mode to Megatron backend #635

vivekkalyan added 2 commits March 31, 2026 16:58

refactor: move TCP port helper to shared utils

a02920e

feat: add dedicated merged mode to Megatron backend

dd3c18d

vivekkalyan mentioned this pull request Apr 2, 2026

feat: Add Qwen3.5 support to Megatron backend #637

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add dedicated merged mode to Megatron backend#636

feat: Add dedicated merged mode to Megatron backend#636
vivekkalyan wants to merge 2 commits intofeat/dedicated-mode-megatronfrom
feat/merged-mode-megatron

vivekkalyan commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivekkalyan commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this enables

Implementation

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vivekkalyan commented Apr 1, 2026 •

edited

Loading