feat: Add dedicated lora mode to Megatron backend by vivekkalyan · Pull Request #635 · OpenPipe/ART

vivekkalyan · 2026-03-31T22:29:50Z

Summary

This adds dedicated lora mode to the Megatron backend so inference and training can run on separate GPUs in parallel.

In dedicated mode, ART now keeps a dedicated vLLM server on the inference GPU and updates LoRA adapters in place after training steps. That makes the Megatron flow match the dedicated-serving model we already use elsewhere instead of treating training as a blocking operation.

What this enables

Use Megatron with separate trainer and inference GPUs in dedicated mode
Keep rollout serving on the inference GPU while Megatron trains on the trainer GPU
Advance served model steps by reloading the next adapter into the running vLLM server

Validation

Unit coverage:
- tests/unit/test_megatron_dedicated.py
2-GPU smoke on a fresh H200 machine:
- trainer on GPU 0
- inference on GPU 1
- base model Qwen/Qwen3-30B-A3B-Instruct-2507
- dedicated lora mode completed two real train steps and advanced the served model from @0 to @2

feat: Add dedicated Megatron lora mode

b16a5bc

vivekkalyan mentioned this pull request Apr 1, 2026

feat: Add dedicated merged mode to Megatron backend #636

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add dedicated lora mode to Megatron backend#635

feat: Add dedicated lora mode to Megatron backend#635
vivekkalyan wants to merge 1 commit intomainfrom
feat/dedicated-mode-megatron

vivekkalyan commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivekkalyan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this enables

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vivekkalyan commented Mar 31, 2026 •

edited

Loading