Skip to content

feat: Add dedicated lora mode to Megatron backend#635

Draft
vivekkalyan wants to merge 1 commit intomainfrom
feat/dedicated-mode-megatron
Draft

feat: Add dedicated lora mode to Megatron backend#635
vivekkalyan wants to merge 1 commit intomainfrom
feat/dedicated-mode-megatron

Conversation

@vivekkalyan
Copy link
Copy Markdown
Collaborator

@vivekkalyan vivekkalyan commented Mar 31, 2026

Summary

This adds dedicated lora mode to the Megatron backend so inference and training can run on separate GPUs in parallel.

In dedicated mode, ART now keeps a dedicated vLLM server on the inference GPU and updates LoRA adapters in place after training steps. That makes the Megatron flow match the dedicated-serving model we already use elsewhere instead of treating training as a blocking operation.

What this enables

  • Use Megatron with separate trainer and inference GPUs in dedicated mode
  • Keep rollout serving on the inference GPU while Megatron trains on the trainer GPU
  • Advance served model steps by reloading the next adapter into the running vLLM server

Validation

  • Unit coverage:
    • tests/unit/test_megatron_dedicated.py
  • 2-GPU smoke on a fresh H200 machine:
    • trainer on GPU 0
    • inference on GPU 1
    • base model Qwen/Qwen3-30B-A3B-Instruct-2507
    • dedicated lora mode completed two real train steps and advanced the served model from @0 to @2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant