Skip to content

Add multi-arch (x86, arm) container image support#720

Open
maryamtahhan wants to merge 2 commits into
vllm-project:mainfrom
maryamtahhan:feat/arm-container
Open

Add multi-arch (x86, arm) container image support#720
maryamtahhan wants to merge 2 commits into
vllm-project:mainfrom
maryamtahhan:feat/arm-container

Conversation

@maryamtahhan
Copy link
Copy Markdown
Contributor

Add ARM64 container image support

  • Create reusable multi-arch build workflow
  • Build AMD64 with all extras, ARM64 with recommended extras
  • Generate manifest lists for automatic platform selection
  • Update development, nightly, and release workflows
  • Update container maintenance to handle arch-specific tags

Fixes #498

Summary

This PR adds multi-architecture container image support for GuideLLM, enabling the container to run on both AMD64 (x86_64) and ARM64 (aarch64) platforms. This addresses issue #498 where users on ARM systems (e.g., Ampere processors) could not run the published container images.

The implementation creates a reusable workflow that builds architecture-specific images and combines them into manifest lists, allowing users to pull images without specifying architecture tags. The container runtime automatically selects the correct architecture.

Key architectural decision: ARM64 images are built with recommended extras (excludes audio/vision) because PyTorch CPU wheels are not available for ARM64 Linux. AMD64 images continue to be built with all extras for full feature support.

Details

  • New reusable workflow (.github/workflows/build-multiarch-container.yml):

    • Builds both linux/amd64 and linux/arm64 platforms in parallel using matrix strategy
    • Uses QEMU emulation for cross-platform builds on GitHub's x86_64 runners
    • Creates OCI manifest lists that point to both architecture-specific images
    • Supports both workflow_call and workflow_dispatch triggers for flexibility
    • Platform-specific build caching to optimize CI performance
  • Updated workflows to use the reusable workflow:

    • development.yml: Simplified to call reusable workflow with pr-<number> tags
    • nightly.yml: Simplified to call reusable workflow with nightly tag
    • release.yml: Added version extraction job, calls reusable workflow with release version tag
  • Container maintenance (.github/workflows/container-maintenance.yml):

    • Updated cleanup pattern to handle architecture-specific tags (pr-N-amd64, pr-N-arm64)
    • Manifest list handling is transparent (no changes needed for stable/latest tag updates)
  • Build configuration:

    • AMD64: GUIDELLM_BUILD_EXTRAS=all (includes audio, vision, perf, tokenizers)
    • ARM64: GUIDELLM_BUILD_EXTRAS=recommended (perf + tokenizers only)

Test Plan

Local Testing (if you have podman/docker):

# Test AMD64 image
podman pull ghcr.io/vllm-project/guidellm:pr-<NUMBER>-amd64
podman run --rm ghcr.io/vllm-project/guidellm:pr-<NUMBER>-amd64 --version

# Test ARM64 image (via QEMU emulation)
podman pull --platform linux/arm64 ghcr.io/vllm-project/guidellm:pr-<NUMBER>-arm64
podman run --rm --platform linux/arm64 ghcr.io/vllm-project/guidellm:pr-<NUMBER>-arm64 --version

# Test manifest list (automatic platform selection)
podman pull ghcr.io/vllm-project/guidellm:pr-<NUMBER>
podman run --rm ghcr.io/vllm-project/guidellm:pr-<NUMBER> --help

Verify Manifest List:

skopeo inspect --raw docker://ghcr.io/vllm-project/guidellm:pr-<NUMBER> | jq '.manifests[] | {arch: .platform.architecture, os: .platform.os}'

Expected output: Two manifests (amd64 and arm64)

CI Validation:

  • Workflow YAML syntax is valid
  • Pre-commit hooks pass
  • AMD64 build completes successfully in CI
  • ARM64 build completes successfully in CI
  • Manifest list is created and pushed to ghcr.io
  • Both architecture images are accessible via registry

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

maryamtahhan and others added 2 commits May 7, 2026 11:47
- Create reusable multi-arch build workflow
- Build AMD64 with all extras, ARM64 with recommended extras
- Generate manifest lists for automatic platform selection
- Update development, nightly, and release workflows
- Update container maintenance to handle arch-specific tags

Fixes vllm-project#498

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Reusable workflows need explicit permissions passed from the caller.
Added 'packages: write' permission to all three workflow callers.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan changed the title Add ARM64 container image support Add multi-arch (x86, arm) container image support May 7, 2026
@dbutenhof dbutenhof added community contribution An opportunity for contribution from the GuideLLM community already invested in this area. build Issues affecting CI, packaging, container builds labels May 7, 2026
@sjmonson
Copy link
Copy Markdown
Collaborator

sjmonson commented May 8, 2026

If you look at the original issue, we outlined that we do not want to publish official container images for a platform until we can fully support it will all features. I think torchcodec is still the main thing holding back ARM so my preference would be to try and help them get ARM builds up (I think they already do for MacOS its just missing linux-aarch64-cpu).

@maryamtahhan
Copy link
Copy Markdown
Contributor Author

The issue is if we want to benchmark inference on Arm platforms (which we do) We have to use an out of tree build and point folks to that, in our own quay repos. We would rather use an official guidellm build.

Why not support the limited Arm image till the depedency is resolved and document the limitation? There's no clear timeline for the dependency being resolved in the meantime.

Copy link
Copy Markdown
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think it's a positive that it moves the shared logic to one place.

Copy link
Copy Markdown
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think it's a positive that it moves the shared logic to one place.

Agreed. I like the refactor.

The issue I see is whether we want to release/support a "watered down" arm64 container. I do think there's a potential cost in people grabbing it without understanding the limitations, but that may be justified as a (hopefully temporary) compromise.

@jaredoconnell
Copy link
Copy Markdown
Collaborator

Looks good to me. I think it's a positive that it moves the shared logic to one place.

Agreed. I like the refactor.

The issue I see is whether we want to release/support a "watered down" arm64 container. I do think there's a potential cost in people grabbing it without understanding the limitations, but that may be justified as a (hopefully temporary) compromise.

Is there a good place to document this limitation? Having a container that supports all recommended dependencies isn't too bad.

@sjmonson
Copy link
Copy Markdown
Collaborator

Some update on this, it looks like the only blocker on the torchcodec side is CI to run aarch64 builds which is being addressed here: meta-pytorch/torchcodec#1372

@jaredoconnell
Copy link
Copy Markdown
Collaborator

I don't think this has to block on that. We can change it after we verify compatibility.

@sjmonson
Copy link
Copy Markdown
Collaborator

I don't think this has to block on that. We can change it after we verify compatibility.

My point is that this has been addressed (that PR merged) so I assume we can now build aarch64 with the all extras.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Issues affecting CI, packaging, container builds community contribution An opportunity for contribution from the GuideLLM community already invested in this area.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'guidellm:latest' not available for image platform (linux/arm64)

4 participants