Slim simulator image (multi-stage + runtime-only deps) by YWHyuk · Pull Request #30 · PSAL-POSTECH/LLMServingSimSpec

YWHyuk · 2026-05-18T09:18:52Z

Summary

Cuts the published ghcr.io/psal-postech/llmservingsimspec/sim image from ~3-5 GB to an estimated 300-500 MB by:

Multi-stage Dockerfile — stage 1 (builder) has the C++ toolchain to compile ASTRA-Sim; stage 2 (runtime) carries only the python interpreter and the deps the simulator actually imports. No compilers, no protobuf headers, no .git history.
Runtime-only python deps — serving/ imports only pyyaml / pyinstrument / msgspec / pandas / numpy / rich / protobuf (plus chakra via subprocess). transformers / datasets / scikit-learn / xgboost / matplotlib are workload-generator + bench + power-model-training deps and live in the vLLM image (docker-vllm.sh) only, so they're dropped from the sim image.
.dockerignore — keeps the build context lean: skips profiler/perf/, bench/results/, outputs/, tests/, agent_plan/, docs/, .venv*, host-side CMake outputs, *.o / *.a / *.so, __pycache__, *.pyc. .git is intentionally NOT excluded (the builder stage needs it for git submodule update), but the runtime stage drops every .git subtree during cleanup so it never reaches the final image.

What's not in the slim image

C++ build toolchain (build-essential, cmake, protobuf-compiler, libprotobuf-dev)
Workload generators / bench / power-model-training python deps
The astra-sim CMake build tree (CMakeFiles/, _deps/, CMakeCache.txt, object files, static archives)
.git/ directories from the repo and every submodule
docs/, tests/, agent_plan/, outputs/, profile + bench output dirs

What is in the slim image

ubuntu:24.04 + python3 + pip + ca-certificates
The runtime python deps serving/ imports
The compiled AstraSim_Analytical_Congestion_Unaware binary at its original repo-relative path (so serving/__main__.py's cd astra-sim/ + invocation path keeps working)
The chakra python package, installed via pip install --no-deps from the in-repo source

Test plan

CI builds successfully (the workflow trigger paths include scripts/sim.Dockerfile and .github/workflows/build-sim-image.yml, but not .dockerignore — PR will need to be re-pushed or the trigger paths extended to catch it; manual workflow_dispatch is a fallback)
After merge, docker pull ghcr.io/psal-postech/llmservingsimspec/sim:latest and docker images shows a significantly smaller size
docker run --rm -it ghcr.io/psal-postech/llmservingsimspec/sim:latest \ python -m serving --help works without further install
A representative python -m serving ... --dataset workloads/example_trace.jsonl --output /tmp/x.csv run completes inside the container

Notes

The trigger paths in the existing workflow don't currently include .dockerignore. If we want every dockerignore tweak to rebuild the image, that should be appended to the path filter; for now this PR's changes hit scripts/sim.Dockerfile which already triggers.
If anyone needs the broader install-sim.sh dep set inside an image (workload generation, training the power model), they can keep using the vLLM image (docker-vllm.sh) or run install-sim.sh on bare metal — that script is unchanged.

Generated by Claude Code

Cuts the published ghcr.io image from ~3-5 GB to an estimated 300-500 MB by separating build vs. runtime concerns and dropping deps the simulator doesn't actually use. scripts/sim.Dockerfile — now multi-stage: * Stage 1 (builder, ubuntu:24.04 + build-essential + cmake + protobuf-compiler): inits submodules, compiles ASTRA-Sim's analytical backend, then aggressively prunes CMake build trees, downloaded _deps, object files, and every .git directory before the next stage copies its tree. * Stage 2 (runtime, ubuntu:24.04): only python3 + pip + ca-certs. No compilers, no protobuf-dev headers, no .git. Installs the narrower runtime pip set (pyyaml / pyinstrument / msgspec / pandas / numpy / rich / protobuf) instead of the broader scripts/install-sim.sh set — transformers / datasets / scikit-learn / xgboost / matplotlib are workload-generator + bench + power-model-training deps that the serving/ runtime never imports, so they live in the vLLM image (scripts/docker-vllm.sh) only. .dockerignore (new) keeps the build context lean: * perf/ + bench/results/ + outputs/ — local run artefacts * tests/ + agent_plan/ + docs/ — non-runtime * host-side CMake outputs + *.o / *.a / *.so — would collide with the builder stage's fresh compile * .venv / .venv-cpu — local install-vllm* venvs * __pycache__ / *.pyc — universal .git is intentionally NOT excluded — the builder stage needs it for git submodule update, and the runtime stage drops every .git subtree during cleanup so it never reaches the final image.

YWHyuk merged commit 69a6b58 into main May 18, 2026
1 check passed

YWHyuk mentioned this pull request May 18, 2026

Fix dangling AnalyticalAstra symlink in slim sim image #32

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slim simulator image (multi-stage + runtime-only deps)#30

Slim simulator image (multi-stage + runtime-only deps)#30
YWHyuk merged 1 commit into
mainfrom
claude/slim-sim-image

YWHyuk commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YWHyuk commented May 18, 2026

Summary

What's not in the slim image

What is in the slim image

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants