Slim simulator image (multi-stage + runtime-only deps)#30
Merged
Conversation
Cuts the published ghcr.io image from ~3-5 GB to an estimated 300-500 MB by separating build vs. runtime concerns and dropping deps the simulator doesn't actually use. scripts/sim.Dockerfile — now multi-stage: * Stage 1 (builder, ubuntu:24.04 + build-essential + cmake + protobuf-compiler): inits submodules, compiles ASTRA-Sim's analytical backend, then aggressively prunes CMake build trees, downloaded _deps, object files, and every .git directory before the next stage copies its tree. * Stage 2 (runtime, ubuntu:24.04): only python3 + pip + ca-certs. No compilers, no protobuf-dev headers, no .git. Installs the narrower runtime pip set (pyyaml / pyinstrument / msgspec / pandas / numpy / rich / protobuf) instead of the broader scripts/install-sim.sh set — transformers / datasets / scikit-learn / xgboost / matplotlib are workload-generator + bench + power-model-training deps that the serving/ runtime never imports, so they live in the vLLM image (scripts/docker-vllm.sh) only. .dockerignore (new) keeps the build context lean: * perf/ + bench/results/ + outputs/ — local run artefacts * tests/ + agent_plan/ + docs/ — non-runtime * host-side CMake outputs + *.o / *.a / *.so — would collide with the builder stage's fresh compile * .venv / .venv-cpu — local install-vllm* venvs * __pycache__ / *.pyc — universal .git is intentionally NOT excluded — the builder stage needs it for git submodule update, and the runtime stage drops every .git subtree during cleanup so it never reaches the final image.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cuts the published
ghcr.io/psal-postech/llmservingsimspec/simimage from ~3-5 GB to an estimated 300-500 MB by:.githistory.serving/imports onlypyyaml / pyinstrument / msgspec / pandas / numpy / rich / protobuf(plus chakra via subprocess).transformers / datasets / scikit-learn / xgboost / matplotlibare workload-generator + bench + power-model-training deps and live in the vLLM image (docker-vllm.sh) only, so they're dropped from the sim image..dockerignore— keeps the build context lean: skipsprofiler/perf/,bench/results/,outputs/,tests/,agent_plan/,docs/,.venv*, host-side CMake outputs,*.o / *.a / *.so,__pycache__,*.pyc..gitis intentionally NOT excluded (the builder stage needs it forgit submodule update), but the runtime stage drops every.gitsubtree during cleanup so it never reaches the final image.What's not in the slim image
build-essential,cmake,protobuf-compiler,libprotobuf-dev)CMakeFiles/,_deps/,CMakeCache.txt, object files, static archives).git/directories from the repo and every submoduledocs/,tests/,agent_plan/,outputs/, profile + bench output dirsWhat is in the slim image
ubuntu:24.04+python3+pip+ca-certificatesserving/importsAstraSim_Analytical_Congestion_Unawarebinary at its original repo-relative path (soserving/__main__.py'scd astra-sim/+ invocation path keeps working)pip install --no-depsfrom the in-repo sourceTest plan
scripts/sim.Dockerfileand.github/workflows/build-sim-image.yml, but not.dockerignore— PR will need to be re-pushed or the trigger paths extended to catch it; manualworkflow_dispatchis a fallback)docker pull ghcr.io/psal-postech/llmservingsimspec/sim:latestanddocker imagesshows a significantly smaller sizedocker run --rm -it ghcr.io/psal-postech/llmservingsimspec/sim:latest \ python -m serving --helpworks without further installpython -m serving ... --dataset workloads/example_trace.jsonl --output /tmp/x.csvrun completes inside the containerNotes
.dockerignore. If we want every dockerignore tweak to rebuild the image, that should be appended to the path filter; for now this PR's changes hitscripts/sim.Dockerfilewhich already triggers.docker-vllm.sh) or runinstall-sim.shon bare metal — that script is unchanged.Generated by Claude Code