Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Keeps the Docker build context small for scripts/sim.Dockerfile.
# .git is intentionally NOT excluded: scripts/install-sim.sh calls
# `git submodule update` in the builder stage; the runtime stage drops
# all .git/ subtrees so they never reach the final image.

# Caches / generated artefacts
**/__pycache__
**/*.pyc
**/*.pyo
.pytest_cache
.ruff_cache
.mypy_cache

# Local venvs (install-vllm.sh / install-vllm-cpu.sh)
.venv
.venv-cpu

# Module outputs that can be hundreds of MB
profiler/perf
bench/results
outputs

# Non-runtime artefacts: tests, planning docs, the docs site. The
# image is meant for running `python -m serving`, not for hacking
# inside; iteration is expected via the dev container (docker-sim.sh)
# with the repo bind-mounted.
tests
agent_plan
docs

# Host-side build artefacts. The builder stage compiles ASTRA-Sim fresh
# in /build inside the container, so any host-side outputs would only
# bloat the build context (and astra-sim/build/ contains the build.sh
# entry point, so the exclusion has to be scoped to the CMake output
# subtree only — NOT all of astra-sim/build/).
astra-sim/build/astra_analytical/build/
**/CMakeCache.txt
**/CMakeFiles/
**/cmake_install.cmake
**/Makefile.cmake
**/*.o
**/*.a
**/*.so
**/*.dylib

# Python packaging artefacts (root-level only — pip install -e for a
# subdirectory submodule creates its own egg-info we want to keep).
/build/
/dist/
*.egg-info

# Notebook checkpoints
**/.ipynb_checkpoints

# Docs site (Docusaurus build deps + node_modules pull a lot)
docs/node_modules
docs/build
docs/.docusaurus

# Editor / OS junk
.DS_Store
.idea
.vscode
*.swp

# CI metadata that's irrelevant inside the image
.github
2 changes: 1 addition & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ live with their module — only setup and build helpers are here.
| `install-vllm-cpu.sh` | Bare-metal vLLM **CPU** install. Builds vLLM 0.19.0 from source with `VLLM_TARGET_DEVICE=cpu` into `.venv-cpu` (separate from the GPU venv). Works on x86_64 and aarch64 (NVIDIA Grace). Used by `profiler/profile-cpu.sh`. |
| `install-sim.sh` | Bare-metal simulator install for fresh Docker containers / minimal Linux hosts (root or non-root). Installs apt build deps, inits submodules, pip-installs sim Python deps, builds ASTRA-Sim's analytical backend, and installs the Chakra converter (`--no-deps`). Skips vLLM — profiler/bench need `install-vllm.sh` and a GPU. Idempotent. |
| `compile.sh` | Build ASTRA-Sim's analytical backend and install the Chakra trace converter. |
| `sim.Dockerfile` | Ubuntu 24.04 base + `install-sim.sh`. Built and published to GHCR by `.github/workflows/build-sim-image.yml` on every push to `main` touching the install / build paths. Pull with `docker pull ghcr.io/psal-postech/llmservingsimspec/sim:latest`. |
| `sim.Dockerfile` | Multi-stage Ubuntu 24.04 image. Stage 1 compiles ASTRA-Sim with the C++ toolchain; stage 2 carries only the simulator runtime — python + the deps `serving/` actually imports + the compiled binary + chakra. No compilers, no `.git`, no `transformers / datasets / xgboost / matplotlib / scikit-learn` (those live in the vLLM image, `docker-vllm.sh`). `.dockerignore` keeps the build context lean (no `perf/`, `results/`, venv, CMake outputs). Built and published to GHCR by `.github/workflows/build-sim-image.yml` on every push to `main` touching the install / build paths. Pull with `docker pull ghcr.io/psal-postech/llmservingsimspec/sim:latest`. |

## Typical first-time setup

Expand Down
92 changes: 67 additions & 25 deletions scripts/sim.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,45 +1,87 @@
# LLMServingSim simulator image.
# LLMServingSim simulator image (slim, multi-stage).
#
# Built by .github/workflows/build-sim-image.yml on every push to main
# that touches scripts/install-sim.sh, the astra-sim submodule, or this
# file. Published to ghcr.io/<owner>/<repo>/sim.
#
# Stage 1 (builder) carries the C++ toolchain to compile ASTRA-Sim;
# Stage 2 (runtime) carries only the python interpreter, the
# simulator's runtime python deps, the compiled ASTRA-Sim binary, the
# chakra package, and the repo source — no compilers, no headers, no
# .git history. .dockerignore at the repo root strips perf/results/
# venv/CMake artefacts from the context so neither stage sees them.
#
# install-sim.sh also pulls workload-generator / bench / power-model
# training deps (transformers, datasets, scikit-learn, xgboost,
# matplotlib). Those are NOT installed here: the sim image is for
# `python -m serving` only, and workload generation lives in the vLLM
# docker image (scripts/docker-vllm.sh).
#
# Local build:
# docker build -f scripts/sim.Dockerfile -t llmservingsim-sim .
#
# Local run (mount your working copy over /workspace if you want live
# edits — the image carries the repo state at build time):
# docker run -it --rm -v "$(pwd)":/workspace llmservingsim-sim bash

# ============================================================================
# Stage 1: builder — toolchain for ASTRA-Sim native build
# ============================================================================
FROM ubuntu:24.04 AS builder

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update -qq \
&& apt-get install -y -qq --no-install-recommends \
git ca-certificates \
build-essential cmake \
protobuf-compiler libprotobuf-dev \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY . /build

# Init submodules, build the analytical backend, then aggressively
# prune everything we don't need at runtime: CMake build trees,
# downloaded cmake _deps, object files, every .git directory. The
# runtime stage copies the leftover tree as-is.
RUN git config --global --add safe.directory '*' \
&& git submodule update --init --recursive --depth 1 \
&& bash astra-sim/build/astra_analytical/build.sh \
&& rm -rf astra-sim/build/astra_analytical/build/CMakeFiles \
astra-sim/build/astra_analytical/build/_deps \
astra-sim/build/astra_analytical/build/CMakeCache.txt \
astra-sim/build/astra_analytical/build/cmake_install.cmake \
astra-sim/build/astra_analytical/build/Makefile \
&& find astra-sim -name '*.o' -delete \
&& find astra-sim -name '*.a' -delete \
&& find . -type d -name .git -prune -exec rm -rf {} + \
&& find . -type d -name __pycache__ -prune -exec rm -rf {} +

# ============================================================================
# Stage 2: runtime — slim, only what's needed to RUN the simulator
# ============================================================================
FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive \
PIP_BREAK_SYSTEM_PACKAGES=1
PIP_BREAK_SYSTEM_PACKAGES=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1

# Pre-install the apt packages install-sim.sh expects. Doing this in a
# dedicated layer lets Docker cache them independently of the script's
# pip / submodule / build steps further down, which is what dominates
# rebuild time when the source changes.
# Runtime apt: python + pip + tls roots. No compilers, no git, no
# protobuf C++ runtime — the python protobuf wheel ships its own.
RUN apt-get update -qq \
&& apt-get install -y -qq --no-install-recommends \
git build-essential cmake \
protobuf-compiler libprotobuf-dev \
python3 python3-pip python-is-python3 \
ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*
ca-certificates \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

WORKDIR /workspace
COPY --from=builder /build /workspace

# Copy the repo (including submodules — the workflow's checkout step
# fetches them on the runner so they're in the build context).
COPY . /workspace

# install-sim.sh is idempotent and detects pre-installed apt packages,
# so its apt phase is a no-op here; the work it actually does in this
# layer is: python deps, ASTRA-Sim analytical-backend build, chakra
# install. See the script for the full sequence.
RUN bash scripts/install-sim.sh
# Simulator runtime python deps. Narrower than scripts/install-sim.sh
# on purpose — see the header comment.
RUN pip3 install --quiet --no-input \
pyyaml pyinstrument msgspec \
pandas numpy rich protobuf \
&& pip3 install --quiet --no-input --no-deps \
/workspace/astra-sim/extern/graph_frontend/chakra \
&& find / -type d -name __pycache__ -prune -exec rm -rf {} + 2>/dev/null || true

# Default entry: interactive shell at the repo root. Match docker-sim.sh
# behaviour so the two surfaces (CI image / dev script) feel the same.
CMD ["/bin/bash"]
Loading