Keep .git/ in the sim image for revision tracking#40
Merged
Conversation
Debugging which commit a running container was built from currently
requires reading /workspace/.git-rev (single SHA, no log/diff context)
because the builder stage explicitly wiped every .git/ subtree before
the COPY --from=builder. That's enough to identify a release tag in
the abstract but useless when the user wants to know which submodule
SHAs went into ASTRA-Sim, whether a hotfix patch landed, or what the
last few commits actually changed.
Changes:
- scripts/sim.Dockerfile builder stage: drop the
``find -name .git -prune -exec rm -rf {} +`` and replace with
``git gc --aggressive --prune=now`` on the main repo plus
``git submodule foreach --recursive 'git gc --aggressive
--prune=now'`` so submodule .git/modules/<name>/ packs get
compacted too (the bulk of the .git footprint).
- scripts/sim.Dockerfile runtime stage: install git, add
``git config --system --add safe.directory '*'`` (containers run as
root for build but the .git was created in stage 1; safe.directory
also covers the enroot --user / docker -u case).
- Update header comment + .dockerignore comment to reflect the new
contract (".git/ KEPT, gc'd").
After this lands, every published sim image can self-report:
docker run --rm <image> bash -lc 'cd /workspace && git log -3 --oneline'
docker run --rm <image> bash -lc 'cd /workspace && git submodule status --recursive'
docker run --rm <image> bash -lc 'cat /workspace/.git-rev'
Size impact (measured on a full dev clone of this repo):
before gc: 305 MB total .git/ (root + .git/modules/* packs)
after gc: 210 MB (~31% reduction)
CI shallow checkouts (actions/checkout default ``fetch-depth: 1``
with ``submodules: recursive``) start much smaller, so the runtime
image overhead from the published builds will be well below 210 MB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Currently the sim image only carries a single
/workspace/.git-revline because the builder stage runsfind -name .git -prune -exec rm -rf {} +. That's enough to identify a tag but not to do real debugging: nogit logfor context, nogit submodule statusfor submodule SHAs (chakra, ns-3, etc.), nogit difffor hotfix verification.This patch keeps
.git/(main repo + recursive submodules), installsgitin the runtime stage, and runsgit gc --aggressive --prune=noweverywhere to compact packs.Changes
scripts/sim.Dockerfilebuilder stage: replace the .git wipe withgit gc --aggressive --prune=nowon the main repo plusgit submodule foreach --recursive 'git gc --aggressive --prune=now'so submodule.git/modules/<name>/packs (the bulk of the footprint) get compacted too. The submodule foreach is wrapped in a subshell +|| trueso a single submodule failure doesn't abort the build, scoped so the rest of the&&chain still aborts on real errors.git, addgit config --system --add safe.directory '*'(root-owned .git tree may now be inspected by enroot--user/ docker-uUIDs)..dockerignorecomment to document the new contract.After this lands
Size impact (measured)
Smoke test on a full dev clone of this repo (deep history; CI runs are shallower):
.git/totalThe root
.git/shrinks ~1 MB because it's already packed; the real win is on.git/modules/*(278 MB of submodule packs in this dev clone -> ~180 MB aftersubmodule foreach gc).CI shallow checkouts (
actions/checkoutwith defaultfetch-depth: 1+submodules: recursive) start much smaller, so the runtime image overhead from published builds will be well below 210 MB.Interaction with #39
Both PRs touch
scripts/sim.Dockerfile. #39 addsARG GIT_SHA+ an OCI label for fast version inspection without running the container; this PR adds the full.git/tree for deep debugging. They're complementary, but whichever merges second will need a tiny rebase (the header comment block and the sameRUNstep). No semantic conflict.Test plan
linux/amd64andlinux/arm64.docker run --rm ghcr.io/psal-postech/llmservingsimspec/sim:latest cat /workspace/.git-revreturns the merge commit SHA.docker run --rm <image> bash -lc 'cd /workspace && git log -1 --format=%H'matches that SHA.docker run --rm <image> bash -lc 'cd /workspace && git submodule status --recursive'lists every submodule with its SHA -- includingastra-simand the sub-submodules underastra-sim/extern/.docker image inspect <image> --format '{{.Size}}'is within ~250 MB of the previous published image size (sanity check on .git bloat).Generated by Claude Code