Consolidate binaries; add 'stream' subcommand for direct-to-remote acquisition#810
Open
demoray wants to merge 6 commits into
Open
Consolidate binaries; add 'stream' subcommand for direct-to-remote acquisition#810demoray wants to merge 6 commits into
demoray wants to merge 6 commits into
Conversation
Add an opt-in --stream-to-blob mode that writes the in-progress memory snapshot directly to a Block Blob via stage_block + commit_block_list, without ever touching local disk. The default snapshot-to-file-then-upload flow is unchanged. Library additions: - image::Image::with_dst<W: Write> opens the memory source as a File but threads an arbitrary writer through as the destination. - snapshot::Snapshot::create_to_writer<W: Write> picks one source up front (Source::probe_single_source helper, factored from the /dev/stdout branch) and writes to the supplied writer. No source auto-retry; blob writes cannot be discarded. - upload::stream::BlockBlobStream wraps tokio_util::io::SyncIoBridge over an internal AsyncWrite. The async side buffers up to block_size, then hands each full block to a bounded uploader task that gates concurrency via a Semaphore. finalize() awaits in-flight stages then commits; abort() awaits in-flight stages but skips commit. - A pub(crate) BlockStager trait abstracts the two SDK methods so tests can substitute an in-memory FakeStager. Eight unit tests cover buffer rotation, block-id uniformity, trailing-partial flush, stage failure surfaced on finalize, abort path, and concurrency bound. Binary wiring (bin/avml.rs): - --stream-to-blob flag, requires --sas-url, conflicts with --url, --max-disk-usage, --max-disk-usage-percentage, and --delete (all hard errors at parse time). - filename is required_unless_present = stream_to_blob. - Block size derived from disk_usage::estimate to fit in Azure's 50,000-block-per-blob cap; --sas-block-size is a floor. - Snapshot runs inside spawn_blocking so SyncIoBridge's block_on doesn't deadlock the current_thread runtime. The (stream, result) tuple comes back so abort always runs on partial failure. No public API breakage: only additive items (Image::with_dst, Snapshot::create_to_writer, Snapshot::probe_single_source, BlockBlobStream, BLOB_MAX_BLOCKS, BlobError re-export). Cargo.toml: enable tokio-util/io-util and tokio/sync under the blobstore feature. Verified: cargo fmt --check, full pedantic clippy (-D warnings), cargo test --all-features (36 lib + 2 convert + 1 doctest), cargo semver-checks check-release, and musl release build.
Snapshot streaming is a different workflow from acquire-then-upload —
it picks one memory source up front (no auto-retry), never touches
local disk, and has its own destination protocol. Putting it in a
separate binary mirrors how avml-convert and avml-upload are handled,
and keeps the avml CLI flat (no acquire/stream subcommand split).
avml: revert to the pre-streaming flat CLI shape. The --stream-to-blob
flag and its handler move out; --max-disk-usage, --sas-url, --delete
etc. remain unchanged. The mandatory positional <FILENAME> argument
is now always a plain PathBuf, no more cfg-gated Option<PathBuf>.
avml-stream: new binary, requires the blobstore feature. Subcommand
shape:
avml-stream blob <SAS_URL> [--compress] [--source S]
[--sas-block-size N]
[--sas-block-concurrency N]
The subcommand layer leaves room for adding additional destination
protocols later (e.g., a future 'avml-stream s3-multipart' against
the AWS SDK) without restructuring.
Cargo.toml: register the new [[bin]] with required-features =
['blobstore']. eng/build.sh and the linux artifact upload paths in
build.yml include the new binary. README points at avml-stream for
the no-local-file streaming workflow.
Verified: fmt, full pedantic clippy, all tests (36 lib + 2 convert +
1 doctest), semver-checks (additive only, no version bump), musl
release builds with --all-features and --no-default-features.
--help output of avml is unchanged from main; avml-stream's --help
shows the new subcommand surface.
Streams the snapshot bytes sequentially to a remote TCP listener — the collector runs something like 'nc -l 9000 > snapshot.lime' and avml connects as a client. Implementation is direct: open a tokio::net::TcpStream, wrap in SyncIoBridge, hand the sync writer to Snapshot::create_to_writer. No staging, no commit dance, no length prefix; the protocol is just bytes-on-the-wire in source order. Failure mode: a connection drop mid-stream surfaces as an io::Error from SyncIoBridge's write call, which becomes snapshot::Error::Disk (via image::Error::Io) and propagates out of create_to_writer. No retry; the source was committed-to up front, the destination has no resume. No TLS — pair with an SSH tunnel or stunnel if confidentiality and integrity matter. Documented in README. Cargo.toml: add 'net' to tokio's feature list (already optional under put/blobstore; non-network builds unaffected). Verified: fmt, full pedantic clippy, all tests (36 lib + 2 convert + 1 doctest), semver-checks (no public-API change), musl release build, 'avml-stream --help' showing both 'blob' and 'tcp' subcommands.
Single avml binary with subcommands:
avml acquire FILE [--compress] [--source S] [--max-disk-usage N]
[--max-disk-usage-percentage P]
[--url U] [--delete]
[--sas-url U] [--sas-block-size N] [--sas-block-concurrency N]
avml convert [--source-format F] [--format F] SRC DST
avml upload put FILE URL
avml upload blob FILE URL [--sas-block-size N] [--sas-block-concurrency N]
avml stream blob SAS_URL [--compress] [--source S]
[--sas-block-size N] [--sas-block-concurrency N]
avml stream tcp ADDR [--compress] [--source S]
Three new top-level features control which subcommands the binary
includes:
convert = [] # no extra deps
upload = ["put", "blobstore"] # subsumes both upload backends
stream = ["blobstore", "tokio/net"] # blob + tcp streaming
default = ["stream", "upload", "convert", "native-tls"].
A --no-default-features build produces an acquire-only binary
(the old "avml-minimal" shape).
Layout: src/bin/avml/main.rs + sibling modules (acquire.rs, convert.rs,
upload.rs, stream.rs), wired via an explicit [[bin]] path so cargo's
bin autodiscovery doesn't try to compile each submodule as its own
binary.
eng/build.sh and the GitHub Actions workflow drop the separate
'--bin avml-upload' step and the avml-{convert,upload,stream}
artifact paths; everything ships in the one avml binary now.
README, eng/test-azure-image.sh, and the CI smoke tests are updated
to use 'avml acquire'.
Breaking CLI changes for users:
avml output.lime -> avml acquire output.lime
avml-convert ... -> avml convert ...
avml-upload put ... -> avml upload put ...
avml-upload upload-blob -> avml upload blob
avml-stream blob ... -> avml stream blob ...
avml-stream tcp ... -> avml stream tcp ...
Verified: fmt, full pedantic clippy across 7 feature combinations
(default, no-default, +convert, +upload, +stream, +convert+upload,
+convert+stream, +upload+stream), all tests (36 lib + 2 convert + 1
doctest), semver-checks (additive only), musl release builds with
--all-features and --no-default-features, --help output for every
subcommand verified end-to-end.
Both subcommands depend on Linux-specific kernel interfaces (/proc/iomem, /proc/kcore, /dev/crash, /dev/mem); they don't work on macOS, the BSDs, or Windows. Cfg-gate the subcommand modules and matching Commands enum variants on 'target_os = "linux"' so non-Linux builds drop them from the binary entirely. Help text and dispatch follow. On Windows / macOS / BSD with default features, the resulting binary ships only 'convert' and 'upload'. The library-level streaming primitives (BlockBlobStream etc.) still compile and remain available to library consumers regardless of target. Verified: full pedantic clippy, all tests, cargo check against x86_64-pc-windows-msvc with --all-features (and with stream-only, which compiles but produces a no-subcommand binary on Windows).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Consolidate the four standalone binaries (
avml,avml-convert,avml-upload,avml-stream) into a singleavmlbinary with subcommands, and add direct-to-remote-storage streaming as a first-class capability.CLI shape
Feature flags
Three new top-level features gate which subcommands the binary includes:
convertavml convertuploadavml upload {put,blob}put+blobstorestreamavml stream {blob,tcp}blobstore+tokio/netdefault = ["stream", "upload", "convert", "native-tls"]. The existing lower-levelput/blobstorefeatures stay for library consumers who want them in isolation.cargo build --no-default-featuresstill produces an acquire-only binary (the previousavml-minimalshape).Streaming subcommand
For hosts where writing the snapshot to a local file first is undesirable (read-only root, limited disk, forensic chain-of-custody concerns),
streamwrites bytes sequentially to a remote destination without touching local disk.stream blob— Azure Block Blob viastage_block+commit_block_list. Block size is derived from the iomem estimate so the snapshot fits within Azure's 50,000-block-per-blob limit (--sas-block-sizeis a floor, not an override).--sas-block-concurrencycaps in-flight stage_block calls via atokio::sync::Semaphore; peak RAM is roughly(concurrency + 1) * block_size. On mid-stream failure, staged blocks are abandoned withoutcommit_block_list; Azure discards them per its 7-day default.stream tcp— opens a singletokio::net::TcpStream::connectand writes the bytes in order; intended for collectors likenc -l PORT > snapshot.lime. No TLS; pair with an SSH tunnel or stunnel for confidentiality.Both subcommands pick a memory source once up front (honoring
--sourceif given, otherwise probing kcore → /dev/crash → /dev/mem). Blob writes are not rewindable, so there is no automatic source fallback once any bytes have been emitted.New library surface (all additive — semver-checks passes)
image::Image::with_dst<W: Write>(format, src, dst: W) -> Result<Image<File, W>>— generic-destination companion toImage::new. Lets callers thread arbitrary writers (e.g.BlockBlobStream) through.snapshot::Snapshot::create_to_writer<W: Write>(&self, dst: W)— alternative tocreate()for non-file destinations. Never auto-retries across sources.snapshot::Snapshot::probe_single_source() -> Result<Source>— factored helper from the existing/dev/stdoutbranch.upload::stream::BlockBlobStream { writer(), finish_writes(), finalize().await, abort().await }— sync-write bridge over anAsyncWriteviatokio_util::io::SyncIoBridge; uploader task withSemaphore-bounded concurrency.Layout
Wired via
[[bin]] path = "src/bin/avml/main.rs"so cargo's bin autodiscovery doesn't treat each sibling as a separate binary.Breaking CLI changes for users
avml output.limeavml acquire output.limeavml-convert ...avml convert ...avml-upload put FILE URLavml upload put FILE URLavml-upload upload-blob FILE URLavml upload blob FILE URLThe previous
eng/test-azure-image.shand the GitHub Actions smoke tests are already updated. Downstream consumers that pin command lines (Azure VM Extensions configs, custom scripts) need the same one-token change.Verification
cargo fmt --checkcargo clippy --locked --all-targets --all-features -- -D warnings -D clippy::pedantic -A clippy::missing_errors_docclippyclean across--no-default-features,+convert,+upload,+stream, and pairwise combinations.cargo test --all-features— 36 lib (was 28; +8 newBlockBlobStreamunit tests against an in-memoryFakeStager) + 2 convert + 1 doctest.cargo semver-checks check-release— no semver update required (only additive public-API changes).cargo build --release --target x86_64-unknown-linux-musl --locked --all-featuresand--no-default-featuresboth clean.--helpoutput exercised end-to-end for every subcommand.