Skip to content

Consolidate binaries; add 'stream' subcommand for direct-to-remote acquisition#810

Open
demoray wants to merge 6 commits into
mainfrom
bcaswell/stream-to-blob
Open

Consolidate binaries; add 'stream' subcommand for direct-to-remote acquisition#810
demoray wants to merge 6 commits into
mainfrom
bcaswell/stream-to-blob

Conversation

@demoray
Copy link
Copy Markdown
Collaborator

@demoray demoray commented May 26, 2026

Consolidate the four standalone binaries (avml, avml-convert, avml-upload, avml-stream) into a single avml binary with subcommands, and add direct-to-remote-storage streaming as a first-class capability.

CLI shape

avml acquire FILE [--compress] [--source S] [--max-disk-usage N]
                  [--max-disk-usage-percentage P]
                  [--url U] [--delete]
                  [--sas-url U] [--sas-block-size N] [--sas-block-concurrency N]
avml convert [--source-format F] [--format F] SRC DST
avml upload  put  FILE URL
avml upload  blob FILE URL [--sas-block-size N] [--sas-block-concurrency N]
avml stream  blob SAS_URL [--compress] [--source S]
                          [--sas-block-size N] [--sas-block-concurrency N]
avml stream  tcp  ADDR    [--compress] [--source S]

Feature flags

Three new top-level features gate which subcommands the binary includes:

feature enables extra deps
convert avml convert none
upload avml upload {put,blob} put + blobstore
stream avml stream {blob,tcp} blobstore + tokio/net

default = ["stream", "upload", "convert", "native-tls"]. The existing lower-level put/blobstore features stay for library consumers who want them in isolation. cargo build --no-default-features still produces an acquire-only binary (the previous avml-minimal shape).

Streaming subcommand

For hosts where writing the snapshot to a local file first is undesirable (read-only root, limited disk, forensic chain-of-custody concerns), stream writes bytes sequentially to a remote destination without touching local disk.

  • stream blob — Azure Block Blob via stage_block + commit_block_list. Block size is derived from the iomem estimate so the snapshot fits within Azure's 50,000-block-per-blob limit (--sas-block-size is a floor, not an override). --sas-block-concurrency caps in-flight stage_block calls via a tokio::sync::Semaphore; peak RAM is roughly (concurrency + 1) * block_size. On mid-stream failure, staged blocks are abandoned without commit_block_list; Azure discards them per its 7-day default.
  • stream tcp — opens a single tokio::net::TcpStream::connect and writes the bytes in order; intended for collectors like nc -l PORT > snapshot.lime. No TLS; pair with an SSH tunnel or stunnel for confidentiality.

Both subcommands pick a memory source once up front (honoring --source if given, otherwise probing kcore → /dev/crash → /dev/mem). Blob writes are not rewindable, so there is no automatic source fallback once any bytes have been emitted.

New library surface (all additive — semver-checks passes)

  • image::Image::with_dst<W: Write>(format, src, dst: W) -> Result<Image<File, W>> — generic-destination companion to Image::new. Lets callers thread arbitrary writers (e.g. BlockBlobStream) through.
  • snapshot::Snapshot::create_to_writer<W: Write>(&self, dst: W) — alternative to create() for non-file destinations. Never auto-retries across sources.
  • snapshot::Snapshot::probe_single_source() -> Result<Source> — factored helper from the existing /dev/stdout branch.
  • upload::stream::BlockBlobStream { writer(), finish_writes(), finalize().await, abort().await } — sync-write bridge over an AsyncWrite via tokio_util::io::SyncIoBridge; uploader task with Semaphore-bounded concurrency.

Layout

src/bin/avml/main.rs        (Cmd + dispatch)
src/bin/avml/acquire.rs
src/bin/avml/convert.rs     (was src/bin/avml-convert.rs)
src/bin/avml/upload.rs      (was src/bin/avml-upload.rs)
src/bin/avml/stream.rs

Wired via [[bin]] path = "src/bin/avml/main.rs" so cargo's bin autodiscovery doesn't treat each sibling as a separate binary.

Breaking CLI changes for users

before after
avml output.lime avml acquire output.lime
avml-convert ... avml convert ...
avml-upload put FILE URL avml upload put FILE URL
avml-upload upload-blob FILE URL avml upload blob FILE URL

The previous eng/test-azure-image.sh and the GitHub Actions smoke tests are already updated. Downstream consumers that pin command lines (Azure VM Extensions configs, custom scripts) need the same one-token change.

Verification

  • cargo fmt --check
  • cargo clippy --locked --all-targets --all-features -- -D warnings -D clippy::pedantic -A clippy::missing_errors_doc
  • Same clippy clean across --no-default-features, +convert, +upload, +stream, and pairwise combinations.
  • cargo test --all-features — 36 lib (was 28; +8 new BlockBlobStream unit tests against an in-memory FakeStager) + 2 convert + 1 doctest.
  • cargo semver-checks check-release — no semver update required (only additive public-API changes).
  • cargo build --release --target x86_64-unknown-linux-musl --locked --all-features and --no-default-features both clean.
  • --help output exercised end-to-end for every subcommand.

demoray added 4 commits May 21, 2026 17:05
Add an opt-in --stream-to-blob mode that writes the in-progress memory
snapshot directly to a Block Blob via stage_block + commit_block_list,
without ever touching local disk. The default snapshot-to-file-then-upload
flow is unchanged.

Library additions:
- image::Image::with_dst<W: Write> opens the memory source as a File but
  threads an arbitrary writer through as the destination.
- snapshot::Snapshot::create_to_writer<W: Write> picks one source up front
  (Source::probe_single_source helper, factored from the /dev/stdout
  branch) and writes to the supplied writer. No source auto-retry; blob
  writes cannot be discarded.
- upload::stream::BlockBlobStream wraps tokio_util::io::SyncIoBridge over
  an internal AsyncWrite. The async side buffers up to block_size, then
  hands each full block to a bounded uploader task that gates concurrency
  via a Semaphore. finalize() awaits in-flight stages then commits;
  abort() awaits in-flight stages but skips commit.
- A pub(crate) BlockStager trait abstracts the two SDK methods so tests
  can substitute an in-memory FakeStager. Eight unit tests cover buffer
  rotation, block-id uniformity, trailing-partial flush, stage failure
  surfaced on finalize, abort path, and concurrency bound.

Binary wiring (bin/avml.rs):
- --stream-to-blob flag, requires --sas-url, conflicts with --url,
  --max-disk-usage, --max-disk-usage-percentage, and --delete (all hard
  errors at parse time).
- filename is required_unless_present = stream_to_blob.
- Block size derived from disk_usage::estimate to fit in Azure's
  50,000-block-per-blob cap; --sas-block-size is a floor.
- Snapshot runs inside spawn_blocking so SyncIoBridge's block_on doesn't
  deadlock the current_thread runtime. The (stream, result) tuple comes
  back so abort always runs on partial failure.

No public API breakage: only additive items
(Image::with_dst, Snapshot::create_to_writer, Snapshot::probe_single_source,
BlockBlobStream, BLOB_MAX_BLOCKS, BlobError re-export).

Cargo.toml: enable tokio-util/io-util and tokio/sync under the blobstore
feature.

Verified: cargo fmt --check, full pedantic clippy (-D warnings), cargo
test --all-features (36 lib + 2 convert + 1 doctest), cargo
semver-checks check-release, and musl release build.
Snapshot streaming is a different workflow from acquire-then-upload —
it picks one memory source up front (no auto-retry), never touches
local disk, and has its own destination protocol. Putting it in a
separate binary mirrors how avml-convert and avml-upload are handled,
and keeps the avml CLI flat (no acquire/stream subcommand split).

avml: revert to the pre-streaming flat CLI shape. The --stream-to-blob
flag and its handler move out; --max-disk-usage, --sas-url, --delete
etc. remain unchanged. The mandatory positional <FILENAME> argument
is now always a plain PathBuf, no more cfg-gated Option<PathBuf>.

avml-stream: new binary, requires the blobstore feature. Subcommand
shape:

  avml-stream blob <SAS_URL> [--compress] [--source S]
                             [--sas-block-size N]
                             [--sas-block-concurrency N]

The subcommand layer leaves room for adding additional destination
protocols later (e.g., a future 'avml-stream s3-multipart' against
the AWS SDK) without restructuring.

Cargo.toml: register the new [[bin]] with required-features =
['blobstore']. eng/build.sh and the linux artifact upload paths in
build.yml include the new binary. README points at avml-stream for
the no-local-file streaming workflow.

Verified: fmt, full pedantic clippy, all tests (36 lib + 2 convert +
1 doctest), semver-checks (additive only, no version bump), musl
release builds with --all-features and --no-default-features.
--help output of avml is unchanged from main; avml-stream's --help
shows the new subcommand surface.
Streams the snapshot bytes sequentially to a remote TCP listener — the
collector runs something like 'nc -l 9000 > snapshot.lime' and avml
connects as a client.

Implementation is direct: open a tokio::net::TcpStream, wrap in
SyncIoBridge, hand the sync writer to Snapshot::create_to_writer.
No staging, no commit dance, no length prefix; the protocol is just
bytes-on-the-wire in source order.

Failure mode: a connection drop mid-stream surfaces as an io::Error
from SyncIoBridge's write call, which becomes snapshot::Error::Disk
(via image::Error::Io) and propagates out of create_to_writer. No
retry; the source was committed-to up front, the destination has no
resume.

No TLS — pair with an SSH tunnel or stunnel if confidentiality and
integrity matter. Documented in README.

Cargo.toml: add 'net' to tokio's feature list (already optional under
put/blobstore; non-network builds unaffected).

Verified: fmt, full pedantic clippy, all tests (36 lib + 2 convert +
1 doctest), semver-checks (no public-API change), musl release build,
'avml-stream --help' showing both 'blob' and 'tcp' subcommands.
Single avml binary with subcommands:

  avml acquire FILE [--compress] [--source S] [--max-disk-usage N]
                    [--max-disk-usage-percentage P]
                    [--url U] [--delete]
                    [--sas-url U] [--sas-block-size N] [--sas-block-concurrency N]
  avml convert [--source-format F] [--format F] SRC DST
  avml upload put  FILE URL
  avml upload blob FILE URL [--sas-block-size N] [--sas-block-concurrency N]
  avml stream blob SAS_URL  [--compress] [--source S]
                            [--sas-block-size N] [--sas-block-concurrency N]
  avml stream tcp  ADDR     [--compress] [--source S]

Three new top-level features control which subcommands the binary
includes:

  convert = []                        # no extra deps
  upload  = ["put", "blobstore"]      # subsumes both upload backends
  stream  = ["blobstore", "tokio/net"] # blob + tcp streaming

default = ["stream", "upload", "convert", "native-tls"].

A --no-default-features build produces an acquire-only binary
(the old "avml-minimal" shape).

Layout: src/bin/avml/main.rs + sibling modules (acquire.rs, convert.rs,
upload.rs, stream.rs), wired via an explicit [[bin]] path so cargo's
bin autodiscovery doesn't try to compile each submodule as its own
binary.

eng/build.sh and the GitHub Actions workflow drop the separate
'--bin avml-upload' step and the avml-{convert,upload,stream}
artifact paths; everything ships in the one avml binary now.
README, eng/test-azure-image.sh, and the CI smoke tests are updated
to use 'avml acquire'.

Breaking CLI changes for users:
  avml output.lime         -> avml acquire output.lime
  avml-convert ...         -> avml convert ...
  avml-upload put ...      -> avml upload put ...
  avml-upload upload-blob  -> avml upload blob
  avml-stream blob ...     -> avml stream blob ...
  avml-stream tcp ...      -> avml stream tcp ...

Verified: fmt, full pedantic clippy across 7 feature combinations
(default, no-default, +convert, +upload, +stream, +convert+upload,
+convert+stream, +upload+stream), all tests (36 lib + 2 convert + 1
doctest), semver-checks (additive only), musl release builds with
--all-features and --no-default-features, --help output for every
subcommand verified end-to-end.
demoray added 2 commits May 26, 2026 23:16
Both subcommands depend on Linux-specific kernel interfaces
(/proc/iomem, /proc/kcore, /dev/crash, /dev/mem); they don't work on
macOS, the BSDs, or Windows. Cfg-gate the subcommand modules and
matching Commands enum variants on 'target_os = "linux"' so non-Linux
builds drop them from the binary entirely. Help text and dispatch
follow.

On Windows / macOS / BSD with default features, the resulting binary
ships only 'convert' and 'upload'. The library-level streaming
primitives (BlockBlobStream etc.) still compile and remain available
to library consumers regardless of target.

Verified: full pedantic clippy, all tests, cargo check against
x86_64-pc-windows-msvc with --all-features (and with stream-only,
which compiles but produces a no-subcommand binary on Windows).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant