Skip to content

ZFS template ENOSPC retry: tighten end-to-end verification #6

@sodre

Description

@sodre

Follow-up to #5 (Plan B). The ENOSPC retry path in zfs::ensure_template and zfs::docker_install_from_layers is structurally correct and partially verified, but never observed end-to-end in a single test run. This issue tracks closing that gap.

What's verified today

PR #5 documents the following tests passing on a loopback ZFS pool (Linux 6.12.75, aarch64, zfs-2.4.1):

  • unsquashfs returns non-zero on Disk quota exceeded (verified with quota=3M < single-template extract size).
  • The retry path catches the failure: [WARN] Extraction failed; evicting all warm templates and retrying is printed.
  • The sweep itself works under WARM_SECONDS=0 (Plan B Task 6 tests).
  • The retry attempt fires (a second unsquashfs invocation runs).
  • Final-error path fires when retry also fails: [ERROR] Extraction failed even after evicting warm templates, .tmp dataset is destroyed, no orphan container.

What's not yet verified

Success after retry, in one continuous run: first attempt hits ENOSPC → sweep evicts a warm template that frees enough space → retry attempt succeeds → final state has the new template installed.

The pieces are individually correct (the retry attempt is the identical command to the first), but observing the transition in a single run requires three quota constraints to line up:

  1. The new template's extracted size must be ≤ available-after-eviction.
  2. Two templates' combined extracted size must exceed the quota (so the first attempt fails).
  3. ZFS refuses quota=N if N < current usage, so the quota has to be set up while the warm template is fresh enough to fit AND tight enough that adding a second pushes over.

I tried several payload sizes and quotas on a 3.75G test pool and either both fit (no ENOSPC) or neither fit (no warm eviction recovery).

Also untested: Docker variant ENOSPC behavior

The .sqsh path uses unsquashfs, which surfaces ENOSPC immediately as a non-zero exit. The Docker path uses tar | tar inside enroot-nsenter and exhibited a different failure mode in one test: the receiving tar hung instead of returning a clean exit code, because ZFS quota visibility into the writing process is delayed by transaction-group commit timing. The retry's if ! guard never fired and the merge command had to be SIGKILL'd.

If this proves flaky in production, options include:

  • Wrap the receiving tar with a poller that watches the dataset's available property and aborts the pipe with SIGPIPE once it hits zero.
  • Pre-flight: run zfs::sweep_templates more aggressively before the merge if under_pressure is borderline (say, >= threshold - 10).
  • Add a hard timeout around the merge command and treat timeout-with-no-progress as ENOSPC.

Suggested verification approaches

  1. Dedicated tiny pool. Create a 64MB loopback file → 64MB pool. Tighter knobs available; smaller payloads exercise quotas with less ambiguity.
  2. Synthetic failure injection. Replace unsquashfs (and the merge command) with a wrapper script that exits non-zero on first call and exits zero on second call. Verifies the success-after-retry transition without depending on real ENOSPC behavior. Useful as a unit-style check.
  3. zfs reservation instead of quota. A reservation on a sibling dataset can squeeze the templates dataset's available bytes deterministically, sidestepping the "can't shrink quota below current usage" rule.
  4. Concurrent-extraction race against the quota — N workers each extracting a unique template against a quota that fits half of them. Forces sweep + retry under contention.

Acceptance criteria

  • A reproducible test recipe (in doc/zfs.md admin notes or a script in pkg/ if/when test infra lands) that triggers each of these in a single run:
    • First attempt ENOSPC → sweep → retry → success → new template installed.
    • First attempt ENOSPC → sweep evicts nothing → retry → second ENOSPC → final error → .tmp cleaned.
    • Docker tar | tar variant: receives clean ENOSPC exit code (no hang).
  • A short note in doc/zfs.md documenting the test recipe so admins can verify their own pool sizing produces the expected behavior.

Out of scope

  • Implementing a more sophisticated retry policy (e.g. multiple sweep aggressiveness levels, exponential backoff). The current single-retry behavior matches the plan; this issue is about verification, not redesign.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions