Aron/delegation token propagation#223
Closed
aron-muon wants to merge 20 commits into
Closed
Conversation
Add a maximum_background field to FUSEMountConfiguration. When set, it caps the FUSE asynchronous request queue (FUSE_INIT max_background); when zero, go-fuse's default of 12 is used. The default of 12 suits bb_clientd, but is too low for bb_worker, where a single mount serves many concurrent actions and the kernel CongestionThreshold (max_background * 3/4) is reached almost as soon as multiple actions begin reading. Reads beyond the threshold queue in kernel D-state, which presents as the worker wedging while reporting actions as still executing. Recommended value for bb_worker is 1024.
Redirect bb_worker_container_push from container_push_official's ghcr.io/buildbarn/ pin to ghcr.io/aron-muon/bb-worker so we can publish images built from this fork without colliding with upstream artifacts. Not for upstream merge — this commit lives on the aron-muon fork only.
Picks up upstream commit d0c6f26 ("Forward termination signals to child
process", PR #332). Without it, bb_worker and bb_runner running as PID 1
in containers ignore SIGTERM — kubelet has to SIGKILL after the grace
period, killing in-flight actions and accumulating Failed pods.
Validated locally: a Linux build of cmd/bb_worker now contains the
"Failed to forward signal %#v to child process" log message, confirming
the fix is linked into the binary.
Bumps both go.mod (gazelle's go_deps source) and the git_override in
MODULE.bazel (which actually controls bzlmod resolution).
Aron/fuse max background
Pulls in 55de026c372 ("program: exit cleanly when signal-to-self races")
from aron-muon/bb-storage. Without that, bb_worker pods exit with
code 1 on every clean SIGTERM shutdown — the time.Sleep(5) fallback
in terminateWithSignal evaluates to 5 nanoseconds and fires before
the signal-to-self can deliver.
Revert this back to the upstream HEAD once
https://github.com/buildbarn/bb-storage/pull/<TBD> lands and is released.
Redirect bb_worker_container_push from container_push_official's ghcr.io/buildbarn/ pin to ghcr.io/aron-muon/bb-worker so we can publish images built from this fork without colliding with upstream artifacts. Not for upstream merge — this commit lives on the aron-muon fork only.
The upstream script gated all stamp values behind GITHUB_ACTIONS=true
and used GNU date's --date "@<unix_ts>" syntax. That meant local
builds on macOS produced no BUILD_SCM_* values, so image_push targets
that template-expand them (e.g. //cmd/bb_worker:bb_worker_container_push)
failed with `function "BUILD_SCM_TIMESTAMP" not defined`.
Drop the GITHUB_ACTIONS guard and detect GNU vs BSD date so the same
script produces the same `${TIMESTAMP}-${SHA}` tag whether it runs
on a Linux CI runner or a macOS dev machine. No behavior change in
CI; macOS builds now get a usable stamp instead of an opaque template
error.
# Conflicts: # MODULE.bazel # pkg/filesystem/virtual/configuration/fuse_mount_enabled.go # pkg/proto/configuration/filesystem/virtual/virtual.pb.go # pkg/proto/configuration/filesystem/virtual/virtual.proto
rules_img's should_stamp() only checks templates.values() for {{...}}
placeholders, not tag_file content. With our tag template living in
stamped_tags.txt (via tag_file = ...), stamp files weren't being passed
to expand-template, causing 'function "BUILD_SCM_TIMESTAMP" not
defined' build failures. Set stamp = "enabled" to force inclusion.
… fix) Real fix this time: drops the signal-raise dance entirely instead of just retiming the fallback. The previous bump (55de026) didn't help because Go's runtime.dieFromSignal exits 2 before our fallback runs — confirmed via strace on staging. Container exit code goes from 2 (Failed/Error) to 0 (Succeeded).
…bb-runner-installer bb_runner_installer embeds the bb_runner binary, which links our hotfixed bb-storage. Without a parallel push target the upstream ghcr.io/buildbarn/bb-runner-installer is used, meaning the bb-runner sidecar still exits 1 on graceful SIGTERM (the original 5-ns sleep race in terminateWithSignal) even though bb-worker now exits 0. Mirrors the cmd/bb_worker/BUILD.bazel pattern.
…ment Add a configurable channel for propagating per-build identity tokens (e.g., bb-credential-broker delegation JWTs) from the Bazel client through the scheduler to the worker's action environment, without contaminating the Action digest or CAS. Scheduler (P1): new `forward_request_headers` config field. When set, the scheduler extracts the named gRPC metadata headers from incoming Execute() calls, wraps each value in a ForwardedRequestHeader Any message, and appends it to the existing AuxiliaryMetadata slice that ships in DesiredState_Executing. Worker (P2): new `forward_auxiliary_metadata_to_environment` config field on RunnerConfiguration. When true, the worker decodes ForwardedRequestHeader messages from inbound AuxiliaryMetadata and injects them into the action's environment variables. Command-proto variables (part of the Action digest) take precedence, so a client cannot shadow digest-committed variables. Both features default to off — behaviour is identical to upstream when unconfigured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arn#222) * fuse: expose maximum_background in FUSEMountConfiguration Add a maximum_background field to FUSEMountConfiguration. When set, it caps the FUSE asynchronous request queue (FUSE_INIT max_background); when zero, go-fuse's default of 12 is used. The default of 12 suits bb_clientd, but is too low for bb_worker, where a single mount serves many concurrent actions and the kernel CongestionThreshold (max_background * 3/4) is reached almost as soon as multiple actions begin reading. Reads beyond the threshold queue in kernel D-state, which presents as the worker wedging while reporting actions as still executing. Recommended value for bb_worker is 1024. * deps: bump bb-storage to upstream/main HEAD Picks up commit d0c6f26 ("Forward termination signals to child process", PR #332). Without that fix, processes running as PID 1 in containers silently ignore SIGTERM and the kubelet has to fall back to SIGKILL once the grace period expires. For bb_worker that means in-flight actions are killed and the scheduler has to retry them on another worker. Bumps both go.mod (gazelle's go_deps source) and the git_override in MODULE.bazel that actually controls bzlmod resolution; they need to stay in sync. * proto: load proto_library from @protobuf//bazel The @rules_proto//proto:defs.bzl source has been deprecated in favour of @protobuf//bazel:proto_library.bzl in newer protobuf module versions. This commit switches all proto_library load statements over.
- Regenerate bb_scheduler.pb.go, bb_worker.pb.go for new proto fields - Check in generated forwarded_request_header.pb.go and gRPC stub - Fix trailing double-space on forwardAuxiliaryMetadataToEnvironment - Combine consecutive bool params in NewLocalBuildExecutor signature - Alphabetize BUILD.bazel load statements in cmd/bb_worker Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…duler Replace upstream's container_push_official (ghcr.io/buildbarn/) with a fork-specific image_push target pointing at ghcr.io/aron-muon/bb-scheduler. Same pattern as cmd/bb_worker. The target name is unchanged so the existing workflow step works without modification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Regenerate virtual.pb.go (import reorder from protoc-gen-go) - Fix 755→644 permissions on forwarded_request_header pb.go files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a new bb_runner decorator that exchanges a delegation JWT for real upstream credentials via bb-credential-broker's /token endpoint, then writes credential files (.netrc for MVP) into the action's input root before delegating to the base LocalRunner. The decorator reads BB_DELEGATION_JWT (configurable) from the action's environment variables, removes it before the action spawns, and calls the broker for each configured destination. If no JWT is present, the request passes through unmodified (mixed-pool compatibility). New config proto: CredentialInjectionConfiguration on ApplicationConfiguration (field 15) with destinations, broker_url, and credential file specs. When unset, behaviour is identical to upstream. Credential files are written atomically (write-to-temp + rename) to avoid partial reads by concurrent processes reading .netrc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.