Eliminate oc --image-for calls and narrow semaphore scope#773
Conversation
Profiling the changelog handler for a two-release diff
(4.22.0-rc.4 → 4.22.0-rc.5, two machine-OS streams) showed the
following oc invocations in the NodeImageSectionMarkdown path:
Step Duration
oc adm release info -o json <to> ~4.2s ListMachineOSStreams
oc adm release info --image-for \
rhel-coreos <from> ~4.1s ImageReferenceForComponent
oc adm release info --image-for \
rhel-coreos-10 <from> ~4.0s ImageReferenceForComponent
oc adm release info --rpmdb-diff ... ~18s RpmDiffForStream (×2)
The two --image-for calls together added ~8 seconds of wall time per
changelog request. Each invoked a separate oc process for data that is
already present in the release JSON: references.spec.tags[*].from.name
contains the full pullspec for every component in the release, and that
JSON is already cached in groupcache under the "releaseinfo" key.
Changes:
- Extend releaseInfoImageRefs with from.name so imageRefFromReleaseJSON
can extract pullspecs from the cached release JSON.
- ExecReleaseInfo.ImageReferenceForComponent now calls ReleaseInfo() and
parses the JSON instead of shelling out to oc --image-for.
- CachingReleaseInfo "imagefor" groupcache getter fetches "releaseinfo"
via cache.Get and parses locally — no extra oc invocation.
- CachingReleaseInfo "machineosstreams" getter likewise reads from the
"releaseinfo" cache entry instead of calling info.ListMachineOSStreams
(which bypassed groupcache and ran oc independently), eliminating a
duplicate oc adm release info -o json call per request.
rh-pre-commit.version: 2.4.0
rh-pre-commit.check-secrets: ENABLED
ImageReferenceForComponent now resolves from groupcache (JSON parse, no oc subprocess, no /tmp/rpmdb/ access) after the --image-for elimination. Move it outside the RpmdbOCSlots semaphore in both NodeImageSectionMarkdown and the API handler so the slot is only held during RpmListForStream and RpmDiffForStream — the calls that actually spawn oc subprocesses and write to /tmp/rpmdb/. On a cold cache with two machine-OS streams, profiling showed: ImageReferenceForComponent × 2: ~8s (now outside semaphore) RpmListForStream × 2: ~8s (inside semaphore) RpmDiffForStream × 2: ~18s (inside semaphore) This reduces per-request semaphore hold time from ~34s to ~26s, allowing more concurrent changelog requests to proceed before the 16-slot limit causes 429 responses. On a freshly restarted controller where hundreds of release pages hit the semaphore simultaneously, this frees ~8 seconds of slot time per request for other callers. rh-pre-commit.version: 2.4.0 rh-pre-commit.check-secrets: ENABLED
📝 WalkthroughWalkthroughThis PR pre-resolves release-image "from" references from cached release JSON, updates groupcache getters to use that JSON, and narrows semaphore protection to only rpmdb/oc subprocess work; it also adds a script, README, metadata, and .gitignore to produce a prepopulated rpmdb cache. ChangesRelease Image Resolution & Caching Optimization
RPMDB Cache Generation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/release-controller/machine_os_tags.go`:
- Around line 35-49: Define a package-level sentinel error (e.g.
ErrReleaseComponentNotFound = errors.New("release component not found")) and
update imageRefFromReleaseJSON to return that sentinel when the loop finishes
without finding componentName (instead of the current fmt.Errorf("component %q
not found in release", ...)); keep all other error returns (JSON unmarshal error
and the empty ref fmt.Errorf) unchanged so callers can differentiate a genuine
not-found via errors.Is(err, ErrReleaseComponentNotFound) while still treating
other errors as real failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b31d3b52-19d8-4048-bbb1-86ea60f7c2e5
📒 Files selected for processing (4)
cmd/release-controller-api/http.gopkg/release-controller/machine_os_tags.gopkg/release-controller/release_info.gopkg/rhcos/node_image.go
| func imageRefFromReleaseJSON(raw, componentName string) (string, error) { | ||
| var ri releaseInfoImageRefs | ||
| if err := json.Unmarshal([]byte(raw), &ri); err != nil { | ||
| return "", err | ||
| } | ||
| for _, t := range ri.References.Spec.Tags { | ||
| if t.Name == componentName { | ||
| ref := strings.TrimSpace(t.From.Name) | ||
| if ref == "" { | ||
| return "", fmt.Errorf("empty image reference for component %q", componentName) | ||
| } | ||
| return ref, nil | ||
| } | ||
| } | ||
| return "", fmt.Errorf("component %q not found in release", componentName) |
There was a problem hiding this comment.
Return a distinguishable not-found error here.
The new pre-resolution paths use ImageReferenceForComponent to decide whether a stream existed in the from-release, but this helper currently turns “component missing” into a generic error string. That makes genuine absence indistinguishable from malformed release JSON / empty from.name, so those callers can silently drop RPM diffs instead of surfacing the real failure. Please expose a sentinel/typed not-found error (or a boolean) so callers can ignore only the expected absence case.
Suggested direction
+var ErrReleaseComponentNotFound = errors.New("release component not found")
+
func imageRefFromReleaseJSON(raw, componentName string) (string, error) {
var ri releaseInfoImageRefs
if err := json.Unmarshal([]byte(raw), &ri); err != nil {
return "", err
}
for _, t := range ri.References.Spec.Tags {
if t.Name == componentName {
ref := strings.TrimSpace(t.From.Name)
if ref == "" {
return "", fmt.Errorf("empty image reference for component %q", componentName)
}
return ref, nil
}
}
- return "", fmt.Errorf("component %q not found in release", componentName)
+ return "", fmt.Errorf("%w: %q", ErrReleaseComponentNotFound, componentName)
}Then the pre-resolution callers can use errors.Is(err, ErrReleaseComponentNotFound) and treat everything else as a real failure.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func imageRefFromReleaseJSON(raw, componentName string) (string, error) { | |
| var ri releaseInfoImageRefs | |
| if err := json.Unmarshal([]byte(raw), &ri); err != nil { | |
| return "", err | |
| } | |
| for _, t := range ri.References.Spec.Tags { | |
| if t.Name == componentName { | |
| ref := strings.TrimSpace(t.From.Name) | |
| if ref == "" { | |
| return "", fmt.Errorf("empty image reference for component %q", componentName) | |
| } | |
| return ref, nil | |
| } | |
| } | |
| return "", fmt.Errorf("component %q not found in release", componentName) | |
| var ErrReleaseComponentNotFound = errors.New("release component not found") | |
| func imageRefFromReleaseJSON(raw, componentName string) (string, error) { | |
| var ri releaseInfoImageRefs | |
| if err := json.Unmarshal([]byte(raw), &ri); err != nil { | |
| return "", err | |
| } | |
| for _, t := range ri.References.Spec.Tags { | |
| if t.Name == componentName { | |
| ref := strings.TrimSpace(t.From.Name) | |
| if ref == "" { | |
| return "", fmt.Errorf("empty image reference for component %q", componentName) | |
| } | |
| return ref, nil | |
| } | |
| } | |
| return "", fmt.Errorf("%w: %q", ErrReleaseComponentNotFound, componentName) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/release-controller/machine_os_tags.go` around lines 35 - 49, Define a
package-level sentinel error (e.g. ErrReleaseComponentNotFound =
errors.New("release component not found")) and update imageRefFromReleaseJSON to
return that sentinel when the loop finishes without finding componentName
(instead of the current fmt.Errorf("component %q not found in release", ...));
keep all other error returns (JSON unmarshal error and the empty ref fmt.Errorf)
unchanged so callers can differentiate a genuine not-found via errors.Is(err,
ErrReleaseComponentNotFound) while still treating other errors as real failures.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bradmwilliams, sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hack/rpmdb-cache-generate.sh`:
- Around line 112-163: The metadata commit currently happens unconditionally
after the streams loop (the jq write to "${METADATA}" and increment of
new_count), which records versions even when some stream caching failed; change
the logic in the for stream ... done block to track success with a boolean/flag
(e.g., success=true; set success=false when the rpmdb check or extension
extraction fails — references: variables/steps ext_pull, ext_cache, the oc adm
release info check and the ext extract branch), and only run the jq --arg v
"${version}" ... > "${METADATA}.tmp" mv and new_count=$((new_count + 1)) when
that success flag remains true (otherwise skip metadata write so partial
versions are not marked complete).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d538c8d9-004a-42f4-b11e-c0d107ef60db
📒 Files selected for processing (4)
hack/rpmdb-cache-generate.shrpmdb-cache/.gitignorerpmdb-cache/README.mdrpmdb-cache/metadata.json
✅ Files skipped from review due to trivial changes (2)
- rpmdb-cache/.gitignore
- rpmdb-cache/metadata.json
| for stream in ${streams}; do | ||
| ext_tag="${stream}-extensions" | ||
|
|
||
| # Populate sha256_* via --rpmdb-cache | ||
| if ! oc adm release info --rpmdb --rpmdb-cache="${CACHE_DIR}/" \ | ||
| --rpmdb-image="${stream}" --output=json \ | ||
| "${REGISTRY}:${tag}" >/dev/null 2>&1; then | ||
| echo -n "(rpmdb FAIL for ${stream}) " | ||
| continue | ||
| fi | ||
|
|
||
| # Resolve extensions image pullspec from release JSON | ||
| ext_pull=$(echo "${release_json}" | python3 -c " | ||
| import sys, json | ||
| data = json.load(sys.stdin) | ||
| for t in data['references']['spec']['tags']: | ||
| if t['name'] == '${ext_tag}': | ||
| print(t['from']['name']) | ||
| break | ||
| " 2>/dev/null) || true | ||
|
|
||
| if [[ -n "${ext_pull}" ]]; then | ||
| ext_sha="${ext_pull##*@sha256:}" | ||
| ext_cache="${CACHE_DIR}/extensions-${ext_sha}" | ||
|
|
||
| if [[ ! -f "${ext_cache}" ]]; then | ||
| tmpdir=$(mktemp -d) | ||
| if oc image extract "${ext_pull}[-1]" \ | ||
| --file="${EXTENSIONS_PATH}" \ | ||
| --path="${EXTENSIONS_PATH}:${tmpdir}" 2>/dev/null; then | ||
| if [[ -f "${tmpdir}/${EXTENSIONS_PATH}" ]]; then | ||
| mv "${tmpdir}/${EXTENSIONS_PATH}" "${ext_cache}" | ||
| elif [[ -f "${tmpdir}/extensions.json" ]]; then | ||
| mv "${tmpdir}/extensions.json" "${ext_cache}" | ||
| fi | ||
| else | ||
| echo -n "(ext extract FAIL for ${ext_tag}) " | ||
| fi | ||
| rm -rf "${tmpdir}" | ||
| fi | ||
| fi | ||
|
|
||
| stream_list_json=$(echo "${stream_list_json}" | jq --arg s "${stream}" '. + [$s]') | ||
| done | ||
|
|
||
| # Record in metadata | ||
| jq --arg v "${version}" --argjson s "${stream_list_json}" \ | ||
| '.versions[$v] = {"streams": $s}' "${METADATA}" > "${METADATA}.tmp" | ||
| mv "${METADATA}.tmp" "${METADATA}" | ||
|
|
||
| new_count=$((new_count + 1)) | ||
| echo "OK" |
There was a problem hiding this comment.
Only record a version after all discovered streams are cached successfully.
Line [158] writes metadata even when stream caching failed earlier (Lines [116]-[121]). That marks a partial version as complete, and future runs skip it, so missing artifacts are never retried.
Proposed fix
echo -n "streams=[${streams}] "
stream_list_json="[]"
+ total_streams=0
+ cached_streams=0
+ stream_failures=0
for stream in ${streams}; do
+ total_streams=$((total_streams + 1))
ext_tag="${stream}-extensions"
# Populate sha256_* via --rpmdb-cache
if ! oc adm release info --rpmdb --rpmdb-cache="${CACHE_DIR}/" \
--rpmdb-image="${stream}" --output=json \
"${REGISTRY}:${tag}" >/dev/null 2>&1; then
echo -n "(rpmdb FAIL for ${stream}) "
+ stream_failures=$((stream_failures + 1))
continue
fi
@@
stream_list_json=$(echo "${stream_list_json}" | jq --arg s "${stream}" '. + [$s]')
+ cached_streams=$((cached_streams + 1))
done
+
+ if [[ ${stream_failures} -ne 0 || ${cached_streams} -ne ${total_streams} || ${cached_streams} -eq 0 ]]; then
+ echo "FAILED (${cached_streams}/${total_streams} streams cached)"
+ error_count=$((error_count + 1))
+ continue
+ fi
# Record in metadata
jq --arg v "${version}" --argjson s "${stream_list_json}" \
'.versions[$v] = {"streams": $s}' "${METADATA}" > "${METADATA}.tmp"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for stream in ${streams}; do | |
| ext_tag="${stream}-extensions" | |
| # Populate sha256_* via --rpmdb-cache | |
| if ! oc adm release info --rpmdb --rpmdb-cache="${CACHE_DIR}/" \ | |
| --rpmdb-image="${stream}" --output=json \ | |
| "${REGISTRY}:${tag}" >/dev/null 2>&1; then | |
| echo -n "(rpmdb FAIL for ${stream}) " | |
| continue | |
| fi | |
| # Resolve extensions image pullspec from release JSON | |
| ext_pull=$(echo "${release_json}" | python3 -c " | |
| import sys, json | |
| data = json.load(sys.stdin) | |
| for t in data['references']['spec']['tags']: | |
| if t['name'] == '${ext_tag}': | |
| print(t['from']['name']) | |
| break | |
| " 2>/dev/null) || true | |
| if [[ -n "${ext_pull}" ]]; then | |
| ext_sha="${ext_pull##*@sha256:}" | |
| ext_cache="${CACHE_DIR}/extensions-${ext_sha}" | |
| if [[ ! -f "${ext_cache}" ]]; then | |
| tmpdir=$(mktemp -d) | |
| if oc image extract "${ext_pull}[-1]" \ | |
| --file="${EXTENSIONS_PATH}" \ | |
| --path="${EXTENSIONS_PATH}:${tmpdir}" 2>/dev/null; then | |
| if [[ -f "${tmpdir}/${EXTENSIONS_PATH}" ]]; then | |
| mv "${tmpdir}/${EXTENSIONS_PATH}" "${ext_cache}" | |
| elif [[ -f "${tmpdir}/extensions.json" ]]; then | |
| mv "${tmpdir}/extensions.json" "${ext_cache}" | |
| fi | |
| else | |
| echo -n "(ext extract FAIL for ${ext_tag}) " | |
| fi | |
| rm -rf "${tmpdir}" | |
| fi | |
| fi | |
| stream_list_json=$(echo "${stream_list_json}" | jq --arg s "${stream}" '. + [$s]') | |
| done | |
| # Record in metadata | |
| jq --arg v "${version}" --argjson s "${stream_list_json}" \ | |
| '.versions[$v] = {"streams": $s}' "${METADATA}" > "${METADATA}.tmp" | |
| mv "${METADATA}.tmp" "${METADATA}" | |
| new_count=$((new_count + 1)) | |
| echo "OK" | |
| echo -n "streams=[${streams}] " | |
| stream_list_json="[]" | |
| total_streams=0 | |
| cached_streams=0 | |
| stream_failures=0 | |
| for stream in ${streams}; do | |
| total_streams=$((total_streams + 1)) | |
| ext_tag="${stream}-extensions" | |
| # Populate sha256_* via --rpmdb-cache | |
| if ! oc adm release info --rpmdb --rpmdb-cache="${CACHE_DIR}/" \ | |
| --rpmdb-image="${stream}" --output=json \ | |
| "${REGISTRY}:${tag}" >/dev/null 2>&1; then | |
| echo -n "(rpmdb FAIL for ${stream}) " | |
| stream_failures=$((stream_failures + 1)) | |
| continue | |
| fi | |
| # Resolve extensions image pullspec from release JSON | |
| ext_pull=$(echo "${release_json}" | python3 -c " | |
| import sys, json | |
| data = json.load(sys.stdin) | |
| for t in data['references']['spec']['tags']: | |
| if t['name'] == '${ext_tag}': | |
| print(t['from']['name']) | |
| break | |
| " 2>/dev/null) || true | |
| if [[ -n "${ext_pull}" ]]; then | |
| ext_sha="${ext_pull##*`@sha256`:}" | |
| ext_cache="${CACHE_DIR}/extensions-${ext_sha}" | |
| if [[ ! -f "${ext_cache}" ]]; then | |
| tmpdir=$(mktemp -d) | |
| if oc image extract "${ext_pull}[-1]" \ | |
| --file="${EXTENSIONS_PATH}" \ | |
| --path="${EXTENSIONS_PATH}:${tmpdir}" 2>/dev/null; then | |
| if [[ -f "${tmpdir}/${EXTENSIONS_PATH}" ]]; then | |
| mv "${tmpdir}/${EXTENSIONS_PATH}" "${ext_cache}" | |
| elif [[ -f "${tmpdir}/extensions.json" ]]; then | |
| mv "${tmpdir}/extensions.json" "${ext_cache}" | |
| fi | |
| else | |
| echo -n "(ext extract FAIL for ${ext_tag}) " | |
| fi | |
| rm -rf "${tmpdir}" | |
| fi | |
| fi | |
| stream_list_json=$(echo "${stream_list_json}" | jq --arg s "${stream}" '. + [$s]') | |
| cached_streams=$((cached_streams + 1)) | |
| done | |
| if [[ ${stream_failures} -ne 0 || ${cached_streams} -ne ${total_streams} || ${cached_streams} -eq 0 ]]; then | |
| echo "FAILED (${cached_streams}/${total_streams} streams cached)" | |
| error_count=$((error_count + 1)) | |
| continue | |
| fi | |
| # Record in metadata | |
| jq --arg v "${version}" --argjson s "${stream_list_json}" \ | |
| '.versions[$v] = {"streams": $s}' "${METADATA}" > "${METADATA}.tmp" | |
| mv "${METADATA}.tmp" "${METADATA}" | |
| new_count=$((new_count + 1)) | |
| echo "OK" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/rpmdb-cache-generate.sh` around lines 112 - 163, The metadata commit
currently happens unconditionally after the streams loop (the jq write to
"${METADATA}" and increment of new_count), which records versions even when some
stream caching failed; change the logic in the for stream ... done block to
track success with a boolean/flag (e.g., success=true; set success=false when
the rpmdb check or extension extraction fails — references: variables/steps
ext_pull, ext_cache, the oc adm release info check and the ext extract branch),
and only run the jq --arg v "${version}" ... > "${METADATA}.tmp" mv and
new_count=$((new_count + 1)) when that success flag remains true (otherwise skip
metadata write so partial versions are not marked complete).
6dcf014 to
c05975a
Compare
|
I didn't intend to push the rpmdb cache generation script, sorry about that. I'm running it locally and will see about submitting a new PR once it's finished generating and we can discuss more about how to handle the cache. |
|
@sdodson: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
oc adm release info --image-forcalls by parsing image pullspecs from the release JSON (references.spec.tags[*].from.name) already cached in groupcache under the"releaseinfo"key. BothExecReleaseInfo.ImageReferenceForComponentand theCachingReleaseInfo"imagefor"/"machineosstreams"getters now resolve from cached JSON instead of spawningocsubprocesses.RpmdbOCSlotssemaphore scope so it is only held duringRpmListForStreamandRpmDiffForStream(the calls that spawnocsubprocesses and write to/tmp/rpmdb/).ListMachineOSStreamsandImageReferenceForComponentnow resolve outside the semaphore since they only do groupcache lookups and JSON parsing.Motivation
Profiling the changelog handler for a two-release diff (4.22.0-rc.4 → rc.5, two machine-OS streams) showed:
ListMachineOSStreamsImageReferenceForComponent×2RpmListForStream×2RpmDiffForStream×2Before (original code): semaphore held for ~34s per request (included
--image-foroc calls + duplicaterelease info -o json).After these changes:
On a freshly restarted controller where hundreds of release pages hit the semaphore simultaneously, this reduces per-request slot hold time from ~34s to ~6-12s — roughly 3-5x more concurrent changelog requests can proceed before the 16-slot limit causes 429 responses.
Test plan
go build ./...go test ./pkg/release-controller/... ./pkg/rhcos/...🤖 Generated with Claude Code
Summary by CodeRabbit
Performance
New Features
Documentation
Bug Fixes