Skip to content

NE-2572: use add/del haproxy api calls#763

Open
jcmoraisjr wants to merge 2 commits intoopenshift:masterfrom
jcmoraisjr:NE-2572-add-del-api
Open

NE-2572: use add/del haproxy api calls#763
jcmoraisjr wants to merge 2 commits intoopenshift:masterfrom
jcmoraisjr:NE-2572-add-del-api

Conversation

@jcmoraisjr
Copy link
Copy Markdown
Member

@jcmoraisjr jcmoraisjr commented Apr 10, 2026

DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.

Summary by CodeRabbit

  • New Features

    • Direct runtime backend server management (add/update/delete) with per-server health-check control.
  • Deprecated

    • The --max-dynamic-servers option is deprecated (flag kept for parsing only).
  • Changes

    • Router now emits a dynamic routing cookie key when dynamic management is enabled and no longer generates template-based server-templates; per-service endpoint tracking and endpoint-replacement behavior improved.
    • New working-directory and default destination CA options for dynamic management.
  • Tests

    • New comprehensive tests for backend dynamic operations (add/update/delete, health checks).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 10, 2026

@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Removed template-driven HAProxy dynamic-server generation and MaxDynamicServers. Introduced runtime HAProxy server management (add/update/delete/health) via a HAProxyClient interface, threaded backend/service context into ConfigManager APIs, and added per-service EndpointTable plus WorkingDir/DefaultDestinationCA options.

Changes

Runtime dynamic-backend refactor

Layer / File(s) Summary
Data Shape / Interface
pkg/router/template/types.go
ServiceAliasConfig adds EndpointTable; ConfigManagerOptions drops MaxDynamicServers, adds WorkingDir and DefaultDestinationCA; ConfigManager interface Register now takes backend *ServiceAliasConfig and ReplaceRouteEndpoints takes svc *ServiceUnit; server-template methods removed.
HAProxy client API
pkg/router/template/configmanager/haproxy/client.go
Added exported HAProxyClient interface (RunCommand, Execute) for runtime HAProxy interactions.
Core Backend implementation
pkg/router/template/configmanager/haproxy/backend.go
Backend switched to use HAProxyClient; removed exported server FQDN/InitialWeight; added AddServer, UpdateServer, EnableHealthCheck, DisableHealthCheck, DeleteServer and internal helpers to build/execute add/set/del commands; introduced apiType and execCommand validation.
Manager logic / orchestration
pkg/router/template/configmanager/haproxy/manager.go
haproxyConfigManager stores backend *ServiceAliasConfig, added workingDir/defaultDestinationCA, removed dynamic-server name/template tracking, and now applies endpoint diffs by calling DeleteServer/UpdateServer/AddServer directly; aggregates errors with errors.Join.
Router wiring
pkg/router/template/router.go
Passes backend into dynamicConfigManager.Register, initializes/maintains ServiceAliasConfig.EndpointTable, threads *ServiceUnit into ReplaceRouteEndpoints, refactors dynamic add/replace flows, and simplifies logging.
Template removal / emission
images/router/haproxy/conf/haproxy-config.template
Removed template-driven server-template/dynamic server emission; emit dynamic-cookie-key {{ $cfg.RoutingKeyName }} when dynamic manager present; per-endpoint cookie only emitted when dynamic manager absent.
CLI / bootstrap options
pkg/cmd/infra/router/template.go
Removed MaxDynamicServers field propagation; replaced active flag with deprecated parsing-only flag and now forwards WorkingDir and DefaultDestinationCA into templateplugin.ConfigManagerOptions.
Tests
pkg/router/template/configmanager/haproxy/backend_test.go, pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go, pkg/router/router_test.go
Added TestBackendDynamicUpdate covering add/update/health/delete sequences with a fake HAProxy client; updated fakeConfigManager and tests to new Register/ReplaceRouteEndpoints signatures; removed assertions relying on template-driven server-template output.
Helpers & misc
pkg/router/template/template_helper.go, go.mod
Added exported FirstMatch wrapper; promoted k8s.io/utils to direct require in go.mod.

Sequence Diagram(s)

sequenceDiagram
    participant Router as Router
    participant CM as ConfigManager
    participant HA as HAProxyRuntime
    Router->>CM: Register(id, backend, route)
    Router->>CM: ReplaceRouteEndpoints(id, svc, oldEPs, newEPs, weight)
    CM->>CM: compute added/modified/deleted endpoints
    CM->>HA: add server <backend> <name> addr <ip> [ssl/verify/...]
    HA-->>CM: OK / error
    CM->>HA: set server <backend>/<name> addr|weight|state ...
    HA-->>CM: OK / error
    CM->>HA: enable/disable health <backend>/<name>
    HA-->>CM: OK / error
    CM->>HA: del server <backend>/<name>
    HA-->>CM: OK / error
    CM-->>Router: result (nil or aggregated error)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Assertions in TestBackendDynamicUpdate lack meaningful failure messages. Lines 319-324 have require.NoError, require.EqualError, assert.Equal calls without messages to help diagnose failures. Add assertion messages to all require/assert calls. Example: require.NoError(t, err, "failed to execute command for case %s", name) and assert.Equal(t, expected, actual, "command mismatch for case %s", name).
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adoption of HAProxy add/del API calls for dynamic backend server management, which is the primary objective of this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable. Backend_test.go has 24 static names. Router_test.go has 28 static test case names. No dynamic content detected.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests added in PR. All new tests are standard Go unit tests (TestBackendDynamicUpdate, etc.). Custom check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added. All test changes are standard Go unit tests (testing.T pattern) with no Ginkgo imports or Describe/It/Context patterns.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies HAProxy router internals and backend management, not deployment manifests or operator code. No pod scheduling constraints, affinity rules, or topology assumptions are introduced.
Ote Binary Stdout Contract ✅ Passed No new process-level stdout writes detected. Pre-existing fmt.Println calls in router_test.go TestMain are test infrastructure (not added by this PR) and are correctly outside production binary scope.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. The only new test added (TestBackendDynamicUpdate) is a standard Go unit test using testify, not a Ginkgo e2e test. This check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from davidesalerno and gcs278 April 10, 2026 23:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 374-376: The ca-file path currently uses string(b.name) (HAProxy
backend name) which doesn't match the on-disk PEM name for route-specific
DestinationCACertificate; change the path construction so it uses the
destination CA certificate ID (the key used in cfg.Certificates / the route's
DestinationCACertificate) instead of string(b.name). Concretely, when choosing
the CA file in the conditional that checks defaultDestinationCA and
cfg.Certificates[cfg.Host+"_pod"], build the path with the destination CA ID
variable (use defaultDestinationCA when that's the chosen CA, or the
route-specific DestinationCACertificate ID otherwise) so the
path.Join(workingDir, "router/cacerts", "<destination-ca-id>.pem") matches the
actual written filename. Ensure references to b.name are removed from that
ca-file path logic.

In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 509-519: The code enables health checks for newly added endpoints
but mistakenly re-enables the former single endpoint from entry.activeEndpoints,
which is empty during initial sync; update the block that builds newEPs to
append the former endpoint from oldEndpoints (the pre-scale-up list) instead of
entry.activeEndpoints so the original server gets health checks on a 1→N
scale-up; locate the logic around newEndpoints, addedEndpoints, newEPs and
entry.activeEndpoints in the function in manager.go and replace the use of
entry.activeEndpoints with the appropriate element(s) from oldEndpoints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 00ae6c70-df1a-4e7b-9f2b-a256506f707a

📥 Commits

Reviewing files that changed from the base of the PR and between 8963907 and 74a07f4.

📒 Files selected for processing (9)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/template_helper.go
  • pkg/router/template/types.go

Comment thread pkg/router/template/configmanager/haproxy/backend.go Outdated
Comment thread pkg/router/template/configmanager/haproxy/manager.go Outdated
@jcmoraisjr jcmoraisjr force-pushed the NE-2572-add-del-api branch from 74a07f4 to c005e0f Compare April 14, 2026 14:50
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 14, 2026

@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added direct backend server management capabilities including add, update, enable/disable health checks, and delete operations.

  • Deprecated

  • --max-dynamic-servers flag is now deprecated.

  • Changes

  • Removed dynamic server template generation from HAProxy configuration; static per-endpoint server entries now exclusively managed.

  • Refactored endpoint tracking and server state management for improved control and flexibility.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/router/router_test.go (1)

177-179: Typo in connection info path: "dymmy" should be "dummy".

While this may be pre-existing, since this line is being modified, it's a good opportunity to fix the typo.

✏️ Suggested fix
 		DynamicConfigManager: haproxyconfigmanager.NewHAProxyConfigManager(templateplugin.ConfigManagerOptions{
-			ConnectionInfo: "unix:///var/lib/dymmy",
+			ConnectionInfo: "unix:///var/lib/dummy",
 		}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/router_test.go` around lines 177 - 179, The ConnectionInfo string
passed into haproxyconfigmanager.NewHAProxyConfigManager via the
ConfigManagerOptions in the DynamicConfigManager setup contains a typo
("unix:///var/lib/dymmy"); update that value to the correct path
("unix:///var/lib/dummy") so the DynamicConfigManager initialization
(haproxyconfigmanager.NewHAProxyConfigManager and
ConfigManagerOptions.ConnectionInfo) uses the proper connection path.
pkg/router/template/configmanager/haproxy/manager.go (1)

562-570: Inconsistent error handling: fail-fast vs aggregated errors.

RemoveRouteEndpoints returns immediately on the first DeleteServer error (fail-fast), while ReplaceRouteEndpoints aggregates all errors using errors.Join. Consider aligning the error handling patterns for consistency.

♻️ Option to align with ReplaceRouteEndpoints pattern
+	var errs []error
 	for _, ep := range endpoints {
 		log.V(4).Info("deleting server for endpoint", "endpoint", ep.ID)
 		if err := backend.DeleteServer(ep); err != nil {
-			return err
+			errs = append(errs, fmt.Errorf("error deleting server %s: %w", ep.ID, err))
 		}
 	}

-	return nil
+	return errors.Join(errs...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/configmanager/haproxy/manager.go` around lines 562 - 570,
RemoveRouteEndpoints currently fails fast on the first backend.DeleteServer
error while ReplaceRouteEndpoints aggregates errors with errors.Join; update
RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by collecting each
DeleteServer error into a slice (or []error), continue deleting remaining
endpoints, log per-endpoint failures with log.V(...) as currently done, and
finally return errors.Join(collectedErrs...) (or nil if none) instead of
returning immediately on the first error.
pkg/router/template/types.go (1)

186-191: Add documentation for new configuration fields.

The WorkingDir and DefaultDestinationCA fields have empty doc comments. These should be documented to clarify their purpose.

📝 Suggested documentation
-	//
+	// WorkingDir is the router's working directory containing configuration
+	// files, certificates, and other router-managed resources.
 	WorkingDir string

-	//
+	// DefaultDestinationCA is the path to the default CA certificate file used
+	// to verify backend server certificates for re-encrypt routes when no
+	// route-specific destination CA is configured.
 	DefaultDestinationCA string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/types.go` around lines 186 - 191, Add descriptive doc
comments for the WorkingDir and DefaultDestinationCA struct fields: explain that
WorkingDir is the path the router uses as its runtime working directory for
temporary files/operations, and DefaultDestinationCA is the PEM-encoded CA
certificate (or path/identifier used by your config) applied as the default
trust anchor for outbound TLS destination verification when no per-destination
CA is configured. Update the comments directly above the WorkingDir and
DefaultDestinationCA field declarations in types.go so they succinctly describe
expected value format and purpose for maintainers and users.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 495-497: The current early return when errsEndpoints has entries
aborts the subsequent health-check configuration and can leave backends
inconsistent; update the flow in the function that processes endpoints so that
after collecting errsEndpoints you still call the health-check configuration
routine (the existing health check code or method—e.g., configureHealthChecks /
the block immediately following endpoint processing) for successfully processed
backends, collect any health-check errors, and then return a joined error
containing both errsEndpoints and health-check errors (using errors.Join).
Ensure you do not short-circuit before invoking the health-check configuration
and aggregate all errors before returning.

---

Nitpick comments:
In `@pkg/router/router_test.go`:
- Around line 177-179: The ConnectionInfo string passed into
haproxyconfigmanager.NewHAProxyConfigManager via the ConfigManagerOptions in the
DynamicConfigManager setup contains a typo ("unix:///var/lib/dymmy"); update
that value to the correct path ("unix:///var/lib/dummy") so the
DynamicConfigManager initialization
(haproxyconfigmanager.NewHAProxyConfigManager and
ConfigManagerOptions.ConnectionInfo) uses the proper connection path.

In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 562-570: RemoveRouteEndpoints currently fails fast on the first
backend.DeleteServer error while ReplaceRouteEndpoints aggregates errors with
errors.Join; update RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by
collecting each DeleteServer error into a slice (or []error), continue deleting
remaining endpoints, log per-endpoint failures with log.V(...) as currently
done, and finally return errors.Join(collectedErrs...) (or nil if none) instead
of returning immediately on the first error.

In `@pkg/router/template/types.go`:
- Around line 186-191: Add descriptive doc comments for the WorkingDir and
DefaultDestinationCA struct fields: explain that WorkingDir is the path the
router uses as its runtime working directory for temporary files/operations, and
DefaultDestinationCA is the PEM-encoded CA certificate (or path/identifier used
by your config) applied as the default trust anchor for outbound TLS destination
verification when no per-destination CA is configured. Update the comments
directly above the WorkingDir and DefaultDestinationCA field declarations in
types.go so they succinctly describe expected value format and purpose for
maintainers and users.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: da89419c-e254-4010-955c-9113979cb278

📥 Commits

Reviewing files that changed from the base of the PR and between 74a07f4 and c005e0f.

📒 Files selected for processing (9)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/template_helper.go
  • pkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/router/template/template_helper.go
  • pkg/cmd/infra/router/template.go
  • pkg/router/template/router.go
  • pkg/router/template/configmanager/haproxy/backend.go

Comment thread pkg/router/template/configmanager/haproxy/manager.go Outdated
@jcmoraisjr jcmoraisjr force-pushed the NE-2572-add-del-api branch from c005e0f to e157196 Compare April 14, 2026 19:06
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 14, 2026

@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.

Summary by CodeRabbit

  • New Features

  • Added direct backend server management: add/update/delete servers and enable/disable health checks.

  • Deprecated

  • The --max-dynamic-servers option is now deprecated.

  • Changes

  • HAProxy config now emits a dynamic routing cookie key when dynamic management is enabled; dynamic server-template generation removed in favor of per-endpoint server entries.

  • Internal routing/backend APIs and config options updated; new working-directory and destination-CA options added, and per-service endpoint tracking improved.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Line 352: The current log.Info call uses a printf-style format ("disabling
backend server %s...") but is invoked as structured logging and therefore never
interpolates the server name; update the call in backend.go to include the
server identifier and error as structured fields instead of a format string
(e.g. replace the log.Info(...) line with a structured call that attaches the
server name/ID and err — using the existing err variable and the server
identifier variable in scope — such as log.With("server", <serverIdentifier>,
"error", err).Info("disabling backend server instead of deleting due to a delete
failure")).
- Around line 347-355: The DeleteServer function currently calls
innerSetServerState and then calls innerDeleteServer but swallows any error from
innerDeleteServer after logging; change DeleteServer to propagate the
innerDeleteServer error instead of silently returning nil (or at minimum log at
V(0)); specifically, in DeleteServer (which calls innerSetServerState and
innerDeleteServer) return the error from innerDeleteServer when it fails (or
upgrade the log to log.V(0).Info if you intentionally want to keep state change
and not fail), so callers see the deletion failure rather than the method
appearing to succeed.

In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 401-407: FindBackend returns a backend struct without populating
its servers map, so before iterating backend.servers you must call
backend.Refresh() and handle its error; specifically, after calling
cm.client.FindBackend(backendName) check the error, then call backend.Refresh()
and return any error, then proceed to log and loop over backend.servers calling
backend.DeleteServer(templaterouter.Endpoint{ID: server.Name}) as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5fc1d692-59ad-4478-bbdf-cfbeae5a7e78

📥 Commits

Reviewing files that changed from the base of the PR and between c005e0f and e157196.

📒 Files selected for processing (9)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/template_helper.go
  • pkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/router/template/template_helper.go
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/router.go

Comment thread pkg/router/template/configmanager/haproxy/backend.go Outdated
Comment thread pkg/router/template/configmanager/haproxy/backend.go Outdated
Comment thread pkg/router/template/configmanager/haproxy/manager.go
@jcmoraisjr jcmoraisjr force-pushed the NE-2572-add-del-api branch from e157196 to e506ef8 Compare April 15, 2026 19:44
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 15, 2026

@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.

Summary by CodeRabbit

  • New Features

  • Direct runtime backend server management (add/update/delete servers) including per-server health-check control.

  • Deprecated

  • The --max-dynamic-servers option is now deprecated (flag preserved for parsing).

  • Changes

  • Dynamic routing cookie key emitted when dynamic management is enabled; template-driven server-template generation removed in favor of explicit per-endpoint server entries.

  • New working-directory and destination-CA options; per-service endpoint tracking improved and endpoint replacement logic refined.

  • Tests

  • Comprehensive backend dynamic update tests added covering add/update/delete and health-check scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/router/template/configmanager/haproxy/backend.go (1)

570-571: Unreachable validation case for empty response.

The valid = response == "" check for apiSetHealth, apiSetServerWeight, and apiSetServerState is unreachable since empty responses already return nil at line 558-560. This code path can never set valid = true.

This is functionally correct (non-empty responses will correctly fail validation), but the logic is misleading. Consider simplifying:

♻️ Suggested simplification
 	case apiSetHealth, apiSetServerWeight, apiSetServerState:
-		valid = response == ""
+		// These commands return empty on success; any response indicates an error.
+		valid = false
 	default:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 570 - 571,
The switch case handling apiSetHealth, apiSetServerWeight, and apiSetServerState
contains an unreachable assignment "valid = response == """ because empty
responses are already returned as nil earlier (the early return at the
empty-response check); remove the redundant "valid = response == """ branch from
that case (or remove the entire case and let the default/failure path handle
non-empty responses) so the validation logic in the function no longer contains
misleading/unreachable code while preserving the existing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 570-571: The switch case handling apiSetHealth,
apiSetServerWeight, and apiSetServerState contains an unreachable assignment
"valid = response == """ because empty responses are already returned as nil
earlier (the early return at the empty-response check); remove the redundant
"valid = response == """ branch from that case (or remove the entire case and
let the default/failure path handle non-empty responses) so the validation logic
in the function no longer contains misleading/unreachable code while preserving
the existing behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 20f3c29b-9f87-45cd-a87f-669758923b4a

📥 Commits

Reviewing files that changed from the base of the PR and between e157196 and e506ef8.

📒 Files selected for processing (11)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/client.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/template_helper.go
  • pkg/router/template/types.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/router/template/template_helper.go
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go

DCM currently works by creating blueprint backends and empty backend
server slots upfront. These empty slots are updated later with new
endpoints, created on scale out operations, without the need to
fork-and-reload haproxy. Scale in operations work in a similar way:
backend servers are disabled when endpoints are not available anymore.

HAProxy has add/del apis since 2.5, which adds the benefit of not having
to create empty slots. This means that we don't need to predict the
ideal number of empty servers per backend, so backends can be configured
with the currently active endpoints only, and it can scale to any number
of new backend servers without the need to reload.

This update is being made in at least two phases:

Phase 1: update config manager (manager.go) and client's backend
(backend.go) the minimum to support the new api calls. The priorities
are: new approach fully functional and an easier code update to revise.

Phase 2: cleanup dead code, eventually combining types into a single,
coherent and simplified struct.

New phases should happen as a revisit on other parts of the code that
can be optimized with the new approach, or the code cleanup, like
changing route resources and the handling of the haproxy lookup maps.
@jcmoraisjr jcmoraisjr force-pushed the NE-2572-add-del-api branch from e506ef8 to a94c31f Compare April 15, 2026 20:15
@bentito
Copy link
Copy Markdown
Contributor

bentito commented Apr 22, 2026

/assign @bentito @davidesalerno

Copy link
Copy Markdown
Contributor

@davidesalerno davidesalerno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look almost good even if I have some feedbacks to share that could help us to improve the final work.

return err
}
if err := b.innerDeleteServer(ep); err != nil {
log.Info("disabling backend server instead of deleting due to a delete failure", "server", ep.ID, "error", err.Error())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are returning nil even when innerDeleteServer() fails, so callers cannot distinguish “server was drained/put in maint” from “server was fully removed from HAProxy”.
This makes error handling less explicit and can hide partial cleanup failures.
If keeping the backend in maint is intentional to avoid reloads, I’d suggest either returning a typed/non-fatal error to the caller or documenting this degraded-success contract in the method comment so the behavior is explicit to maintainers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expected scenario, but your concern and comments make sense. Just changed the method signature to return whether a removal fully succeeded or not, added doc, improved test, but ignored from the caller since we don't need this state right now.

return nil, fmt.Errorf("no server found for id: %s", id)
}

func (b *Backend) AddServer(cfg *templaterouter.ServiceAliasConfig, svc *templaterouter.ServiceUnit, ep templaterouter.Endpoint, weight int32, workingDir, defaultDestinationCA string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it could be useful add a small documentation to all the functions of the backend struct that are exposed/public.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure makes sense. Just added.

valid = response == "Server deleted."
case apiSetServerAddr:
valid = response == "nothing changed" || strings.HasPrefix(response, "IP changed from ") || strings.HasPrefix(response, "port changed from ") || strings.HasPrefix(response, "no need to change ")
case apiSetHealth, apiSetServerWeight, apiSetServerState:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor readability nit: this branch seems to be unreachable as written, because empty responses already return at the top of execCommand().

The behavior is correct, but the switch suggests those commands can still validate response == "" here. I’d simplify this case so non-empty responses for those commands always fall through to the generic error path; that makes the success contract clearer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we need those entries just here, otherwise it would panic as invalid IDs. So I changed it to a hard coded false instead and added a comment, wondering if it sounds better now.

func (b *Backend) innerAddServer(cfg *templaterouter.ServiceAliasConfig, svc *templaterouter.ServiceUnit, ep templaterouter.Endpoint, weight int32, workingDir, defaultDestinationCA string) error {
// This should always follow the template, changes here should be reflected there, both regular and passthrough backends
//
// TODO: either read this configuration from the template, or instead make the template read from here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TODO here is worth treating as real follow-up feedback: the server definition is now duplicated between the template and innerAddServer(), which is fragile for a code path this sensitive. Any future change to reencrypt options, proto h2, health checks, cookies, or maxconn has to be updated in two places and kept byte-for-byte aligned.
I’d strongly prefer a single source of truth soon after this lands—either generate the runtime command from shared logic, or have the template consume the same data/renderer so static and dynamic backends cannot drift.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just added https://redhat.atlassian.net/browse/NE-2646 to handle this.

@jcmoraisjr
Copy link
Copy Markdown
Member Author

/test e2e-upgrade

}
// Failed due to already existing server left behind, in maintenance mode, due to in-flight connections.
// Let's just give it another chance to be deleted.
if err := b.innerDeleteServer(ep); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we first put the server into maint mode (like DeleteServer does)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the case here, the only scenario that makes sense is an already maint server that failed to be deleted before due to established connections, now we're trying to remove it again, and it should fail if there are still active connections and the only option is to reload - by returning an error.

delete(modifiedEndpoints, relatedEndpointID)
if len(oldEndpoints) == 1 {
// enabling also for the former single endpoint as well
newEPs = append(newEPs, oldEndpoints[0])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ignores modifiedEndpoints. If an existing endpoint was modified in a way that requires its health check state to change, it will be missed, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is only to catch the former "single endpoint" which by design doesn't have its health check enabled. Modifications don't disable health checks, and additions are already covered elsewhere.

Copy link
Copy Markdown
Contributor

@bentito bentito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review findings

// passthrough is a TCP listener and does not use ssl or proto related config
}

if !ep.NoHealthCheck {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this if !ep.NoHealthCheck, the health check parameters are always initialized. As-is, I think HAProxy rejects future dynamic enable health commands for this endpoint because the check was never configured.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable health shouldn't happen since NoHealthCheck denies it, so we should be good. Anyway you're right, in case NoHealthCheck changes we'd need a reload just to configure check and inter, and a dynamically added endpoint already starts with health check disabled. Lemme better explore this, I think we can in fact drop this condition and always add check and inter. Thanks for the heads up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this one. Note that I also removed NoHealthCheck from the backend tests since it is not used in this part of the client.

@jcmoraisjr jcmoraisjr force-pushed the NE-2572-add-del-api branch from 6f2571a to 80b8c12 Compare May 5, 2026 12:31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/router/template/configmanager/haproxy/client.go (1)

27-30: 💤 Low value

Add a godoc comment to the exported HAProxyClient interface.

This is the public abstraction that Backend now depends on, but it has no documentation. Per Go convention (and tools like revive/staticcheck), exported types should carry a doc comment that begins with the type name.

Suggested doc comment
+// HAProxyClient is the minimal client surface used by Backend to issue
+// HAProxy dynamic config API commands. RunCommand applies an optional
+// response Converter, while Execute returns the raw response bytes.
 type HAProxyClient interface {
 	RunCommand(cmd string, converter Converter) ([]byte, error)
 	Execute(cmd string) ([]byte, error)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/configmanager/haproxy/client.go` around lines 27 - 30,
Add a Godoc comment for the exported HAProxyClient interface that begins with
"HAProxyClient" and briefly describes its purpose as the public abstraction used
by Backend to run and execute HAProxy commands; place the comment immediately
above the type declaration and mention the primary methods RunCommand(cmd
string, converter Converter) and Execute(cmd string) so readers/tools understand
it wraps command execution and conversion behavior.
pkg/router/template/configmanager/haproxy/backend.go (1)

12-12: 💤 Low value

Use the repo-wide routev1 alias for consistency.

The rest of the codebase (e.g., pkg/router/template/types.go, the new backend_test.go in this PR, and other configmanager files) imports this package as routev1. Using v1 here makes greps/symbol navigation for routev1.TLSTermination* skip this file.

Suggested change
-	v1 "github.com/openshift/api/route/v1"
+	routev1 "github.com/openshift/api/route/v1"
 	templaterouter "github.com/openshift/router/pkg/router/template"

And at the switch on lines 381–403:

-	case v1.TLSTerminationReencrypt:
+	case routev1.TLSTerminationReencrypt:
 		...
-	case "", v1.TLSTerminationEdge:
+	case "", routev1.TLSTerminationEdge:
 		...
-	case v1.TLSTerminationPassthrough:
+	case routev1.TLSTerminationPassthrough:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/configmanager/haproxy/backend.go` at line 12, The import
of "github.com/openshift/api/route/v1" uses the alias v1 but the repo standard
is routev1; change the import alias to routev1 and update all usages (e.g., the
switch handling TLSTermination types that currently references
v1.TLSTermination*) to use routev1.TLSTerminationEdge,
routev1.TLSTerminationReencrypt, routev1.TLSTerminationPassthrough (and any
other routev1.* symbols) so symbol lookups and greps match the rest of the
codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 310-323: The code in the add-server retry path currently detects
the "already exists" condition by doing strings.Contains(err.Error(), "Already
exists a server "), which relies on an undocumented HAProxy error message;
update the code by (1) adding a clear explanatory comment immediately above that
strings.Contains check in the add-server block (near innerAddServer /
innerDeleteServer calls) that documents the dependency on that exact HAProxy
message and references the unit test "should retry adding insecure server" in
backend_test.go, and (2) encapsulate the match in a small helper (e.g.,
isAlreadyExistsError(err)) so the detection logic and its documentation live in
one place for easier future updates or replacement with a version-specific
check.

---

Nitpick comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Line 12: The import of "github.com/openshift/api/route/v1" uses the alias v1
but the repo standard is routev1; change the import alias to routev1 and update
all usages (e.g., the switch handling TLSTermination types that currently
references v1.TLSTermination*) to use routev1.TLSTerminationEdge,
routev1.TLSTerminationReencrypt, routev1.TLSTerminationPassthrough (and any
other routev1.* symbols) so symbol lookups and greps match the rest of the
codebase.

In `@pkg/router/template/configmanager/haproxy/client.go`:
- Around line 27-30: Add a Godoc comment for the exported HAProxyClient
interface that begins with "HAProxyClient" and briefly describes its purpose as
the public abstraction used by Backend to run and execute HAProxy commands;
place the comment immediately above the type declaration and mention the primary
methods RunCommand(cmd string, converter Converter) and Execute(cmd string) so
readers/tools understand it wraps command execution and conversion behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e144d092-a57a-401b-94e5-c1a0c83f707f

📥 Commits

Reviewing files that changed from the base of the PR and between e506ef8 and 80b8c12.

📒 Files selected for processing (12)
  • go.mod
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/cmd/infra/router/template.go
  • pkg/router/router_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/client.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/template_helper.go
  • pkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/router/router_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/router/template/template_helper.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/cmd/infra/router/template.go
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go

Comment on lines +310 to +323
if err := b.innerAddServer(cfg, svc, ep, weight, workingDir, defaultDestinationCA); err != nil {
if !strings.Contains(err.Error(), "Already exists a server ") {
return err
}
// Failed due to already existing server left behind, in maintenance mode, due to in-flight connections.
// Let's just give it another chance to be deleted.
if err := b.innerDeleteServer(ep); err != nil {
// No way, need to fail which will ask for a fork-and-reload. This will leave the existing connections in the old process.
return err
}
if err := b.innerAddServer(cfg, svc, ep, weight, workingDir, defaultDestinationCA); err != nil {
return err
}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

HAProxy "add server" CLI dynamic API "Already exists a server" exact response wording across 2.5, 2.6, 2.8, 3.0

💡 Result:

No sources provide the exact wording of the "Already exists a server" error response for the HAProxy "add server" CLI command across versions 2.5, 2.6, 2.8, or 3.0. Official documentation states that the server name cannot already be in use in the backend [1][2][3], and design discussions confirm a check for "server name already registered for the backend" [4]. Source code snippets do not reveal the precise CLI rejection message [5][6][7]. In 3.0, a different error "'enabled' option is not accepted for dynamic server" is shown for invalid keywords [1]. The feature was experimental in 2.5 [8][9][10], stabilized in later versions. No version-specific differences in the error wording were found; it likely rejects with a message indicating the server already exists, but exact phrasing remains unconfirmed in available sources.

Citations:


🏁 Script executed:

# Search for test files related to backend.go
fd -e '*test\.go' | rg -i 'backend' | head -20

Repository: openshift/router

Length of output: 42


🏁 Script executed:

# Search for references to "Already exists" or innerAddServer in the codebase
rg -i "already exists" --type go -A 3 -B 3

Repository: openshift/router

Length of output: 50372


🏁 Script executed:

# Search for innerAddServer and innerDeleteServer implementations
rg "func.*innerAddServer|func.*innerDeleteServer" --type go -A 10

Repository: openshift/router

Length of output: 2419


🏁 Script executed:

# Find test files in the haproxy directory
find pkg/router/template/configmanager/haproxy -name "*test.go" -o -name "*_test.go"

Repository: openshift/router

Length of output: 356


🏁 Script executed:

# Get the full innerAddServer implementation
sed -n '/func.*innerAddServer/,/^func/p' pkg/router/template/configmanager/haproxy/backend.go | head -60

Repository: openshift/router

Length of output: 2393


🏁 Script executed:

# Search for where the "Already exists a server" message might be checked
rg "innerAddServer|innerDeleteServer" --type go -B 2 -A 15 | head -100

Repository: openshift/router

Length of output: 7207


🏁 Script executed:

# Check if backend_test.go has tests for the "Already exists a server" scenario
rg "Already exists|AddServer" pkg/router/template/configmanager/haproxy/backend_test.go -A 10 -B 2

Repository: openshift/router

Length of output: 1003


🏁 Script executed:

# Search for any other references to AddServer method in test files
rg "AddServer" pkg/router/template/configmanager/haproxy/ --type go -B 2 -A 10

Repository: openshift/router

Length of output: 10284


Error message match relies on undocumented HAProxy behavior — consider documenting or asserting test coverage.

The retry path depends on strings.Contains(err.Error(), "Already exists a server ") to detect when a server already exists. This substring is not part of HAProxy's official API documentation, and no version-specific guarantees exist across 2.5, 2.6, 2.8, and 3.0. While a unit test in backend_test.go (test case "should retry adding insecure server") already validates this error path, consider adding an explanatory comment documenting the reliance on this specific HAProxy response wording to protect against silent degradation if the message changes in future versions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 310 - 323,
The code in the add-server retry path currently detects the "already exists"
condition by doing strings.Contains(err.Error(), "Already exists a server "),
which relies on an undocumented HAProxy error message; update the code by (1)
adding a clear explanatory comment immediately above that strings.Contains check
in the add-server block (near innerAddServer / innerDeleteServer calls) that
documents the dependency on that exact HAProxy message and references the unit
test "should retry adding insecure server" in backend_test.go, and (2)
encapsulate the match in a small helper (e.g., isAlreadyExistsError(err)) so the
detection logic and its documentation live in one place for easier future
updates or replacement with a version-specific check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is however an unfortunate consequence of how HAProxy API is designed, and our whole client code is designed to handle HAProxy particularities. This is not only related with the responses you pointed out, but also behaviors depending on current state and what changes. We are already dealing with those particularities in two fronts: ensure correct implementation via unit tests as you already found, and ensure correct behavior via e2e tests that cover changes on messages and behaviors.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcmoraisjr, that makes sense — thank you for the context. Relying on HAProxy's response strings is an inherent aspect of the dynamic API, and having both unit tests for implementation correctness and e2e tests to catch any upstream behavioral or message changes is a solid two-layer safety net for this kind of coupling. I'll close out the comment.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@jcmoraisjr
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@jcmoraisjr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@davidesalerno
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidesalerno

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants