NE-2572: use add/del haproxy api calls#763
NE-2572: use add/del haproxy api calls#763jcmoraisjr wants to merge 2 commits intoopenshift:masterfrom
Conversation
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRemoved template-driven HAProxy dynamic-server generation and MaxDynamicServers. Introduced runtime HAProxy server management (add/update/delete/health) via a HAProxyClient interface, threaded backend/service context into ConfigManager APIs, and added per-service EndpointTable plus WorkingDir/DefaultDestinationCA options. ChangesRuntime dynamic-backend refactor
Sequence Diagram(s)sequenceDiagram
participant Router as Router
participant CM as ConfigManager
participant HA as HAProxyRuntime
Router->>CM: Register(id, backend, route)
Router->>CM: ReplaceRouteEndpoints(id, svc, oldEPs, newEPs, weight)
CM->>CM: compute added/modified/deleted endpoints
CM->>HA: add server <backend> <name> addr <ip> [ssl/verify/...]
HA-->>CM: OK / error
CM->>HA: set server <backend>/<name> addr|weight|state ...
HA-->>CM: OK / error
CM->>HA: enable/disable health <backend>/<name>
HA-->>CM: OK / error
CM->>HA: del server <backend>/<name>
HA-->>CM: OK / error
CM-->>Router: result (nil or aggregated error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 374-376: The ca-file path currently uses string(b.name) (HAProxy
backend name) which doesn't match the on-disk PEM name for route-specific
DestinationCACertificate; change the path construction so it uses the
destination CA certificate ID (the key used in cfg.Certificates / the route's
DestinationCACertificate) instead of string(b.name). Concretely, when choosing
the CA file in the conditional that checks defaultDestinationCA and
cfg.Certificates[cfg.Host+"_pod"], build the path with the destination CA ID
variable (use defaultDestinationCA when that's the chosen CA, or the
route-specific DestinationCACertificate ID otherwise) so the
path.Join(workingDir, "router/cacerts", "<destination-ca-id>.pem") matches the
actual written filename. Ensure references to b.name are removed from that
ca-file path logic.
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 509-519: The code enables health checks for newly added endpoints
but mistakenly re-enables the former single endpoint from entry.activeEndpoints,
which is empty during initial sync; update the block that builds newEPs to
append the former endpoint from oldEndpoints (the pre-scale-up list) instead of
entry.activeEndpoints so the original server gets health checks on a 1→N
scale-up; locate the logic around newEndpoints, addedEndpoints, newEPs and
entry.activeEndpoints in the function in manager.go and replace the use of
entry.activeEndpoints with the appropriate element(s) from oldEndpoints.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 00ae6c70-df1a-4e7b-9f2b-a256506f707a
📒 Files selected for processing (9)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
74a07f4 to
c005e0f
Compare
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/router/router_test.go (1)
177-179: Typo in connection info path: "dymmy" should be "dummy".While this may be pre-existing, since this line is being modified, it's a good opportunity to fix the typo.
✏️ Suggested fix
DynamicConfigManager: haproxyconfigmanager.NewHAProxyConfigManager(templateplugin.ConfigManagerOptions{ - ConnectionInfo: "unix:///var/lib/dymmy", + ConnectionInfo: "unix:///var/lib/dummy", }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/router_test.go` around lines 177 - 179, The ConnectionInfo string passed into haproxyconfigmanager.NewHAProxyConfigManager via the ConfigManagerOptions in the DynamicConfigManager setup contains a typo ("unix:///var/lib/dymmy"); update that value to the correct path ("unix:///var/lib/dummy") so the DynamicConfigManager initialization (haproxyconfigmanager.NewHAProxyConfigManager and ConfigManagerOptions.ConnectionInfo) uses the proper connection path.pkg/router/template/configmanager/haproxy/manager.go (1)
562-570: Inconsistent error handling: fail-fast vs aggregated errors.
RemoveRouteEndpointsreturns immediately on the firstDeleteServererror (fail-fast), whileReplaceRouteEndpointsaggregates all errors usingerrors.Join. Consider aligning the error handling patterns for consistency.♻️ Option to align with ReplaceRouteEndpoints pattern
+ var errs []error for _, ep := range endpoints { log.V(4).Info("deleting server for endpoint", "endpoint", ep.ID) if err := backend.DeleteServer(ep); err != nil { - return err + errs = append(errs, fmt.Errorf("error deleting server %s: %w", ep.ID, err)) } } - return nil + return errors.Join(errs...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/template/configmanager/haproxy/manager.go` around lines 562 - 570, RemoveRouteEndpoints currently fails fast on the first backend.DeleteServer error while ReplaceRouteEndpoints aggregates errors with errors.Join; update RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by collecting each DeleteServer error into a slice (or []error), continue deleting remaining endpoints, log per-endpoint failures with log.V(...) as currently done, and finally return errors.Join(collectedErrs...) (or nil if none) instead of returning immediately on the first error.pkg/router/template/types.go (1)
186-191: Add documentation for new configuration fields.The
WorkingDirandDefaultDestinationCAfields have empty doc comments. These should be documented to clarify their purpose.📝 Suggested documentation
- // + // WorkingDir is the router's working directory containing configuration + // files, certificates, and other router-managed resources. WorkingDir string - // + // DefaultDestinationCA is the path to the default CA certificate file used + // to verify backend server certificates for re-encrypt routes when no + // route-specific destination CA is configured. DefaultDestinationCA string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/template/types.go` around lines 186 - 191, Add descriptive doc comments for the WorkingDir and DefaultDestinationCA struct fields: explain that WorkingDir is the path the router uses as its runtime working directory for temporary files/operations, and DefaultDestinationCA is the PEM-encoded CA certificate (or path/identifier used by your config) applied as the default trust anchor for outbound TLS destination verification when no per-destination CA is configured. Update the comments directly above the WorkingDir and DefaultDestinationCA field declarations in types.go so they succinctly describe expected value format and purpose for maintainers and users.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 495-497: The current early return when errsEndpoints has entries
aborts the subsequent health-check configuration and can leave backends
inconsistent; update the flow in the function that processes endpoints so that
after collecting errsEndpoints you still call the health-check configuration
routine (the existing health check code or method—e.g., configureHealthChecks /
the block immediately following endpoint processing) for successfully processed
backends, collect any health-check errors, and then return a joined error
containing both errsEndpoints and health-check errors (using errors.Join).
Ensure you do not short-circuit before invoking the health-check configuration
and aggregate all errors before returning.
---
Nitpick comments:
In `@pkg/router/router_test.go`:
- Around line 177-179: The ConnectionInfo string passed into
haproxyconfigmanager.NewHAProxyConfigManager via the ConfigManagerOptions in the
DynamicConfigManager setup contains a typo ("unix:///var/lib/dymmy"); update
that value to the correct path ("unix:///var/lib/dummy") so the
DynamicConfigManager initialization
(haproxyconfigmanager.NewHAProxyConfigManager and
ConfigManagerOptions.ConnectionInfo) uses the proper connection path.
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 562-570: RemoveRouteEndpoints currently fails fast on the first
backend.DeleteServer error while ReplaceRouteEndpoints aggregates errors with
errors.Join; update RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by
collecting each DeleteServer error into a slice (or []error), continue deleting
remaining endpoints, log per-endpoint failures with log.V(...) as currently
done, and finally return errors.Join(collectedErrs...) (or nil if none) instead
of returning immediately on the first error.
In `@pkg/router/template/types.go`:
- Around line 186-191: Add descriptive doc comments for the WorkingDir and
DefaultDestinationCA struct fields: explain that WorkingDir is the path the
router uses as its runtime working directory for temporary files/operations, and
DefaultDestinationCA is the PEM-encoded CA certificate (or path/identifier used
by your config) applied as the default trust anchor for outbound TLS destination
verification when no per-destination CA is configured. Update the comments
directly above the WorkingDir and DefaultDestinationCA field declarations in
types.go so they succinctly describe expected value format and purpose for
maintainers and users.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: da89419c-e254-4010-955c-9113979cb278
📒 Files selected for processing (9)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
- pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/router/template/template_helper.go
- pkg/cmd/infra/router/template.go
- pkg/router/template/router.go
- pkg/router/template/configmanager/haproxy/backend.go
c005e0f to
e157196
Compare
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Line 352: The current log.Info call uses a printf-style format ("disabling
backend server %s...") but is invoked as structured logging and therefore never
interpolates the server name; update the call in backend.go to include the
server identifier and error as structured fields instead of a format string
(e.g. replace the log.Info(...) line with a structured call that attaches the
server name/ID and err — using the existing err variable and the server
identifier variable in scope — such as log.With("server", <serverIdentifier>,
"error", err).Info("disabling backend server instead of deleting due to a delete
failure")).
- Around line 347-355: The DeleteServer function currently calls
innerSetServerState and then calls innerDeleteServer but swallows any error from
innerDeleteServer after logging; change DeleteServer to propagate the
innerDeleteServer error instead of silently returning nil (or at minimum log at
V(0)); specifically, in DeleteServer (which calls innerSetServerState and
innerDeleteServer) return the error from innerDeleteServer when it fails (or
upgrade the log to log.V(0).Info if you intentionally want to keep state change
and not fail), so callers see the deletion failure rather than the method
appearing to succeed.
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 401-407: FindBackend returns a backend struct without populating
its servers map, so before iterating backend.servers you must call
backend.Refresh() and handle its error; specifically, after calling
cm.client.FindBackend(backendName) check the error, then call backend.Refresh()
and return any error, then proceed to log and loop over backend.servers calling
backend.DeleteServer(templaterouter.Endpoint{ID: server.Name}) as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5fc1d692-59ad-4478-bbdf-cfbeae5a7e78
📒 Files selected for processing (9)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
- pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/router/template/template_helper.go
- pkg/cmd/infra/router/template.go
- pkg/router/router_test.go
- pkg/router/template/router.go
e157196 to
e506ef8
Compare
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/router/template/configmanager/haproxy/backend.go (1)
570-571: Unreachable validation case for empty response.The
valid = response == ""check forapiSetHealth,apiSetServerWeight, andapiSetServerStateis unreachable since empty responses already returnnilat line 558-560. This code path can never setvalid = true.This is functionally correct (non-empty responses will correctly fail validation), but the logic is misleading. Consider simplifying:
♻️ Suggested simplification
case apiSetHealth, apiSetServerWeight, apiSetServerState: - valid = response == "" + // These commands return empty on success; any response indicates an error. + valid = false default:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 570 - 571, The switch case handling apiSetHealth, apiSetServerWeight, and apiSetServerState contains an unreachable assignment "valid = response == """ because empty responses are already returned as nil earlier (the early return at the empty-response check); remove the redundant "valid = response == """ branch from that case (or remove the entire case and let the default/failure path handle non-empty responses) so the validation logic in the function no longer contains misleading/unreachable code while preserving the existing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 570-571: The switch case handling apiSetHealth,
apiSetServerWeight, and apiSetServerState contains an unreachable assignment
"valid = response == """ because empty responses are already returned as nil
earlier (the early return at the empty-response check); remove the redundant
"valid = response == """ branch from that case (or remove the entire case and
let the default/failure path handle non-empty responses) so the validation logic
in the function no longer contains misleading/unreachable code while preserving
the existing behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 20f3c29b-9f87-45cd-a87f-669758923b4a
📒 Files selected for processing (11)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/backend_test.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/client.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
🚧 Files skipped from review as they are similar to previous changes (6)
- pkg/router/template/template_helper.go
- pkg/cmd/infra/router/template.go
- pkg/router/router_test.go
- images/router/haproxy/conf/haproxy-config.template
- pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
- pkg/router/template/configmanager/haproxy/manager.go
DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore. HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload. This update is being made in at least two phases: Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise. Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct. New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.
e506ef8 to
a94c31f
Compare
|
/assign @bentito @davidesalerno |
davidesalerno
left a comment
There was a problem hiding this comment.
The changes look almost good even if I have some feedbacks to share that could help us to improve the final work.
| return err | ||
| } | ||
| if err := b.innerDeleteServer(ep); err != nil { | ||
| log.Info("disabling backend server instead of deleting due to a delete failure", "server", ep.ID, "error", err.Error()) |
There was a problem hiding this comment.
We are returning nil even when innerDeleteServer() fails, so callers cannot distinguish “server was drained/put in maint” from “server was fully removed from HAProxy”.
This makes error handling less explicit and can hide partial cleanup failures.
If keeping the backend in maint is intentional to avoid reloads, I’d suggest either returning a typed/non-fatal error to the caller or documenting this degraded-success contract in the method comment so the behavior is explicit to maintainers.
There was a problem hiding this comment.
This is an expected scenario, but your concern and comments make sense. Just changed the method signature to return whether a removal fully succeeded or not, added doc, improved test, but ignored from the caller since we don't need this state right now.
| return nil, fmt.Errorf("no server found for id: %s", id) | ||
| } | ||
|
|
||
| func (b *Backend) AddServer(cfg *templaterouter.ServiceAliasConfig, svc *templaterouter.ServiceUnit, ep templaterouter.Endpoint, weight int32, workingDir, defaultDestinationCA string) error { |
There was a problem hiding this comment.
I think that it could be useful add a small documentation to all the functions of the backend struct that are exposed/public.
There was a problem hiding this comment.
Sure makes sense. Just added.
| valid = response == "Server deleted." | ||
| case apiSetServerAddr: | ||
| valid = response == "nothing changed" || strings.HasPrefix(response, "IP changed from ") || strings.HasPrefix(response, "port changed from ") || strings.HasPrefix(response, "no need to change ") | ||
| case apiSetHealth, apiSetServerWeight, apiSetServerState: |
There was a problem hiding this comment.
Minor readability nit: this branch seems to be unreachable as written, because empty responses already return at the top of execCommand().
The behavior is correct, but the switch suggests those commands can still validate response == "" here. I’d simplify this case so non-empty responses for those commands always fall through to the generic error path; that makes the success contract clearer.
There was a problem hiding this comment.
In fact we need those entries just here, otherwise it would panic as invalid IDs. So I changed it to a hard coded false instead and added a comment, wondering if it sounds better now.
| func (b *Backend) innerAddServer(cfg *templaterouter.ServiceAliasConfig, svc *templaterouter.ServiceUnit, ep templaterouter.Endpoint, weight int32, workingDir, defaultDestinationCA string) error { | ||
| // This should always follow the template, changes here should be reflected there, both regular and passthrough backends | ||
| // | ||
| // TODO: either read this configuration from the template, or instead make the template read from here. |
There was a problem hiding this comment.
I think the TODO here is worth treating as real follow-up feedback: the server definition is now duplicated between the template and innerAddServer(), which is fragile for a code path this sensitive. Any future change to reencrypt options, proto h2, health checks, cookies, or maxconn has to be updated in two places and kept byte-for-byte aligned.
I’d strongly prefer a single source of truth soon after this lands—either generate the runtime command from shared logic, or have the template consume the same data/renderer so static and dynamic backends cannot drift.
There was a problem hiding this comment.
Sure, just added https://redhat.atlassian.net/browse/NE-2646 to handle this.
|
/test e2e-upgrade |
| } | ||
| // Failed due to already existing server left behind, in maintenance mode, due to in-flight connections. | ||
| // Let's just give it another chance to be deleted. | ||
| if err := b.innerDeleteServer(ep); err != nil { |
There was a problem hiding this comment.
Should we first put the server into maint mode (like DeleteServer does)?
There was a problem hiding this comment.
This is not the case here, the only scenario that makes sense is an already maint server that failed to be deleted before due to established connections, now we're trying to remove it again, and it should fail if there are still active connections and the only option is to reload - by returning an error.
| delete(modifiedEndpoints, relatedEndpointID) | ||
| if len(oldEndpoints) == 1 { | ||
| // enabling also for the former single endpoint as well | ||
| newEPs = append(newEPs, oldEndpoints[0]) |
There was a problem hiding this comment.
I think this ignores modifiedEndpoints. If an existing endpoint was modified in a way that requires its health check state to change, it will be missed, no?
There was a problem hiding this comment.
This condition is only to catch the former "single endpoint" which by design doesn't have its health check enabled. Modifications don't disable health checks, and additions are already covered elsewhere.
bentito
left a comment
There was a problem hiding this comment.
Adversarial review findings
| // passthrough is a TCP listener and does not use ssl or proto related config | ||
| } | ||
|
|
||
| if !ep.NoHealthCheck { |
There was a problem hiding this comment.
If we remove this if !ep.NoHealthCheck, the health check parameters are always initialized. As-is, I think HAProxy rejects future dynamic enable health commands for this endpoint because the check was never configured.
There was a problem hiding this comment.
Enable health shouldn't happen since NoHealthCheck denies it, so we should be good. Anyway you're right, in case NoHealthCheck changes we'd need a reload just to configure check and inter, and a dynamically added endpoint already starts with health check disabled. Lemme better explore this, I think we can in fact drop this condition and always add check and inter. Thanks for the heads up.
There was a problem hiding this comment.
Addressed this one. Note that I also removed NoHealthCheck from the backend tests since it is not used in this part of the client.
6f2571a to
80b8c12
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/router/template/configmanager/haproxy/client.go (1)
27-30: 💤 Low valueAdd a godoc comment to the exported
HAProxyClientinterface.This is the public abstraction that
Backendnow depends on, but it has no documentation. Per Go convention (and tools likerevive/staticcheck), exported types should carry a doc comment that begins with the type name.Suggested doc comment
+// HAProxyClient is the minimal client surface used by Backend to issue +// HAProxy dynamic config API commands. RunCommand applies an optional +// response Converter, while Execute returns the raw response bytes. type HAProxyClient interface { RunCommand(cmd string, converter Converter) ([]byte, error) Execute(cmd string) ([]byte, error) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/client.go` around lines 27 - 30, Add a Godoc comment for the exported HAProxyClient interface that begins with "HAProxyClient" and briefly describes its purpose as the public abstraction used by Backend to run and execute HAProxy commands; place the comment immediately above the type declaration and mention the primary methods RunCommand(cmd string, converter Converter) and Execute(cmd string) so readers/tools understand it wraps command execution and conversion behavior.pkg/router/template/configmanager/haproxy/backend.go (1)
12-12: 💤 Low valueUse the repo-wide
routev1alias for consistency.The rest of the codebase (e.g.,
pkg/router/template/types.go, the newbackend_test.goin this PR, and other configmanager files) imports this package asroutev1. Usingv1here makes greps/symbol navigation forroutev1.TLSTermination*skip this file.Suggested change
- v1 "github.com/openshift/api/route/v1" + routev1 "github.com/openshift/api/route/v1" templaterouter "github.com/openshift/router/pkg/router/template"And at the switch on lines 381–403:
- case v1.TLSTerminationReencrypt: + case routev1.TLSTerminationReencrypt: ... - case "", v1.TLSTerminationEdge: + case "", routev1.TLSTerminationEdge: ... - case v1.TLSTerminationPassthrough: + case routev1.TLSTerminationPassthrough:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/backend.go` at line 12, The import of "github.com/openshift/api/route/v1" uses the alias v1 but the repo standard is routev1; change the import alias to routev1 and update all usages (e.g., the switch handling TLSTermination types that currently references v1.TLSTermination*) to use routev1.TLSTerminationEdge, routev1.TLSTerminationReencrypt, routev1.TLSTerminationPassthrough (and any other routev1.* symbols) so symbol lookups and greps match the rest of the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 310-323: The code in the add-server retry path currently detects
the "already exists" condition by doing strings.Contains(err.Error(), "Already
exists a server "), which relies on an undocumented HAProxy error message;
update the code by (1) adding a clear explanatory comment immediately above that
strings.Contains check in the add-server block (near innerAddServer /
innerDeleteServer calls) that documents the dependency on that exact HAProxy
message and references the unit test "should retry adding insecure server" in
backend_test.go, and (2) encapsulate the match in a small helper (e.g.,
isAlreadyExistsError(err)) so the detection logic and its documentation live in
one place for easier future updates or replacement with a version-specific
check.
---
Nitpick comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Line 12: The import of "github.com/openshift/api/route/v1" uses the alias v1
but the repo standard is routev1; change the import alias to routev1 and update
all usages (e.g., the switch handling TLSTermination types that currently
references v1.TLSTermination*) to use routev1.TLSTerminationEdge,
routev1.TLSTerminationReencrypt, routev1.TLSTerminationPassthrough (and any
other routev1.* symbols) so symbol lookups and greps match the rest of the
codebase.
In `@pkg/router/template/configmanager/haproxy/client.go`:
- Around line 27-30: Add a Godoc comment for the exported HAProxyClient
interface that begins with "HAProxyClient" and briefly describes its purpose as
the public abstraction used by Backend to run and execute HAProxy commands;
place the comment immediately above the type declaration and mention the primary
methods RunCommand(cmd string, converter Converter) and Execute(cmd string) so
readers/tools understand it wraps command execution and conversion behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e144d092-a57a-401b-94e5-c1a0c83f707f
📒 Files selected for processing (12)
go.modimages/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/backend_test.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/client.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
- pkg/router/router_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- pkg/router/template/template_helper.go
- pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
- pkg/cmd/infra/router/template.go
- images/router/haproxy/conf/haproxy-config.template
- pkg/router/template/configmanager/haproxy/manager.go
- pkg/router/template/router.go
| if err := b.innerAddServer(cfg, svc, ep, weight, workingDir, defaultDestinationCA); err != nil { | ||
| if !strings.Contains(err.Error(), "Already exists a server ") { | ||
| return err | ||
| } | ||
| // Failed due to already existing server left behind, in maintenance mode, due to in-flight connections. | ||
| // Let's just give it another chance to be deleted. | ||
| if err := b.innerDeleteServer(ep); err != nil { | ||
| // No way, need to fail which will ask for a fork-and-reload. This will leave the existing connections in the old process. | ||
| return err | ||
| } | ||
| if err := b.innerAddServer(cfg, svc, ep, weight, workingDir, defaultDestinationCA); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
HAProxy "add server" CLI dynamic API "Already exists a server" exact response wording across 2.5, 2.6, 2.8, 3.0
💡 Result:
No sources provide the exact wording of the "Already exists a server" error response for the HAProxy "add server" CLI command across versions 2.5, 2.6, 2.8, or 3.0. Official documentation states that the server name cannot already be in use in the backend [1][2][3], and design discussions confirm a check for "server name already registered for the backend" [4]. Source code snippets do not reveal the precise CLI rejection message [5][6][7]. In 3.0, a different error "'enabled' option is not accepted for dynamic server" is shown for invalid keywords [1]. The feature was experimental in 2.5 [8][9][10], stabilized in later versions. No version-specific differences in the error wording were found; it likely rejects with a message indicating the server already exists, but exact phrasing remains unconfirmed in available sources.
Citations:
- 1: https://www.haproxy.com/documentation/haproxy-runtime-api/reference/add-server/
- 2: http://docs.haproxy.org/2.5/management.html
- 3: http://docs.haproxy.org/2.5/configuration.html
- 4: Add new server during runtime via cli haproxy/haproxy#1136
- 5: https://github.com/haproxy/haproxy/blob/0f95e730/src/server_state.c
- 6: https://github.com/haproxy/haproxy/blob/master/src/backend.c
- 7: https://github.com/haproxy/haproxy/blob/0f95e730/src/lb_map.c
- 8: cli: segfault with the "add server" command haproxy/haproxy#1865
- 9: Segfault in add server operation haproxy/haproxy#1481
- 10: 2.5 Dynamic server capabilities accessible in lua haproxy/haproxy#1501
🏁 Script executed:
# Search for test files related to backend.go
fd -e '*test\.go' | rg -i 'backend' | head -20Repository: openshift/router
Length of output: 42
🏁 Script executed:
# Search for references to "Already exists" or innerAddServer in the codebase
rg -i "already exists" --type go -A 3 -B 3Repository: openshift/router
Length of output: 50372
🏁 Script executed:
# Search for innerAddServer and innerDeleteServer implementations
rg "func.*innerAddServer|func.*innerDeleteServer" --type go -A 10Repository: openshift/router
Length of output: 2419
🏁 Script executed:
# Find test files in the haproxy directory
find pkg/router/template/configmanager/haproxy -name "*test.go" -o -name "*_test.go"Repository: openshift/router
Length of output: 356
🏁 Script executed:
# Get the full innerAddServer implementation
sed -n '/func.*innerAddServer/,/^func/p' pkg/router/template/configmanager/haproxy/backend.go | head -60Repository: openshift/router
Length of output: 2393
🏁 Script executed:
# Search for where the "Already exists a server" message might be checked
rg "innerAddServer|innerDeleteServer" --type go -B 2 -A 15 | head -100Repository: openshift/router
Length of output: 7207
🏁 Script executed:
# Check if backend_test.go has tests for the "Already exists a server" scenario
rg "Already exists|AddServer" pkg/router/template/configmanager/haproxy/backend_test.go -A 10 -B 2Repository: openshift/router
Length of output: 1003
🏁 Script executed:
# Search for any other references to AddServer method in test files
rg "AddServer" pkg/router/template/configmanager/haproxy/ --type go -B 2 -A 10Repository: openshift/router
Length of output: 10284
Error message match relies on undocumented HAProxy behavior — consider documenting or asserting test coverage.
The retry path depends on strings.Contains(err.Error(), "Already exists a server ") to detect when a server already exists. This substring is not part of HAProxy's official API documentation, and no version-specific guarantees exist across 2.5, 2.6, 2.8, and 3.0. While a unit test in backend_test.go (test case "should retry adding insecure server") already validates this error path, consider adding an explanatory comment documenting the reliance on this specific HAProxy response wording to protect against silent degradation if the message changes in future versions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 310 - 323,
The code in the add-server retry path currently detects the "already exists"
condition by doing strings.Contains(err.Error(), "Already exists a server "),
which relies on an undocumented HAProxy error message; update the code by (1)
adding a clear explanatory comment immediately above that strings.Contains check
in the add-server block (near innerAddServer / innerDeleteServer calls) that
documents the dependency on that exact HAProxy message and references the unit
test "should retry adding insecure server" in backend_test.go, and (2)
encapsulate the match in a small helper (e.g., isAlreadyExistsError(err)) so the
detection logic and its documentation live in one place for easier future
updates or replacement with a version-specific check.
There was a problem hiding this comment.
Good point. This is however an unfortunate consequence of how HAProxy API is designed, and our whole client code is designed to handle HAProxy particularities. This is not only related with the responses you pointed out, but also behaviors depending on current state and what changes. We are already dealing with those particularities in two fronts: ensure correct implementation via unit tests as you already found, and ensure correct behavior via e2e tests that cover changes on messages and behaviors.
There was a problem hiding this comment.
@jcmoraisjr, that makes sense — thank you for the context. Relying on HAProxy's response strings is an inherent aspect of the dynamic API, and having both unit tests for implementation correctness and e2e tests to catch any upstream behavioral or message changes is a solid two-layer safety net for this kind of coupling. I'll close out the comment.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
|
/retest-required |
|
@jcmoraisjr: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidesalerno The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.
HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.
This update is being made in at least two phases:
Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.
Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.
New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.
Summary by CodeRabbit
New Features
Deprecated
Changes
Tests