feat(acm): add ACM certificate management feature by the-technat · Pull Request #4554 · kubernetes-sigs/aws-load-balancer-controller

the-technat · 2026-01-27T12:56:32Z

Issue

Description

This PR adds support for automatically provisioning ACM certificates based on ingress annotations.

An initial design idea is described in the issue: #2509 (comment).

How it works: see the prepared user-docs for a description of the feature.

I focused on ingress objects in this feature to keep the change "small". But I think most of the Synthesizer / Manager could be reused for implementing the same thing for Gateway API. Maybe as an alternative approach for #4494 (skipping cert-manager and directly manage the cert).

Internals

Some internals not mentioned in the user-facing docs:

due to a certain delay between certificate requesting and issuance, the controller has a (currently non-user-configurable) time to wait for the certificate to be come issued. This was implemented similar to the wait behavior of the ELBv2 service
- if the timeout hits before the certificate got issued, the reconciliation will fail with an error and retry according to the controller's standard retry mechanism
- a subsequent reconciliation attempt will discover the already requested certificate and wait another round for it to become issued, unless the already requested certificate has exceeded a certain age (currently 5 minutes), in this case the certificate is recreated
certificate state tracking is based on the Tagging Manager interface used for other resources as well, thus orphaned certificates are cleaned up as part of the PostSynthesize phase. Due to time delay between switching certificates for a listener and the listener actually releasing the previously used certificate the deletion of an orphaned certificate is in any case retried a couple of times if it's still in use.
certificate validation (only required for Amazon Issued certificates) is only support using DNS Method and Route53. The validation records are created automatically and pruned when a certificate is deleted. Due to the possibility of multiple certificates using the same CNAME record, we ignore "not found" and "other value" errors, as another certificate not managed by the controller having the same domains / CA will use the exact same CNAME record for validation, just with another value
ingress objects that are modified and whoose hosts set change, will trigger a new certificate to be requested with more Subject Alternative Names

Tests

I added unit tests where possible and feasible.

All cases have been manually tested multiple times in a real environment using EKS, PCA, ACM and an intermediate build of the AWS Load Balancer Controller

API rate limiting

To visualize the impact of this feature on API requests I tried collecting the number of API requests that occur per reconciliation attempt of one ingress object.

I took the rates from here: API rate quotas and identified the following ones as used by my feature: RequestCertificate, ListTagsForCertificate, ListCertificates, DeleteCertificate and DescribeCertificate. All of those operations have a rate-limit of either 5 or 10 queries per second.

Here's how often they are called:

ListTagsForCertificate & ListCertificates: one request every minute to rebuild the in-memory cache in the Certificate Discovery part & ACM Tagging Manager
RequestCertificate: at most once per reconciliation. If an existing matching certificate is found we'll use this and rather wait for it's issuance than request another certificate.
DescribeCertificate: is used at 3 different places:
- DNS validation records: after requesting a certificate we wait for it to provide DNS records values. This uses a retry-mechanism with a request every 5s up to a 30s timeout -> at most 6 requested with a 5s delay in-between
- certificate waiter: waits for the certificate to be issued: has timeout of 5m and a minDelay between requests of 60s increasing exponentially up a maxDelay of 120s
- Deleting Certificates: to obtain the validation records values we need to clean up, one request per certificate deletion attempt is made too
DeleteCertificate: has a retry-mechanism that tries to delete the certificate if one request fails with a retry wait interval of 5s and a total timeout of 30s -> per certificate there's at most 6 requests with a 5s delay in-between

In addition to the ACM API rate quotas the ones for Route53 might also be relevant.

These rates are taken from here: Route53 API request limits.

Here's how often they are called:

ListHostedZones: one request every 5 minutes to rebuild the in-memory cache in the Route53 service
ChangeResourceRecordSets: one request per certificate issue request and one request per certificate deletion attempt

Checklist

Added tests that cover your change (if possible)
Added/modified documentation as required (such as the README.md, or the docs directory)
Manually tested
Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

Backfilled missing tests for code in same general area 🎉
Refactored something and made the world a better place 🌟

k8s-ci-robot · 2026-01-27T12:56:42Z

Hi @the-technat. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…r certificate references

…rences

…certificate tags cache expiry

…or certificate manager

… to 30 minutes

…t start

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2026

k8s-ci-robot requested review from M00nF1sh and wweiwei-li January 27, 2026 12:56

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 27, 2026

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 27, 2026

the-technat force-pushed the issue/2509 branch 23 times, most recently from 1fbc015 to 8fd5564 Compare January 29, 2026 14:49

the-technat added 30 commits March 10, 2026 14:30

chore: add new algorithms for ContainsSubMap and IsDiffStringSlice

d470a74

feat: add route53 client to cloud interface

ba955cd

feat: add ownerTags filter to certificate discovery mechanism

5d741f0

chore: add controller flags & annotation names

aa8313d

feat: add user-docs for acm certificate management

02f6b34

feat: add more functionality to the ACM service

a057b62

feat: add model builder for ACM certificates

bf38406

feat: use certificate model builder if feature flag is active

1cc903e

chore: adjust existing use of certificates to use core.StringToken fo…

d4e9aa1

…r certificate references

feat: add certificate manager

9ce97be

feat: add tagging manager for acm

8bbd285

feat: add certificate synthesizer

59fee79

chore: use certificate synthesizer in stack deployer

e967ed2

chore: go mod tidy

252911a

fix: existing tests need to be adapted for core.StringToken cert refe…

6bcc3f5

…rences

chore: add tests for acm tagging manager

8102420

chore: add test for certificate discovery filter tags

88ee1b5

fix: use constanst for describe cert retry intervall/timeout & lower …

72b5850

…certificate tags cache expiry

fix: remove cert discovery from certificate synthesizer

e4e4679

chore: add tests for certificate synthesizer, remove empty testfile f…

9ff8918

…or certificate manager

fix: reissueWaitTime exceeding doesn't actually recreate the certificate

3ff471d

chore: move feature flag to feature gates

7f0aea2

fix: remove unexecuted code

0475bc8

fix: add dependencies for listener correctly & set validation timeout…

7cc2037

… to 30 minutes

chore: add E2E test for certificate mgmt feature

bc1d0ae

fix: retry route53 creation also when no validation records present a…

d5f0c0d

…t start

chore: add feature gate to helm feature gates

962b249

fix: Synthesize test needs to return non-empty validation options

c83b231

fix: cert discovery should ignore all certs with stack tags

229232c

fix: use new structure for feature gate

49b0436

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(acm): add ACM certificate management feature#4554

feat(acm): add ACM certificate management feature#4554
the-technat wants to merge 30 commits intokubernetes-sigs:mainfrom
the-technat:issue/2509

the-technat commented Jan 27, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

the-technat commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Description

Internals

Tests

API rate limiting

Checklist

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

Uh oh!

k8s-ci-robot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

the-technat commented Jan 27, 2026 •

edited

Loading