Skip to content

fix(parametric): retry loop for flaky Python adaptive sampling tests#6951

Draft
bm1549 wants to merge 2 commits into
mainfrom
brian.marks/python-adaptive-sampling-retry-loops
Draft

fix(parametric): retry loop for flaky Python adaptive sampling tests#6951
bm1549 wants to merge 2 commits into
mainfrom
brian.marks/python-adaptive-sampling-retry-loops

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

Motivation

Three Python TestDynamicConfigSamplingRules tests have been flagged as flaky for months (APMAPI-857, APMAPI-1051, APMAPI-1860). Vlad Scherbich diagnosed the race in early 2026: set_and_wait_rc ACKs new RC config before the tracer's apm-tracing.rc event has propagated to the sampler, so the next span assertion can race against stale rules.

Vlad's commits 5d9142e51 and 7932d7c2f fixed two of these tests (test_remote_sampling_rules_retention and test_trace_sampling_rules_with_tags) by adding a retry loop that polls _dd.rule_psr until it matches the expected rate. This PR applies the same pattern to the two remaining flaky tests and unflags the third (already covered by Vlad's earlier fix).

Cross-language context: this is the Python piece of a multi-tracer adaptive-sampling investigation tracked locally at ~/dd/polyglot-tracer-reports/adaptive-sampling/01-investigation.md.

Changes

tests/parametric/test_dynamic_configuration.py — wrap four post-set_and_wait_rc assertion blocks in retry loops:

  • test_trace_sampling_rules_override_env (2 assertion blocks)
  • test_trace_sampling_rules_override_rate (2 assertion blocks)

Each loop matches Vlad's pattern: max_propagation_retries = 30, time.sleep(0.1) per iteration, breaks on rate match, asserts as before.

manifests/python.yml — removes three flaky entries now resolved:

  • test_remote_sampling_rules_retention: flaky (APMAPI-857, APMAPI-1860) (already fixed by Vlad)
  • test_trace_sampling_rules_override_env: flaky (APMAPI-1860)
  • test_trace_sampling_rules_override_rate: flaky (APMAPI-1051)

Workflow

  1. Draft until CI passes
  2. Mark ready once green

Reviewer checklist

  • No code or scenario changes — test + manifest only
  • No docker base image changes
  • No scenario added, removed, or renamed

…ests

The set_and_wait_rc helper ACKs new RC config before the tracer's
apm-tracing.rc event has propagated to the sampler, so an assertion
immediately after set_and_wait_rc can race against stale rules.

Vlad Scherbich fixed this for test_remote_sampling_rules_retention and
test_trace_sampling_rules_with_tags in commits 5d9142e and 7932d7c
by adding a retry loop that waits for _dd.rule_psr to match the
expected rate. Applies the same pattern to:

- TestDynamicConfigSamplingRules::test_trace_sampling_rules_override_env
  (APMAPI-1860)
- TestDynamicConfigSamplingRules::test_trace_sampling_rules_override_rate
  (APMAPI-1051)

Also unflags test_remote_sampling_rules_retention since Vlad's earlier
fix already covers it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bm1549 bm1549 added the ai-generated The pull request includes a significant amount of AI-generated code label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/parametric/test_dynamic_configuration.py                          @DataDog/system-tests-core @DataDog/apm-sdk-capabilities

Lines 838 and 909 reassigned `trace = None` inside the retry loops, but
mypy had narrowed `trace` to `list[Span]` after the earlier assignments
from get_sampled_trace(). Add explicit `list[Span] | None` annotations
matching the pattern Vlad used in test_remote_sampling_rules_retention
(and the one already present on line 813 of the same test).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated The pull request includes a significant amount of AI-generated code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant