Skip to content

CSS v1.2.0: Phase A — parametric coverage additions#6938

Draft
ichinaski wants to merge 17 commits into
mainfrom
inigo/css-spec-coverage
Draft

CSS v1.2.0: Phase A — parametric coverage additions#6938
ichinaski wants to merge 17 commits into
mainfrom
inigo/css-spec-coverage

Conversation

@ichinaski
Copy link
Copy Markdown
Contributor

@ichinaski ichinaski commented May 14, 2026

First slice of system-tests coverage for the gaps identified in the CSS v1.2.0 status report.

CI is green on the last commit (411 pass / 0 real failures / 47 skipping).

What's added

Four parametric tests in tests/parametric/test_library_tracestats.py:

Test Spec What it asserts
test_http_method_endpoint_TS011 §5 ClientGroupedStats HTTPMethod and HTTPEndpoint are populated from http.method / http.endpoint / http.route span meta
test_payload_metadata_TS012 §3 ClientStatsPayload Hostname, Env, Version, RuntimeID, Sequence are populated; Service is present either at payload level or in any ClientGroupedStats
test_agent_populated_fields_empty_TS013 §3 ClientStatsPayload ContainerID, Tags, ImageTag, AgentAggregation, ProcessTagsHash are absent or empty when the payload leaves the tracer (these are agent-populated)
test_partial_version_excluded_TS014 §7 Span Exclusions Spans with _dd.partial_version set do not contribute to stats

A new _find_raw_v06_stats helper reads the raw msgpack body, since the decoded V06StatsAggr view is intentionally narrower than the spec.

Parametric harness fix (python)

utils/build/docker/python/parametric/apm_test_client/server.py — the /trace/stats/flush endpoint was still using ddtrace.internal.processor.stats.SpanStatsProcessorV06, which dd-trace-py removed when it moved CSS to libdatadog. Without that processor in the chain the endpoint silently no-op'd, so the test agent never received a /v0.6/stats payload inside a single test invocation (libdatadog's native TraceExporter only flushes stats on its 10-second internal timer or on shutdown).

The endpoint now falls back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. Old behavior preserved when the legacy processor is present. With this fix python passes all four tests locally and in CI.

Manifest entries — real spec divergences only

After three CI rounds we narrowed the markers to actual gaps (not test bugs or harness limitations):

SDK Test Reason
golang TS012 stats.go:96-103 PayloadAggregationKey omits RuntimeID; payload-level RuntimeID empty
java TS014 Exclusion path uses internal longRunningVersion (ConflatingMetricsAggregator.java:308), not the spec's _dd.partial_version metric
dotnet TS014 Spans flagged with _dd.partial_version aren't excluded
rust all 4 Parametric harness doesn't flush /v0.6/stats deterministically (same root cause as python pre-fix; libdatadog backend with no Python-equivalent fallback yet)
nodejs / php / ruby / cpp all 4 CSS not implemented in tracer

Test bug fixes uncovered during CI

A few of the early failures were the tests' fault, not the SDKs':

  • TS011 was using http.route only — dd-trace-go reads http.endpoint (ddtrace/ext/tags.go:65). Test now sets both.
  • TS011 needed DD_TRACE_RESOURCE_RENAMING_ENABLED=true to make dd-trace-java extract HTTPMethod/HTTPEndpoint (Config.java:2278 defaults it to false unless AppSec is on).
  • TS012 needed DD_TRACE_REPORT_HOSTNAME=true because both dd-trace-go (option.go:297) and dd-trace-java (Config.java:2005) gate hostname population on it.
  • TS012's Service check is now lenient (accepts payload-level or per-bucket) because the trace-agent uses ClientStatsPayload.Service only as a partition-key hint in PayloadAggregationKey.BaseService (pkg/trace/stats/client_stats_aggregator.go:178); the per-bucket ClientGroupedStats.Service is the spec-required source of truth.

dd-trace-go gaps surfaced (not addressed in this PR)

This work concretely revealed the following dd-trace-go spec divergences worth follow-up tickets:

  1. Payload-level RuntimeID never set (ddtrace/tracer/stats.go:96-103)
  2. Payload-level Service never set (only per-bucket; ddtrace/tracer/stats.go:181)
  3. Hostname only populated when DD_TRACE_REPORT_HOSTNAME=true (option.go:297)
  4. HTTPEndpoint reads http.endpoint, inconsistent with OTel's http.route semantic convention
  5. /info version field not parsed in infoResponse struct
  6. Configurable retry on stats send (stats.go:266) contradicts spec's no-retry guidance
  7. gRPC status code extraction delegated to agent (not done in tracer)
  8. api.errors metric not emitted from the stats endpoint error path

Phases B-E (follow-ups on this branch)

  • Phase B — trace filters (filter_tags, filter_tags_regex, ignore_resources) — largest cross-tracer gap
  • Phase C — sampler interactions and extended obfuscation (Cassandra/Redis)
  • Phase D — internal DogStatsD metrics (needs a stats interceptor)
  • Phase E — stretch goals (stochastic rounding, bucket-end-time assignment, header naming)

Full plan lives outside the repo at css-spec-coverage-plan.md so it survives across sessions.

ichinaski added 4 commits May 14, 2026 15:24
Asserts that span.http.method and span.http.route metadata are populated as HTTPMethod and HTTPEndpoint in the /v0.6/stats payload, per CSS v1.2.0 spec §5. Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts Hostname, Env, Version, Service, RuntimeID, and Sequence are populated in the /v0.6/stats payload per CSS v1.2.0 spec §3 (deployment-level identifiers). Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts that ContainerID, Tags, ImageTag, AgentAggregation, and ProcessTagsHash are absent or empty in the tracer-sent /v0.6/stats payload, per CSS v1.2.0 spec §3 (these fields are agent-populated). Mark nodejs, php, ruby, cpp as missing_feature.
Asserts spans with the _dd.partial_version metric set are excluded from stats aggregation, per CSS v1.2.0 spec §7 (Span Exclusions). A control span without the metric must still produce stats. Mark nodejs, php, ruby, cpp as missing_feature.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

CODEOWNERS have been resolved as:

manifests/cpp.yml                                                       @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
manifests/rust.yml                                                      @DataDog/apm-rust
tests/parametric/test_library_tracestats.py                             @DataDog/system-tests-core @DataDog/apm-sdk-capabilities
utils/build/docker/python/parametric/apm_test_client/server.py          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core

ichinaski added 3 commits May 15, 2026 11:23
Test agent returns the /v0.6/stats request body as a base64-encoded str, not bytes. Mypy correctly flagged the annotation mismatch on TS011.
Split type-and-truthiness assertions into separate checks per ruff's pytest rule against compound assert statements.
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented May 15, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: cbdda2e | Docs | Datadog PR Page | Give us feedback!

ichinaski added 10 commits May 15, 2026 11:39
…ey fail

CI revealed real per-SDK gaps:
- python: parametric harness does not flush /v0.6/stats (TS011-TS014 all)
- golang: HTTPEndpoint not populated from http.route; Hostname not set on payload (TS011, TS012)
- dotnet: _dd.partial_version spans not excluded from stats (TS014)

These reflect real implementation gaps in those SDKs (or the parametric harness), not test bugs — markers explain the gap per spec section.
Resolve manifest/dotnet.yml conflict on TS001 version (main bumped to <3.43.0).
Replace ': ' with ' - ' in CSS v1.2.0 missing_feature reasons to avoid YAML
parsing errors (colon was being interpreted as a mapping value).
Java fails TS011 (HTTPMethod=None), TS012 (Hostname=''), and TS014 (partial.snapshot not excluded) on both dev and prod. TS013 passes. Rust fails all 4 (parametric harness does not flush /v0.6/stats, same root cause as python).
TS011: was setting only http.route. dd-trace-go (and the spec field name) reads http.endpoint. Now sets both http.endpoint and http.route. Also adds DD_TRACE_RESOURCE_RENAMING_ENABLED=true so dd-trace-java's gate on HTTPMethod/HTTPEndpoint extraction (Config.java:2278) is on.

TS012: tracers do not auto-detect hostname in the parametric harness; pin DD_HOSTNAME=test-host so the field is populated as the spec requires.

Removed missing_feature markers from golang (TS011, TS012) and java (TS011, TS012) — those were test bugs, not implementation gaps. Java's TS014 marker remains: dd-trace-java's exclusion uses the internal longRunningVersion field, not the _dd.partial_version metric, which the spec mandates.
…ERVICE env

Root cause of python failures: dd-trace-py >= 3.x delegates CSS to libdatadog's native TraceExporter. The exporter only flushes /v0.6/stats on its 10-second internal timer or on shutdown — the parametric server's /trace/stats/flush endpoint was still using the long-removed Python-side SpanStatsProcessorV06 and silently no-op'd.

- Update the parametric server to fall back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. This deterministically flushes libdatadog stats at the end of each parametric test.
- TS012 also needed DD_SERVICE (payload-level Service is the configured main service name per spec §3, not the per-span service). Added to library_env alongside DD_HOSTNAME.

With these fixes, all four python CSS tests pass locally. Removing python missing_feature markers for TS011-TS014.
dd-trace-go option.go:297 and dd-trace-java Config.java:2005 only populate the Hostname field on the ClientStatsPayload when DD_TRACE_REPORT_HOSTNAME is on. Without it, both SDKs return empty Hostname even when DD_HOSTNAME is set.
dd-trace-go stats.go:181 only writes Service at the per-bucket StatSpanConfig level (the span's service), not at the payload level. The spec mandates Service in ClientStatsPayload. This is the same divergence already documented for Go on test_top_level_service in the e2e suite.
Two fixes informed by checking the trace-agent's actual use of these fields:

1. Service: the trace-agent uses payload-level ClientStatsPayload.Service only as a partition-key hint in PayloadAggregationKey.BaseService (client_stats_aggregator.go:178). The per-bucket ClientGroupedStats.Service is the spec-required source of truth that ends up at the backend. dd-trace-go intentionally writes Service only at the per-bucket level (stats.go:181). Accept either location.

2. Env: dd-trace-java's WellKnownTags does not apply the spec's 'unknown-env' default when DD_ENV is unset. Pin DD_ENV (plus DD_VERSION for completeness) so the assertion is deterministic across SDKs.

Removed the Go TS012 marker — the test now passes for Go under the lenient Service assertion.
dd-trace-go stats.go:96-103 builds the PayloadAggregationKey without a RuntimeID field, so the /v0.6/stats payload sent to the agent has RuntimeID empty. The trace-agent passes RuntimeID through to the backend (writer/stats.go:408) for message-uniqueness/deduplication but doesn't aggregate by it. Functionally non-fatal, but it's a spec-mandated field.
@Eldolfin
Copy link
Copy Markdown
Contributor

Relevant to phase B: #6952
And C: #6648

Both are still in draft because I'm waiting for at least one tracer to pass the tests (they do on local versions of dd-trace-py + libdatadog tho)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants