feat(sidecar): forward FFE exposures to EVP proxy#2026
Conversation
Adds SidecarAction::FfeExposures variant so the PHP tracer can hand a batched exposure payload to the sidecar, and adds an ffe_flusher module that POSTs the payload to the agent's EVP proxy at /evp_proxy/v2/api/v2/exposures with X-Datadog-EVP-Subdomain: event-platform-intake. Matches dd-trace-go / ruby / python / js / dotnet wire protocol. Fire-and-forget; non-2xx is logged and dropped (no agent_info gating, consistent with other tracers). Also exposes ddog_sidecar_send_ffe_exposures FFI in datadog-sidecar-ffi for the PHP extension to call from its RSHUTDOWN / MSHUTDOWN hooks. Tests: 3 httpmock-backed cases cover POST method + path + subdomain header + body, non-2xx drop, and endpoint-path override while preserving authority / scheme / auth / timeout.
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: d780e49 | Docs | Datadog PR Page | Give us feedback! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2026 +/- ##
==========================================
+ Coverage 72.90% 73.26% +0.35%
==========================================
Files 460 462 +2
Lines 76396 77293 +897
==========================================
+ Hits 55696 56626 +930
+ Misses 20700 20667 -33
🚀 New features to boost your workflow:
|
Adds a parallel pathway for PHP feature-flag evaluation metrics mirroring the FfeExposures forwarder. dd-trace-php encodes `feature_flag.evaluations` counters as OTLP/protobuf in PHP (via its existing PHP 7-safe `OtlpMetricEncoder`) and ships the encoded bytes to the sidecar, which POSTs them to the user-configured OTLP HTTP metrics intake. Why a sibling action instead of reusing FfeExposures: - The OTLP collector is not the Datadog Agent. It's user-configurable via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (default http://localhost:4318/v1/metrics), so the endpoint travels with the payload rather than being derived from the sidecar session's agent base URL. - Content type differs (application/x-protobuf vs application/json). - No EVP subdomain header. - The payload is binary protobuf, not a JSON string. dd-trace-php side (PR DataDog/dd-trace-php#3911) will refactor its existing `OtlpHttpMetricTransport` (which currently does PHP-side HTTP I/O, violating the architectural rule "no I/O outside the sidecar") to call this new FFI. Validation: - `cargo test -p datadog-sidecar ffe` passes 7 tests (3 exposures + 4 metrics). - `cargo check -p datadog-sidecar-ffi` clean.
Adds Mermaid sources and rendered PNGs for the hook (this) PR plus a README documenting the regeneration workflow. - `docs/php-ffe-stack/stack-pr3909.mmd` + `.png` — 4-PR stack with this PR highlighted (M1 done; EVP and metrics as siblings to come). - `docs/php-ffe-stack/system-pr3909.mmd` + `.png` — target system architecture; this PR contributes the EvaluationCompletedHook + OpenFeature provider hook surface. All downstream nodes (writers, sidecar FFI, sidecar process, backends) marked future. - `docs/php-ffe-stack/README.md` — npx invocation for regenerating PNGs locally; PR-by-PR diagram table; architectural rule note. The architectural rule encoded in the system diagram (all I/O via the libdatadog sidecar) is the same rule Bob applied to PR #3910. See DataDog/libdatadog#2026 for the sidecar-side support.
Per Bob's PR review (2026-05-22), the tracer extension must perform no I/O outside the sidecar. Replaces the raw-socket `AgentExposureTransport` with `SidecarExposureTransport`, which forwards exposure batches to the libdatadog sidecar via a new native PHP function `\DDTrace\send_ffe_exposures` that calls the `ddog_sidecar_send_ffe_exposures` FFI added in DataDog/libdatadog#2026. PHP side: - Delete `Internal/Exposure/AgentExposureTransport.php` (raw socket POST to the Agent EVP proxy). - Add `Internal/Exposure/SidecarExposureTransport.php` that JSON-encodes the batch and calls `\DDTrace\send_ffe_exposures()`. Fire-and-forget; the sidecar handles retries. - Update `ExposureWriter::createDefault()` to instantiate the sidecar transport. - Drop the obsolete `testAgentTransportBuildsAgentEvpRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_flusher`). - Add `Internal/DefaultEvaluationCompletedHook` and `Internal/CompositeEvaluationCompletedHook` so production callers go through a composite hook factory. In this PR the composite contains only `ExposureHook`; the metrics PR (#3911) contributes `EvaluationMetricHook` and the file conflict at merge resolves by combining both. Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. C/Rust bridge: - Declare `ddog_ByteSlice` (and underlying `ddog_Slice_U8`) in `components-rs/common.h` for the metrics path; declare both `ddog_sidecar_send_ffe_exposures` and `ddog_sidecar_send_ffe_metrics` in `components-rs/sidecar.h`. - Add C wrappers `ddtrace_sidecar_send_ffe_exposures(zend_string *)` and `ddtrace_sidecar_send_ffe_metrics(zend_string *endpoint, zend_string *payload_bytes)` in `ext/sidecar.{h,c}` that call the FFI with the current sidecar transport + instance id + queue id. - Declare native PHP functions `\DDTrace\send_ffe_exposures(string): bool` and `\DDTrace\send_ffe_metrics(string, string): bool` in `ext/ddtrace.stub.php`; add corresponding arginfo entries and `ZEND_FUNCTION` registrations in `ext/ddtrace_arginfo.h`; implement `PHP_FUNCTION(DDTrace_send_ffe_exposures)` and `PHP_FUNCTION(DDTrace_send_ffe_metrics)` in `ext/ddtrace.c`. - Bump `libdatadog` submodule to FFE branch tip `29762335c` (which provides both FFIs). The submodule will be bumped to the libdatadog main commit once #2026 merges. Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3910.{mmd,png}` for this PR. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 41 tests, 174 assertions, OK. - libdatadog sidecar tests (`cargo test -p datadog-sidecar ffe_flusher`) → 3 passed, on the pinned submodule commit. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags` and `make test_c TESTS=tests/ext/ffe/...` will run in CI; running them locally requires rebuilding the extension which is gated behind libdatadog #2026 merging.
Adds the M3 evaluation-metrics layer on top of the hook PR (#3909) as a sibling of the EVP exposures PR (#3910). Records `feature_flag.evaluations` for both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook); both paths share `EvaluationMetricHook::sharedWriter()` for unified aggregation. OTLP/protobuf payloads are encoded in PHP via the existing `OtlpMetricEncoder` and delivered to the user-configured OTLP HTTP metrics intake through the libdatadog sidecar (`ddog_sidecar_send_ffe_metrics` FFI added in DataDog/libdatadog#2026). This branch is force-pushed (user-authorized one-time exception to the no-force-push rule, 2026-05-23) to restructure history away from being linearly stacked on the M2 exposures PR (#3910). The PR now stacks directly on the hook PR (#3909) as a sibling of the EVP PR. PHP side: - Add `Internal/Metric/EvaluationMetricWriter` with bounded series aggregation, drop accounting, and shutdown flush. - Add `Internal/Metric/EvaluationMetricHook` (DD Client hook) and `OtlpMetricEncoder` (PHP 7-safe protobuf encoding). - Add `Internal/Metric/SidecarOtlpMetricsTransport` that calls `\DDTrace\send_ffe_metrics()` (FFI declared in #3910). Endpoint resolution: `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, falling back to `OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics`, default `http://localhost:4318/v1/metrics`. - Add `DDTrace\OpenFeature\EvalMetricsHook` implementing `OpenFeature\interfaces\hooks\Hook` (after + error stages), registered on `DataDogProvider` via `setHooks()`. - `DataDogProvider` constructs its internal DD `Client` with `DefaultEvaluationCompletedHook::createWithoutMetric()` so the OpenFeature path records the metric via the OpenFeature hook (PR 3911 scope) and NOT via the DD Client hook — preventing double-counting. PHP 7 path keeps recording via the DD Client hook. - Add `Internal/CompositeEvaluationCompletedHook` and `Internal/DefaultEvaluationCompletedHook` (metric-only composite). This is the merge-conflict point with PR #3910's `[ExposureHook]` composite — second merge resolves by combining both hooks. - Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. - Drop the obsolete `testOtlpTransportBuildsHttpProtobufRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_metrics_flusher`). - Add `_files_openfeature.php` entry for `EvalMetricsHook.php`. C/Rust bridge: the `\DDTrace\send_ffe_metrics()` native function, its C wrapper `ddtrace_sidecar_send_ffe_metrics()`, and the `ddog_sidecar_send_ffe_metrics` FFI declaration in `components-rs/sidecar.h` were already added in #3910. This PR's branch picks up those changes once #3910 merges (or via the same libdatadog submodule pin during review). For development locally the libdatadog submodule is pinned to the FFE branch tip (`29762335c`). Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3911.{mmd,png}` per the 4-PR documentation convention. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 40 tests, 160 assertions, OK. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags`, OpenFeature PHPUnit, and ffe-dogfooding end-to-end validation will run in CI / are validated separately by FOLLOW-05 Steps 4–5.
The PHP FFE writers (`SidecarExposureTransport`,
`SidecarOtlpMetricsTransport`) can fire as soon as evaluations begin —
which is often earlier than the first remote-config metadata call that
registers the application against a `QueueId`.
Previously, FFE dispatch lived inside the
`if let Entry::Occupied(entry) = applications.entry(queue_id) { ... }`
block in `enqueue_actions`. That block is only entered after the PHP
runtime has called `set_remote_config_data` or `set_request_config` for
this queue. For shorter-lived PHP processes (parametric test client,
CLI tools, eager evaluators) the FFE batch arrives before the app
registration call lands, so the entire batch was silently dropped.
This change filters `FfeExposures` and `FfeMetrics` actions out of
the action vec before the application-entry gate and dispatches them
directly: both only need session-level state (the trace endpoint /
the user-supplied OTLP endpoint), not per-application telemetry context.
Validated locally with dd-trace-php system-tests parametric
`Test_Feature_Flag_Parametric_Evaluation_Metrics::test_php_ffe_evaluation_metric`,
which now passes (26/27 FFE-scoped tests; remaining failure is the
exposure_event test on a branch that lacks the exposure code path).
Pair the EVP-exposure forwarder name with its sibling `ffe_metrics_flusher`. The unqualified `ffe_flusher` predates the OTLP-metrics forwarder and the asymmetry was leaving readers wondering whether `ffe_flusher` was a parent/umbrella module or a sibling. Renames the file via `git mv` (preserving blame history) and updates all references (mod.rs, sidecar_server.rs dispatch arm, ffe_metrics_flusher.rs cross-reference in the module doc, and the CODEOWNERS entry). No functional change.
The renamed identifier pushed one debug! line past rustfmt's column limit. Apply `cargo fmt -p datadog-sidecar -p datadog-sidecar-ffi` to break the macro across three lines, matching CI's nightly-2026-02-08 rustfmt.
Single architecture diagram showing the end-to-end FFE delivery path
through the sidecar:
tracer payload → ddog_sidecar_send_ffe_{exposures,metrics} FFI
→ tarpc enqueue_actions IPC
→ sidecar_server.rs enqueue_actions handler
→ FFE filter (lifted out of applications.entry gate, this PR)
→ ffe_exposures_flusher / ffe_metrics_flusher
→ NativeCapabilities HTTP client
→ Agent EVP proxy / OTLP HTTP intake
Uses `flowchart TD` and a quoted YAML title (Mermaid's frontmatter
parser eats unquoted `#` as comments). PNG rendered at 2400×2400
`--scale 3 -b white` for legible PR-page thumbnails.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8be471fbc1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| context: &FfeTelemetryContext<'_>, | ||
| exposures: Slice<FfeExposure<'_>>, | ||
| ) -> MaybeError { | ||
| if exposures.is_empty() { |
There was a problem hiding this comment.
Validate the exposure slice before calling is_empty
When a C caller passes a malformed Slice such as a null exposures pointer with a non-zero length, this is_empty() call dereferences through Slice::as_slice() and panics before the later try_as_slice() can turn the bad slice into a MaybeError. Because this is an extern "C" entry point, that panic can cross the FFI boundary instead of being reported to the caller; validate with try_as_slice() first (or check the raw length without dereferencing) and then handle the empty case.
Useful? React with 👍 / 👎.
sameerank
left a comment
There was a problem hiding this comment.
I think the main thing that I'd double check is the panic risk, and the rest are minor/nits. It would also be preferable to hear from another reviewer who knows Rust better
Motivation
PHP FFE exposure delivery needs a native path with a cache that persists beyond a single PHP request/thread. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0
This PR is exposure-only. Metrics were split into #2052 so reviewers can evaluate exposure cache and delivery separately from OTLP evaluation metrics.
Changes
This adds caller-driven FFE exposure sidecar actions, exposure payload forwarding through the Agent EVP proxy, and a shared exposure cache that deduplicates repeated
(service, env, version, flag, subject)assignments across PHP requests and sidecar connections.The reusable FFE-domain pieces now live in
datadog-ffebehind theexposure-eventsfeature: exposure input types, the LRU deduplication cache, and JSON payload encoding.datadog-sidecarkeeps only sidecar-specific work: deriving the agent EVP endpoint, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions.Current PHP MVP path:
flowchart LR Eval["PHP native evaluation<br/>ddog_ffe_evaluate"] Batch["PHP tracer native memory<br/>request/thread-local exposure batch"] Shutdown["PHP RSHUTDOWN<br/>flush exposure batch"] Action["sidecar action<br/>record FFE exposures"] Domain["datadog-ffe<br/>feature: exposure-events<br/>types + cache + JSON encoder"] Sidecar["shared sidecar<br/>cross-request and cross-thread exposure cache"] Agent["Datadog Agent<br/>EVP proxy"] Intake["FFE exposure intake"] Eval -->|"doLog=true assignment"| Batch Batch --> Shutdown Shutdown --> Action Action --> Domain Domain --> Sidecar Sidecar --> Agent Agent --> IntakeFuture Python/Ruby connection:
flowchart LR PyToday["dd-trace-py today<br/>host-language exposure writer"] RbToday["dd-trace-rb today<br/>host-language exposure writer"] PyFuture["dd-trace-py future<br/>explicit native opt-in"] RbFuture["dd-trace-rb future<br/>explicit native opt-in"] Native["libdatadog caller-driven<br/>FFE exposure action"] Shared["shared sidecar<br/>dedupe + EVP delivery"] Agent["Datadog Agent<br/>EVP proxy"] PyToday -. "current direct EVP path" .-> Agent RbToday -. "current direct EVP path" .-> Agent PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> AgentThe future Python/Ruby arrows are intentionally not active behavior in this PR. They show why the reusable code lives in
datadog-fferather than directly in sidecar internals, while preserving today's host-language ownership.Why Python/Ruby do not double count today:
Reference implementation check: dd-trace-java follows the same exposure semantics and user ergonomics. Java's
DDEvaluatoris SDK-owned evaluation code; after resolving an assignment, it checks allocationdoLog, builds an exposure event with flag, variant, allocation, targeting key, and context, and dispatches it throughFeatureFlaggingGateway.ExposureWriterImplsubscribes to those exposure events, queues them, deduplicates with an LRU exposure cache, serializes service/env/version context, and posts to the Agent EVP proxy. Application code only calls the OpenFeature provider; it does not call an exposure API.PHP mirrors that canonical shape, with PHP-specific lifecycle mechanics: the dd-trace-php evaluation bridge records
doLog=trueexposure candidates internally, request shutdown flushes the batch, and this PR's sidecar path owns cross-request deduplication and EVP delivery. For future Python/Ruby migration, the same rule applies: wire native exposure recording inside the SDK-owned evaluation path, and turn off the existing host-language exposure writer for those evaluations.Decisions
No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This remains required for Python/Ruby coexistence because those SDKs currently log exposures and metrics in host-language code.
The sidecar cache deduplicates only exposure candidates sent through this native sidecar path; it cannot deduplicate direct host-language EVP writers.
Future Python/Ruby migration must be an ownership switch, not an additional writer. When those SDKs opt into this native exposure path, their host-language exposure writers must be disabled or bypassed for the same evaluations to avoid double counting.
Validation
Current head (
8be471fbc) local validation:Results: datadog-ffe exposure tests passed (4 passed), sidecar exposure tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings.
Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3910 using this PR at
6d23848a:System-tests downstream validation:
Result: 11 passed in 77.53 seconds.
Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3910, #2052, DataDog/system-tests#7031.