Add fork safety hooks and cancellation token for trace exporter FFI#2051
Add fork safety hooks and cancellation token for trace exporter FFI#2051lloeki wants to merge 2 commits into
Conversation
|
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2051 +/- ##
==========================================
- Coverage 72.88% 72.86% -0.02%
==========================================
Files 460 460
Lines 76463 76584 +121
==========================================
+ Hits 55731 55805 +74
- Misses 20732 20779 +47
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22e92370e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| tokio = { version = "1.23", features = ["macros"] } | ||
| tokio-util = "0.7.11" |
There was a problem hiding this comment.
Regenerate Cargo.lock for the new dependencies
Adding tokio/tokio-util to this crate without committing the corresponding root Cargo.lock update breaks locked builds: I checked cargo check -p libdd-data-pipeline-ffi --locked, and Cargo exits with the lock file /workspace/libdatadog/Cargo.lock needs to be updated. Any CI or consumer build using the repository lockfile will fail before compiling this crate until the lock entry for libdd-data-pipeline-ffi includes these new dependencies.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
edit: I think I was wrong, looks like a legit failure: https://github.com/DataDog/libdatadog/actions/runs/26578731968/job/78305509370?pr=2051
original comment:
This feels like a case of "AI doesn't know how "libdd-data-pipeline-ffi" is used.
This might information be worth writing down (maybe in https://github.com/DataDog/libdatadog/blob/22e92370e5d66c0efd6ac5fe949bd8b6514fd902/libdd-data-pipeline-ffi/README.md, or other place if this is common across multiple libdatadog/* crates).
| "TracerSpan" = "ddog_TracerSpan" | ||
| "TracerSpanFields" = "ddog_TracerSpanFields" | ||
| "TracerTraceChunks" = "ddog_TracerTraceChunks" | ||
| "Handle_TokioCancellationToken" = "ddog_TraceExporterCancelToken" |
There was a problem hiding this comment.
Declare the opaque cancellation token type in cbindgen
With only the handle rename here, cbindgen emits typedef ddog_CancellationToken ddog_TokioCancellationToken; in data-pipeline.h, but this header only includes common.h, which does not define ddog_CancellationToken; I regenerated the header and cc -fsyntax-only reports unknown type name 'ddog_CancellationToken'. C/C++ consumers that include the generated data-pipeline header will not compile when this new token type is present, so this needs the same opaque token rename/forward declaration pattern used by the profiling FFI header.
Useful? React with 👍 / 👎.
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is | ||
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the | ||
| /// in-flight HTTP request will be aborted cooperatively. |
There was a problem hiding this comment.
What does cooperatively mean here?
| /// Cancel a cancellation token. | ||
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is | ||
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the |
There was a problem hiding this comment.
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the | |
| /// currently help by [`ddog_trace_exporter_send_trace_chunks`], the |
Wouldn't "currently held" be more accurate than "currently passed" here, since "ddog_trace_exporter_send_trace_chunks" might have finished its work already, thus not holding the token anymore?
|
|
||
| /// Cancel a cancellation token. | ||
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is |
There was a problem hiding this comment.
What if the token was not passed to ddog_trace_exporter_send_trace_chunks, but will soon, is a cancellation: scheduled to happen; ddog_trace_exporter_send_trace_chunks won't start because it's already cancelled; nothing?
It's good to document what happens when the token is not held yet.
And also, if the token is no longer held (aka ddog_trace_exporter_send_trace_chunks is done), which I assume means nothing happens on ddog_trace_exporter_cancel_token_cancel.
| /// # Safety | ||
| /// | ||
| /// * `token` must point to a valid [`Handle`] returned by | ||
| /// [`ddog_trace_exporter_cancel_token_new`]. |
There was a problem hiding this comment.
| /// # Safety | |
| /// | |
| /// * `token` must point to a valid [`Handle`] returned by | |
| /// [`ddog_trace_exporter_cancel_token_new`]. | |
| /// * `token` is the return of [`ddog_trace_exporter_cancel_token_new`]. |
I don't think we need to say that we should give this function a "valid" input.
There was a problem hiding this comment.
The same applies to ddog_trace_exporter_cancel_token_drop.
There was a problem hiding this comment.
There's more of these Safety comment blocks in down this file. All the ones in a similar "type-checked argument must be type-valid" is likely not needed.
|
|
||
| /// Free a cancellation token handle. | ||
| /// | ||
| /// After this call the pointer is invalid and must not be reused. |
There was a problem hiding this comment.
Not sure if this is useful additional information "After this call the pointer is invalid and must not be reused.".
Freed resources cannot be used and are likely invalid.
If we want to provide additional context here, the only thing I can think of is: what happens if you free it while ddog_trace_exporter_send_trace_chunks still owns it. But I feel like that's kind of a natural contract in Rust/native: freeing things in use == bad.
Not sure if we need an extra comment here.
| /// * `chunks` is consumed and must not be used after this call. | ||
| /// * If `response_out` is non-null it must point to valid writable memory for a | ||
| /// `Box<ExporterResponse>`. | ||
| /// * If `cancel` is non-null it must point to a valid cancellation token handle. |
There was a problem hiding this comment.
Similar comment to those in ddog_trace_exporter_cancel_token_cancel: is it even possible to pass an invalid cancellation token since the argument is typed *mut Handle<TokioCancellationToken>?
marcotc
left a comment
There was a problem hiding this comment.
Despite my other comments, the implementation looks good 👍
| exporter: Option<&TraceExporter>, | ||
| chunks: Box<TracerTraceChunks>, | ||
| response_out: Option<NonNull<Box<ExporterResponse>>>, | ||
| mut cancel: *mut Handle<TokioCancellationToken>, |
There was a problem hiding this comment.
I think it's better to not use the Handle and use an Option<&TokioCancellationToken> instead. This way the null pointer is treated as None without any conversion.
| let block_result = exporter.shared_runtime().block_on(async { | ||
| tokio::select! { | ||
| res = exporter.send_trace_chunks_async(chunks.0) => res, | ||
| _ = ct.cancelled() => Err( | ||
| std::io::Error::new( | ||
| std::io::ErrorKind::Interrupted, | ||
| "send cancelled via cancellation token", | ||
| ).into() | ||
| ), | ||
| } | ||
| }); | ||
| match block_result { | ||
| Ok(inner) => inner, | ||
| Err(io_err) => Err(io_err.into()), | ||
| } | ||
| None | ||
| } else { | ||
| exporter.send_trace_chunks(chunks.0) | ||
| }; | ||
|
|
||
| match result { | ||
| Ok(resp) => { | ||
| if let Some(out) = response_out { |
There was a problem hiding this comment.
The cancel token is a nice add-on and should be added to the actual trace exporter, we try to keep logic in the *-ffi crates to a minimum.
There was a problem hiding this comment.
Also the send_trace_chunks_async has a bug and shouldn't be used currently (it should have been marked as experimental). I can get it fixed, but the simplest way to bypass it is to add support for the cancel token to send_trace_chunks
| /// Returns a reference to the underlying [`SharedRuntime`]. | ||
| pub fn shared_runtime(&self) -> &SharedRuntime { | ||
| &self.shared_runtime | ||
| } | ||
|
|
There was a problem hiding this comment.
The SharedRuntime reference in the TraceExporter should remain private if you need to interact with the SharedRuntime (e.g. for fork hooks) you should create a SharedRuntime with libdd-shared-runtime-ffi and pass it to the TraceExporter builder
| let block_result = exporter.shared_runtime().block_on(async { | ||
| tokio::select! { | ||
| res = exporter.send_trace_chunks_async(chunks.0) => res, | ||
| _ = ct.cancelled() => Err( | ||
| std::io::Error::new( | ||
| std::io::ErrorKind::Interrupted, | ||
| "send cancelled via cancellation token", | ||
| ).into() | ||
| ), | ||
| } | ||
| }); |
There was a problem hiding this comment.
It should be called out in the function's doc that using the cancel token may result in data loss
Add `ddog_trace_exporter_before_fork`, `_after_fork_in_parent`, and `_after_fork_in_child` that delegate to the underlying SharedRuntime. These allow C callers to coordinate the tokio runtime lifecycle around process forks.
Introduce `ddog_trace_exporter_cancel_token_new`, `_cancel`, and `_drop` for managing cancellation tokens. Thread the token into `ddog_trace_exporter_send_trace_chunks` via `tokio::select!` so callers can cooperatively abort in-flight HTTP requests.
22e9237 to
c5c339a
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
What does this PR do?
Add fork safety and cooperative cancellation support to the trace exporter FFI:
Fork hooks: Expose
ddog_trace_exporter_before_fork,_after_fork_in_parent,and
_after_fork_in_childthat delegate to the underlyingSharedRuntime. Theseallow C callers to coordinate the tokio runtime lifecycle around process forks.
Cancellation token: Introduce
ddog_trace_exporter_cancel_token_new,_cancel,and
_dropfor managing cancellation tokens. Thread the token intoddog_trace_exporter_send_trace_chunksviatokio::select!so callers cancooperatively abort in-flight HTTP requests.
Motivation
FUP to #1952.
Ruby application servers (Puma, Unicorn, Passenger) fork worker processes. The tokio
runtime does not survive
fork()— its threads are not carried over. Without the forkhooks, the trace exporter is dead in child processes.
The existing
RUBY_UBF_IOunblock function on the dd-trace-rb side sends a signalthat cannot actually cancel an in-flight Rust HTTP request. The cancellation token
enables cooperative abort: when Ruby interrupts a thread (shutdown,
Thread#kill),the UBF cancels the token, which causes
tokio::select!to abort the send and returnpromptly.
Additional Notes
pub fn shared_runtime()accessor toTraceExportersince the field was private.Handle<CancellationToken>fromlibdd-common-ffi, samepattern as profiling-ffi, with a
ddog_TraceExporterCancelTokencbindgen rename toavoid symbol conflicts.
cancelparameter onsend_trace_chunksis nullable; passingNULLpreservesexisting behavior.
AI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
46 tests pass (40 existing + 6 new for fork hooks and cancellation token).