feat(shared-runtime)!: add BorrowedRuntime for caller-owned tokio runtimes#2061
feat(shared-runtime)!: add BorrowedRuntime for caller-owned tokio runtimes#2061Aaalibaba42 wants to merge 3 commits into
Conversation
b0f4da8 to
50536ba
Compare
📚 Documentation Check Results📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13d5a971de
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: df92857 | Docs | Datadog PR Page | Give us feedback! |
375b11a to
64603e7
Compare
64603e7 to
4e49301
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e49301fa3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let (_state, res) = self | ||
| .shutdown_tracker | ||
| .cv | ||
| .wait_timeout_while(state, timeout, |s| s.completed < s.expected) |
There was a problem hiding this comment.
Avoid blocking the only Tokio worker during borrowed shutdown
When wait_shutdown_done is called from a task running on a new_multi_thread runtime configured with one worker thread, this blocking Condvar wait parks the only executor thread while the pause/shutdown tasks from trigger_shutdown_signal were spawned onto that same handle, so they cannot be polled and shutdown just times out. The current RuntimeFlavor::CurrentThread guard does not cover this valid multi-thread configuration; use block_in_place/an async wait path or reject calls from the host runtime when no other worker can drive the shutdown tasks.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
RuntimeFlavor::CurrentThread doesn't cover worker_threads(1) multi-thread runtimes, and I can't detect that case programmatically for tokio doesn't expose worker count or distinguish executor threads from blocking threads through the Handle API...
I've changed the documentation to better reflect this as a caller responsibility.
There was a problem hiding this comment.
I wonder if this kind of requirement is practical. If the borrowed runtime is, downstream, some random client's app tokio runtime, do we really have any control over how it is configured? I'm not sure how wait_shutdown_done is called, but if it's called in an async context, I suppose we could design an async wait operation instead (e.g., while polled, the future checks if s.completed < s.expected)?
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2061 +/- ##
==========================================
+ Coverage 72.93% 72.98% +0.05%
==========================================
Files 460 461 +1
Lines 76463 76688 +225
==========================================
+ Hits 55766 55972 +206
- Misses 20697 20716 +19
🚀 New features to boost your workflow:
|
… not call this on executor thread
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
|
It looks like the description is a bit off mentioning fixes and bugs that seem to be coming from previous commits of the branch and not from the current state of the feature |
I tried letting a cheap AI read the commits and make a description, I don't think it's too valuable and can even be detrimental, as it were. |
|
I rewrote it cleaner and straighter to the point |
There was a problem hiding this comment.
The code LGTM overall, but I'm not entirely sure to fully grasp in which context (sync or async) is this API supposed to be consumed. In particular we use standard sync thread mechanisms (condvars, mutexes) in an async context, which could be problematic (even in a multi-thread context, a condvar will block the current executor thread, which is not ideal).
On a different front, I remember @paullegranddc said he would rather spawn our own shared runtime in a separate thread than hooking in the client's runtime. I don't have a strong opinion myself, but I wonder if this discussion had a conclusion.
| } | ||
|
|
||
| /// Returns a clone of the caller-supplied tokio runtime handle. | ||
| pub fn runtime_handle(&self) -> Handle { |
There was a problem hiding this comment.
Most Handle methods take &self. Any reason to do a preventive clone here, instead of having a getter and letting the caller decide to clone if needed?
| let mut guard = match self.registry.workers.lock() { | ||
| Ok(g) => g, | ||
| Err(poison) => poison.into_inner(), | ||
| }; |
There was a problem hiding this comment.
Nit
| let mut guard = match self.registry.workers.lock() { | |
| Ok(g) => g, | |
| Err(poison) => poison.into_inner(), | |
| }; | |
| let mut guard = self.registry.workers.lock().unwrap_or_else(|poison| poison.into_inner()); |
| let (_state, res) = self | ||
| .shutdown_tracker | ||
| .cv | ||
| .wait_timeout_while(state, timeout, |s| s.completed < s.expected) |
There was a problem hiding this comment.
I wonder if this kind of requirement is practical. If the borrowed runtime is, downstream, some random client's app tokio runtime, do we really have any control over how it is configured? I'm not sure how wait_shutdown_done is called, but if it's called in an async context, I suppose we could design an async wait operation instead (e.g., while polled, the future checks if s.completed < s.expected)?
| let mut state = self | ||
| .shutdown_tracker | ||
| .state | ||
| .lock() |
There was a problem hiding this comment.
We are potentially locking a sync mutex in a tokio runtime, which might block the executor's thread. Is that ok in the current context (and if yes, why?)? I'm not sure to have the bigger picture. Are the other mutex-locking functions like spawn_worker and cos also expected to be run from within a runtime? I guess if yes, we can just use a tokio async mutex. If the mutex can be locked/unlocked from both within and outside the runtime, then I'm not sure...
What ?
Add
BorrowedRuntimetolibdd-shared-runtime: runs backgroundworkers on a caller-supplied tokio runtime instead of creating one internally.Why ?
SharedRuntimeused to always create and own a tokio runtime. Rust programs that already run one ended up with two runtime in the same process, which lead to several tokio-in-tokio panics and hustles.How ?
BorrowedRuntimewraps a tokio handle and intentionally omits operations that would be unsafe against a runtime we don't own: no fork hooks or synchronous block_on, no runtime teardown. Workers it spawned can be stopped via eithershutdown_asyncortrigger_shutdown_signal+wait_shutdown_done, afterwards the instance is terminal and subsequentspawn_workercalls returnRuntimeShuttingDown.Additional Notes
Breaking:
SharedRuntimeErrorgains two new variants (RuntimeShuttingDown,WouldDeadlock), match arms must be updated for exhaustiveness.