Fix high CPU usage on Linux due to epoll busy-loop#191
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #191 +/- ##
==========================================
- Coverage 92.80% 92.40% -0.40%
==========================================
Files 68 69 +1
Lines 3448 3543 +95
==========================================
+ Hits 3200 3274 +74
- Misses 248 269 +21 ☔ View full report in Codecov by Sentry. |
|
@swhitty just pinging on this :) |
| @Sendable | ||
| public func accept() async throws -> AsyncSocket { | ||
| try await pool.loopUntilReady(for: .connection, on: socket) { | ||
| try await pool.loopUntilReady(for: .read, on: socket) { |
There was a problem hiding this comment.
could we localise this change to just linux for now — I remember I needed write events for iOS disconnection detection so I'll need to look into that
#if canImport(Glibc)
static let connection: Self = [.read]
#else
static let connection: Self = [.read, .write]
#endif| } | ||
|
|
||
| mutating func setEvents(_ events: Socket.Events, for socket: Socket.FileDescriptor) throws { | ||
| if existing[socket] == events { return } |
There was a problem hiding this comment.
| if existing[socket] == events { return } | |
| guard existing[socket] != events else { return } |
| var epollEvents: EPOLLEvents { | ||
| reduce(EPOLLEvents()) { [$0, $1.epollEvent] } | ||
| var events = reduce(EPOLLEvents()) { [$0, $1.epollEvent] } | ||
| events.insert(.edgeTriggered) |
There was a problem hiding this comment.
Would it be possible to pass this as an option to structure ePoll so users can revert to level triggered events if they need to?
let pool = SocketPool<ePoll>(triggering: .edge, maxEvents: ....)
let socket = AsyncSocket(pool: pool)
I don't mind if the default changes to .edge, just a way to easily go back if there is an issue
|
Thanks for this Luke, just a couple of changes to provide a way to revert the changes at runtime if we need |
|
also, is there an easy way I can reproduce this issue on Linux with FlyingFoxCLI? I've tried in docker / container and it isn't looping when waiting for connections, do I need to use a VM? |
I was only able to reproduce it with my code but, I'll try get Claude or Amp to make a repro case. |
6a0cf67 to
b81b2f3
Compare
Three fixes for 100% CPU spinning when HTTPServer is idle on Linux: 1. Use edge-triggered (EPOLLET) mode for all sockets, not just the canary eventfd. Level-triggered epoll causes epoll_wait() to return immediately when the listening socket is readable, creating a busy loop. 2. Change accept() to wait on .read events only instead of .connection ([.read, .write]). A listening socket is always writable (EPOLLOUT), so registering for write events causes immediate spurious wakeups. accept() only needs readability (a pending connection). 3. Skip redundant epoll_ctl(EPOLL_CTL_MOD) calls when the event mask hasn't changed. EPOLL_CTL_MOD re-arms edge-triggered events and can cause spurious wakeups when the condition is already met. Fixes: swhitty#186
NB: these fixes were generated by AmpCode. They appear to fix the issue and also do not break tests.
—
Three fixes for 100% CPU spinning when HTTPServer is idle on Linux:
Use edge-triggered (EPOLLET) mode for all sockets, not just the canary eventfd. Level-triggered epoll causes epoll_wait() to return immediately when the listening socket is readable, creating a busy loop.
Change accept() to wait on .read events only instead of .connection ([.read, .write]). A listening socket is always writable (EPOLLOUT), so registering for write events causes immediate spurious wakeups. accept() only needs readability (a pending connection).
Skip redundant epoll_ctl(EPOLL_CTL_MOD) calls when the event mask hasn't changed. EPOLL_CTL_MOD re-arms edge-triggered events and can cause spurious wakeups when the condition is already met.
Fixes: #186