In testing I also saw something concerning:
- in at least Linux, clients (
upslog here) were apparently piling up and exceeding the MAXCONN value embedded into the generated tests/NIT/tmp/etc/upsd.conf file (two swarm sizes plus 30 to feel generous). Maybe they were retrying connections as their earlier attempts failed (due to initial bug, may still be possible when upsd is overwhelmed), compounding the issue: upsd remembers older connection attempts, earlier in its list, and until they time out, it has little chance of seeing newer ones. Not sure OTOH if "connection closed/timed out" is evaluated at that time, or it is only fast-tracked for drivers.
Originally posted by @jimklimov in #3302
Upon a quick look at upsd.c mainloop(), there are checks for FD validity when adding them or not to the list of sockets to check for activity. This seems to be a "static" check based on a previously known disconnection, and so a saved "invalid" FD/handle value. Clients are not looked at after 60 seconds of inactivity (hard-coded).
Maybe the fixes done for issue #3302 and/or #3365 would cover this more timely detection of disconnections well enough to make the point moot, but if not (so such "leak" or piling-up remains), keep this in mind. Maybe detection of disconnections should be more aggressive somehow.
Originally posted by @jimklimov in #3302
Upon a quick look at
upsd.cmainloop(), there are checks for FD validity when adding them or not to the list of sockets to check for activity. This seems to be a "static" check based on a previously known disconnection, and so a saved "invalid" FD/handle value. Clients are not looked at after 60 seconds of inactivity (hard-coded).Maybe the fixes done for issue #3302 and/or #3365 would cover this more timely detection of disconnections well enough to make the point moot, but if not (so such "leak" or piling-up remains), keep this in mind. Maybe detection of disconnections should be more aggressive somehow.