Skip to content

NUT swarm stress testing: revise cleanup of disconnected sockets #3366

@jimklimov

Description

@jimklimov

In testing I also saw something concerning:

  • in at least Linux, clients (upslog here) were apparently piling up and exceeding the MAXCONN value embedded into the generated tests/NIT/tmp/etc/upsd.conf file (two swarm sizes plus 30 to feel generous). Maybe they were retrying connections as their earlier attempts failed (due to initial bug, may still be possible when upsd is overwhelmed), compounding the issue: upsd remembers older connection attempts, earlier in its list, and until they time out, it has little chance of seeing newer ones. Not sure OTOH if "connection closed/timed out" is evaluated at that time, or it is only fast-tracked for drivers.

Originally posted by @jimklimov in #3302

Upon a quick look at upsd.c mainloop(), there are checks for FD validity when adding them or not to the list of sockets to check for activity. This seems to be a "static" check based on a previously known disconnection, and so a saved "invalid" FD/handle value. Clients are not looked at after 60 seconds of inactivity (hard-coded).

Maybe the fixes done for issue #3302 and/or #3365 would cover this more timely detection of disconnections well enough to make the point moot, but if not (so such "leak" or piling-up remains), keep this in mind. Maybe detection of disconnections should be more aggressive somehow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Connection stability issuesIssues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over timeenhancement

    Type

    No type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions