Investigate issue #3302 driver behavior when upsd aborts#3368
Investigate issue #3302 driver behavior when upsd aborts#3368jimklimov wants to merge 10 commits intonetworkupstools:masterfrom
Conversation
|
❌ Build nut 2.8.4.4369-master failed (commit 1c5d56839b by @jimklimov) |
fd80697 to
97eca57
Compare
|
✅ Build nut 2.8.4.4370-master completed (commit 0a7e64ee19 by @jimklimov) |
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ally and flip to specified upsname later [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…sing setproctag() [networkupstools#3302, networkupstools#3368] Did not work for parallel scanning threads where it would be most useful, because they are in same process space... Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…pthreads so far [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
❌ Build nut 2.8.4.4371-master failed (commit 0f2f5925f5 by @jimklimov) |
|
✅ Build nut 2.8.4.4373-master completed (commit dd1c3aa017 by @jimklimov) |
|
NOTE: After #3363 it seems that UPDATE: Older Windows builds did similarly (tested with 2.8.4.1572-1572+g69e282b3b+v2.8.5+rc5 and a small swarm of 50 drivers, to be under 64 connections):
Older Linux build (2.8.4.1541.9-1550+g7cd79ab73, with 3 dummy devices from NIT):
|
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
5ca8690 to
cf14d94
Compare
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
This all seems to work (even survives Better include into the next release cycle, so the current one completes in finite time (at the cost of having a known rarely-triggered bug in v2.8.5). |
|
This PR also introduces better NSS error reporting (older methods did not always work) and generally more legible logging messages in Although on the client side the error is not as visible: I was under the impression that the server would tell the client (maybe in plaintext Maybe we should accept the attempt with any cert or lack thereof, just to drop it gracefully? |
|
A ZIP file with standard source tarball and another tarball with pre-built docs for commit 0d462f2 is temporarily available: NUT-tarballs-PR-3368.zip. |
|
Rebased after offloading relatively neutral but massive changes into master branch via PRs linked above. |
|
✅ Build nut 2.8.4.4478-master completed (commit 23564cd710 by @jimklimov)
|
|
✅ Build nut 2.8.4.4479-master completed (commit 15dcee8e68 by @jimklimov)
|
|
✅ Build nut 2.8.4.4487-master completed (commit 5971c6bf53 by @jimklimov)
|
|
✅ Build nut 2.8.5.4494-master completed (commit 92e7c3b437 by @jimklimov)
|
|
✅ Build nut 2.8.5.4495-master completed (commit 5a4d3aeedc by @jimklimov)
|
|
✅ Build nut 2.8.5.4513-master completed (commit ed4b2056ae by @jimklimov)
|
|
✅ Build nut 2.8.5.4513-master completed (commit ed4b2056ae by @jimklimov) |
…-check before retry - POSIX part [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…-check before retry - also for WIN32 [networkupstools#3302] Also revised WaitForSingleObject() result checking - there has to be a chance to succeed ;) Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…isconnect() [networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…upstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…_disconnect() implementation [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…iling CreateNamedPipe() [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ing error codes; document the methods [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…) for faults in NSS setup [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…te() [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
✅ Build nut 2.8.5.4547-master completed (commit 2833f6dd73 by @jimklimov)
|
|
✅ Build nut 2.8.5.4547-master completed (commit 2833f6dd73 by @jimklimov) |
Start by poking
upsdrvctlfor both WIN32 and POSIX builds...Includes code from PR #3367 to try reproducing the issue.
UPDATE: Maybe specific to
dummy-ups, reproduced both for standalone starts of the driver program directly, one driver viaupsdrvctl(note: the latter does not seem to propagate the exit-code and returns0, at least on Windows, probably should indicate an error), and a swarm of drivers viaupsdrvctl(also exits with code0even if all drivers died abruptly). Sometimes it took several starts ofupsdto be killed a few seconds later.In all these cases the final words were like:
upsdsometimes logs the clean-up:dummy-upsside it seems to always end with the sameentering parse_data_file()call (and exit-code 127) after failing to write to the server:I don't think I've reproduced nor ruled out the problem on non-Windows builds yet.
Per GDB and added debug-logging traces, it seems to crash around
malloc()calls, whether in PCONF context init or invupslog()a bit before it gets there: