Skip to content

feat: WiFi scale support — WebSocket control + events#54

Open
skialpine wants to merge 5 commits into
decentespresso:mainfrom
skialpine:feat/wifi-scale-support
Open

feat: WiFi scale support — WebSocket control + events#54
skialpine wants to merge 5 commits into
decentespresso:mainfrom
skialpine:feat/wifi-scale-support

Conversation

@skialpine
Copy link
Copy Markdown
Contributor

@skialpine skialpine commented May 21, 2026

Summary

Brings the on-device /snapshot WebSocket up to feature parity with BLE so the web UI and a separate app (phone/desktop) can both drive the scale.

New protocol (c0eb374)

  • One-way {"grams","ms"} weight stream extended into a full control + event protocol.
  • Event broadcasts: button presses (sendWebsocketButton) and power-off reasons (sendWebsocketPowerOff), wired into the same call sites as the BLE/USB equivalents.
  • Status frames (sendWebsocketStatus / sendWebsocketStatusAll) publishing battery %, voltage, charging, timer, display, soft-sleep, LED, and rate state — broadcast periodically every 5 s after events on, and on demand.
  • Commands accepted as text or JSON: status, events on/off, tare, timer start/stop/zero, led <r,g,b>|off, display on/off, low_power on/off, sleep on/off, power off.
  • Selectable weight rate: 2 Hz / 5 Hz / 10 Hz via rate <hz>, interval <ms>, or JSON (rate_hz / hz / rate / interval_ms / {command:rate,value:…}).

Freeze fix (1051681)

  • The WS event handler runs on the AsyncTCP task, so calling u8g2.setPowerSave/setContrast or digitalWrite(PWR_CTRL,…) directly raced the main loop's OLED draws and stalled the device for ~4 s under feature-test load.
  • Added a portMUX-protected wsPendingMask drained early in loop(). Hardware ops deferred; state flags updated synchronously so status frames stay accurate.
  • Tightened include/webserver.h: handlers registered once across stop/start (server.end() doesn't clear the handler list), server.begin() moved to the end, removed a duplicate addHandler(&websocket).
  • WS DATA callback: String((const char*)data, len) instead of O(N²) char-by-char concat; only accepts complete unfragmented text frames.
  • static unsigned long lastUpdate (was signed → wrong across millis() rollover).
  • Documented in wifi_setup.cpp that WiFi.setSleep(false) is the wrong call with BLE active (measured: 8 % packet loss, HTTP all-timeout because BT+WiFi share the 2.4 GHz radio and the coex layer needs WiFi's sleep windows for BT slots).

Review feedback (74d7f0f)

  • Mask collapse fix: wsPendingMask |= bit let opposing actions accumulate, so display off then display on could end up off while status reported on. Added wsReplacePending(set, clear) for the three mutually-exclusive pairs.
  • Marked cross-task globals (weightWebsocketNotifyInterval, b_websocketEventsEnabled, b_softSleep, b_u8g2Sleep, …) volatile for dual-core ESP32-S3 cache coherence.
  • Cast stopWatch.elapsed() to (unsigned long) for %lu. Fixed two stale doc comments (wsQueuePending and processWsPendingCmds). README documents the JSON rate forms (rate_hz, hz, interval_ms) that were accepted but undocumented.

Multi-client (7a3b51c)

  • Dropped the 1-client middleware so the web UI and an app can both connect to the same scale.
  • cleanupClients() now uses the library default (DEFAULT_MAX_WS_CLIENTS = 8 on ESP32).
  • WS_EVT_DISCONNECT only resets the shared rate/events state when server->count() == 0 (last client out) — otherwise one client leaving would wipe the other's negotiated state.

Verified on device

  • 60/60 WebSocket feature regression PASS (status / events / tare / timer / led / display / low_power / sleep / rate forms / unknown-input handling / soft-sleep round trip).
  • 120 s freeze stress (10 Hz WS + HTTP every 1 s + ~5 cmd/s feature chaos): max weight gap 0.60 s (was 4.18 s pre-fix), 0 HTTP failures, 0 slow.
  • Multi-client 5/5 PASS: two simultaneous handshakes both succeed; both clients stream weight at the negotiated rate; both see the periodic status broadcast; disconnecting one client leaves the other's stream and state intact.

🤖 Generated with Claude Code

skialpine and others added 3 commits May 21, 2026 11:24
Extend the /snapshot WebSocket beyond a one-way weight stream into a
full control + event protocol so web/app clients reach feature parity
with BLE:

  * Event broadcasts: button presses (sendWebsocketButton) and
    power-off reasons (sendWebsocketPowerOff) wired into the same call
    sites as their BLE/USB equivalents.
  * Status frames: sendWebsocketStatus / sendWebsocketStatusAll publish
    battery %, voltage, charging, timer, display, soft-sleep, LED, and
    rate state. Periodic broadcast every WEBSOCKET_STATUS_NOTIFY_INTERVAL_MS.
  * Commands accepted as text or JSON: status, events on/off, tare,
    timer start/stop/zero, led <r,g,b>|off, display on/off,
    low_power on/off, sleep on/off, power off.
  * Selectable weight rate: 2 Hz / 5 Hz / 10 Hz, via "rate <hz>",
    "interval <ms>", or JSON {rate_hz|hz|rate|interval_ms|...}.
    Falls back to default on client disconnect.
  * README: document the protocol (events, status, commands, rates).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The /snapshot WebSocket handler runs on the AsyncTCP task. Several
commands (display, low_power, sleep, power_off) were calling
u8g2.setPowerSave / u8g2.setContrast / digitalWrite(PWR_CTRL,...)
directly from that task while the main loop hammered the same
peripherals — under feature-test load at 10Hz this caused the device
to stall for ~4s and HTTP responses to time out mid-body.

  * Add a portMUX-protected pending-action mask. The WS callback
    queues the hardware op and updates the matching state flag
    (b_u8g2Sleep / b_softSleep) synchronously so the status frame is
    accurate; the main loop drains the mask early in loop() (before
    the b_softSleep guard so SLEEP_OFF can wake the device).
  * StopWatch.* is just bool/uint32 writes, so timer ops stay
    synchronous in the callback.
  * webserver.h: register handlers only once across stop/start
    cycles, drop the duplicate addHandler(&websocket), and move
    server.begin() to the end of startWebServer() so handlers are
    in place before the server accepts requests.
  * WS DATA callback: build the String with the (buf,len)
    constructor instead of O(N²) char-by-char concatenation, and
    only process complete unfragmented text frames.
  * Use unsigned long for the WS broadcast lastUpdate to avoid
    signed-subtraction wrap at the millis() 49.7-day rollover.
  * Document why WiFi.setSleep(false) is the wrong call here:
    tested it, caused 8% loss and HTTP all-timeout when BLE is
    active because BT/WiFi share the 2.4GHz radio and the coex
    layer needs WiFi sleep windows for BT slots.

Stress test (10Hz WS + HTTP every 1s + 5 cmd/s feature chaos, 120s):

           pre-fix       post-fix
  max gap   4.18 s       0.90 s
  HTTP fail 1            0
  HTTP slow 2            0

Feature regression (60 WS commands) still 60/60 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code review on PR decentespresso#54 caught two real bugs in the deferred-action fix:

1. The pending-mask let opposing actions accumulate. If a client sent
   "display off" then "display on" before the main loop drained, both
   bits were set and processWsPendingCmds ran them in source order —
   final hardware ended up off while b_u8g2Sleep reported on. Add a
   wsReplacePending(set, clear) helper that atomically supersedes
   the opposing bit, and use it for the three mutually-exclusive
   pairs (display, low_power, sleep).

2. The new WS state globals (weightWebsocketNotifyInterval,
   b_websocketEventsEnabled, b_websocketLowPower/LedEnabled,
   i_websocketLedR/G/B, t_lastWebsocketStatusUpdate) and the pre-
   existing b_softSleep / b_u8g2Sleep / b_powerOff that are now
   written from the AsyncTCP task are missing `volatile`. On dual-
   core ESP32-S3 the compiler may cache stale values in a register
   on the loop core across the WS gate check. Mark them volatile.

Also fold in smaller doc/format cleanups from the same review:
  * Cast stopWatch.elapsed() (uint32_t) to (unsigned long) for the
    %lu format spec in sendWebsocketStatus / sendWebsocketStatusAll.
    Works today on ESP32 (both 32-bit) but technically a type
    mismatch.
  * Correct the wsQueuePending doc-comment (StopWatch is safe to
    mutate inline; only u8g2 and power-rail GPIOs need deferring).
  * Correct the processWsPendingCmds comment to note that
    b_powerOff is set here on purpose.
  * README: document the JSON rate forms accepted by the parser
    that weren't listed (rate_hz, hz, interval_ms).

Re-verified:
  * 60/60 WS feature regression PASS
  * 120s freeze stress: max weight gap 0.60s (was 4.18s pre-fix),
    0 HTTP failures, 0 slow, 590 commands sent

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@skialpine
Copy link
Copy Markdown
Contributor Author

Code review

Found 1 issue, fixed in 74d7f0f:

  1. Deferred-action mask collapses opposing commands. wsPendingMask was set with |= only, so if a client sent display off then display on before the main loop drained, both bits ended up set and processWsPendingCmds ran them in source order — final hardware was off while b_u8g2Sleep reported on, leaving the status frame inconsistent with the device. Same shape for low_power on/off and sleep on/off. Fixed by adding wsReplacePending(set, clear) and atomically superseding the opposing bit at the three pair sites.

    https://github.com/skialpine/openscale/blob/7c533a241af23df7654c7e3cc2f087704ce1580b/src/hds.ino#L496-L508

    https://github.com/skialpine/openscale/blob/7c533a241af23df7654c7e3cc2f087704ce1580b/src/hds.ino#L751-L803


5 below-threshold issues (50-74), all addressed in 74d7f0f

Verification after these fixes: 60/60 WS feature regression PASS, 120s freeze stress max weight gap 0.60s (4.18s pre-fix), 0 HTTP failures.

False positives discarded: an events on → invalid claim (command == "events" is exact equality, not a prefix match, so the first call falls through and the split-and-retry path handles it correctly); a sendWebsocketPowerOff ordering risk (it fires before esp32_sleep() which is where stopWebServer() runs, so the event is delivered before WS teardown).

Generated with Claude Code

@skialpine skialpine force-pushed the feat/wifi-scale-support branch from 7c533a2 to 74d7f0f Compare May 21, 2026 18:52
@skialpine skialpine marked this pull request as ready for review May 21, 2026 18:56
skialpine and others added 2 commits May 21, 2026 13:56
The previous middleware returned 503 to any client past the first,
which meant opening the on-device web UI prevented a phone/desktop
app from connecting to the same scale. Drop the cap and let the
library handle multi-client.

  * include/webserver.h: remove the count-checking middleware so any
    handshake within the library's DEFAULT_MAX_WS_CLIENTS limit
    succeeds.
  * src/hds.ino: call websocket.cleanupClients() with no argument so
    we inherit the library default (DEFAULT_MAX_WS_CLIENTS, which is
    8 on ESP32). Plenty of RAM headroom — ~12-14 KB per client on
    top of ~240 KB free heap.
  * src/hds.ino WS_EVT_DISCONNECT: only reset the shared session
    state (weightWebsocketNotifyInterval, b_websocketEventsEnabled,
    t_lastWebsocketStatusUpdate) when the LAST client leaves
    (server->count() == 0). Without this guard, one client
    disconnecting would wipe rate/events state for any other client
    still connected.

Verified with /tmp/ws_multi.py (5/5):
  * Two simultaneous handshakes both succeed (was: 2nd got 503).
  * Both clients receive the weight stream at the negotiated rate
    (2Hz default, 10Hz after `rate 10`).
  * Both clients see the periodic status broadcast after `events on`.
  * Disconnecting client A leaves client B's stream + state intact.

Single-client feature regression still 60/60 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A bare hds.local A record isn't reliably resolvable on Android (the
OS resolver doesn't do mDNS and a direct app-socket query gets the
unicast reply dropped). A DNS-SD service is discoverable through
Android's NsdManager, Bonjour, and Avahi.

Register `_decentscale._tcp` on port 80 (the HTTP/WS port) alongside
the existing hostname, with a friendly instance name and TXT records
so a discovery UI can label the device and connect without guessing:

  instance: "Half Decent Scale"
  port:     80
  TXT:      fw=FW: 3.0.9   model=hds   proto=ws   path=/snapshot

Uses the Arduino ESPmDNS wrapper (same ESP-IDF mdns component as the
raw mdns_service_add API). config.h is included for FIRMWARE_VER;
cast to const char* to disambiguate the addServiceTxt overloads.

Verified on device:
  dns-sd -B _decentscale._tcp        -> finds "Half Decent Scale"
  dns-sd -L "Half Decent Scale" ...  -> hds.local.:80
                                        path=/snapshot proto=ws
                                        model=hds fw=FW: 3.0.9

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@tadelv tadelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like ask you to move the websocket code into a file that is not hds.ino. Some separation of concerns would be nice to have.

Comment thread src/wifi_setup.cpp

WiFi.begin(params.getSSID(), params.getPass());
WiFi.setTxPower(WIFI_POWER_18_5dBm);
// Leaving WiFi modem-sleep at the Arduino default (WIFI_PS_MIN_MODEM):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment here seems redundant?

Comment thread src/hds.ino
if (command == "rate" && doc["value"].is<const char *>()) {
return setWebsocketRateFromValue(client, doc["value"].as<String>());
}
if (command == "led" &&
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LedRGB seems to have snuck in somehow, since it doesn't really do anything, could you remove all the references and code related to it?

Copy link
Copy Markdown
Collaborator

@tadelv tadelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict behavioral pass — four findings beyond the cleanup/structure notes I left inline. None block the protocol design, but #1 and #2 undercut the stated multi-client goal/threat model, and #3#4 make status frames report state that isn't real.

Comment thread src/hds.ino
if (websocket.availableForWriteAll() > 0) {
if (!b_adc_recovery_active) {
websocket.printfAll("{ \"grams\": %.2f, \"ms\": %lu }", f_displayedValue, current);
websocket.printfAll("{\"grams\":%.2f,\"ms\":%lu}", f_displayedValue, current);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

availableForWriteAll() returns the minimum writable across all clients, so this all-or-nothing gate couples every client to the slowest one: if any single client's TX buffer is full (slow/dead socket), the weight broadcast — and the periodic status below — is skipped for everyone. That defeats the PR's multi-client independence goal. Consider iterating clients and sending per-client (check each client->canSend()) instead of gating the shared printfAll.

Comment thread src/hds.ino
String msg((const char *)data, len);
Serial.print("Websocket recv: ");
Serial.println(msg);
handleWebsocketRateCommand(client, msg);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trust-model note: pre-PR this socket was read-only weight + tare. It now accepts power off, soft_sleep on, display off, led, timer, etc. with no authentication. Anyone on the LAN can power the scale off (and power off can't be woken remotely) or sleep it — easy griefing/DoS. If a trusted-LAN assumption is intended, please state it explicitly; otherwise consider gating the destructive commands.

Comment thread src/hds.ino
}

if (command == "timer") {
// StopWatch is just bool/uint32 state — safe to mutate from the AsyncTCP
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"safe to mutate from the AsyncTCP task" isn't quite right. StopWatch is multi-field (running flag + start timestamp + accumulator), and it's also mutated from the main loop (hds.ino shot logic), BLE (ble.h), and USB (usbcomm.h). Concurrent start/stop/reset here plus elapsed()/isRunning() reads in the status frame can tear or read mid-update across cores. requestRemoteTare() already uses a portMUX for exactly this reason — the timer path warrants the same guarding rather than an assumption that bool/uint32 writes are atomic-as-a-group.

Comment thread src/hds.ino
return true;
}

if (command == "low_power") {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low_power on defers setContrast(0), but the button-press path sets setContrast(255) (hds.ino:94). After any button press the contrast reverts to max while b_websocketLowPowerEnabled stays true, so the status frame reports low_power:true that no longer matches the hardware.

Comment thread src/hds.ino
Serial.println("Websocket soft sleep on detected.");
// Set state flags synchronously so status reflects the requested mode.
// u8g2 + GPIO power-rail writes are deferred to the main loop.
b_softSleep = true;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soft_sleep on calls setPowerSave(1) (display off) but never sets b_u8g2Sleep = true. Since the status frame derives display_on from b_u8g2Sleep, it reports display_on:true while the OLED is physically off during soft sleep. soft_sleep off does set b_u8g2Sleep = false, so the pair is asymmetric — set b_u8g2Sleep = true here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants