feat: verified-credential lifecycle — close the silent-failure window#1259
Merged
Conversation
… (steps 1-2) First two behavior-neutral steps of the push-as-commit / verified-credential lifecycle (docs/phase2-push-as-commit-spec.md), closing the residual silent-failure window left open by PR #1258. Step 1 - coordinator verified state: - LockUsercodeUpdateCoordinator gains a parallel ``_verified: dict[int, bool]`` kept in lockstep with ``data``, plus ``is_verified(slot)`` (absent -> True). - ``push_update`` takes ``optimistic: bool = False``; the default marks slots verified (every existing caller unchanged), ``optimistic=True`` marks them unverified. A verified-flag-only change updates the map without re-notifying. Step 2 - WriteResult enum: - ``async_set_credential`` / ``async_set_usercode`` / ``_set_credential`` return ``WriteResult`` (NO_CHANGE / CONFIRMED / OPTIMISTIC) instead of ``bool``. - All seven providers migrated to CONFIRMED/NO_CHANGE (pure rename; no behavior change yet -- OPTIMISTIC is wired in a later step). - ``async_internal_set_usercode`` consumes ``result.changed`` for the refresh decision. Tests/mocks updated to the enum. Nothing reads ``is_verified`` or returns ``OPTIMISTIC`` yet, so behavior is identical; the lifecycle switch lands in the state-machine step. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Entire-Checkpoint: 18c469f4c1f4
…step 3) Adds the BaseLock plumbing for the verified-credential lifecycle (dormant until a provider returns WriteResult.OPTIMISTIC in a later step): - _pending_writes: slot -> (believed_pin, monotonic deadline); PENDING_WRITE_TTL=60s. - _push_credential_update(..., optimistic=) threads the flag (kwarg only passed when optimistic, so verified pushes keep their call shape). - _record_optimistic_write / _confirm_slot / _expire_pending_writes. - The seam records an optimistic write when async_set_credential returns OPTIMISTIC (no provider does yet, so behavior is unchanged). Direct unit tests cover record/confirm/expire. Full suite green (1259). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Entire-Checkpoint: 0a470d2b2493
…ep 4) Wires the verified-credential lifecycle into the sync state machine (still dormant -- no provider returns OPTIMISTIC until step 5, so behavior is unchanged for all current providers): - SyncState.PENDING_CONFIRMATION: an optimistic write awaiting confirmation. - calculate_in_sync gates on coordinator.is_verified(slot): an unverified slot is never in sync (absent flag -> verified, so a no-op for non-optimistic providers). - The tick parks a slot with an unexpired pending write in PENDING_CONFIRMATION (no re-write while waiting); on timeout it records a slot-breaker failure and falls through to re-sync, so an unconfirmed write ends in a visible suspend rather than a silent in-sync. Tests: verified gate, PENDING_CONFIRMATION hold, expiry-records-failure+resync. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Entire-Checkpoint: 84612b50267b
…nd (step 5) Makes the verified-credential lifecycle live for the Z-Wave unified path and adds the dropped-push backstop: - zwave async_set_credential returns WriteResult.OPTIMISTIC (was CONFIRMED) for a driver ERROR_UNKNOWN: the seam records it pending + pushes unverified, so a masked-but-unconfirmed write is reconciled rather than trusted. - zwave credential push handlers route through _confirm_slot: a credential event confirms a pending write (keeping the believed value even when the lock reports it masked); an external change is taken as-is. - coordinator._apply_read: a genuine read (poll or hard refresh) confirms a pending write when the slot is observed present (the dropped-push backstop), keeps waiting when still absent, and marks non-pending reads verified. - sync tick: an optimistic write parks in PENDING_CONFIRMATION after _perform_sync instead of being judged immediately -- so a masked write is not penalised before its confirming event arrives. The breaker is charged only on timeout. - frontend: slot-card renders the new pending_confirmation status as "Confirming" (mdi:progress-clock); +1 frontend test. End-to-end: optimistic write -> PENDING_CONFIRMATION -> (confirm -> IN_SYNC) or (repeated timeout -> visible suspend), never a silent in-sync. Full suite green (1263) + 709 frontend tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Entire-Checkpoint: 6e93c13cbac8
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1259 +/- ##
========================================
Coverage 97.15% 97.15%
========================================
Files 54 54
Lines 6434 6540 +106
Branches 461 462 +1
========================================
+ Hits 6251 6354 +103
- Misses 183 186 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Address the verified-credential-lifecycle review (concern: overly loose or overly restrictive end states): - Over-restrictive false suspend: a slot left unverified by an optimistic write that later resolves to an authoritative WriteResult.CONFIRMED was never re-marked verified (the CONFIRMED path neither cleared the pending entry nor the unverified flag), so calculate_in_sync stayed False and the slot churned to a suspend despite the lock acknowledging the write. The seam now clears the pending entry and calls coordinator.mark_verified on CONFIRMED. - Over-restrictive double breaker charge: on pending-write expiry the tick charged one failure and then fell through to _perform_sync, which could charge a second failure in the same tick (halving the effective window). Expiry now records one failure, parks at OUT_OF_SYNC, and re-syncs on the next tick -- which also gives a confirming push that lands between ticks a chance to resolve the slot first. - _confirm_slot masked a readable external change: a readable observation of a *different* code during the pending window is an external change and is now surfaced instead of being overwritten with the believed value. - Remove dead _expire_pending_writes (the tick expires inline per slot). The over-loose User Code CC fallback path (set returns CONFIRMED even when the V1 verify poll can't confirm) is documented as a known, pre-existing (non-regression) limitation: that path confirms inline via the poll rather than via a later push, so wiring it into the lifecycle needs it to own its pending bookkeeping -- deferred, since the path is temporary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 81b909299405
… a push Empirical trace through node-zwave-js + zwave-js-server + the python client shows a masked-accept write (the #1251 case, our only WriteResult.OPTIMISTIC producer) emits NO confirming credential event: in feature-apis/AccessControl.ts setCredential, node.emit("credential added/modified") is gated on `if (succeeded)`, and `succeeded` is the driver's own `verified.userCode === codeData` check -- the exact check that fails on a masked lock. The event and the Error_Unknown return are mutually exclusive, so there is nothing to wait for and nothing to clobber. That made the lifecycle wait for a push that never arrives. The only backstop was the hourly drift refresh, but the breaker suspends at 3 failures / 5-min window (~3 min), so every masked-accept lock was being suspended -- a regression from 3.x, which trusted the masked write. Fix: the seam's OPTIMISTIC branch now calls coordinator.async_confirm_pending_writes() -- an on-demand hard read right after recording the optimistic write. LCM projects masked codes to unreadable() (present), so _apply_read confirms the slot where the driver's verify could not; a genuinely-absent slot stays pending and re-syncs on the next tick. This is order-independent (depends on no event) and non-fatal on a read failure (no backoff; the tick/TTL reconciles). Tests: 5 new (coordinator confirm: present-masked/absent/read-failure/no-op; seam: OPTIMISTIC drives the confirm read). Full suite 1269 passing; prek clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1a64c8fdb5ac
5 tasks
…ycle Code-review follow-up: - Over-restrictive (must-fix): a stale optimistic write blocked reconciliation of a changed desired state. The PENDING_CONFIRMATION tick guard keyed only on the pending entry + deadline, so if the user edited the PIN or disabled the slot while a write was outstanding, the slot held PENDING_CONFIRMATION (and re-synced the OLD value) for up to the 60s TTL, then charged a spurious breaker failure. The guard now drops a pending entry that no longer matches the desired state and re-syncs the new target without charging the breaker. A clear also pops any pending set up front (a clear supersedes it). - The on-demand confirm read could escape the set seam: it caught only LockCodeManagerError, so a non-LCM error (or a _normalize_keys failure) would propagate into the seam and suspend the slot, violating the "best-effort, never fatal" contract. It now also swallows unexpected errors (logged) and leaves the pending write for the tick to reconcile. - _apply_read now mirrors _confirm_slot's readable-mismatch guard: a genuine read observing a *readable* code different from a pending write takes the observation (an external change) instead of masking it with the believed value -- otherwise a drift refresh, whose purpose is to surface out-of-band changes, would silently overwrite one. Full suite green (1273); prek clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b2aa9ea43c2e
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed change
Phase 2 of the credential-write reliability work (follows #1258). Closes the residual silent-failure window that #1258 deliberately left open: a Z-Wave write returning an ambiguous
ERROR_UNKNOWNwhere the code did not land was treated as a completed set, so the slot could report in-sync with no working code.Introduces a per-slot verified / unverified lifecycle. An optimistic write (ambiguous, treated as completed) is not trusted until the lock actually confirms it. If confirmation never comes, the slot re-syncs and, after repeated failures, visibly suspends — never a silent in-sync.
Confirmation is an active read-back, not a wait for a push
The central finding of this work (traced end-to-end through
node-zwave-js→zwave-js-server→ the python client): a masked-accept write — our only source ofOPTIMISTIC— emits no confirming credential event at all. Innode-zwave-jsAccessControl.setCredential,node.emit("credential added/modified")is gated onif (succeeded), andsucceededis the driver's ownverified.userCode === codeDatacheck — the exact check that fails on a lock that reports codes back masked (the #1251 root cause). So the event and theError_Unknownreturn are mutually exclusive.Because no push arrives, the lifecycle actively reads the slot back immediately after an optimistic write (
coordinator.async_confirm_pending_writes()). LCM projects masked codes tounreadable()(present), so its read confirms the slot where the driver's own verify could not. This is order-independent (depends on no event timing) and non-fatal on a read failure (the slot stays pending and the sync tick reconciles it within the TTL).Design (full spec was developed locally)
WriteResultenum replaces theboolreturn ofasync_set_credential:NO_CHANGE/CONFIRMED/OPTIMISTIC. Only the Z-Wave unifiedERROR_UNKNOWNpath returnsOPTIMISTICtoday; everything else isCONFIRMED(authoritative) — so poll/eventual-consistency providers (Schlage, ZHA, …) are unaffected and keep their existing convergence._verifiedmap +push_update(optimistic=);is_verified(slot)defaults to verified for absent slots;mark_verified(slot)clears a stale unverified flag;async_confirm_pending_writes()is the on-demand read-back backstop._pending_writes(slot → believed pin + 60s deadline);_record_optimistic_write,_confirm_slot(a present observation confirms a pending write, keeping the believed value even when the lock reports it masked; a differing readable observation is taken as an external change). The seam drives the read-back on an optimistic result.coordinator._apply_read: the dropped-/no-push backstop for genuine reads — confirms a pending write when the slot is observed present, keeps waiting when absent, marks non-pending reads verified.SyncState.PENDING_CONFIRMATION;calculate_in_syncgates onis_verified; an optimistic write parks in PENDING_CONFIRMATION (not penalised before confirmation); the breaker is charged once on timeout, then re-syncs on the next tick (no same-tick double-charge).mdi:progress-clock).Why this isn't over-restrictive or over-loose
CONFIRMEDwrite also clears any stale unverified flag so a recovered slot can converge.Reviews
Two adversarial review passes (stepping through every initial state × provider, stress-testing both success and failure paths) drove additional hardening: clearing the stale unverified flag on
CONFIRMED, removing the expiry/sync same-tick double-charge, taking a differing readable observation as an external change, and — most importantly — replacing the assumed "push/hourly-refresh confirms it" path with the empirically-verified active read-back above.Type of change
Additional information
WriteResultenum → base plumbing → state machine → make it live → two rounds of review-driven hardening. Each commit keeps the suite green._apply_read, and the read-back confirm (present-masked / absent / read-failure / no-op); base record/confirm (incl. readable external change); sync verified-gate, PENDING_CONFIRMATION hold, confirmed→IN_SYNC, expiry→failure+resync (two-tick); seam OPTIMISTIC drives the confirm read; frontendpending_confirmationrender.node-zwave-jsfixes the unified API), and its looseness is pre-existing 3.x behavior, not a regression.🤖 Generated with Claude Code