Skip to content

feat: verified-credential lifecycle — close the silent-failure window#1259

Merged
raman325 merged 7 commits into
mainfrom
feat/phase2-verified-credential-lifecycle
Jun 15, 2026
Merged

feat: verified-credential lifecycle — close the silent-failure window#1259
raman325 merged 7 commits into
mainfrom
feat/phase2-verified-credential-lifecycle

Conversation

@raman325

@raman325 raman325 commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Proposed change

Phase 2 of the credential-write reliability work (follows #1258). Closes the residual silent-failure window that #1258 deliberately left open: a Z-Wave write returning an ambiguous ERROR_UNKNOWN where the code did not land was treated as a completed set, so the slot could report in-sync with no working code.

Introduces a per-slot verified / unverified lifecycle. An optimistic write (ambiguous, treated as completed) is not trusted until the lock actually confirms it. If confirmation never comes, the slot re-syncs and, after repeated failures, visibly suspends — never a silent in-sync.

Confirmation is an active read-back, not a wait for a push

The central finding of this work (traced end-to-end through node-zwave-jszwave-js-server → the python client): a masked-accept write — our only source of OPTIMISTIC — emits no confirming credential event at all. In node-zwave-js AccessControl.setCredential, node.emit("credential added/modified") is gated on if (succeeded), and succeeded is the driver's own verified.userCode === codeData check — the exact check that fails on a lock that reports codes back masked (the #1251 root cause). So the event and the Error_Unknown return are mutually exclusive.

Because no push arrives, the lifecycle actively reads the slot back immediately after an optimistic write (coordinator.async_confirm_pending_writes()). LCM projects masked codes to unreadable() (present), so its read confirms the slot where the driver's own verify could not. This is order-independent (depends on no event timing) and non-fatal on a read failure (the slot stays pending and the sync tick reconciles it within the TTL).

Design (full spec was developed locally)

  • WriteResult enum replaces the bool return of async_set_credential: NO_CHANGE / CONFIRMED / OPTIMISTIC. Only the Z-Wave unified ERROR_UNKNOWN path returns OPTIMISTIC today; everything else is CONFIRMED (authoritative) — so poll/eventual-consistency providers (Schlage, ZHA, …) are unaffected and keep their existing convergence.
  • Coordinator gains a parallel _verified map + push_update(optimistic=); is_verified(slot) defaults to verified for absent slots; mark_verified(slot) clears a stale unverified flag; async_confirm_pending_writes() is the on-demand read-back backstop.
  • BaseLock tracks _pending_writes (slot → believed pin + 60s deadline); _record_optimistic_write, _confirm_slot (a present observation confirms a pending write, keeping the believed value even when the lock reports it masked; a differing readable observation is taken as an external change). The seam drives the read-back on an optimistic result.
  • coordinator._apply_read: the dropped-/no-push backstop for genuine reads — confirms a pending write when the slot is observed present, keeps waiting when absent, marks non-pending reads verified.
  • Sync state machine: SyncState.PENDING_CONFIRMATION; calculate_in_sync gates on is_verified; an optimistic write parks in PENDING_CONFIRMATION (not penalised before confirmation); the breaker is charged once on timeout, then re-syncs on the next tick (no same-tick double-charge).
  • Frontend: the slot card renders the new state as "Confirming" (mdi:progress-clock).

Why this isn't over-restrictive or over-loose

  • Not over-restrictive: a masked-but-accepted write is confirmed by the immediate read-back (or, failing that, the sync tick) rather than waiting on a push that never arrives, so it is not driven to a false suspend. An authoritative CONFIRMED write also clears any stale unverified flag so a recovered slot can converge.
  • Not over-loose: a write the lock never committed reads back absent → stays pending → re-syncs → after repeated timeouts the breaker trips and the slot suspends (visible), instead of silently reporting in-sync.

Reviews

Two adversarial review passes (stepping through every initial state × provider, stress-testing both success and failure paths) drove additional hardening: clearing the stale unverified flag on CONFIRMED, removing the expiry/sync same-tick double-charge, taking a differing readable observation as an external change, and — most importantly — replacing the assumed "push/hourly-refresh confirms it" path with the empirically-verified active read-back above.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • Built in behavior-isolated commits: coordinator flag + WriteResult enum → base plumbing → state machine → make it live → two rounds of review-driven hardening. Each commit keeps the suite green.
  • Full Python suite green (1269) + 709 frontend tests. New/updated tests: coordinator verified/optimistic, _apply_read, and the read-back confirm (present-masked / absent / read-failure / no-op); base record/confirm (incl. readable external change); sync verified-gate, PENDING_CONFIRMATION hold, confirmed→IN_SYNC, expiry→failure+resync (two-tick); seam OPTIMISTIC drives the confirm read; frontend pending_confirmation render.
  • The User Code CC fallback path is a known, documented exception: it confirms inline via its V1 verify poll and does not yet participate in the lifecycle. It is the temporary workaround (removed once upstream node-zwave-js fixes the unified API), and its looseness is pre-existing 3.x behavior, not a regression.
  • Related: closes the deferred follow-up from fix(zwave_js,matter): tolerate ambiguous credential writes; universal masked read projection (#1251, #1257) #1258.

🤖 Generated with Claude Code

raman325 and others added 4 commits June 13, 2026 15:55
… (steps 1-2)

First two behavior-neutral steps of the push-as-commit / verified-credential
lifecycle (docs/phase2-push-as-commit-spec.md), closing the residual
silent-failure window left open by PR #1258.

Step 1 - coordinator verified state:
- LockUsercodeUpdateCoordinator gains a parallel ``_verified: dict[int, bool]``
  kept in lockstep with ``data``, plus ``is_verified(slot)`` (absent -> True).
- ``push_update`` takes ``optimistic: bool = False``; the default marks slots
  verified (every existing caller unchanged), ``optimistic=True`` marks them
  unverified. A verified-flag-only change updates the map without re-notifying.

Step 2 - WriteResult enum:
- ``async_set_credential`` / ``async_set_usercode`` / ``_set_credential`` return
  ``WriteResult`` (NO_CHANGE / CONFIRMED / OPTIMISTIC) instead of ``bool``.
- All seven providers migrated to CONFIRMED/NO_CHANGE (pure rename; no behavior
  change yet -- OPTIMISTIC is wired in a later step).
- ``async_internal_set_usercode`` consumes ``result.changed`` for the refresh
  decision. Tests/mocks updated to the enum.

Nothing reads ``is_verified`` or returns ``OPTIMISTIC`` yet, so behavior is
identical; the lifecycle switch lands in the state-machine step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Entire-Checkpoint: 18c469f4c1f4
…step 3)

Adds the BaseLock plumbing for the verified-credential lifecycle (dormant
until a provider returns WriteResult.OPTIMISTIC in a later step):

- _pending_writes: slot -> (believed_pin, monotonic deadline); PENDING_WRITE_TTL=60s.
- _push_credential_update(..., optimistic=) threads the flag (kwarg only passed
  when optimistic, so verified pushes keep their call shape).
- _record_optimistic_write / _confirm_slot / _expire_pending_writes.
- The seam records an optimistic write when async_set_credential returns
  OPTIMISTIC (no provider does yet, so behavior is unchanged).

Direct unit tests cover record/confirm/expire. Full suite green (1259).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Entire-Checkpoint: 0a470d2b2493
…ep 4)

Wires the verified-credential lifecycle into the sync state machine (still
dormant -- no provider returns OPTIMISTIC until step 5, so behavior is
unchanged for all current providers):

- SyncState.PENDING_CONFIRMATION: an optimistic write awaiting confirmation.
- calculate_in_sync gates on coordinator.is_verified(slot): an unverified slot
  is never in sync (absent flag -> verified, so a no-op for non-optimistic
  providers).
- The tick parks a slot with an unexpired pending write in PENDING_CONFIRMATION
  (no re-write while waiting); on timeout it records a slot-breaker failure and
  falls through to re-sync, so an unconfirmed write ends in a visible suspend
  rather than a silent in-sync.

Tests: verified gate, PENDING_CONFIRMATION hold, expiry-records-failure+resync.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Entire-Checkpoint: 84612b50267b
…nd (step 5)

Makes the verified-credential lifecycle live for the Z-Wave unified path and
adds the dropped-push backstop:

- zwave async_set_credential returns WriteResult.OPTIMISTIC (was CONFIRMED) for
  a driver ERROR_UNKNOWN: the seam records it pending + pushes unverified, so a
  masked-but-unconfirmed write is reconciled rather than trusted.
- zwave credential push handlers route through _confirm_slot: a credential event
  confirms a pending write (keeping the believed value even when the lock reports
  it masked); an external change is taken as-is.
- coordinator._apply_read: a genuine read (poll or hard refresh) confirms a
  pending write when the slot is observed present (the dropped-push backstop),
  keeps waiting when still absent, and marks non-pending reads verified.
- sync tick: an optimistic write parks in PENDING_CONFIRMATION after _perform_sync
  instead of being judged immediately -- so a masked write is not penalised
  before its confirming event arrives. The breaker is charged only on timeout.
- frontend: slot-card renders the new pending_confirmation status as
  "Confirming" (mdi:progress-clock); +1 frontend test.

End-to-end: optimistic write -> PENDING_CONFIRMATION -> (confirm -> IN_SYNC) or
(repeated timeout -> visible suspend), never a silent in-sync. Full suite green
(1263) + 709 frontend tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Entire-Checkpoint: 6e93c13cbac8
Copilot AI review requested due to automatic review settings June 13, 2026 20:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions github-actions Bot added python Pull requests that update Python code javascript Pull requests that update javascript code bug Something isn't working labels Jun 13, 2026
@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.15%. Comparing base (ebac273) to head (a5f499b).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
custom_components/lock_code_manager/domain/sync.py 86.95% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #1259    +/-   ##
========================================
  Coverage   97.15%   97.15%            
========================================
  Files          54       54            
  Lines        6434     6540   +106     
  Branches      461      462     +1     
========================================
+ Hits         6251     6354   +103     
- Misses        183      186     +3     
Flag Coverage Δ
python 97.70% <97.63%> (-0.02%) ⬇️
typescript 95.08% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...components/lock_code_manager/domain/coordinator.py 97.46% <100.00%> (+0.91%) ⬆️
...components/lock_code_manager/domain/credentials.py 100.00% <100.00%> (ø)
...stom_components/lock_code_manager/domain/models.py 100.00% <100.00%> (ø)
...om_components/lock_code_manager/providers/_base.py 97.05% <100.00%> (+0.17%) ⬆️
...onents/lock_code_manager/providers/_zwave_js_uc.py 100.00% <100.00%> (ø)
...m_components/lock_code_manager/providers/akuvox.py 95.70% <100.00%> (ø)
...m_components/lock_code_manager/providers/matter.py 100.00% <100.00%> (ø)
..._components/lock_code_manager/providers/schlage.py 96.29% <100.00%> (ø)
..._components/lock_code_manager/providers/virtual.py 98.50% <100.00%> (ø)
...stom_components/lock_code_manager/providers/zha.py 94.57% <100.00%> (ø)
... and 4 more
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

raman325 and others added 2 commits June 13, 2026 16:50
Address the verified-credential-lifecycle review (concern: overly loose
or overly restrictive end states):

- Over-restrictive false suspend: a slot left unverified by an optimistic
  write that later resolves to an authoritative WriteResult.CONFIRMED was
  never re-marked verified (the CONFIRMED path neither cleared the pending
  entry nor the unverified flag), so calculate_in_sync stayed False and the
  slot churned to a suspend despite the lock acknowledging the write. The
  seam now clears the pending entry and calls coordinator.mark_verified on
  CONFIRMED.

- Over-restrictive double breaker charge: on pending-write expiry the tick
  charged one failure and then fell through to _perform_sync, which could
  charge a second failure in the same tick (halving the effective window).
  Expiry now records one failure, parks at OUT_OF_SYNC, and re-syncs on the
  next tick -- which also gives a confirming push that lands between ticks a
  chance to resolve the slot first.

- _confirm_slot masked a readable external change: a readable observation of
  a *different* code during the pending window is an external change and is
  now surfaced instead of being overwritten with the believed value.

- Remove dead _expire_pending_writes (the tick expires inline per slot).

The over-loose User Code CC fallback path (set returns CONFIRMED even when
the V1 verify poll can't confirm) is documented as a known, pre-existing
(non-regression) limitation: that path confirms inline via the poll rather
than via a later push, so wiring it into the lifecycle needs it to own its
pending bookkeeping -- deferred, since the path is temporary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 81b909299405
… a push

Empirical trace through node-zwave-js + zwave-js-server + the python client
shows a masked-accept write (the #1251 case, our only WriteResult.OPTIMISTIC
producer) emits NO confirming credential event: in
feature-apis/AccessControl.ts setCredential, node.emit("credential
added/modified") is gated on `if (succeeded)`, and `succeeded` is the driver's
own `verified.userCode === codeData` check -- the exact check that fails on a
masked lock. The event and the Error_Unknown return are mutually exclusive, so
there is nothing to wait for and nothing to clobber.

That made the lifecycle wait for a push that never arrives. The only backstop
was the hourly drift refresh, but the breaker suspends at 3 failures / 5-min
window (~3 min), so every masked-accept lock was being suspended -- a
regression from 3.x, which trusted the masked write.

Fix: the seam's OPTIMISTIC branch now calls
coordinator.async_confirm_pending_writes() -- an on-demand hard read right
after recording the optimistic write. LCM projects masked codes to
unreadable() (present), so _apply_read confirms the slot where the driver's
verify could not; a genuinely-absent slot stays pending and re-syncs on the
next tick. This is order-independent (depends on no event) and non-fatal on a
read failure (no backoff; the tick/TTL reconciles).

Tests: 5 new (coordinator confirm: present-masked/absent/read-failure/no-op;
seam: OPTIMISTIC drives the confirm read). Full suite 1269 passing; prek clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1a64c8fdb5ac
@raman325 raman325 changed the title feat: verified-credential lifecycle (push-as-commit) — close the silent-failure window feat: verified-credential lifecycle — close the silent-failure window Jun 14, 2026
@raman325 raman325 added the lcm-minor Minor version bump label Jun 15, 2026
…ycle

Code-review follow-up:

- Over-restrictive (must-fix): a stale optimistic write blocked reconciliation
  of a changed desired state. The PENDING_CONFIRMATION tick guard keyed only on
  the pending entry + deadline, so if the user edited the PIN or disabled the
  slot while a write was outstanding, the slot held PENDING_CONFIRMATION (and
  re-synced the OLD value) for up to the 60s TTL, then charged a spurious
  breaker failure. The guard now drops a pending entry that no longer matches
  the desired state and re-syncs the new target without charging the breaker. A
  clear also pops any pending set up front (a clear supersedes it).

- The on-demand confirm read could escape the set seam: it caught only
  LockCodeManagerError, so a non-LCM error (or a _normalize_keys failure) would
  propagate into the seam and suspend the slot, violating the "best-effort,
  never fatal" contract. It now also swallows unexpected errors (logged) and
  leaves the pending write for the tick to reconcile.

- _apply_read now mirrors _confirm_slot's readable-mismatch guard: a genuine
  read observing a *readable* code different from a pending write takes the
  observation (an external change) instead of masking it with the believed
  value -- otherwise a drift refresh, whose purpose is to surface out-of-band
  changes, would silently overwrite one.

Full suite green (1273); prek clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b2aa9ea43c2e
@raman325 raman325 merged commit 89841f4 into main Jun 15, 2026
19 checks passed
@raman325 raman325 deleted the feat/phase2-verified-credential-lifecycle branch June 15, 2026 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working javascript Pull requests that update javascript code lcm-minor Minor version bump python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants