fix: unsynchronized reads of OSIdentityModel state (jwt)#1665
Conversation
fix unsynchronized reads of OSIdentityModel state (jwtBearerToken in particular)
Two follow-on JWT concurrency issues exposed while reviewing the prior fix. 1. OSIdentityModelRepo.updateJwtToken fired the model's change notifier synchronously (→ onModelUpdated → onJwtTokenChanged → executor listeners) while still holding the repo's NSLock. Today nothing re-enters the repo lock so it doesn't deadlock by luck, but it's a trap for any future listener. The fix collects matching models under the lock and mutates them outside, so the notifier fires lock-free. 2. invalidateJwtForExternalId had a TOCTOU between its "is it already invalid?" read and the "set to invalid" write. A concurrent valid-token write landing between them would be overwritten with INVALID and trigger a needless re-auth. The transition is now an atomic compare-and-set on the model (invalidateJwtBearerToken); only the thread that wins the transition fires fireJwtExpired. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
This PR touches JWT/auth code with subtle concurrency reasoning (lock consolidation, TOCTOU fix, notifier-outside-lock pattern). No bugs were found, but the security-sensitive nature warrants a human review.
Extended reasoning...
Overview
The PR consolidates aliasesLock into a single lock that now also guards jwtBearerToken, converts jwtBearerToken into a computed property backed by a locked private field, adds two new APIs (getValidJwt() for atomic read+validate and invalidateJwtBearerToken() for atomic compare-and-set), and updates all call sites in OneSignalUserManagerImpl and OSUserRequest to use them. OSIdentityModelRepo.updateJwtToken is restructured to snapshot matching models under the repo lock and only call the setter outside the lock to avoid deadlock against the change-notifier → onJwtTokenChanged chain.
Security risks
This touches authentication-adjacent code (JWT bearer token lifecycle). The risks are not in injection/exposure shape but in correctness: if locking is wrong, a stale or already-invalidated token could be used on a request, or invalidateJwtForExternalId could double-fire / fail to fire the JWT-invalidated listener. The new getValidJwt() correctly snapshots once, and invalidateJwtBearerToken() does the check-and-set atomically and only returns true on actual transition, which fixes the prior TOCTOU between jwtBearerToken != OS_JWT_TOKEN_INVALID and the subsequent write. The setter intentionally invokes self.set(property:...) outside the lock — the inline comment explains this is to avoid deadlocks with listeners that take other locks, which matches the deadlock pattern that updateJwtToken was also restructured to avoid.
Level of scrutiny
High. Concurrency + auth code changes deserve careful human eyes even when the diff is small. The reasoning is sound on inspection, but assessing reentrancy and deadlock correctness fully requires tracing every listener attached to the identity models changeNotifier (including OSIdentityModelRepo.onModelUpdated → jwtConfig.onJwtTokenChanged) and confirming none take a lock that any caller of the setter already holds. The PR description is also the unfilled template (no motivation/scope/testing notes), and the affected-code and testing checklists are unchecked, which makes it harder to assess intended scope and test coverage from the PR alone.\n\n### Other factors\nNo bugs were found by the bug hunting system and the changes look internally consistent. However, the title ("identity verification crashes") implies this is a crash fix that should be validated against the original crash repro — thats something a human owner of this area is best positioned to confirm.
Description
One Line Summary
Fix production crashes on Identity Verification beta caused by unsynchronized access to
OSIdentityModel.jwtBearerTokenand two adjacent JWT-lifecycle hazards.Details
Motivation
8 production crashes reported
Two adjacent issues were exposed while fixing the first and are addressed in the same PR:
OSIdentityModelRepo.updateJwtTokenfired the model's change notifier (→onModelUpdated→onJwtTokenChanged→ executor listeners) synchronously while holding the repo'sNSLock. Nothing re-enters today, so it deadlocks-by-luck, but it's a trap for any future listener.invalidateJwtForExternalIdhad a TOCTOU between its "already invalid?" read and the "set to invalid" write. A concurrent valid-token write landing between them would be overwritten withOS_JWT_TOKEN_INVALIDand trigger a needless re-auth.Scope
Affects the JWT (Identity Verification) hot paths only: how
OSIdentityModel.jwtBearerTokenis read/written, how the model repo propagates token updates to listeners, and how invalidation transitions. No public API changes. Network payloads, request shapes, and persisted UserDefaults schema are unchanged (encode/decodecontinues to write/read the sameOS_JWT_BEARER_TOKENkey — just through a renamed private backing storage).What changed
jwtBearerTokenis backed by a privatejwtBearerTokenLocked(commented "only read/write underself.lock") with a locked getter and a locked setter. The change notifier fires outside the lock —NSRecursiveLockonly saves us from same-thread re-entry, so firing the notifier under the lock would be a deadlock waiting to happen as soon as a listener grabs another lock.encode(with:)andinit?(coder:)access the backing storage directly (already inside the lock / no race during init).isJwtValid() -> Bool→getValidJwt() -> String?. Snapshots once and returns the usable value, eliminating the read-then-check race callers were prone to.addJWTHeaderIsValidandgetFullPushHeaderupdated to consume it.OSIdentityModelRepo.updateJwtTokennow snapshots matching models under the repo lock and mutates them outside, so the model's change notifier fires lock-free.OSIdentityModel.invalidateJwtBearerToken()performs an atomic compare-and-set toOS_JWT_TOKEN_INVALIDand returns whether the transition occurred.invalidateJwtForExternalIdonly firesfireJwtExpiredwhen it wins the transition.Testing
Unit testing
Existing
OneSignalUserTestscontinue to pass; they exercise the JWT setter, invalidation, and re-queue paths covered by this change. No new unit tests added in this PR — a TSan stress test that raceslogin(externalId:token:)against executor flushing is planned as a follow-up.Manual testing
login(externalId, token:)on main +addTagbursts on a background queue against a JWT-enabled app: no crashes, no spuriousJwtExpiredevents.updateRequestQueuein UserDefaults: decode succeeds and queued requests drain normally (encode/decode now usesjwtBearerTokenLockeddirectly).loginwith same external ID → property update burst) on the dev app; no reproduction of the 8 crash signatures.Affected code checklist
Checklist
Overview
Testing
Final pass