Skip to content

fix: unsynchronized reads of OSIdentityModel state (jwt)#1665

Open
nan-li wants to merge 3 commits into
identity_verification_rebase_on_5.5.1from
fix/identity_verification_crashes
Open

fix: unsynchronized reads of OSIdentityModel state (jwt)#1665
nan-li wants to merge 3 commits into
identity_verification_rebase_on_5.5.1from
fix/identity_verification_crashes

Conversation

@nan-li
Copy link
Copy Markdown
Contributor

@nan-li nan-li commented May 28, 2026

Description

One Line Summary

Fix production crashes on Identity Verification beta caused by unsynchronized access to OSIdentityModel.jwtBearerToken and two adjacent JWT-lifecycle hazards.

Details

Motivation

8 production crashes reported

Two adjacent issues were exposed while fixing the first and are addressed in the same PR:

  • OSIdentityModelRepo.updateJwtToken fired the model's change notifier (→ onModelUpdatedonJwtTokenChanged → executor listeners) synchronously while holding the repo's NSLock. Nothing re-enters today, so it deadlocks-by-luck, but it's a trap for any future listener.
  • invalidateJwtForExternalId had a TOCTOU between its "already invalid?" read and the "set to invalid" write. A concurrent valid-token write landing between them would be overwritten with OS_JWT_TOKEN_INVALID and trigger a needless re-auth.

Scope

Affects the JWT (Identity Verification) hot paths only: how OSIdentityModel.jwtBearerToken is read/written, how the model repo propagates token updates to listeners, and how invalidation transitions. No public API changes. Network payloads, request shapes, and persisted UserDefaults schema are unchanged (encode/decode continues to write/read the same OS_JWT_BEARER_TOKEN key — just through a renamed private backing storage).

What changed

  1. jwtBearerToken is backed by a private jwtBearerTokenLocked (commented "only read/write under self.lock") with a locked getter and a locked setter. The change notifier fires outside the lock — NSRecursiveLock only saves us from same-thread re-entry, so firing the notifier under the lock would be a deadlock waiting to happen as soon as a listener grabs another lock. encode(with:) and init?(coder:) access the backing storage directly (already inside the lock / no race during init).
  2. isJwtValid() -> BoolgetValidJwt() -> String?. Snapshots once and returns the usable value, eliminating the read-then-check race callers were prone to. addJWTHeaderIsValid and getFullPushHeader updated to consume it.
  3. OSIdentityModelRepo.updateJwtToken now snapshots matching models under the repo lock and mutates them outside, so the model's change notifier fires lock-free.
  4. New OSIdentityModel.invalidateJwtBearerToken() performs an atomic compare-and-set to OS_JWT_TOKEN_INVALID and returns whether the transition occurred. invalidateJwtForExternalId only fires fireJwtExpired when it wins the transition.

Testing

Unit testing

Existing OneSignalUserTests continue to pass; they exercise the JWT setter, invalidation, and re-queue paths covered by this change. No new unit tests added in this PR — a TSan stress test that races login(externalId:token:) against executor flushing is planned as a follow-up.

Manual testing

  • Tight loop of login(externalId, token:) on main + addTag bursts on a background queue against a JWT-enabled app: no crashes, no spurious JwtExpired events.
  • Cold start with cached updateRequestQueue in UserDefaults: decode succeeds and queued requests drain normally (encode/decode now uses jwtBearerTokenLocked directly).
  • Manually replayed the customer's flow shape (token rotation → login with same external ID → property update burst) on the dev app; no reproduction of the 8 crash signatures.

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
    • If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
    • Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
  • I have reviewed this PR myself, ensuring it meets each checklist item
    • WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

nan-li and others added 3 commits May 28, 2026 10:46
fix unsynchronized reads of OSIdentityModel state (jwtBearerToken in particular)
Two follow-on JWT concurrency issues exposed while reviewing the prior fix.

1. OSIdentityModelRepo.updateJwtToken fired the model's change notifier
   synchronously (→ onModelUpdated → onJwtTokenChanged → executor listeners)
   while still holding the repo's NSLock. Today nothing re-enters the repo
   lock so it doesn't deadlock by luck, but it's a trap for any future
   listener. The fix collects matching models under the lock and mutates
   them outside, so the notifier fires lock-free.

2. invalidateJwtForExternalId had a TOCTOU between its "is it already
   invalid?" read and the "set to invalid" write. A concurrent valid-token
   write landing between them would be overwritten with INVALID and trigger
   a needless re-auth. The transition is now an atomic compare-and-set on
   the model (invalidateJwtBearerToken); only the thread that wins the
   transition fires fireJwtExpired.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
renaming a variable and cleaning up comments
@nan-li nan-li changed the title Fix/identity verification crashes fix: unsynchronized reads of OSIdentityModel state (jwt) May 28, 2026
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR touches JWT/auth code with subtle concurrency reasoning (lock consolidation, TOCTOU fix, notifier-outside-lock pattern). No bugs were found, but the security-sensitive nature warrants a human review.

Extended reasoning...

Overview

The PR consolidates aliasesLock into a single lock that now also guards jwtBearerToken, converts jwtBearerToken into a computed property backed by a locked private field, adds two new APIs (getValidJwt() for atomic read+validate and invalidateJwtBearerToken() for atomic compare-and-set), and updates all call sites in OneSignalUserManagerImpl and OSUserRequest to use them. OSIdentityModelRepo.updateJwtToken is restructured to snapshot matching models under the repo lock and only call the setter outside the lock to avoid deadlock against the change-notifier → onJwtTokenChanged chain.

Security risks

This touches authentication-adjacent code (JWT bearer token lifecycle). The risks are not in injection/exposure shape but in correctness: if locking is wrong, a stale or already-invalidated token could be used on a request, or invalidateJwtForExternalId could double-fire / fail to fire the JWT-invalidated listener. The new getValidJwt() correctly snapshots once, and invalidateJwtBearerToken() does the check-and-set atomically and only returns true on actual transition, which fixes the prior TOCTOU between jwtBearerToken != OS_JWT_TOKEN_INVALID and the subsequent write. The setter intentionally invokes self.set(property:...) outside the lock — the inline comment explains this is to avoid deadlocks with listeners that take other locks, which matches the deadlock pattern that updateJwtToken was also restructured to avoid.

Level of scrutiny

High. Concurrency + auth code changes deserve careful human eyes even when the diff is small. The reasoning is sound on inspection, but assessing reentrancy and deadlock correctness fully requires tracing every listener attached to the identity models changeNotifier (including OSIdentityModelRepo.onModelUpdatedjwtConfig.onJwtTokenChanged) and confirming none take a lock that any caller of the setter already holds. The PR description is also the unfilled template (no motivation/scope/testing notes), and the affected-code and testing checklists are unchecked, which makes it harder to assess intended scope and test coverage from the PR alone.\n\n### Other factors\nNo bugs were found by the bug hunting system and the changes look internally consistent. However, the title ("identity verification crashes") implies this is a crash fix that should be validated against the original crash repro — thats something a human owner of this area is best positioned to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant