fix(lockservice): fence stale binds by allocator epoch by iamlinjunhong · Pull Request #24906 · matrixorigin/matrixone

iamlinjunhong · 2026-06-09T10:55:37Z

What type of PR is this?

Which issue(s) this PR fixes:

issue #24896

What this PR does / why we need it:

fix(lockservice): fence stale binds by allocator epoch

qodo-code-review · 2026-06-09T10:55:41Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

aptend

Review Result

Approved. I did not find any blocking issues in this PR.

What I Checked

GetBind and KeepLockTableBind propagate the allocator epoch to CN.
CN stale-bind purge follows the existing LockTable.Version semantics: cached binds with Version < observed allocator epoch are stale allocator state.
main-branch bind-change handling was reviewed against the epoch purge path, including waiter cleanup and commit-side bind validation.
The added tests cover get-bind refresh, keepalive failure/epoch refresh, concurrent observation, remote/proxy bind purge, and waiter cleanup.

Local Verification

git diff --check origin/main...HEAD
go test ./pkg/lockservice -run 'Test(CommitDetectsStaleLocalBindAfterAllocatorRestart|AllocatorVersionZeroKeepsLocalBinds|AllocatorObserverDoesNotPurgeSameEpochBindVersions|GetBindAllowsRegressedAllocatorVersionWithoutPurging|KeepaliveEpochPurgeKeepsGroupMovePop|AllocatorObserverPurgesRemoteAndProxyLockTables|AllocatorObserverConcurrentKeepaliveAndGetBind|AllocatorObserverCloseWaitersOnStaleLocalBind)$'
timeout 180s go test ./pkg/lockservice

Note: the full go test ./pkg/lockservice hit the 180s local timeout without output. The PR-specific test set passed locally, and CI is green.

XuPeng-SH

I re-checked the latest head. I do not see a new high-confidence correctness bug to block on, and the unit coverage is already much stronger, but with a lockservice fix this concurrency-heavy I still think a few unhappy-path tests are missing before I am comfortable approving.\n\nThe main gaps I would still ask to close are:\n\n1. Repeated allocator restarts / superseded-ID accumulation\n The new fencing now relies on tracking superseded allocator identities, but I did not see a focused test that drives 3+ sequential allocator replacements and proves both correctness and bounded behavior over repeated generations.\n\n2. Client-side cancel before the remote lock request is actually sent\n The PR now keeps more remote-lock bookkeeping alive for later cleanup, and there is timeout coverage, but I did not find the specific unhappy path where the caller is canceled before the remote side ever received the lock request. I would like a test that proves the later unlock/close path handles that harmlessly.\n\n3. The narrow race between getLocalLockTable() and taking bindChangeMu.RLock()\n The latest code intentionally adds a second stale-bind recheck around that window. That is exactly the kind of defensive race fix that should be pinned by a test, and I did not find one that explicitly exercises a bind change in that gap.\n\nIf those concurrency-edge cases are covered, I am happy to recheck the latest head.

iamlinjunhong requested review from XuPeng-SH and aptend as code owners June 9, 2026 10:55

iamlinjunhong temporarily deployed to ci June 9, 2026 10:55 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 9, 2026 10:55 — with GitHub Actions Error

iamlinjunhong had a problem deploying to ci June 9, 2026 10:56 — with GitHub Actions Error

matrix-meow added the size/L Denotes a PR that changes [500,999] lines label Jun 9, 2026

mergify Bot temporarily deployed to ci June 9, 2026 10:56 Inactive

mergify Bot temporarily deployed to ci June 9, 2026 10:57 Inactive

aptend approved these changes Jun 9, 2026

View reviewed changes

iamlinjunhong temporarily deployed to ci June 10, 2026 02:22 — with GitHub Actions Inactive

iamlinjunhong added 3 commits June 10, 2026 15:55

fix(lockservice): fence stale binds by allocator epoch

732f2b5

test(lockservice): cover keepalive stale bind purge

d586e80

fix(lockservice): fence allocator replacements by instance id

542ae65

iamlinjunhong force-pushed the m-24896 branch from c42eef7 to 542ae65 Compare June 10, 2026 07:55

iamlinjunhong temporarily deployed to ci June 10, 2026 07:56 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 07:56 — with GitHub Actions Error

iamlinjunhong temporarily deployed to ci June 10, 2026 07:56 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 07:56 — with GitHub Actions Failure

iamlinjunhong temporarily deployed to ci June 10, 2026 07:56 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 07:56 — with GitHub Actions Error

fix(lockservice): remove unused allocator observer helper

eb7eded

iamlinjunhong temporarily deployed to ci June 10, 2026 08:42 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 08:42 — with GitHub Actions Error

iamlinjunhong temporarily deployed to ci June 10, 2026 08:42 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 08:42 — with GitHub Actions Error

iamlinjunhong temporarily deployed to ci June 10, 2026 08:42 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci June 10, 2026 08:42 — with GitHub Actions Error

iamlinjunhong temporarily deployed to ci June 10, 2026 08:42 — with GitHub Actions Inactive

iamlinjunhong added 2 commits June 10, 2026 17:16

fix(lockservice): keep remote lock bookkeeping on timeout

b80d70c

fix(lockservice): fence keepalive purge and reject stale allocator

3e1f7b5

XuPeng-SH requested changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lockservice): fence stale binds by allocator epoch#24906

fix(lockservice): fence stale binds by allocator epoch#24906
iamlinjunhong wants to merge 6 commits into
matrixorigin:mainfrom
iamlinjunhong:m-24896

iamlinjunhong commented Jun 9, 2026

Uh oh!

qodo-code-review Bot commented Jun 9, 2026

Uh oh!

aptend left a comment

Uh oh!

XuPeng-SH left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iamlinjunhong commented Jun 9, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

qodo-code-review Bot commented Jun 9, 2026

Qodo reviews are paused for this user.

Uh oh!

aptend left a comment

Choose a reason for hiding this comment

Uh oh!

XuPeng-SH left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants