Extract pool blocking-period error state into BlockingPeriodErrorState#4395
Extract pool blocking-period error state into BlockingPeriodErrorState#4395mdaigle wants to merge 6 commits into
Conversation
Move the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class with exponential backoff (5s..60s), cached-exception fast-fail, and an injectable TimeProvider for deterministic tests. Add DbConnectionPoolGroup.IsBlockingPeriodEnabled() and the ADP.UnsafeCreateTimer(TimeProvider,...) overload it depends on. Includes BlockingPeriodErrorStateTest.
There was a problem hiding this comment.
Pull request overview
Refactors the legacy wait-handle connection pool’s blocking-period (error backoff) logic into a dedicated BlockingPeriodErrorState component, adds a pool-group helper for determining when blocking is enabled, and introduces a TimeProvider-based timer factory to enable deterministic unit testing.
Changes:
- Added
BlockingPeriodErrorState(cached exception + exponential backoff + exit timer) and corresponding unit tests usingFakeTimeProvider. - Refactored
WaitHandleDbConnectionPoolto delegate blocking-period logic to the new state object and moved the blocking-period enablement decision intoDbConnectionPoolGroup. - Added an
ADP.UnsafeCreateTimer(TimeProvider, ...)overload returningITimerto supportTimeProvider-driven timers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/BlockingPeriodErrorStateTest.cs | Adds comprehensive unit tests for the new blocking-period state logic using FakeTimeProvider. |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs | Replaces inlined error/backoff fields and timer logic with BlockingPeriodErrorState. |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/DbConnectionPoolGroup.cs | Centralizes the PoolBlockingPeriod decision into IsBlockingPeriodEnabled(). |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/BlockingPeriodErrorState.cs | Introduces the new reusable blocking-period error state implementation. |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs | Adds a TimeProvider-aware UnsafeCreateTimer overload returning ITimer. |
| newTimer = ADP.UnsafeCreateTimer( | ||
| _timeProvider, | ||
| ExitCallback, | ||
| null, | ||
| Timeout.InfiniteTimeSpan, |
| using System; | ||
| using System.Threading; | ||
| using Microsoft.Extensions.Time.Testing; | ||
| using Microsoft.Data.SqlClient.ConnectionPool; | ||
| using Xunit; |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4395 +/- ##
==========================================
- Coverage 65.32% 64.01% -1.32%
==========================================
Files 285 282 -3
Lines 43373 66702 +23329
==========================================
+ Hits 28335 42698 +14363
- Misses 15038 24004 +8966
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| // so callers can inject a test double for deterministic scheduling. | ||
| if (ExecutionContext.IsFlowSuppressed()) | ||
| { | ||
| return timeProvider.CreateTimer(callback, state, dueTime, period); |
There was a problem hiding this comment.
Creating a timer via a TimeProvider allows for faking time in unit tests. I chose not to change existing overloads to feed into this one because there are many spots already use them and it would be a more extensive refactor.
| // instead obtained creation mutex | ||
|
|
||
| DbConnectionInternal obj = null; | ||
| if (ErrorOccurred) |
There was a problem hiding this comment.
There was a check/act race condition here. The new class uses a local variable to check and throw.
| SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Errors are set.", Id); | ||
| Interlocked.Decrement(ref _waitCount); | ||
| throw TryCloneCachedException(); | ||
| _errorState.ThrowIfActive(); |
There was a problem hiding this comment.
There is a chance this may not throw, in which case, loop back around to check available wait handles.
|
|
||
| // Reset the error wait: | ||
| _errorWait = ERROR_WAIT_DEFAULT; | ||
| // A successful creation clears any prior error state and resets backoff. |
There was a problem hiding this comment.
This behavior was missing before. If we somehow successfully create a connection while in the blocking period (via some race condition with checking the error state), then we should reset the whole error state including the ManualResetEvent and the cached exception, not just the wait period. There's no need to wait until the timer fires to allow more creates if we know we're succeeding now.
| newObj = null; // set to null, so we do not return bad new object | ||
|
|
||
| // Failed to create instance | ||
| _resError = e; |
There was a problem hiding this comment.
Lots of race conditions possible here. I decided to move this all under a lock so that all of the relevant values are updated atomically.
|
|
||
| // Clone SqlExceptions so stack traces are not shared across callers; other | ||
| // exception types are rethrown as-is. | ||
| throw cached is SqlException sqlEx ? sqlEx.InternalClone() : cached; |
There was a problem hiding this comment.
This was the existing cloning logic
| internal static ITimer UnsafeCreateTimer( | ||
| TimeProvider timeProvider, | ||
| TimerCallback callback, | ||
| object state, | ||
| TimeSpan dueTime, | ||
| TimeSpan period) |
There was a problem hiding this comment.
Agreed - let's put this new code inside a nullable block.
| _timeProvider, | ||
| ExitCallback, | ||
| null, | ||
| wait, | ||
| Timeout.InfiniteTimeSpan); |
There was a problem hiding this comment.
Let's just make it impossible for our ExitCallback() to fail to complete its mission (i.e. swallow failing Invoke and Dispose calls).
There was a problem hiding this comment.
Copilot is right here. The most important thing that the ExitCallback does is run the onExit action. At the same time, I don't know if we can really be responsible for gracefully handling aborted threads. We have no way to guarantee consistency in those situations.
There was a problem hiding this comment.
I'll swallow any exception in the callback since that runs in a threadpool thread. But other exceptions should bubble up.
| /// <summary> | ||
| /// Verifies that the onEnter callback is invoked after the internal lock is released. | ||
| /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls | ||
| /// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only | ||
| /// when the lock is not held. If the implementation were changed to hold the lock during | ||
| /// the callback invocation, any re-entrant call from the callback that tries to acquire the | ||
| /// same lock (on a non-re-entrant lock) would deadlock. | ||
| /// </summary> | ||
| [Fact] | ||
| public void OnEnter_CalledOutsideLock_CanCallBackIntoState() | ||
| { |
| /// <summary> | ||
| /// Verifies that the onExit callback is invoked after the internal lock is released. | ||
| /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it | ||
| /// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/> | ||
| /// without deadlocking. If the implementation were changed to hold the lock during the | ||
| /// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock. | ||
| /// </summary> | ||
| [Fact] | ||
| public void OnExit_CalledOutsideLock_CanCallBackIntoState() | ||
| { | ||
| // Arrange |
| /// error state so that any waiting callers do not observe a stale exception after | ||
| /// the owning pool is torn down. | ||
| /// </summary> | ||
| public void Dispose() |
There was a problem hiding this comment.
I like to put Dispose() (and the finalizer if it exists) beside the constructors since it's essentially the opposite operation, and it's nice to be able to see both together. And I try to remember to use a #region Construction. Just a style suggestion.
| _timeProvider, | ||
| ExitCallback, | ||
| null, | ||
| wait, | ||
| Timeout.InfiniteTimeSpan); |
There was a problem hiding this comment.
Let's just make it impossible for our ExitCallback() to fail to complete its mission (i.e. swallow failing Invoke and Dispose calls).
| // (onEnter/onExit) can never diverge from the internal state transitions under | ||
| // concurrent Enter/Clear/exit-timer activity. The callbacks are expected to be | ||
| // cheap, non-reentrant operations. | ||
| _onEnter?.Invoke(); |
There was a problem hiding this comment.
Guard against misbehaving callbacks and Dispose() with try/catch that swallows. Same for Clear(), ExitCallback(), and our Dispose().
| /// Clears the cached error state, disposes the exit timer, and resets the backoff to | ||
| /// its initial value. | ||
| /// </summary> | ||
| internal void Clear() |
There was a problem hiding this comment.
There's definitely some overlap between Clear, ExitCallback, and Dispose. I'm having trouble understanding why they aren't all the same. It seems like _cachedException, _inElevatedState, _exitTimer, and _nextWait could all live/die together in a helper class.
| // (onEnter/onExit) can never diverge from the internal state transitions under | ||
| // concurrent Enter/Clear/exit-timer activity. The callbacks are expected to be | ||
| // cheap, non-reentrant operations. | ||
| _onEnter?.Invoke(); |
| lock (_lock) | ||
| { | ||
| _cachedException = null; | ||
| _nextWait = InitialWait; | ||
| _inElevatedState = false; | ||
| oldTimer = _exitTimer; | ||
| _exitTimer = null; | ||
|
|
||
| // Signal and trace under the lock so the exit signal is ordered consistently | ||
| // with the state transition relative to concurrent Enter/exit-timer callbacks. | ||
| _onExit?.Invoke(); | ||
| } |
| /// <summary> | ||
| /// Creates an <see cref="ITimer"/> using the supplied <see cref="TimeProvider"/> without | ||
| /// capturing the current <see cref="ExecutionContext"/>. This overload accepts a | ||
| /// parameterless <see cref="Action"/> for callbacks that do not require state. | ||
| /// </summary> | ||
| /// <param name="timeProvider">The time provider used to create the timer.</param> | ||
| /// <param name="callback">The parameterless delegate invoked when the timer fires.</param> | ||
| /// <param name="state"></param> | ||
| /// <param name="dueTime">The amount of time to wait before the first invocation, or | ||
| /// <see cref="Timeout.InfiniteTimeSpan"/> to create the timer disarmed.</param> | ||
| /// <param name="period">The interval between invocations, or | ||
| /// <see cref="Timeout.InfiniteTimeSpan"/> to disable periodic signaling.</param> | ||
| /// <returns>An <see cref="ITimer"/> created by <paramref name="timeProvider"/>.</returns> |
| /// <summary> | ||
| /// Verifies that the onEnter callback is invoked after the internal lock is released. | ||
| /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls | ||
| /// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only | ||
| /// when the lock is not held. If the implementation were changed to hold the lock during | ||
| /// the callback invocation, any re-entrant call from the callback that tries to acquire the | ||
| /// same lock (on a non-re-entrant lock) would deadlock. | ||
| /// </summary> |
| /// <summary> | ||
| /// Verifies that the onExit callback is invoked after the internal lock is released. | ||
| /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it | ||
| /// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/> | ||
| /// without deadlocking. If the implementation were changed to hold the lock during the | ||
| /// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock. | ||
| /// </summary> |
paulmedynski
left a comment
There was a problem hiding this comment.
Latest changes look good. Waiting for further offline discussion on my earlier comments.
| /// <param name="period">The interval between invocations, or | ||
| /// <see cref="Timeout.InfiniteTimeSpan"/> to disable periodic signaling.</param> | ||
| /// <returns>An <see cref="ITimer"/> created by <paramref name="timeProvider"/>.</returns> | ||
| // TODO: Route the other UnsafeCreateTimer overloads through this method (passing |
There was a problem hiding this comment.
We can likely route all other UnsafeCreateTimer methods through this one (and probably remove some of the overloads due to type inference). But let's save that for another PR to avoid cluttering this one up.
| dotnet_diagnostic.xUnit1030.severity=none | ||
|
|
||
| # Disables warning for unnamed enum case values | ||
| # e.g. providing an invalid int value for an enum that wraps int |
There was a problem hiding this comment.
Is this something we only want suppressed in tests? When would legit driver code want to fake an enum value? I suppose we may have some existing debt in this area.
Summary
This is Part 1 of 3 splitting #4376 ("Dev/mdaigle/pool rate limit") into stacked, reviewable PRs.
This PR extracts the connection pool's blocking-period (error backoff) logic out of
WaitHandleDbConnectionPoolinto a reusable, testableBlockingPeriodErrorStateclass. This keeps the pool's connection-acquisition path focused on capacity/queue concerns. Part 2 will use the new class to implement blocking period support in the ChannelDbConnectionPool.The pool blocking period follows this state machine:

The pool calls
Enter()any time blocking is enabled and an error occurs while opening a connection. This puts the pool in the blocked state and blocks future requests from attempting an open until the period ends. Subsequent errors double the blocking period (up to a 60s max). Successful connections reset the error state and blocking period.To show that nothing changed for the existing pool, I added new unit tests and applied them selectively in a separate branch. If these pass, then the refactors have not impacted behavior: #4411
Changes
BlockingPeriodErrorState— encapsulates cached-exception fast-fail, exponential backoff (5s → 60s cap), exit timer, and synchronization. Takes an injectableTimeProviderso timer scheduling is deterministic in tests.WaitHandleDbConnectionPool— refactored to useBlockingPeriodErrorStateinstead of inline error-state fields/logic.DbConnectionPoolGroup.IsBlockingPeriodEnabled()— new helper centralizing thePoolBlockingPerioddecision (Auto→ not an Azure SQL endpoint,AlwaysBlock→ true,NeverBlock→ false). Consumed by Part 2.ADP.UnsafeCreateTimer(TimeProvider, ...)— new overload returningITimerthat suppressesExecutionContextflow while honoring the injectedTimeProvider; required byBlockingPeriodErrorState.BlockingPeriodErrorStateTest— unit tests usingFakeTimeProvider(29 tests).Stacking
mainChecklist
FakeTimeProvider)