Extract pool blocking-period error state into BlockingPeriodErrorState by mdaigle · Pull Request #4395 · dotnet/SqlClient

mdaigle · 2026-06-23T17:24:37Z

Summary

This is Part 1 of 3 splitting #4376 ("Dev/mdaigle/pool rate limit") into stacked, reviewable PRs.

This PR extracts the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class. This keeps the pool's connection-acquisition path focused on capacity/queue concerns. Part 2 will use the new class to implement blocking period support in the ChannelDbConnectionPool.

The pool blocking period follows this state machine:

The pool calls Enter() any time blocking is enabled and an error occurs while opening a connection. This puts the pool in the blocked state and blocks future requests from attempting an open until the period ends. Subsequent errors double the blocking period (up to a 60s max). Successful connections reset the error state and blocking period.

To show that nothing changed for the existing pool, I added new unit tests and applied them selectively in a separate branch. If these pass, then the refactors have not impacted behavior: #4411

Changes

New BlockingPeriodErrorState — encapsulates cached-exception fast-fail, exponential backoff (5s → 60s cap), exit timer, and synchronization. Takes an injectable TimeProvider so timer scheduling is deterministic in tests.
WaitHandleDbConnectionPool — refactored to use BlockingPeriodErrorState instead of inline error-state fields/logic.
DbConnectionPoolGroup.IsBlockingPeriodEnabled() — new helper centralizing the PoolBlockingPeriod decision (Auto → not an Azure SQL endpoint, AlwaysBlock → true, NeverBlock → false). Consumed by Part 2.
ADP.UnsafeCreateTimer(TimeProvider, ...) — new overload returning ITimer that suppresses ExecutionContext flow while honoring the injected TimeProvider; required by BlockingPeriodErrorState.
BlockingPeriodErrorStateTest — unit tests using FakeTimeProvider (29 tests).

Stacking

Part 1 (this PR) → main
Part 2 (channel-pool rate limiting) → this branch
Part 3 (GitHub instructions) → Part 2

Checklist

Tests added (29 passing via FakeTimeProvider)
No public API changes
No breaking changes
Verified against customer repro (n/a — internal refactor)

Move the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class with exponential backoff (5s..60s), cached-exception fast-fail, and an injectable TimeProvider for deterministic tests. Add DbConnectionPoolGroup.IsBlockingPeriodEnabled() and the ADP.UnsafeCreateTimer(TimeProvider,...) overload it depends on. Includes BlockingPeriodErrorStateTest.

Copilot

Pull request overview

Refactors the legacy wait-handle connection pool’s blocking-period (error backoff) logic into a dedicated BlockingPeriodErrorState component, adds a pool-group helper for determining when blocking is enabled, and introduces a TimeProvider-based timer factory to enable deterministic unit testing.

Changes:

Added BlockingPeriodErrorState (cached exception + exponential backoff + exit timer) and corresponding unit tests using FakeTimeProvider.
Refactored WaitHandleDbConnectionPool to delegate blocking-period logic to the new state object and moved the blocking-period enablement decision into DbConnectionPoolGroup.
Added an ADP.UnsafeCreateTimer(TimeProvider, ...) overload returning ITimer to support TimeProvider-driven timers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/BlockingPeriodErrorStateTest.cs	Adds comprehensive unit tests for the new blocking-period state logic using `FakeTimeProvider`.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs	Replaces inlined error/backoff fields and timer logic with `BlockingPeriodErrorState`.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/DbConnectionPoolGroup.cs	Centralizes the `PoolBlockingPeriod` decision into `IsBlockingPeriodEnabled()`.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/BlockingPeriodErrorState.cs	Introduces the new reusable blocking-period error state implementation.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs	Adds a `TimeProvider`-aware `UnsafeCreateTimer` overload returning `ITimer`.

+                newTimer = ADP.UnsafeCreateTimer(
+                    _timeProvider,
+                    ExitCallback,
+                    null,
+                    Timeout.InfiniteTimeSpan,


+using System;
+using System.Threading;
+using Microsoft.Extensions.Time.Testing;
+using Microsoft.Data.SqlClient.ConnectionPool;
+using Xunit;


codecov · 2026-06-23T20:14:37Z

Codecov Report

❌ Patch coverage is 95.90164% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.01%. Comparing base (44e1b72) to head (5a4ae39).
⚠️ Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
...SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs	75.00%	2 Missing ⚠️
...lient/ConnectionPool/WaitHandleDbConnectionPool.cs	88.23%	2 Missing ⚠️
...lClient/ConnectionPool/BlockingPeriodErrorState.cs	98.87%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4395      +/-   ##
==========================================
- Coverage   65.32%   64.01%   -1.32%     
==========================================
  Files         285      282       -3     
  Lines       43373    66702   +23329     
==========================================
+ Hits        28335    42698   +14363     
- Misses      15038    24004    +8966

Flag	Coverage Δ
CI-SqlClient	`?`
PR-SqlClient-Project	`64.01% <95.90%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mdaigle · 2026-06-23T17:43:51Z

+            // so callers can inject a test double for deterministic scheduling.
+            if (ExecutionContext.IsFlowSuppressed())
+            {
+                return timeProvider.CreateTimer(callback, state, dueTime, period);


Creating a timer via a TimeProvider allows for faking time in unit tests. I chose not to change existing overloads to feed into this one because there are many spots already use them and it would be a more extensive refactor.

mdaigle · 2026-06-23T18:01:58Z

            // instead obtained creation mutex

            DbConnectionInternal obj = null;
-            if (ErrorOccurred)


There was a check/act race condition here. The new class uses a local variable to check and throw.

mdaigle · 2026-06-23T18:03:26Z

                                SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Errors are set.", Id);
                                Interlocked.Decrement(ref _waitCount);
-                                throw TryCloneCachedException();
+                                _errorState.ThrowIfActive();


There is a chance this may not throw, in which case, loop back around to check available wait handles.

mdaigle · 2026-06-23T18:07:22Z


-                // Reset the error wait:
-                _errorWait = ERROR_WAIT_DEFAULT;
+                // A successful creation clears any prior error state and resets backoff.


This behavior was missing before. If we somehow successfully create a connection while in the blocking period (via some race condition with checking the error state), then we should reset the whole error state including the ManualResetEvent and the cached exception, not just the wait period. There's no need to wait until the timer fires to allow more creates if we know we're succeeding now.

mdaigle · 2026-06-23T18:26:37Z

                newObj = null; // set to null, so we do not return bad new object

-                // Failed to create instance
-                _resError = e;


Lots of race conditions possible here. I decided to move this all under a lock so that all of the relevant values are updated atomically.

mdaigle · 2026-06-23T20:43:22Z

+
+            // Clone SqlExceptions so stack traces are not shared across callers; other
+            // exception types are rethrown as-is.
+            throw cached is SqlException sqlEx ? sqlEx.InternalClone() : cached;


This was the existing cloning logic

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

paulmedynski · 2026-06-24T13:11:32Z

+        internal static ITimer UnsafeCreateTimer(
+            TimeProvider timeProvider,
+            TimerCallback callback,
+            object state,
+            TimeSpan dueTime,
+            TimeSpan period)


Agreed - let's put this new code inside a nullable block.

paulmedynski · 2026-06-24T17:52:44Z

+                    _timeProvider,
+                    ExitCallback,
+                    null,
+                    wait,
+                    Timeout.InfiniteTimeSpan);


Let's just make it impossible for our ExitCallback() to fail to complete its mission (i.e. swallow failing Invoke and Dispose calls).

Copilot is right here. The most important thing that the ExitCallback does is run the onExit action. At the same time, I don't know if we can really be responsible for gracefully handling aborted threads. We have no way to guarantee consistency in those situations.

I'll swallow any exception in the callback since that runs in a threadpool thread. But other exceptions should bubble up.

+        /// <summary>
+        /// Verifies that the onEnter callback is invoked after the internal lock is released.
+        /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls
+        /// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only
+        /// when the lock is not held. If the implementation were changed to hold the lock during
+        /// the callback invocation, any re-entrant call from the callback that tries to acquire the
+        /// same lock (on a non-re-entrant lock) would deadlock.
+        /// </summary>
+        [Fact]
+        public void OnEnter_CalledOutsideLock_CanCallBackIntoState()
+        {


+        /// <summary>
+        /// Verifies that the onExit callback is invoked after the internal lock is released.
+        /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it
+        /// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/>
+        /// without deadlocking. If the implementation were changed to hold the lock during the
+        /// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock.
+        /// </summary>
+        [Fact]
+        public void OnExit_CalledOutsideLock_CanCallBackIntoState()
+        {
+            // Arrange


paulmedynski · 2026-06-24T13:36:25Z

+        /// error state so that any waiting callers do not observe a stale exception after
+        /// the owning pool is torn down.
+        /// </summary>
+        public void Dispose()


I like to put Dispose() (and the finalizer if it exists) beside the constructors since it's essentially the opposite operation, and it's nice to be able to see both together. And I try to remember to use a #region Construction. Just a style suggestion.

paulmedynski · 2026-06-24T17:52:44Z

+                    _timeProvider,
+                    ExitCallback,
+                    null,
+                    wait,
+                    Timeout.InfiniteTimeSpan);


Let's just make it impossible for our ExitCallback() to fail to complete its mission (i.e. swallow failing Invoke and Dispose calls).

paulmedynski · 2026-06-24T17:53:17Z

+                // (onEnter/onExit) can never diverge from the internal state transitions under
+                // concurrent Enter/Clear/exit-timer activity. The callbacks are expected to be
+                // cheap, non-reentrant operations.
+                _onEnter?.Invoke();


Guard against misbehaving callbacks and Dispose() with try/catch that swallows. Same for Clear(), ExitCallback(), and our Dispose().

paulmedynski · 2026-06-24T18:00:49Z

+        /// Clears the cached error state, disposes the exit timer, and resets the backoff to
+        /// its initial value.
+        /// </summary>
+        internal void Clear()


There's definitely some overlap between Clear, ExitCallback, and Dispose. I'm having trouble understanding why they aren't all the same. It seems like _cachedException, _inElevatedState, _exitTimer, and _nextWait could all live/die together in a helper class.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

+                // (onEnter/onExit) can never diverge from the internal state transitions under
+                // concurrent Enter/Clear/exit-timer activity. The callbacks are expected to be
+                // cheap, non-reentrant operations.
+                _onEnter?.Invoke();


+            lock (_lock)
+            {
+                _cachedException = null;
+                _nextWait = InitialWait;
+                _inElevatedState = false;
+                oldTimer = _exitTimer;
+                _exitTimer = null;
+
+                // Signal and trace under the lock so the exit signal is ordered consistently
+                // with the state transition relative to concurrent Enter/exit-timer callbacks.
+                _onExit?.Invoke();
+            }


+        /// <summary>
+        /// Creates an <see cref="ITimer"/> using the supplied <see cref="TimeProvider"/> without
+        /// capturing the current <see cref="ExecutionContext"/>. This overload accepts a
+        /// parameterless <see cref="Action"/> for callbacks that do not require state.
+        /// </summary>
+        /// <param name="timeProvider">The time provider used to create the timer.</param>
+        /// <param name="callback">The parameterless delegate invoked when the timer fires.</param>
+        /// <param name="state"></param>
+        /// <param name="dueTime">The amount of time to wait before the first invocation, or
+        /// <see cref="Timeout.InfiniteTimeSpan"/> to create the timer disarmed.</param>
+        /// <param name="period">The interval between invocations, or
+        /// <see cref="Timeout.InfiniteTimeSpan"/> to disable periodic signaling.</param>
+        /// <returns>An <see cref="ITimer"/> created by <paramref name="timeProvider"/>.</returns>


+        /// <summary>
+        /// Verifies that the onEnter callback is invoked after the internal lock is released.
+        /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls
+        /// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only
+        /// when the lock is not held. If the implementation were changed to hold the lock during
+        /// the callback invocation, any re-entrant call from the callback that tries to acquire the
+        /// same lock (on a non-re-entrant lock) would deadlock.
+        /// </summary>


+        /// <summary>
+        /// Verifies that the onExit callback is invoked after the internal lock is released.
+        /// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it
+        /// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/>
+        /// without deadlocking. If the implementation were changed to hold the lock during the
+        /// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock.
+        /// </summary>


paulmedynski

Latest changes look good. Waiting for further offline discussion on my earlier comments.

paulmedynski · 2026-06-26T12:36:57Z

        /// <param name="period">The interval between invocations, or
        /// <see cref="Timeout.InfiniteTimeSpan"/> to disable periodic signaling.</param>
        /// <returns>An <see cref="ITimer"/> created by <paramref name="timeProvider"/>.</returns>
-        // TODO: Route the other UnsafeCreateTimer overloads through this method (passing


We can likely route all other UnsafeCreateTimer methods through this one (and probably remove some of the overloads due to type inference). But let's save that for another PR to avoid cluttering this one up.

paulmedynski · 2026-06-26T12:41:01Z

 dotnet_diagnostic.xUnit1030.severity=none
+
+# Disables warning for unnamed enum case values
+# e.g. providing an invalid int value for an enum that wraps int


Is this something we only want suppressed in tests? When would legit driver code want to fake an enum value? I suppose we may have some existing debt in this area.

Copilot AI review requested due to automatic review settings June 23, 2026 17:24

github-project-automation Bot added this to SqlClient Board Jun 23, 2026

github-project-automation Bot moved this to To triage in SqlClient Board Jun 23, 2026

Copilot started reviewing on behalf of mdaigle June 23, 2026 17:25 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

This was referenced Jun 23, 2026

Add connection-creation rate limiting to ChannelDbConnectionPool #4396

Draft

Document pool rate limiting in Copilot instructions #4397

Draft

mdaigle added 2 commits June 23, 2026 10:53

Add comments

2a4a1f4

Improve locking

3517c19

mdaigle commented Jun 23, 2026

View reviewed changes

mdaigle marked this pull request as ready for review June 23, 2026 20:45

Copilot AI review requested due to automatic review settings June 23, 2026 20:45

mdaigle requested a review from a team as a code owner June 23, 2026 20:45

mdaigle added this to the 7.1.0-preview2 milestone Jun 23, 2026

github-actions Bot assigned paulmedynski and priyankatiwari08 Jun 23, 2026

Copilot started reviewing on behalf of mdaigle June 23, 2026 20:46 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

paulmedynski requested changes Jun 24, 2026

View reviewed changes

github-project-automation Bot moved this from To triage to Waiting for customer in SqlClient Board Jun 24, 2026

mdaigle added 2 commits June 24, 2026 16:53

First round of review changes.

b3de8e7

Require exhaustive switch statements.

5a4ae39

Copilot AI review requested due to automatic review settings June 25, 2026 23:24

Copilot started reviewing on behalf of mdaigle June 25, 2026 23:24 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

paulmedynski reviewed Jun 26, 2026

View reviewed changes

Add wait handle pool unit tests for blocking period.

6ba4b82

mdaigle mentioned this pull request Jun 26, 2026

Dev/automation/pool blocking tests on main #4411

Draft

Uh oh!

Conversation

mdaigle commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Stacking

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

paulmedynski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mdaigle commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading