FIX: Release GIL around teardown SQLFreeHandle/SQLFreeStmt (#565)#604
Draft
saurabh500 wants to merge 2 commits into
Draft
FIX: Release GIL around teardown SQLFreeHandle/SQLFreeStmt (#565)#604saurabh500 wants to merge 2 commits into
saurabh500 wants to merge 2 commits into
Conversation
The 1.7.x fix released the GIL around SQLDriverConnect, SQLSetConnectAttr, SQLEndTran and SQLDisconnect, which fixed the plain-connect deadlock through in-process Python TCP forwarders (paramiko + sshtunnel). However, when a parametrized SELECT leaves an open server-side cursor, the connection teardown cascade still blocks: SqlHandle::free() calls SQLFreeHandle on the STMT handle, which sends a SQL_CLOSE-equivalent network packet to the server. With the GIL held, the forwarder thread cannot run sock.sendall and the call deadlocks forever. This change releases the GIL around the three remaining teardown calls that may perform network I/O: * SqlHandle::free() -> SQLFreeHandle(STMT/DBC) * SqlHandle::close_cursor() -> SQLFreeStmt(SQL_CLOSE) * SQLFreeHandle_wrap() -> SQLFreeHandle Each release is gated on PyGILState_Check() AND !is_python_finalizing(), because these paths can also be reached from interpreter-shutdown cleanup where gil_scoped_release would crash with Fatal Python error: PyEval_SaveThread: ... the GIL is released Adds a second regression test (test_param_query_close_through_python_tcp_forwarder_does_not_deadlock) that reproduces the latest @jschuba scenario via an in-process TCP forwarder + parametrized SELECTs + conn.close() under a hard watchdog. Verified the new test hangs (watchdog fires at 30s) without the C++ change and passes (completes in <0.5s) with it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/ddbc_bindings.cppLines 1402-1411 1402 // interpreter is not finalizing - gil_scoped_release is unsafe during
1403 // shutdown even if PyGILState_Check() reports the GIL as held.
1404 if (!pythonShuttingDown && PyGILState_Check()) {
1405 py::gil_scoped_release release;
! 1406 SQLFreeHandle_ptr(_type, _handle);
! 1407 } else {
1408 SQLFreeHandle_ptr(_type, _handle);
1409 }
1410 _handle = nullptr;
1411 }📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 76.2%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.pybind.connection.connection.cpp: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%🔗 Quick Links
|
Audit follow-up to the previous teardown fix. Identified one additional ODBC call site that performs a server round-trip while holding the GIL, which deadlocks when the connection is routed through an in-process Python TCP forwarder (the same #565-family pattern). Fixed call site: * BindParameters() -> SQLDescribeParam Issued by the driver as sp_describe_undeclared_parameters when a parameter has SQL_UNKNOWN_TYPE - hit on every parametrized query that includes a None value. Earlier drafts of this commit also wrapped Connection::isAlive(), Connection::getInfo() and Connection::getAutocommit(), but those were removed after empirical verification: SQL_ATTR_CONNECTION_DEAD and SQL_ATTR_AUTOCOMMIT GETs are driver-cached client-side state per the ODBC spec (MSDN explicitly states SQL_ATTR_CONNECTION_DEAD 'does not query the data source for activity'), and SQLGetInfo for the standard info types (SQL_DBMS_NAME, SQL_DATABASE_NAME, ...) returns data the driver cached at login time. None of these are network calls. Reached only from a Python-callable wrapper with the GIL held, so a plain py::gil_scoped_release is sufficient (no is_python_finalizing guard needed - that's only required for the destructor cascade fixed in the prior commit). Adds test_param_describe_through_python_tcp_forwarder_does_not_deadlock which runs a parametrized SELECT with a None argument through the in-process Python TCP forwarder under a 30s watchdog. Verified: the test hangs (watchdog fires at 30s) without the SQLDescribeParam GIL release and passes (<1s) with it. Full pytest suite: 1767 passed, 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
a43f44c to
898bd29
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
GitHub Issue: #565
Summary
Fixes two distinct GIL-held ODBC network round-trips that deadlock when the
connection is routed through an in-process Python TCP forwarder (the
SSH-tunnel-style scenario reported on issue #565). When the GIL is held
across a blocking network syscall, the Python-implemented forwarder thread
cannot acquire the GIL to pump bytes, and the whole interpreter wedges.
Two commits, two independent deadlock sites:
1. Teardown path (
9a3b84c6)SqlHandle::free(),SqlHandle::close_cursor()andSQLFreeHandle_wrap()call
SQLFreeHandle/SQLFreeStmt(SQL_CLOSE)while holding the GIL.For a statement with an open server-side cursor, both of those issue a
network round-trip to close it. Hit on every
conn.close()/cursor.close()after a parametrized query — matches jschuba's latest reprocomment exactly.
Fix: release the GIL around those three calls. Because two of the three
sites are reachable from C++ destructors (which can run during interpreter
finalization), the release is guarded by
!is_python_finalizing() && PyGILState_Check()to avoidPyEval_SaveThreadaborts during shutdown.2. Parametrized-query introspection path (
898bd29a)BindParameters()callsSQLDescribeParamwhile holding the GIL whenevera parameter has
SQL_UNKNOWN_TYPE(e.g.Noneargument). The MS ODBCdriver implements this as a
sp_describe_undeclared_parametersserverround-trip, so it deadlocks the forwarder thread on every parametrized
query that includes a
Nonevalue.Fix: release the GIL around the
SQLDescribeParamcall.Note on scope: an earlier draft of this audit also wrapped
Connection::isAlive(),Connection::getInfo()andConnection::getAutocommit(). Those were dropped after empiricalverification —
SQL_ATTR_CONNECTION_DEADandSQL_ATTR_AUTOCOMMITGETsare driver-cached client-side state per the ODBC spec (MSDN explicitly
states
SQL_ATTR_CONNECTION_DEAD"does not query the data source foractivity"), and
SQLGetInfofor the standard info types returns data thedriver cached at login time. None of those are network calls, so the
GIL release was unnecessary.
Test coverage
Two new tests in
tests/test_023_ssh_tunnel_gil_release.py, both usingthe existing in-process TCP forwarder + 30s watchdog pattern:
test_param_query_close_through_python_tcp_forwarder_does_not_deadlockreproduces jschuba's repro: parametrized SELECT + fetch +
conn.close().test_param_describe_through_python_tcp_forwarder_does_not_deadlockexercises a parametrized SELECT with a
Noneargument (forcesSQLDescribeParam), plusConnection.getinfo(SQL_DBMS_NAME)andconn.autocommitas local sanity probes.Both tests reliably hang to the 30s watchdog timeout without the C++
changes and pass in <1s with them. Full pytest suite: 1767 passed,
58 skipped, 0 failed (~114s).