Skip to content

fix: always pass an explicit HTTP agent to avoid Node's 5s idle timeout (cut 1.1.110)#1344

Merged
Martin Torp (mtorp) merged 3 commits into
v1.xfrom
martin/fix-sdk-global-agent-idle-timeout
May 29, 2026
Merged

fix: always pass an explicit HTTP agent to avoid Node's 5s idle timeout (cut 1.1.110)#1344
Martin Torp (mtorp) merged 3 commits into
v1.xfrom
martin/fix-sdk-global-agent-idle-timeout

Conversation

@mtorp
Copy link
Copy Markdown
Contributor

@mtorp Martin Torp (mtorp) commented May 29, 2026

Problem

upload-manifest-files (used by socket scan reach and socket fix) intermittently fails for some enterprise customers: the request is torn down at almost exactly 5 seconds with no response, which the API load balancer logs as client_disconnected_before_any_response. A server-side heartbeat shipped earlier only partially helped.

Root cause

Node ≥19's global HTTP/HTTPS agent ships { keepAlive: true, timeout: 5000 }. Node applies that timeout as a per-socket inactivity timeout. The CLI made requests without an explicit agent on the common path, so they inherited the global agent and its 5s timeout — even when SOCKET_CLI_API_TIMEOUT is unset (the 5s comes entirely from Node, not from the CLI/SDK).

For the upload, the multipart body is sent Transfer-Encoding: chunked. When the server takes >5s to handle auth + multipart parsing before sending any response byte, the socket sits idle, Node fires the 5s 'timeout', and the request is destroyed — the client disconnects before any response is received.

This explains every observed signal: ~5.0s latency, Node-CLI-only (Python requests sends Content-Length, not chunked, and has no analogous default idle timeout), and before_any_response (the timeout fires during the pre-handler stall, before the heartbeat can write a byte).

Reproduced locally

Driving the real SDK against a slow mock HTTP server with no load balancer in the path reproduced the teardown at exactly 5.0s. A trace of Socket.prototype.setTimeout showed setTimeout(5000) originating from Node's Agent.createConnection (the global agent), with the CLI/SDK passing timeout=undefined. An explicit agent with no timeout completes an 8s-stall upload with 200 OK.

Fix

Both of the CLI's HTTP stacks now always pass an explicit Agent (which carries no timeout), so requests are bounded only by an explicit SOCKET_CLI_API_TIMEOUT or until interrupted — restoring the CLI's documented "no timeout unless configured" intent:

  1. SDK pathsetupSdk (src/utils/sdk.mts) supplies an explicit Agent (by protocol) on the no-proxy/no-CA path. Because the SocketSdk builds its request options once, every SDK call is covered (uploadManifestFiles, getOrganizations, createOrgFullScan, getOrgFullScan, searchDependencies, batchPackageStream, …), not just the upload.
  2. Raw apiFetch pathgetHttpsAgent (src/utils/api.mts) now always returns an explicit HttpsAgent instead of undefined when no CA cert is configured. This covers queryApiSafeText/queryApiSafeJson, sendApiRequest, and the direct apiFetch download paths (streaming full-scan responses, binary/tarball downloads) — which had the identical bug and contradicted the file's own "no body timeout" comment.

Proxy and SSL_CERT_FILE paths are unchanged (both already used explicit agents with no such timeout).

Compatibility

  • Explicit new HttpsAgent() defaults to keepAlive: false (pre-Node-19 behavior); fine for a short-lived CLI.
  • Removing the idle timeout means a true black-hole connection hangs until interrupted (the pre-Node-19 status quo); users can bound it with SOCKET_CLI_API_TIMEOUT.
  • Zero server-side changes required.

Tests

  • sdk.test.mts: updated the obsolete "no agent by default" assertion and added a regression test asserting setupSdk always passes an explicit agent with no timeout.
  • api.test.mts: updated the obsolete "no agent / agent: undefined" assertion and now asserts apiFetch uses an explicit agent with no timeout.
  • test:unit src/utils/api.test.mts src/utils/sdk.test.mts36 passed; check:tsc and check:lint green.

Release

  • Bumped package.json to 1.1.110.
  • Added a ## [1.1.110] - 2026-05-29 Fixed entry to CHANGELOG.md.

Durable follow-up (separate, not this PR)

The same Node global-agent default would bite any other consumer of @socketsecurity/sdk. Worth fixing the agent default in the SDK itself so non-CLI consumers are covered too.


Note

Medium Risk
Changes the default HTTP agent for all CLI API traffic (SDK and raw fetch), which affects connection lifecycle globally; behavior is intentional and covered by regression tests, but any subtle networking edge cases would surface across many commands.

Overview
Fixes intermittent ~5 second client disconnects on manifest uploads (socket scan reach, socket fix) and other slow or streaming API calls by ensuring every outbound HTTP path uses an explicit Node Agent, not Node ≥19’s default global agent (which applies a 5s per-socket inactivity timeout even when SOCKET_CLI_API_TIMEOUT is unset).

setupSdk (sdk.mts) now always passes an agent on the default no-proxy path (HttpAgent / HttpsAgent by base URL). apiFetch (api.mts) always returns an explicit HttpsAgent from getHttpsAgent() instead of undefined, so direct https.request traffic no longer inherits the global timeout.

Unit tests were updated and regression cases added to assert an explicit agent with no timeout option. Release 1.1.110 with changelog entry.

Reviewed by Cursor Bugbot for commit 1f13e3b. Configure here.

@mtorp Martin Torp (mtorp) changed the title fix(sdk): always pass an explicit HTTP agent to avoid Node's 5s idle timeout fix: always pass an explicit HTTP agent to avoid Node's 5s idle timeout (cut 1.1.109) May 29, 2026
…timeout

Node >=19's global HTTP/HTTPS agent enables keepAlive with a 5s socket
timeout, which Node applies as a per-socket inactivity timeout. setupSdk
only supplied an explicit agent for the proxy and SSL_CERT_FILE cases, so
the common path inherited the global agent's 5s timeout even when
SOCKET_CLI_API_TIMEOUT is unset.

This caused upload-manifest-files to fail intermittently: the SDK streams
the multipart body with Transfer-Encoding: chunked, and when the server
takes >5s to parse auth/multipart before sending any response byte, the
socket goes idle, Node fires the 5s timeout, and the SDK destroys the
request, so the client disconnects before receiving any response.

Always pass a fresh Agent (no timeout) so a request is bounded only by an
explicit SOCKET_CLI_API_TIMEOUT or until interrupted. Reproduced locally
against a slow mock server with no load balancer in the path.
apiFetch's https.request used the default (global) agent when no CA cert
was configured, inheriting Node >=19's keepAlive 5s socket timeout — the
same issue just fixed for the SDK. getHttpsAgent now always returns an
explicit HttpsAgent (no timeout), covering queryApiSafe*/sendApiRequest
and the direct apiFetch download paths (streaming full-scan responses,
binary and tarball downloads).

Bumps the version to 1.1.110 and adds the changelog entry.
getHttpsAgent now always creates an agent on first call, so its return
type is HttpsAgent (was HttpsAgent | undefined) and the _httpsRequestFetch
agent parameter drops | undefined. The cached _httpsAgent keeps | undefined
since it is the lazy-init sentinel (undefined only before the first call).
The _httpsAgentResolved flag is removed: a set _httpsAgent is itself the
"resolved" signal. Pure polish from review; no behavior change.
@mtorp Martin Torp (mtorp) force-pushed the martin/fix-sdk-global-agent-idle-timeout branch from 29f49ab to 1f13e3b Compare May 29, 2026 07:26
@mtorp Martin Torp (mtorp) changed the title fix: always pass an explicit HTTP agent to avoid Node's 5s idle timeout (cut 1.1.109) fix: always pass an explicit HTTP agent to avoid Node's 5s idle timeout (cut 1.1.110) May 29, 2026
@mtorp Martin Torp (mtorp) marked this pull request as ready for review May 29, 2026 07:28
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@mtorp Martin Torp (mtorp) merged commit 20688fc into v1.x May 29, 2026
13 checks passed
@mtorp Martin Torp (mtorp) deleted the martin/fix-sdk-global-agent-idle-timeout branch May 29, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants