Skip to content

Wait for soroban ledger ingestion at chain head instead of failing fatally on -32600#167

Open
ahmdssi wants to merge 1 commit into
subquery:mainfrom
ahmdssi:fix/wait-soroban-ledger-ingestion
Open

Wait for soroban ledger ingestion at chain head instead of failing fatally on -32600#167
ahmdssi wants to merge 1 commit into
subquery:mainfrom
ahmdssi:fix/wait-soroban-ledger-ingestion

Conversation

@ahmdssi

@ahmdssi ahmdssi commented Jun 12, 2026

Copy link
Copy Markdown

Problem

When an indexer is fully caught up and follows the chain head, @subql/node-stellar can enter a fatal crash-loop:

Maximum number of retries reached
ERROR { code: -32600, message: 'startLedger must be within the ledger range: 62875164 - 62996123' }
Failed to fetch block, waiting for fetched blocks to be processed before shutting down.
Error: Failed to fetch block 62996124.
    at .../node-core/dist/indexer/blockDispatcher/base-block-dispatcher.js:246

We observed this in production on Stellar mainnet behind a hosted RPC provider (QuickNode): the pod crashed and restarted every ~8 minutes for hours. Across 7 consecutive crashes, the fatal block was always exactly range max + 1.

Root cause

The target height comes from Horizon (getFinalizedBlockHeightledgers().order('desc')), while soroban events are fetched from a separate soroban endpoint (sorobanClient.getEvents({ startLedger })). stellar-rpc rejects getEvents with JSON-RPC -32600 when startLedger is greater than the last ledger ingested by the serving backend (get_events.go) — not the network head. With hosted, load-balanced RPCs the backend serving getEvents can lag the Horizon target by a few ledgers for tens of seconds.

StellarApi does not recognize this error (only the legacy 'start is after newest ledger' / 'start is before oldest ledger' messages), so it propagates as an ordinary fetch error: node-core retries 5 times (~20s total) and then terminates the process. The ledger exists and becomes available seconds later — it is a transient condition, not an invalid request.

Note: the stellar-rpc maintainers acknowledge this case as distinct in the getEvents v2 proposal (ledger_future, stellar-rpc#593), but v1 conflates it into -32600.

Fix

fetchAndWrapLedger now calls getEventsWhenIngested(sequence), which waits for the soroban endpoint to ingest the ledger instead of failing the fetch:

  • Detects the "not yet ingested" condition via the -32600 range message (and the legacy 'start is after newest ledger' message), with a structural fallback for unrecognized -32600 wordings: compare the requested sequence against the endpoint's own getLatestLedger().
  • Polls every 6s (≈ one ledger close) until a configurable deadline — new sorobanIngestWaitSeconds endpoint config (default 600s, documented to stay below --timeout) — then rethrows, preserving the previous fail-fast behaviour as a backstop for genuine outages.
  • 'start is before oldest ledger' (out of retention) keeps its immediate explanatory error.

Mapping the error to BlockUnavailableError was deliberately avoided: the dispatcher would permanently skip the block (Near/Solana semantics) and silently drop that ledger's events.

Testing

  • New mock-based unit tests in api.stellar.spec.ts (no network required, delay mocked): recovery after transient failures, immediate rethrow below the retention window, both legacy messages, the getLatestLedger fallback (both directions), deadline exhaustion, and end-to-end wiring through fetchAndWrapLedger.
  • Existing live-endpoint tests in the file still pass.
  • The same logic, applied as a patch on @subql/node-stellar@6.2.0 dist, stopped the crash-loop scenario described above.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@ahmdssi, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 58 minutes and 49 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 532b4725-e988-4c58-af42-7c454c926ee5

📥 Commits

Reviewing files that changed from the base of the PR and between 5d41a0f and 6599649.

📒 Files selected for processing (5)
  • packages/node/CHANGELOG.md
  • packages/node/src/stellar/api.stellar.spec.ts
  • packages/node/src/stellar/api.stellar.ts
  • packages/types/CHANGELOG.md
  • packages/types/src/project.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ahmdssi ahmdssi force-pushed the fix/wait-soroban-ledger-ingestion branch from 64f7af7 to 6599649 Compare June 12, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant