Skip to content

log(db): Add db failure cause log message#4748

Closed
TheodoreSpeaks wants to merge 3 commits into
stagingfrom
debug/failed-schedule
Closed

log(db): Add db failure cause log message#4748
TheodoreSpeaks wants to merge 3 commits into
stagingfrom
debug/failed-schedule

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

Summary

  • Log the underlying Postgres/driver error cause (code, severity, detail, routine, errno, syscall) when workflow execution fails
  • Drizzle wraps the real driver error as error.cause, which the logger's Error serializer dropped — leaving only an opaque Failed query: ... message with no pg error code
  • Walk the cause chain (depth-bounded to 10) and surface the first error carrying a code; this distinguishes a connection drop (08006), rejected connection (53300), and statement timeout (57014)
  • Wrapped in try/catch so the diagnostic can never throw or hang inside the error handler — worst case is no cause field, never a masked failure

Type of Change

  • Improvement (logging/observability)

Testing

Tested manually; bun run lint clean, bun run check:api-validation:strict passed, tsc --noEmit clean. Motivated by intermittent schedule-execution failures where the workspace/personal env reads in preprocessing fail with an opaque "Failed query" and no driver error code.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 26, 2026 11:43pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 26, 2026

PR Summary

Medium Risk
Migration CI path and statement_timeout changes affect how schema deploys run; execution logging is observability-only with a guarded helper.

Overview
Improves workflow execution failure logs by walking Drizzle’s wrapped error.cause chain and attaching driver/Postgres fields (code, severity, detail, etc.) so intermittent DB errors are easier to classify than a bare “Failed query” message.

CI database migrations now run via bun run ./scripts/migrate.ts instead of drizzle-kit migrate, and that script SET statement_timeout = 0 before applying migrations so long-running DDL is not cut off by a server timeout.

Reviewed by Cursor Bugbot for commit 63e3927. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR improves observability for workflow execution failures by surfacing the underlying Postgres/driver error that Drizzle wraps in error.cause, and switches the CI migration runner to a custom script with richer diagnostics and a session-level SET statement_timeout = 0 guard.

  • describeErrorCause walks the cause chain (depth-bounded to 10, wrapped in try/catch) and extracts the first error carrying a code, then passes it as a structured cause field to logger.error — enabling CloudWatch Log Insights to distinguish connection drops, pool exhaustion, and statement timeouts behind the previously opaque "Failed query" message.
  • packages/db/scripts/migrate.ts is now the CI migration runner via .github/workflows/migrations.yml; it disables statement_timeout for the session and prints comprehensive Postgres diagnostic fields on failure via printMigrationError.

Confidence Score: 4/5

Safe to merge; changes are purely additive diagnostics with no behavioral impact on the execution path.

The core execution path is unchanged — describeErrorCause can never throw or mask the original error thanks to its try/catch. The migration script improvement is well-guarded. The only rough edge is the unbounded recursion in printMigrationError, which is harmless in the real failure modes this code targets but worth hardening before it becomes a pattern.

packages/db/scripts/migrate.ts — the recursive printMigrationError helper has no depth guard; worth adding one consistent with the approach used in describeErrorCause.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/execution-core.ts Adds describeErrorCause helper and passes its result as a third arg to logger.error; logic is depth-bounded, wrapped in try/catch, and integrates correctly with the structured-JSON logger path.
packages/db/scripts/migrate.ts Prepends SET statement_timeout = 0 to prevent long DDL from timing out; printMigrationError recurses through the cause chain without a depth guard, which could infinite-loop on a pathological self-referential cause but is safe in practice.
.github/workflows/migrations.yml Switches migration runner from bunx drizzle-kit migrate to the custom bun run ./scripts/migrate.ts, enabling richer error output and the new statement_timeout = 0 guard.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[executeWorkflowCore throws] --> B[catch block]
    B --> C[logger.error with error + describeErrorCause]
    C --> D{describeErrorCause}
    D --> E{error instanceof Error?}
    E -- No --> F[return undefined]
    E -- Yes --> G[Walk cause chain up to depth 10]
    G --> H{candidate.code defined?}
    H -- Yes --> I[driver = candidate, break]
    H -- No --> J[advance to candidate.cause]
    J --> G
    G -- no Error found --> K[driver = first Error or undefined]
    I --> L[filterUndefined on driver fields]
    K --> L
    L --> M{driver exists?}
    M -- No --> F
    M -- Yes --> N[Return: name/message/code/severity/detail/routine/errno/syscall]
    N --> O[logger merges as cause into structured JSON log entry]
    F --> P[cause field omitted from log entry]
Loading

Comments Outside Diff (1)

  1. packages/db/scripts/migrate.ts, line 81-84 (link)

    P2 Unbounded recursion in printMigrationError

    printMigrationError calls itself on error.cause with no depth limit. While real Postgres/Drizzle cause chains are shallow, a contrived or unexpected cycle (e.g. e.cause = e) would overflow the call stack and crash the migration process before it logs anything useful. Adding a simple depth counter (like the depth-bounded walk in describeErrorCause) would make this safe without sacrificing the diagnostic output.

Reviews (1): Last reviewed commit: "log(db): Add db failure cause log messag..." | Re-trigger Greptile

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

Superseded by #4749, which isolates the logging change cleanly off staging. The two DB/CI commits on this branch (fix(db): disable statement_timeout, fix(ci): route migration workflow) aren't in staging/main yet and will get their own PR.

@waleedlatif1 waleedlatif1 deleted the debug/failed-schedule branch May 27, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant