Skip to content

Retry E2E retryable service errors by default#1709

Open
yxzhaao wants to merge 1 commit into
larksuite:mainfrom
yxzhaao:e2e-retryable-default
Open

Retry E2E retryable service errors by default#1709
yxzhaao wants to merge 1 commit into
larksuite:mainfrom
yxzhaao:e2e-retryable-default

Conversation

@yxzhaao

@yxzhaao yxzhaao commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What changed

  • Make the CLI E2E RunCmd helper automatically retry structured service errors when the CLI returns error.retryable: true.
  • Split the actual process execution into runCmdOnce so RunCmdWithRetry remains a custom retry entrypoint and does not nest the default RunCmd retry policy.
  • Share JSON payload extraction between retryable-error detection and cleanup-error suppression, covering stderr that contains progress text before the JSON envelope.
  • Add fake CLI coverage for retryable errors, non-retryable errors, and the no-nested-retry contract.

Why

Live E2E ownership is now distributed across feature owners. Transient server contention, such as Drive delete returning 1061045 with retryable: true, should be handled by the framework by default instead of depending on every test author remembering to call RunCmdWithRetry.

Summary by CodeRabbit

  • Bug Fixes
    • Improved CLI reliability by automatically retrying commands when a structured service error indicates the failure is retryable.
    • Prevented unnecessary retries for non-retryable errors, helping failures surface more predictably.
    • Updated error handling to better interpret structured output from command results.

Change-Id: I3f0254523cd55ed24c2e87e6ac19116af59c212f
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

RunCmd is refactored to delegate execution to RunCmdWithRetry, retrying only when output contains a structured retryable service error. A new runCmdOnce helper holds single-execution logic. JSON payload extraction is centralized via new helpers, and tests cover retry and non-retry paths.

Changes

CLI Retry Behavior

Layer / File(s) Summary
Centralized retry defaults
tests/cli_e2e/core.go
Package-level default retry variables (attempts, delays, backoff multiplier) replace hard-coded values.
RunCmd delegates to RunCmdWithRetry
tests/cli_e2e/core.go
RunCmd now calls RunCmdWithRetry with ResultHasRetryableError as the retry condition; single execution logic moved into runCmdOnce; the retry loop calls runCmdOnce instead of RunCmd, and missing retry options are filled from the new defaults.
Retryable error detection and JSON extraction
tests/cli_e2e/core.go
Adds ResultHasRetryableError and extractJSONPayload helpers to detect error.retryable=true in output; cleanup-suppression logic is refactored to reuse the same JSON payload extraction.
Retry behavior tests and fake CLI commands
tests/cli_e2e/core_test.go
Adds useFastDefaultRetry helper, fail-once-retryable/always-non-retryable fake CLI commands with persisted counters, TestRunCmd subtests for retryable/non-retryable errors, and TestRunCmdWithRetry verifying explicit retry options are not nested with defaults.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant RunCmd
  participant RunCmdWithRetry
  participant runCmdOnce
  participant ResultHasRetryableError

  Caller->>RunCmd: invoke command
  RunCmd->>RunCmdWithRetry: delegate with retry condition
  loop attempts
    RunCmdWithRetry->>runCmdOnce: execute command once
    runCmdOnce-->>RunCmdWithRetry: Result
    RunCmdWithRetry->>ResultHasRetryableError: check result
    ResultHasRetryableError-->>RunCmdWithRetry: retryable true/false
  end
  RunCmdWithRetry-->>Caller: final Result
Loading

Suggested labels: size/M

Suggested reviewers: liangshuo-1

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the change and rationale, but it does not follow the required template and is missing Test Plan and Related Issues sections. Rewrite the description to use the template headings: Summary, Changes, Test Plan, and Related Issues, and fill in the missing details.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: default retrying of structured E2E service errors.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@CLAassistant

CLAassistant commented Jul 1, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added the size/S Low-risk docs, CI, test, or chore only changes label Jul 1, 2026
@yxzhaao yxzhaao marked this pull request as ready for review July 1, 2026 13:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/cli_e2e/core_test.go (1)

383-394: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

useFastDefaultRetry mutates package-level globals — not parallel-safe.

Overwriting defaultRetryInitialDelay / defaultRetryMaxDelay and restoring via t.Cleanup works only because these tests run serially. If any test in this package is later marked t.Parallel(), this becomes a data race under -race. Consider documenting the serial-only assumption here (or gating retry timing through injected RetryOptions) to prevent a future footgun.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cli_e2e/core_test.go` around lines 383 - 394, The helper
useFastDefaultRetry mutates package-level retry globals, so it is not safe if
tests in this package ever run in parallel. Update the helper to avoid
shared-state overrides where possible, ideally by threading retry timing through
injected RetryOptions or similar test-scoped configuration; if that refactor is
too large, add an explicit comment near useFastDefaultRetry noting the
serial-only assumption so future t.Parallel() usage does not introduce a race.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/cli_e2e/core_test.go`:
- Around line 383-394: The helper useFastDefaultRetry mutates package-level
retry globals, so it is not safe if tests in this package ever run in parallel.
Update the helper to avoid shared-state overrides where possible, ideally by
threading retry timing through injected RetryOptions or similar test-scoped
configuration; if that refactor is too large, add an explicit comment near
useFastDefaultRetry noting the serial-only assumption so future t.Parallel()
usage does not introduce a race.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fca14ddb-be9b-4e39-b7ab-93d8c8e0f309

📥 Commits

Reviewing files that changed from the base of the PR and between a6797ac and 7658240.

📒 Files selected for processing (2)
  • tests/cli_e2e/core.go
  • tests/cli_e2e/core_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Low-risk docs, CI, test, or chore only changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants