Skip to content

THREESCALE-14654 Fix status reconciler requeue logic#1177

Open
borisurbanik wants to merge 1 commit into
3scale:masterfrom
borisurbanik:THREESCALE-14654
Open

THREESCALE-14654 Fix status reconciler requeue logic#1177
borisurbanik wants to merge 1 commit into
3scale:masterfrom
borisurbanik:THREESCALE-14654

Conversation

@borisurbanik

@borisurbanik borisurbanik commented May 7, 2026

Copy link
Copy Markdown
Contributor

Fixes https://issues.redhat.com/browse/THREESCALE-14654

Summary

  • Fixes equalStatus && newAvailable guard: the unavailable-but-equal case fell through to a no-op status write on every reconcile. Changed to if equalStatus which short-circuits both available and unavailable steady states.
  • Switches both unavailable return paths from Requeue: true (immediate tight-loop) to RequeueAfter: 30s, giving components time to recover between checks and avoiding extending the backoff counter.

Test plan

  • Unit tests: go test ./controllers/apps/... -run TestAPIManagerStatusReconciler
  • TestAPIManagerStatusReconciler_Reconcile_requeueAfterWhenUnavailable verifies RequeueAfter is non-zero on True→False transition

Manual testing setup

Tested manually with following setup:

export NAMESPACE=3scale-test
make NAMESPACE=$NAMESPACE cluster/prepare/local

cat << EOF | oc create -f -
kind: Secret
apiVersion: v1
metadata:
  name: s3-credentials
  namespace: $NAMESPACE
data:
  AWS_ACCESS_KEY_ID: c29tZXRoaW5nCg==
  AWS_BUCKET: c29tZXRoaW5nCg==
  AWS_REGION: dXMtd2VzdC0xCg==
  AWS_SECRET_ACCESS_KEY: c29tZXRoaW5nCg==
type: Opaque
EOF

DOMAIN=$(oc get routes console -n openshift-console -o json | jq -r '.status.ingress[0].routerCanonicalHostname' | sed 's/router-default.//')
cat << EOF | oc create -f -
kind: APIManager
apiVersion: apps.3scale.net/v1alpha1
metadata:
  name: 3scale
  namespace: $NAMESPACE
spec:
  wildcardDomain: $DOMAIN
  system:
    fileStorage:
      simpleStorageService:
        configurationSecretRef:
          name: s3-credentials
  externalComponents:
    backend:
      redis: true
    system:
      database: true
      redis: true
EOF

sleep 120 && make run

After system provisions and stabilises, delete one of the route, to induce "Available: false", observe the logs for some time - the reconcile was triggered every 30s.

🤖 Co-authored with Claude Code

@borisurbanik borisurbanik requested a review from a team as a code owner May 7, 2026 20:26
@borisurbanik borisurbanik changed the title THREESCALE-14654 Fix status reconciler requeue logic THREESCALE-14224 (THREESCALE-14654) Fix status reconciler requeue logic May 7, 2026
@tkan145

tkan145 commented May 8, 2026

Copy link
Copy Markdown
Contributor

/retest

@borisurbanik borisurbanik changed the title THREESCALE-14224 (THREESCALE-14654) Fix status reconciler requeue logic THREESCALE-14654 Fix status reconciler requeue logic May 8, 2026
@codecov-commenter

codecov-commenter commented May 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.98%. Comparing base (3e09410) to head (bf7312e).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
controllers/apps/apimanager_status_reconciler.go 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1177      +/-   ##
==========================================
+ Coverage   43.16%   43.98%   +0.82%     
==========================================
  Files         203      204       +1     
  Lines       20885    20927      +42     
==========================================
+ Hits         9015     9205     +190     
+ Misses      11056    10920     -136     
+ Partials      814      802      -12     
Flag Coverage Δ
unit 43.98% <25.00%> (+0.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
apis/apps/v1alpha1 (u) 63.19% <ø> (+0.74%) ⬆️
apis/capabilities/v1alpha1 (u) 3.50% <ø> (ø)
apis/capabilities/v1beta1 (u) 20.21% <ø> (ø)
controllers (i) 12.08% <20.00%> (-0.01%) ⬇️
pkg (u) 63.69% <81.15%> (+1.30%) ⬆️
Files with missing lines Coverage Δ
controllers/apps/apimanager_status_reconciler.go 73.43% <25.00%> (-0.58%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

if equalStatus {
s.logger.V(1).Info("Status was not updated")
if !newAvailable {
return reconcile.Result{RequeueAfter: 30 * time.Second}, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we now return reconcile.Result{RequeueAfter: 30 * time.Second}, nil, but inside apimanager_controller.go we only check Requeue, so the result is ignored.

if statusResult.Requeue {
	logger.Info("Reconciling not finished. Requeueing.")
	return statusResult, nil
}

…us write

Two issues in the previous requeue logic:

1. The guard `equalStatus && newAvailable` caused the unavailable-but-equal
   case to fall through to a status write even when nothing had changed,
   producing a no-op write on every reconcile while the instance remained
   unavailable. The new `if equalStatus` guard short-circuits both the
   available and unavailable steady states, avoiding the unnecessary write.

2. Both unavailable return paths used `Requeue: true` (immediate requeue),
   which causes tight-loop reconciliation against an instance that is still
   unavailable. Switching to `RequeueAfter: 30s` gives components time to
   recover between checks and avoids extending backoff counter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@borisurbanik

Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@tkan145

tkan145 commented May 15, 2026

Copy link
Copy Markdown
Contributor

/retest

return specResult, nil
}

if statusResult.Requeue {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only check the last one? reconcileAPIManagerStatus is called more than once

s.logger.V(1).Info("Status is different")
s.apimanagerResource.Status = *newStatus
if err := s.Client().Status().Update(s.Context(), s.apimanagerResource); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to update status: %w", err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now ignore the Conflict error?

@tkan145

tkan145 commented May 17, 2026

Copy link
Copy Markdown
Contributor

/retest

@openshift-ci

openshift-ci Bot commented May 18, 2026

Copy link
Copy Markdown

@urbanikb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-integration 9a9dee8 link true /test test-integration

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants