OPRUN-4392,OPRUN-4393: Add OLMv1 progress deadline QE tests + fixes#755
OPRUN-4392,OPRUN-4393: Add OLMv1 progress deadline QE tests + fixes#755dtfranz wants to merge 1 commit into
Conversation
|
@dtfranz: This pull request references OPRUN-4392 which is a valid jira issue. This pull request references OPRUN-4393 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a QE Ginkgo test for ChangesProgress deadline test flow
Estimated code review effort: 3 (Moderate) | ~25 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dtfranz The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
openshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go (1)
405-416: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider documenting the 12-minute timeout rationale.
The 12-minute timeout here is notably longer than other assertions (3-5 minutes). While this is correct for waiting beyond the 10-minute progress deadline set in test case 88331, a comment explaining the relationship would improve maintainability.
📝 Optional improvement
func expectClusterObjectSetCondition(ctx context.Context, name, conditionType string, status metav1.ConditionStatus, reason string) { + // Timeout is 12 minutes to accommodate the 10-minute progress deadline in test case 88331, + // plus buffer time for controller processing and status updates. eventually(func(g o.Gomega) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go` around lines 405 - 416, Add a comment above the eventually function call in expectClusterObjectSetCondition to document why the 12-minute timeout is used. The comment should explain that this timeout is intentionally longer than other assertions (3-5 minutes) to wait beyond the 10-minute progress deadline configured in test case 88331, establishing the relationship between the timeout value and the deadline requirement for maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go`:
- Around line 263-264: The error message in the o.Expect assertion for the oc
start-build command includes the full output variable, which can contain
extensive build logs violating QE guidelines. Remove the output variable from
the o.Expect error message assertion at the line where start-build is run and
Run method is called, and instead log the output separately using g.By() before
the assertion or truncate the output to a reasonable size if the error needs to
include diagnostic information. This ensures large build logs are handled
through proper logging mechanisms rather than being embedded in the error
expectation message.
---
Nitpick comments:
In `@openshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go`:
- Around line 405-416: Add a comment above the eventually function call in
expectClusterObjectSetCondition to document why the 12-minute timeout is used.
The comment should explain that this timeout is intentionally longer than other
assertions (3-5 minutes) to wait beyond the 10-minute progress deadline
configured in test case 88331, establishing the relationship between the timeout
value and the deadline requirement for maintainability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2af580ca-f800-4db0-8859-d59b8a61f185
📒 Files selected for processing (2)
openshift/tests-extension/.openshift-tests-extension/openshift_payload_olmv1.jsonopenshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go
| output, err := oc.AsAdmin().WithoutNamespace().Run("start-build").Args(name, "-n", namespace, "--from-archive="+archive, "--wait").Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred(), "failed to build image %s: %s", name, output) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Don't include large build output in error messages.
The output variable from oc start-build could contain extensive build logs (hundreds of lines). Including this directly in the o.Expect error message violates the QE guideline: "Don't put large log outputs in error messages (use proper log messages instead of o.Expect with large output)".
Consider logging the output separately with g.By() or truncating it:
♻️ Suggested fix
output, err := oc.AsAdmin().WithoutNamespace().Run("start-build").Args(name, "-n", namespace, "--from-archive="+archive, "--wait").Output()
-o.Expect(err).NotTo(o.HaveOccurred(), "failed to build image %s: %s", name, output)
+if err != nil {
+ g.By(fmt.Sprintf("Build output for %s:\n%s", name, output))
+ o.Expect(err).NotTo(o.HaveOccurred(), "failed to build image %s", name)
+}Or truncate the output:
output, err := oc.AsAdmin().WithoutNamespace().Run("start-build").Args(name, "-n", namespace, "--from-archive="+archive, "--wait").Output()
-o.Expect(err).NotTo(o.HaveOccurred(), "failed to build image %s: %s", name, output)
+truncated := output
+if len(truncated) > 500 {
+ truncated = truncated[:500] + "... (truncated)"
+}
+o.Expect(err).NotTo(o.HaveOccurred(), "failed to build image %s: %s", name, truncated)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openshift/tests-extension/test/qe/specs/olmv1_ce_progress_deadline.go` around
lines 263 - 264, The error message in the o.Expect assertion for the oc
start-build command includes the full output variable, which can contain
extensive build logs violating QE guidelines. Remove the output variable from
the o.Expect error message assertion at the line where start-build is run and
Run method is called, and instead log the output separately using g.By() before
the assertion or truncate the output to a reasonable size if the error needs to
include diagnostic information. This ensures large build logs are handled
through proper logging mechanisms rather than being embedded in the error
expectation message.
Source: Coding guidelines
|
/retest |
|
/test e2e-aws-upgrade-ovn-single-node |
4 similar comments
|
/test e2e-aws-upgrade-ovn-single-node |
|
/test e2e-aws-upgrade-ovn-single-node |
|
/test e2e-aws-upgrade-ovn-single-node |
|
/test e2e-aws-upgrade-ovn-single-node |
| return input | ||
| } | ||
|
|
||
| func expectClusterCatalogServing(ctx context.Context, name string) { |
There was a problem hiding this comment.
suggestion: The file introduces expectClusterCatalogServing, expectClusterExtensionCondition, expectClusterObjectSetCondition, and deleteObject — all of which duplicate existing helpers:
expectClusterCatalogServing→helpers.ExpectCatalogToBeServing(pkg/helpers/catalogs.go:38)expectClusterExtensionCondition→olmv1util.ClusterExtensionDescription.WaitClusterExtensionConditiondeleteObject→ cleanup functions returned byhelpers.CreateClusterCatalog,helpers.EnsureCleanupClusterExtension, etc.
Reusing the existing helpers would cut ~34 lines and keep timeouts consistent across the test suite (these use hardcoded 3/5/12 min vs the established DefaultTimeout/DefaultPolling constants).
There was a problem hiding this comment.
deleteObject is still used for a few things that don't appear to have relevant helpers - but I've otherwise cleaned things up by using some other available helpers for clusterextension and catalog.
| command: | ||
| - /scripts/httpd.sh | ||
| ports: | ||
| - containerPort: 8080 |
There was a problem hiding this comment.
suggestion: containerPort: 8080 but httpd.sh listens on 8081 (and all probes correctly target 8081). The field is informational-only so nothing breaks, but it's misleading if someone reads the CSV template later. Consider changing to 8081.
Automate the ClusterExtension rollout failure coverage for OCP-88331 and OCP-88332 by building in-cluster bundle and catalog images for successful and failing bundle versions. The new QE specs verify ProgressDeadlineExceeded on an initial failed rollout and ProbeFailure while upgrading to a bad revision under the BoxCutter runtime. Signed-off-by: Daniel Franz <dfranz@redhat.com> Co-authored-by: Bruno Andrade <bruno.balint@gmail.com>
1d56a97 to
41eb704
Compare
|
@dtfranz: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
perdasilva
left a comment
There was a problem hiding this comment.
Code review of the progress deadline QE tests. 9 findings below (ranked by severity).
| g.AfterEach(func() { | ||
| if g.CurrentSpecReport().Failed() { | ||
| helpers.DescribeAllClusterCatalogs(context.Background()) | ||
| helpers.DescribeAllClusterExtensions(context.Background(), "") |
There was a problem hiding this comment.
Bug: helpers.DescribeAllClusterExtensions(context.Background(), "") passes an empty namespace, which causes the function to return immediately (dump.go:88-90) — no ClusterExtension diagnostic output is ever printed on test failure.
All 12 other call sites in the codebase pass a real namespace. The fixture creates resources in a known namespace (e.g., "ns-88331") but that value is not captured here.
Suggested fix: capture the fixture namespace and pass it:
g.AfterEach(func() {
if g.CurrentSpecReport().Failed() {
helpers.DescribeAllClusterCatalogs(context.Background())
helpers.DescribeAllClusterExtensions(context.Background(), fixture.Namespace)
}
})This requires restructuring so fixture is available in the AfterEach scope.
| g.By("creating a test namespace") | ||
| ns := &corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: namespace}} | ||
| o.Expect(k8sClient.Create(ctx, ns)).To(o.Succeed(), "failed to create Namespace") | ||
| g.DeferCleanup(deleteObject, ns) |
There was a problem hiding this comment.
Convention violation: DeferCleanup is registered after Create, but the AGENTS.md convention says:
Register defer cleanup BEFORE creating resources — Pattern:
defer resource.Delete(oc)thenresource.Create(oc)— Why: Ensures cleanup even if Create partially succeeds then fails
If Create succeeds server-side but returns a network error, o.Expect fails, DeferCleanup is never reached, and the Namespace leaks.
Same issue at lines 145, 173, 200 (RoleBinding, ImageStream, BuildConfig).
Suggested fix for each — register cleanup first:
g.DeferCleanup(deleteObject, ns)
o.Expect(k8sClient.Create(ctx, ns)).To(o.Succeed(), "failed to create Namespace")| func deleteObject(obj client.Object) { | ||
| err := env.Get().K8sClient.Delete(context.Background(), obj) | ||
| if err != nil && !errors.IsNotFound(err) { | ||
| o.Expect(err).NotTo(o.HaveOccurred(), "failed to delete %s %s", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName()) |
There was a problem hiding this comment.
Diagnostics issue: obj.GetObjectKind().GroupVersionKind().Kind returns "" for typed Go structs that haven't been through scheme conversion. The error message will read "failed to delete ns-88331" (empty kind).
Also, calling o.Expect(err).NotTo(...) during cleanup can mask the original test failure. The dominant codebase pattern in in_cluster_bundles.go uses _ = k8sClient.Delete(...) to silently ignore cleanup errors.
Suggested fix — use client.IgnoreNotFound and drop the function:
g.DeferCleanup(func() {
_ = env.Get().K8sClient.Delete(context.Background(), obj)
})Or if you want to keep the helper, use client.IgnoreNotFound(err) and log warnings instead of asserting.
| } | ||
| } | ||
|
|
||
| func buildImage(ctx g.SpecContext, oc *exutil.CLI, namespace, name string, files map[string][]byte) { |
There was a problem hiding this comment.
Reuse: buildImage, createBuildArchive, namespace creation, and image-puller RoleBinding setup (lines 120-234) reimplement functionality already present in pkg/helpers/in_cluster_bundles.go (createImageStream, createBuildConfig, startBuild, waitForBuildToFinish, createTempTarBall, createNamespace, createImagePullerRoleBinding).
Notably, the existing waitForBuildToFinish dumps oc get build -oyaml and oc logs build/... --tail=200 on failure (lines 324-331), which the new code does not — build failures here will produce opaque errors.
Consider exporting the existing helpers (or extracting a shared higher-level function) rather than maintaining a parallel implementation.
| g.DeferCleanup(cleanup) | ||
|
|
||
| g.By("waiting for the first ClusterObjectSet to report ProgressDeadlineExceeded") | ||
| expectClusterObjectSetCondition(ctx, ceName+"-1", olmv1.TypeProgressing, metav1.ConditionFalse, olmv1.ReasonProgressDeadlineExceeded) |
There was a problem hiding this comment.
Fragility: ClusterObjectSet names are hardcoded as ceName+"-1" and ceName+"-2" (also at lines 98, 101, 104), coupling tightly to the controller's fmt.Sprintf("%s-%d", ext.Name, revisionNumber) naming convention.
The test already has expectActiveRevisions which reads names from status.ActiveRevisions — the same approach could discover COS names dynamically, making the tests resilient to future naming changes.
Suggested approach:
revisions := getActiveRevisions(ctx, ceName)
o.Expect(revisions).To(o.HaveLen(1))
expectClusterObjectSetCondition(ctx, revisions[0], ...)| }, 3*time.Minute) | ||
| } | ||
|
|
||
| func eventually(callback func(o.Gomega), timeout time.Duration) { |
There was a problem hiding this comment.
Missing context propagation: The eventually wrapper does not call .WithContext(ctx), so if the Ginkgo SpecContext deadline fires, the Eventually continues polling for the full timeout (up to 12 minutes) before failing.
All three callers receive a ctx parameter. Gomega supports .WithContext(ctx) for early cancellation.
Suggested fix:
func eventually(ctx context.Context, callback func(o.Gomega), timeout time.Duration) {
o.Eventually(callback).WithContext(ctx).WithTimeout(timeout).WithPolling(helpers.DefaultPolling).Should(o.Succeed())
}| g.DeferCleanup(cleanup) | ||
|
|
||
| g.By("waiting for the initial installation to complete") | ||
| expectClusterExtensionCondition(ctx, ceName, olmv1.TypeInstalled, metav1.ConditionTrue, "", "") |
There was a problem hiding this comment.
nit: These two sequential expectClusterExtensionCondition calls re-fetch the same ClusterExtension object to check different conditions. A single Eventually block checking both TypeInstalled=True and TypeProgressing=True/ReasonSucceeded would halve API calls and avoid a 6-minute worst-case timeout (3+3). Same pattern at lines 101-104 with the COS (24-minute worst case: 12+12).
| } | ||
| } | ||
|
|
||
| const bundleDockerfile = `FROM scratch |
There was a problem hiding this comment.
nit: 130+ lines of inline YAML templates bypass the go-bindata pipeline (pkg/bindata/) used by other in-cluster-build tests. This means the YAML is never validated by make bindata / make verify, and can't be reused by future tests without copy-paste. Consider moving these to pkg/bindata/qe/ for consistency.
| expectClusterObjectSetCondition(ctx, ceName+"-1", olmv1.TypeProgressing, metav1.ConditionFalse, olmv1.ReasonProgressDeadlineExceeded) | ||
|
|
||
| g.By("verifying the ClusterExtension reports ProgressDeadlineExceeded") | ||
| expectClusterExtensionCondition(ctx, ceName, olmv1.TypeProgressing, metav1.ConditionFalse, olmv1.ReasonProgressDeadlineExceeded, "Revision has not rolled out for 10 minute(s).") |
There was a problem hiding this comment.
nit: The exact substring "Revision has not rolled out for 10 minute(s)." is tightly coupled to the controller's message formatting. If the wording changes even slightly (drops the period, changes pluralization), this test fails for a cosmetic reason. The Reason check (ProgressDeadlineExceeded) already validates the semantic behavior — consider dropping the message assertion or matching a looser substring like "10 minute".
Automate the ClusterExtension rollout failure coverage for OCP-88331 and OCP-88332 by building in-cluster bundle and catalog images for successful and failing bundle versions.
The new QE specs verify ProgressDeadlineExceeded on an initial failed rollout and ProbeFailure while upgrading to a bad revision under the BoxCutter runtime.
Supersedes #745
Additional changes:
httpdscriptTest pass shown here: PR, CI run
Summary by CodeRabbit
ProgressDeadlineMinutesin test setups.