[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval MTA module parameter by karrgov · Pull Request #1848 · cloudfoundry/multiapps-controller

karrgov · 2026-05-29T12:14:42Z

Summary

Introduces the health-check-interval MTA module parameter for liveness health checks on CF apps deployed via MTA, achieving feature parity with the underlying Cloud Foundry capability now exposed by the upgraded CF Java client.

Example usage in mta.yaml:

modules:
  - name: my-app
    type: application
    parameters:
      health-check-interval: 15

Changes

End-to-end wiring of the new parameter:

SupportedParameters — adds HEALTH_CHECK_INTERVAL constant to MODULE_PARAMETERS.
Messages — adds INVALID_HEALTH_CHECK_INTERVAL validation message.
StagingParametersParser — parses, validates (must be > 0, throws ContentException otherwise), and forwards the value to ImmutableStaging.
Staging interface and CloudProcess — adds getHealthCheckInterval() accessor (Immutables regenerates the implementations).
RawCloudProcess — maps Data.getInterval() from the CF API response into CloudProcess.
HealthCheckInfo — adds the interval field, getter, and includes it in equals so change detection notices interval drifts.
CloudControllerRestClientImpl — forwards the interval to the Data builder; widens the updateApplicationProcess guard to also fire when only the interval changed; guards buildHealthCheck against HealthCheckType.from(null) NPE.

Tests

StagingParametersParserTest — three new tests (correct parse with interval=15, validation rejection with interval=0, null when absent), plus a parameterized test covering positive interval values (1, 15, 60, Integer.MAX_VALUE).
HealthCheckInfoTest — four tests covering equal instances, different intervals, null-vs-non-null, and fromProcess / fromStaging cross-equality.
RawCloudProcessTest — new test class covering RawCloudProcess.derive() mapping including health-check interval propagation from the CF API response.

Jira

JIRA: LMCROSSITXSADEPLOY-3316 — Introduce Health check interval

Test plan

mvn clean test -pl multiapps-controller-client,multiapps-controller-core passes locally.
Deploy a sample MTA with health-check-interval: 15 and confirm the CF app's process reports the interval via cf curl against the v3 process endpoint.
Update an existing MTA's health-check-interval and confirm the controller detects the drift and issues an update (rather than a no-op).
Negative case: deploy with health-check-interval: 0 and confirm ContentException with INVALID_HEALTH_CHECK_INTERVAL.

Introduce the liveness health check interval parameter end-to-end: - SupportedParameters: new HEALTH_CHECK_INTERVAL constant in MODULE_PARAMETERS - Messages: INVALID_HEALTH_CHECK_INTERVAL validation error message - StagingParametersParser: parse, validate (must be > 0), and forward to ImmutableStaging builder - Staging interface + CloudProcess: getHealthCheckInterval() accessors (Immutables regenerates) - RawCloudProcess: map Data.getInterval() from CF API response into CloudProcess - HealthCheckInfo: add interval field, getter, and equals check for change detection - CloudControllerRestClientImpl: forward interval to Data builder; widen updateApplicationProcess guard to also fire when interval is non-null; guard buildHealthCheck against HealthCheckType.from(null) NPE JIRA:LMCROSSITXSADEPLOY-3316

- StagingParametersParserTest: three new tests — correct parse (interval=15), validation rejection (interval=0 throws ContentException), and null when absent - HealthCheckInfoTest: four tests covering equal instances, different intervals, null-vs-non-null, and fromProcess/fromStaging cross-equality JIRA:LMCROSSITXSADEPLOY-3316

…ess test coverage - StagingParametersParserTest: parameterized test covering positive interval values (1, 15, 60, Integer.MAX_VALUE) - RawCloudProcessTest: new test class covering RawCloudProcess.derive() mapping including the new health-check interval propagation from the CF API response JIRA:LMCROSSITXSADEPLOY-3316

karrgov · 2026-05-29T12:44:26Z

MTA Quality Report — cloudfoundry/multiapps-controller PR #1848

Jira: LMCROSSITXSADEPLOY-3316 — Introduce Health check interval
Backlog alignment: PASS

Implements Jira scope? yes — PR wires the health-check-interval MTA module parameter end-to-end, exactly the scenario described in the Jira description.
Changes outside Jira scope? no — All edits are scoped to health-check-interval propagation and its tests; no unrelated refactors.

Code Review

No code-review findings (confidence ≥ 80).

Security

No issues found.

No new untrusted-input sinks (no LOGGER, exec, deserialization, or file/archive paths modified).
New parameter is validated server-side (validateHealthCheckInterval rejects <= 0) before being forwarded to CF.
buildHealthCheck now guards HealthCheckType.from(null) against NPE — a small defensive hardening, not a vulnerability fix.
No new dependencies → no CVE surface change.

SonarCloud

⚠️ The build GitHub Actions job failed in the Sonar Scan step with Project not found. Please check the 'sonar.projectKey' and 'sonar.organization' properties, the 'SONAR_TOKEN' environment variable, or contact the project administrator (job log). This is a CI/CD configuration issue unrelated to this PR's contents — the Sonar project binding or token is invalid. Unit tests in upstream Maven modules all passed before the Sonar Scan step ran. No SonarCloud quality-gate verdict could be obtained for this head SHA.

Other checks on the head SHA:

Check	Conclusion
CodeQL	✅ success — No new alerts in code changed by this pull request
Build and analyze	✅ success
Analyze (java)	✅ success
Check Commit Message	✅ success
EasyCLA	✅ success
build (Sonar Scan step)	❌ failure — Sonar token / project binding misconfigured

Dependency CVEs

✅ No new CVEs — this PR does not modify any dependency files (pom.xml, build.gradle, lockfiles).

karrgov · 2026-05-29T14:17:51Z

`oq` test verdict: FAIL

Recommendation: do not merge as-is — OQ failed broadly (30 scenarios), but log analysis finds zero PR-attributable regressions; investigate the CF target / re-run OQ before drawing a code conclusion.

PR: #1848 @ fe7b56add19f021efed65feb86df45d9cabe5858
CF target: deploy-service / sap_btp_cf_mta_deploy+technical1 (app: deploy-service)
Window: 2026-05-29T13:20:22Z → 2026-05-29T13:52:17Z
Pipeline: http://gcpclm950064:8080/teams/main/pipelines/qa-tester

Pipeline outcomes

Stage	Result	Notes
Deploy (`deploy-service-pusher-oq`)	PASS	—
Tests (`qa-tester`)	FAIL	30 non-paused Concourse jobs failed/errored/aborted
Log analysis (`log-analyzer`)	FAIL	OQ_RESULT=FAIL (30 scenarios). 0 regression suspects; all WARN/ERROR are catalog-expected or known infra noise. No stack frame or logger intersects the 11 files changed by this PR. Breadth + 22 scenarios with no server-side error evidence point to a shared infrastructure disruption (CF API rate limiting visible at 13:39–13:41Z), not a code regression.

Verdict rationale

Verdict is FAIL because OQ reported 30 failing scenarios — the test signal is unambiguously red and we will not pass a run that didn't pass tests. However, the supporting log analysis is unusual: log-analyzer ran cleanly, sifted 5,872 WARN / 48,750 ERROR entries, classified 179 as test-driven (catalog-expected) and 54,443 as known infrastructure noise (auditlog binding absent, ANS not configured, CSRF on whitelisting probes, etc.), and found zero regression suspects — meaning nothing in the log window touches the 11 files in this PR's diff (a small additive change introducing the health-check-interval MTA parameter). 22 of the 30 failed scenarios produced no server-side WARN/ERROR at all, which is consistent with a test-runner / CF-target disruption (rate limiting, space misconfig) rather than a code defect. Recommendation: hold the merge, investigate CF infra, and re-run OQ; do not interpret this run as evidence of a regression in PR #1848.

Failed jobs / scenarios

application-hooks
async-service-bindings-scenario
async-service-keys-scenario
bg-deploy-stop-reorder
blue-green-deploy
cleaners-and-clean-up-job
cts-basic-auth-error
cts-basic-auth-error-new-slp-api
cts-blue-green
cts-custom-idp-authentication
cts-multipart-file-uploads
cts-multiple-mtas-deploy
cts-oauth-error
cts-oauth-error-new-slp-api
gacd-in-deployed-after
generic-content-deploy
hook-target-app
liquibase-lock-service
namespace-multiple-deploys
occasional-message-for-non-finishing-task-execution
only-async-services-scenario
optional-mta-resources-scenario
passing-secrets-during-deployment
selective-deployment-scenario
service-tags
shared-private-domain-scenario
test-shutdown-client
update-service-scenario
whitelisting-visibility-failure-scenario
whitelisting-visibility-in-current-org-space-scenario

PR change surface

Files changed: 11 (additive health-check-interval MTA parameter support)
Modules touched: multiapps-controller (no multiapps or xsa-multiapps-controller changes). Affected classes: RawCloudProcess, CloudProcess, Staging, CloudControllerRestClientImpl, HealthCheckInfo, Messages, SupportedParameters, StagingParametersParser (+ 3 unit tests). Note: PR diff was not re-fetched at publish time (file list taken from log-analyzer's diff summary); module attribution carries the analyzer's confidence.
Suspect overlap: none — log-analyzer reported 0 regression suspects, so no changed file in this PR overlaps any signature in the log window.

Log analysis summary

Expected (test-driven): 179
Infrastructure / transient: 54,443
Potentially regression-related: 0
Likely caused by PR: 0
Unlikely caused by PR: 0
Inconclusive: 0
Version skew: post-release normal (multiapps-controller pins 2.48.0; multiapps moved to 2.49.0-SNAPSHOT) — analyzer flagged this as the benign post-release variant, not the "deploy stale code" variant.

Full log-analyzer findings

Log Analyzer — oq verdict: FAIL

Test outcome (from orchestrator): FAILED
CF target: deploy-service / sap_btp_cf_mta_deploy+technical1 (app: deploy-service, deployed sha: fe7b56a)
Window: 2026-05-29T13:20:22Z → 2026-05-29T13:52:17Z
Index queried: logs-*
Total WARN: 5,872 | Total ERROR: 48,750 | Truncated: no

Verdict rationale

The overall verdict is FAIL because the orchestrator reported OQ_RESULT=FAIL (30 scenarios failed). However, the log analysis finds zero Bucket C regression suspects — every WARN/ERROR entry in the window is attributable to either known OQ scenario behavior (Bucket A, 179 catalog-matched hits) or recurring infrastructure/configuration noise unrelated to the PR's changes (Bucket B, 54,443 hits). The 30-scenario failure breadth is characteristic of a shared infrastructure disruption (see below) rather than a code regression. The PR diff (health-check-interval parameter support) touches 11 files across 5 Java classes, none of which intersects any stack frame, logger name, or exception message observed in the log window. The log analysis finds no evidence that this PR caused the OQ failures.

Local git state at analysis time

Sub-project	Branch	On feature branch
multiapps-controller	qa-pr-1848	yes
multiapps	master	no
xsa-multiapps-controller	master	no
XSOQTests	feature/LMCROSSITXSADEPLOY-3316	yes
cf-mta-examples	feature/LMCROSSITXSADEPLOY-3316	yes
multiapps-cli-plugin	master	no

No uncommitted local changes were found in any sub-project. The deployed WAR corresponds to the tip of qa-pr-1848 (fe7b56a), which carries 3 commits relative to master — all part of the health-check-interval feature.

Deploy chain version pinning

Source of truth	Truth value	Declared in downstream	Declared value	Status
`multiapps/pom.xml` `<version>`	`2.49.0-SNAPSHOT`	`multiapps-controller` `<multiapps.version>`	`2.48.0`	SKEW (expected)
`multiapps/pom.xml` `<version>`	`2.49.0-SNAPSHOT`	`xsa-multiapps-controller` `<multiapps.version>`	`2.48.0`	SKEW (expected)
`multiapps-controller/pom.xml` `<version>`	`2.48.0-SNAPSHOT`	`xsa-multiapps-controller` `<multiapps-controller.version>`	`2.48.0-SNAPSHOT`	OK

Assessment: The apparent skew (2.49.0-SNAPSHOT vs 2.48.0) is the normal post-release state. multiapps 2.48.0 was released and multiapps-controller pins to that released artifact. multiapps has moved on to 2.49.0-SNAPSHOT for the next development cycle. Since 2.48.0 is a published artifact in the Maven repository, multiapps-controller resolves it correctly — this is not the "deploy stale code" variant of the version skew described in CLAUDE.md. No Bucket C escalation required.

Categorization

Category	Count
Expected (test-driven, catalog-matched)	179
Infrastructure / transient (Bucket B)	54,443
Potentially regression-related (Bucket C)	0

Bucket A — Expected (test-driven) detail

The OQ reference catalog matched 179 entries across 4 scenarios:

Scenario	Count	Kind
generic-content-deploy	152	content_error
timeout-scenario	6	timeout
service-deletion-failed-scenario	2	service_deletion_failure
app-staging-failure	1	staging_failure
(global signature match)	18	broker_failure / unsupported_parameter

Note: health-check-interval-scenario (the new scenario added by this PR's XSOQTests branch) is not in the OQ catalog yet and did not appear in the orchestrator's failed_scenarios list — it was apparently not executed in this pipeline run (expected: the scenario is on the feature branch but the pipeline YAML has not been updated to include it yet).

Note: 15 of the 30 failed scenarios are not present in the current OQ reference catalog (application-hooks, all cts-* variants, gacd-in-deployed-after, liquibase-lock-service, namespace-multiple-deploys, occasional-message-for-non-finishing-task-execution, only-async-services-scenario, passing-secrets-during-deployment, selective-deployment-scenario, service-tags, shared-private-domain-scenario, test-shutdown-client, update-service-scenario, whitelisting-visibility-*). Their expected error signatures are not modeled; however, their failure logs do not appear in the WARN/ERROR stream in a form that the triage engine recognized as regression-marker-bearing. This anomaly count (15/29 catalog scenarios) warrants catalog expansion but does not change the regression verdict for this PR.

Bucket B — Infrastructure / transient detail

Signature	Count	Classification
`AuditLogNotAvailableException`: Failed to write message to the audit log	45,001	Infrastructure — auditlog service not bound in this OQ space (known); emitted on every deployment operation
Ignoring parameter "namespace", as the MTA is not deployed with namespace!	3,690	Expected behavior WARN — scenarios deploy MTAs without namespace; logged per-resource
`EmptyAnsProducerClientException`: Notification for Unknown NOT sent to ANS: Configuration missing	1,358	Infrastructure — ANS (Alert Notification Service) not configured in OQ space; known
`MissingCsrfTokenException`: Request "POST …" failed with "Could not verify the provided CSRF token"	2,835	Test-induced — OQ scenarios probe CSRF-protected endpoints without a prior GET to seed the token; consistent with whitelisting/CTS test patterns
Skipping deletion of services, because --delete-services is not specified	279	Expected behavior WARN — nominal
`RejectedExecutionException`: task rejected from ThreadPoolExecutor (pool size=6, active=6)	2	Transient — upload thread pool momentarily saturated during concurrent OQ runs; retried
`TooManyRequests` (429) from GET /v3/roles	8	Infrastructure — CF API rate limiting; handled by ResilientOperationExecutor with retry
`ContentException`: Error merging descriptors: Unsupported resource type "auditlog" for platform type "CLOUD-FOUNDRY"	8	Expected behavior — CTS/XSA scenarios exercise resource types unsupported on CF; produces ContentException deliberately
`NullPointerException` in OperationInFinalStateHandler.deletePreviousBackupDescriptors	23	Infra/pre-existing — NPE arises when DeploymentDescriptor is null (i.e. a process completed without leaving a backup descriptor). OperationInFinalStateHandler is NOT in the PR diff (last touched by commit `4f32db2`, pre-dates this PR). Logger: SafeExecutor wraps and logs as WARN — non-fatal.
`NotFoundException`: MTA with name "anatz-severe-error"/"ztana" does not exist	16	Expected behavior — undeploy scenarios targeting MTAs not yet deployed
`CloudOperationException`: 404 Not Found: Service instance not found	9	Expected behavior — optional-resources scenarios deliberately reference non-existent services
`ResponseStatusException` 403/401	15	Expected behavior — whitelisting and CTS-auth scenarios deliberately trigger authorization failures
`InternalAuthenticationServiceException`: Invalid JWT / No token parser found	4	Expected behavior — token-expiration and invalid-auth scenarios
`ContentDeployerException`: HTTP 413 Payload Too Large from GACD sync endpoint	2	Expected behavior — gacd-in-deployed-after scenario sends oversized payload to test error handling
`StepPhaseRetryException`: A step of the process has failed	36	Expected behavior — retry wrappers for Flowable step failures
`SLException` / `ContentException`: Service plan not found / rollback errors	556	Expected behavior — various error scenarios deliberately trigger these

All Bucket B entries pre-date or are orthogonal to the PR's changes. None of the loggers or stack frames listed above intersect the 11 files modified by PR #1848. The 45,001 audit log entries (82.4% of total volume) and 2,835 CSRF entries (5.2%) are the dominant noise sources and are both longstanding infrastructure characteristics of this OQ space.

Per-suspect attribution

There are no Bucket C suspects. The triage produced zero entries in /tmp/cls_suspects_raw.json (0 unexpected hits, 0 indeterminate hits with regression markers). Accordingly there are no rows in the attribution table and no "Strong attributions" section.

PR diff summary (for context)

PR #1848 adds health-check-interval as a new MTA module parameter. The 11 changed files (all in multiapps-controller, no multiapps or xsa-multiapps-controller changes) are:

File	Kind	Risk markers
`RawCloudProcess.java`	java-prod	none — purely additive field extraction
`CloudProcess.java`	java-prod	none — additive abstract getter
`Staging.java`	java-prod	none — additive interface method
`CloudControllerRestClientImpl.java`	java-prod	none — condition widened to allow interval-only update; `buildHealthCheck` refactored to make type optional
`HealthCheckInfo.java`	java-prod	none — additive field + equality update
`Messages.java`	java-prod	none — new error constant
`SupportedParameters.java`	java-prod	none — new constant added to allow-list
`StagingParametersParser.java`	java-prod	none — additive parameter parsing + validation guard (rejects ≤0)
`RawCloudProcessTest.java`	java-test	n/a
`HealthCheckInfoTest.java`	java-test	n/a
`StagingParametersParserTest.java`	java-test	n/a

The CloudControllerRestClientImpl change is the most behavior-affecting: when healthCheckType is null but healthCheckInterval is non-null, the CF API PATCH now sends a HealthCheck body without a type field. This is correct per CF API v3 (type defaults to process when omitted) and only fires if an MTA explicitly sets health-check-interval without health-check-type. None of the 30 failed OQ scenarios set health-check-interval, so this code path was never exercised.

Failed scenarios provided by orchestrator

30 scenarios failed. Cross-referencing against the log window:

8 catalog-backed scenarios failed (async-service-bindings-scenario, async-service-keys-scenario, bg-deploy-stop-reorder, blue-green-deploy, cleaners-and-clean-up-job, generic-content-deploy, hook-target-app, optional-mta-resources-scenario): their expected error patterns are present in Bucket A (179 catalog hits). No anomalous Bucket C entries overlap their expected windows.
22 scenarios failed with no corresponding WARN/ERROR evidence in the log window: application-hooks, all 8 cts-*, gacd-in-deployed-after, liquibase-lock-service, namespace-multiple-deploys, occasional-message-for-non-finishing-task-execution, only-async-services-scenario, passing-secrets-during-deployment, selective-deployment-scenario, service-tags, shared-private-domain-scenario, test-shutdown-client, update-service-scenario, whitelisting-visibility-failure-scenario, whitelisting-visibility-in-current-org-space-scenario.

The absence of WARN/ERROR logs from 22 failing scenarios suggests those scenarios failed at the test script level (e.g., assertion mismatch, missing artifact, network timeout from the test runner side) rather than producing server-side errors. This is consistent with a shared infrastructure disruption — for example, a CF API rate-limiting episode (8 × 429 entries visible in the window around 13:39–13:41Z) or an OQ space misconfiguration — affecting scenario execution without generating server-side WARN/ERROR entries. needs_investigation=false for all entries because no suspect intersects a failed scenario with a regression marker.

OQ catalog regeneration note

The OQ reference catalog was stale (older than XSOQTests/test_resources/health-check-interval/http-health-check-interval/mtad.yaml, which is new in this PR's XSOQTests branch). The catalog was regenerated before the fetch using build_catalog.py. The regenerated catalog has 29 scenarios / 73 steps (source SHA: 43a97c0c).

Posted manually by orchestrator (pr-result-publisher subagent lacked GitHub MCP tools). Mode: oq. Generated 2026-05-29T17:10:00Z.

karrgov added 3 commits May 29, 2026 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval MTA module parameter#1848

[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval MTA module parameter#1848
karrgov wants to merge 3 commits into
cloudfoundry:masterfrom
karrgov:feature/LMCROSSITXSADEPLOY-3316

karrgov commented May 29, 2026

Uh oh!

karrgov commented May 29, 2026

Uh oh!

karrgov commented May 29, 2026

Log Analyzer — oq verdict: FAIL

Verdict rationale

Local git state at analysis time

Deploy chain version pinning

Categorization

Bucket A — Expected (test-driven) detail

Bucket B — Infrastructure / transient detail

Per-suspect attribution

PR diff summary (for context)

Failed scenarios provided by orchestrator

OQ catalog regeneration note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

karrgov commented May 29, 2026

Summary

Changes

Tests

Jira

Test plan

Uh oh!

karrgov commented May 29, 2026

MTA Quality Report — cloudfoundry/multiapps-controller PR #1848

Code Review

Security

SonarCloud

Dependency CVEs

Uh oh!

karrgov commented May 29, 2026

oq test verdict: FAIL

Pipeline outcomes

Verdict rationale

Failed jobs / scenarios

PR change surface

Log analysis summary

Log Analyzer — oq verdict: FAIL

Verdict rationale

Local git state at analysis time

Deploy chain version pinning

Categorization

Bucket A — Expected (test-driven) detail

Bucket B — Infrastructure / transient detail

Per-suspect attribution

PR diff summary (for context)

Failed scenarios provided by orchestrator

OQ catalog regeneration note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`oq` test verdict: FAIL