Skip to content

WIP: feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor)#1065

Draft
dolho wants to merge 3 commits into
devfrom
feature/1039-retention-oss-floor
Draft

WIP: feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor)#1065
dolho wants to merge 3 commits into
devfrom
feature/1039-retention-oss-floor

Conversation

@dolho

@dolho dolho commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Stage 1 of #1039 — the OSS floor (the issue's "must land first" chunk). The enterprise retention module (private submodule) + Settings → Retention UI follow as separate PRs on the #847 seam.

Prod bug fixed (silent no-op)

docker-compose.prod.yml omitted LOG_RETENTION_DAYS / LOG_ARCHIVE_ENABLED / LOG_CLEANUP_HOUR from backend.environment:. Prod launches standalone (-f docker-compose.prod.yml, no base-compose merge), so an operator-set LOG_RETENTION_DAYS never reached the container and retention always fell back to the code default. Added the three lines.

5-day community floor

Lowered the operator-tunable defaults to the 5-day community floor:

Setting Was Now
LOG_RETENTION_DAYS 90 5
execution_log_retention_days 30 5
execution_row_retention_days 90 5
health_check_retention_days 7 5
agent_soft_delete_retention_days 180 5
schedule_soft_delete_retention_days 30 5
audit_log_retention_days 365 365 (exempt — integrity floor)

New COMMUNITY_RETENTION_FLOOR_DAYS / RETENTION_OPS_KEYS constants — the enterprise module reuses these to clamp unentitled writes.

Read surface

GET /api/settings/retention (admin) reports the effective windows in use + the active edition (community vs enterprise via the retention entitlement) + documented precedence: enterprise (license) DB setting → env → 5-day community default.

Design note

OSS does not hard-clamp env/OPS values — they remain an unsupported self-host escape hatch (per the issue: "not a cryptographic lock on a constant"). The clamp-to-floor lives in the enterprise retention module's managed setter (Stage 2).

⚠️ Release-notes call-out

This sharply shortens soft-delete recovery on community installs (agent 180→5, schedule 30→5). Deliberate per the issue; an enterprise license restores longer windows.

Verification

  • tests/unit/test_retention_floor.py7/7 (floor defaults, audit exemption, read-surface edition + windows, audit 365-floor, enterprise-set OPS window).
  • Cleanup/retention suites green (test_cleanup_inner_sweeps, test_execution_retention_prune, test_audit_retention_prune, test_agent_cleanup_parity).
  • Live: GET /api/settings/retentionedition: community, OPS windows 5, audit 365; backend recreated with the new compose → LOG_RETENTION_DAYS=5 flows through to the container.

Remaining #1039 (follow-up PRs)

  • Enterprise retention module (private trinity-enterprise): enterprise_retention_config table on the two-track runner, GET/PUT /api/enterprise/retention/* gated by requires_entitlement("retention"), live-read write-through, clamp-to-floor when unentitled.
  • Frontend: Settings → Retention panel gated on the retention entitlement; community shows the fixed 5-day default + upgrade hint.

Related to #1039

🤖 Generated with Claude Code

@dolho

dolho commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Now spans all OSS-side stages of #1039 (Stages 1 + 3); enterprise is trinity-enterprise#4

Per the plan to deliver #1039 end-to-end, this PR now carries both public-repo stages; the enterprise backend lives in its own repo PR.

✅ Stage 1 — OSS floor (this PR)

Prod-compose LOG_* fix · 5-day floor across all 6 operator-tunable windows (audit exempt, 365) · GET /api/settings/retention read surface (+ edition + precedence) · COMMUNITY_RETENTION_FLOOR_DAYS/RETENTION_OPS_KEYS constants. 7/7 unit tests, live-verified.

✅ Stage 3 — frontend (this PR)

Settings → Retention tab (admin). Reads /api/settings/retention (all editions); community = read-only 5-day floor + upgrade hint; enterprise = editable per-class windows → PUT /api/enterprise/retention/config, applied live. Audit window shown, never editable. Gated via the existing enterprise store (isEntitled('retention')). SFC compiles clean (Vite HMR).

✅ Stage 2 — enterprise module → Abilityai/trinity-enterprise#4

Private enterprise_retention_config + GET/PUT /api/enterprise/retention/config (double-gated) + live-read write-through to OSS system_settings (no recreate) + clamp-to-floor. 5/5 module tests, enterprise suite 24/24.

Live end-to-end (full stack, submodule mounted)

  • enterprise_features = ['audit','retention','siem'] → read surface edition: enterprise.
  • PUT {execution_row:90, execution_log:2} → 90 allowed, 2 clamped to 5, and GET /api/settings/retention reflects execution_row=90write-through confirmed, no recreate.

Notes for the reviewer

  • The submodule pointer is intentionally NOT bumped in this PR (it would pin an unmerged enterprise branch); the UI gates at runtime via enterprise_features. Bump the pointer when trinity-enterprise#4 merges.
  • ⚠️ Release notes: community soft-delete recovery shrinks (agent 180→5, schedule 30→5).

Related to #1039

dolho and others added 3 commits June 4, 2026 15:39
…ap (#1039)

OSS floor for the #1039 retention work (must land first; the enterprise
`retention` module + Settings UI follow on the #847 seam).

Prod bug fix:
- docker-compose.prod.yml omitted LOG_RETENTION_DAYS / LOG_ARCHIVE_ENABLED /
  LOG_CLEANUP_HOUR from backend.environment. Prod launches standalone (no
  base-compose merge), so operator-set values never reached the container and
  retention silently fell back to the code default. Added the three lines.

Community 5-day floor (was: log 90 / exec-log 30 / exec-row 90 / health 7 /
agent soft-delete 180 / schedule soft-delete 30):
- OPS_SETTINGS_DEFAULTS: the five operator-tunable windows default to 5.
- LOG_RETENTION_DAYS default 5 (log_archive_service, logs.py, docker-compose.yml).
- Audit log EXEMPT — keeps the 365-day integrity floor (audit_retention_service).
- New COMMUNITY_RETENTION_FLOOR_DAYS / RETENTION_OPS_KEYS constants (the
  enterprise module reuses these to clamp unentitled writes).

Read surface:
- GET /api/settings/retention (admin) reports the effective windows in use +
  the active edition (community vs enterprise via the `retention` entitlement)
  + documented precedence (enterprise → env → community-default).

OSS does NOT hard-clamp env/OPS — they remain an unsupported self-host escape
hatch (per the issue: "not a cryptographic lock on a constant"); the clamp is
the enterprise module's managed setter.

Tests: tests/unit/test_retention_floor.py (7) — floor defaults, audit exemption,
read-surface edition + windows. Cleanup/retention suites green (no test pinned
the old defaults).

NOTE: this sharply shortens soft-delete recovery (agent 180→5, schedule 30→5)
on community installs — call out in release notes; enterprise restores it.

Related to #1039

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1039 Stage 3)

Adds a Retention tab to Settings (admin-only). Reads the OSS read surface
GET /api/settings/retention (available in every edition) and shows the effective
windows + an edition badge.

- Community: read-only windows at the fixed 5-day floor + an upgrade hint.
- Enterprise (retention entitlement present): editable per-class windows that
  PUT /api/enterprise/retention/config and apply live (no restart). 0 disables
  a sweep; sub-floor values are raised to the floor (mirrors the backend clamp).
- Audit-log window always shown, never editable (365-day integrity floor).

Gating reuses the existing enterprise store (enterpriseStore.isEntitled
('retention')); the tab is visible in both editions (read-only vs editable),
matching the issue's "community shows the fixed default + upgrade hint".

Related to #1039

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ollution (#1039)

The regression-diff CI gate flagged the 4 endpoint tests in test_retention_floor
as new failures under pytest-randomly seeds 12345/67890 (but not 99999) — a
classic test-ordering pollution, not a logic bug.

Root cause: the tests imported `routers.settings` (lazily), which drags
routers/__init__ → routers.agents → `from services.agent_service import
get_agents_by_prefix`. Another unit test (#612) loads services.agent_service
under a fake sys.modules name, so under some orderings that import resolves to
the partial fake module and raises `ImportError: cannot import name
'get_agents_by_prefix'`.

Fix: load routers/settings.py directly from file under a private module name
(spec_from_file_location), bypassing routers/__init__ entirely. settings.py
imports only models/database/dependencies/services.* — none of the polluted
modules — so the load is robust regardless of collection order. Mirrors the
existing conftest EntitlementCls pattern. Also pins LOG_/AUDIT_ env per call so
a polluted process env can't leak in.

Verified: full unit suite under seeds 12345 + 67890 — all 7 test_retention_floor
tests pass; the only remaining failures are the 7 pre-existing base failures
(git_pull_branch, orphaned_execution_recovery), unchanged by this PR.

Related to #1039

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dolho dolho force-pushed the feature/1039-retention-oss-floor branch from af8175a to b10a57b Compare June 4, 2026 12:41
@dolho dolho marked this pull request as draft June 4, 2026 14:38
@dolho dolho changed the title feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor) WIP: feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor) Jun 4, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

⚠️ Nightly unit-suite check skipped — merge conflict against dev.

Resolve by running git merge dev locally and pushing the result. The next nightly run will re-test once the conflict is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant