WIP: feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor)#1065
WIP: feat(retention): OSS 5-day community floor + prod-compose LOG_* fix (#1039 — OSS floor)#1065dolho wants to merge 3 commits into
Conversation
Now spans all OSS-side stages of #1039 (Stages 1 + 3); enterprise is trinity-enterprise#4Per the plan to deliver #1039 end-to-end, this PR now carries both public-repo stages; the enterprise backend lives in its own repo PR. ✅ Stage 1 — OSS floor (this PR)Prod-compose ✅ Stage 3 — frontend (this PR)Settings → Retention tab (admin). Reads ✅ Stage 2 — enterprise module → Abilityai/trinity-enterprise#4Private Live end-to-end (full stack, submodule mounted)
Notes for the reviewer
Related to #1039 |
…ap (#1039) OSS floor for the #1039 retention work (must land first; the enterprise `retention` module + Settings UI follow on the #847 seam). Prod bug fix: - docker-compose.prod.yml omitted LOG_RETENTION_DAYS / LOG_ARCHIVE_ENABLED / LOG_CLEANUP_HOUR from backend.environment. Prod launches standalone (no base-compose merge), so operator-set values never reached the container and retention silently fell back to the code default. Added the three lines. Community 5-day floor (was: log 90 / exec-log 30 / exec-row 90 / health 7 / agent soft-delete 180 / schedule soft-delete 30): - OPS_SETTINGS_DEFAULTS: the five operator-tunable windows default to 5. - LOG_RETENTION_DAYS default 5 (log_archive_service, logs.py, docker-compose.yml). - Audit log EXEMPT — keeps the 365-day integrity floor (audit_retention_service). - New COMMUNITY_RETENTION_FLOOR_DAYS / RETENTION_OPS_KEYS constants (the enterprise module reuses these to clamp unentitled writes). Read surface: - GET /api/settings/retention (admin) reports the effective windows in use + the active edition (community vs enterprise via the `retention` entitlement) + documented precedence (enterprise → env → community-default). OSS does NOT hard-clamp env/OPS — they remain an unsupported self-host escape hatch (per the issue: "not a cryptographic lock on a constant"); the clamp is the enterprise module's managed setter. Tests: tests/unit/test_retention_floor.py (7) — floor defaults, audit exemption, read-surface edition + windows. Cleanup/retention suites green (no test pinned the old defaults). NOTE: this sharply shortens soft-delete recovery (agent 180→5, schedule 30→5) on community installs — call out in release notes; enterprise restores it. Related to #1039 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1039 Stage 3) Adds a Retention tab to Settings (admin-only). Reads the OSS read surface GET /api/settings/retention (available in every edition) and shows the effective windows + an edition badge. - Community: read-only windows at the fixed 5-day floor + an upgrade hint. - Enterprise (retention entitlement present): editable per-class windows that PUT /api/enterprise/retention/config and apply live (no restart). 0 disables a sweep; sub-floor values are raised to the floor (mirrors the backend clamp). - Audit-log window always shown, never editable (365-day integrity floor). Gating reuses the existing enterprise store (enterpriseStore.isEntitled ('retention')); the tab is visible in both editions (read-only vs editable), matching the issue's "community shows the fixed default + upgrade hint". Related to #1039 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ollution (#1039) The regression-diff CI gate flagged the 4 endpoint tests in test_retention_floor as new failures under pytest-randomly seeds 12345/67890 (but not 99999) — a classic test-ordering pollution, not a logic bug. Root cause: the tests imported `routers.settings` (lazily), which drags routers/__init__ → routers.agents → `from services.agent_service import get_agents_by_prefix`. Another unit test (#612) loads services.agent_service under a fake sys.modules name, so under some orderings that import resolves to the partial fake module and raises `ImportError: cannot import name 'get_agents_by_prefix'`. Fix: load routers/settings.py directly from file under a private module name (spec_from_file_location), bypassing routers/__init__ entirely. settings.py imports only models/database/dependencies/services.* — none of the polluted modules — so the load is robust regardless of collection order. Mirrors the existing conftest EntitlementCls pattern. Also pins LOG_/AUDIT_ env per call so a polluted process env can't leak in. Verified: full unit suite under seeds 12345 + 67890 — all 7 test_retention_floor tests pass; the only remaining failures are the 7 pre-existing base failures (git_pull_branch, orphaned_execution_recovery), unchanged by this PR. Related to #1039 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
af8175a to
b10a57b
Compare
|
Resolve by running |
Summary
Stage 1 of #1039 — the OSS floor (the issue's "must land first" chunk). The enterprise
retentionmodule (private submodule) + Settings → Retention UI follow as separate PRs on the #847 seam.Prod bug fixed (silent no-op)
docker-compose.prod.ymlomittedLOG_RETENTION_DAYS/LOG_ARCHIVE_ENABLED/LOG_CLEANUP_HOURfrombackend.environment:. Prod launches standalone (-f docker-compose.prod.yml, no base-compose merge), so an operator-setLOG_RETENTION_DAYSnever reached the container and retention always fell back to the code default. Added the three lines.5-day community floor
Lowered the operator-tunable defaults to the 5-day community floor:
LOG_RETENTION_DAYSexecution_log_retention_daysexecution_row_retention_dayshealth_check_retention_daysagent_soft_delete_retention_daysschedule_soft_delete_retention_daysaudit_log_retention_daysNew
COMMUNITY_RETENTION_FLOOR_DAYS/RETENTION_OPS_KEYSconstants — the enterprise module reuses these to clamp unentitled writes.Read surface
GET /api/settings/retention(admin) reports the effective windows in use + the active edition (communityvsenterprisevia theretentionentitlement) + documented precedence: enterprise (license) DB setting → env → 5-day community default.Design note
OSS does not hard-clamp env/OPS values — they remain an unsupported self-host escape hatch (per the issue: "not a cryptographic lock on a constant"). The clamp-to-floor lives in the enterprise
retentionmodule's managed setter (Stage 2).This sharply shortens soft-delete recovery on community installs (agent 180→5, schedule 30→5). Deliberate per the issue; an enterprise license restores longer windows.
Verification
tests/unit/test_retention_floor.py— 7/7 (floor defaults, audit exemption, read-surface edition + windows, audit 365-floor, enterprise-set OPS window).test_cleanup_inner_sweeps,test_execution_retention_prune,test_audit_retention_prune,test_agent_cleanup_parity).GET /api/settings/retention→edition: community, OPS windows 5, audit 365; backend recreated with the new compose →LOG_RETENTION_DAYS=5flows through to the container.Remaining #1039 (follow-up PRs)
retentionmodule (privatetrinity-enterprise):enterprise_retention_configtable on the two-track runner,GET/PUT /api/enterprise/retention/*gated byrequires_entitlement("retention"), live-read write-through, clamp-to-floor when unentitled.retentionentitlement; community shows the fixed 5-day default + upgrade hint.Related to #1039
🤖 Generated with Claude Code