Extract variant branches into a generic extension mechanism#4871
Open
caseydavenport wants to merge 56 commits into
Open
Extract variant branches into a generic extension mechanism#4871caseydavenport wants to merge 56 commits into
caseydavenport wants to merge 56 commits into
Conversation
Add WithContext/ComponentHandlerOption to NewComponentHandler (variadic, backward-compatible) and call operator.ApplyPatches in CreateOrUpdateOrDelete for components implementing render.Named.
Pulls the enterprise RBAC extra-rules and MULTI_INTERFACE_MODE env branches out of pkg/render/typha.go into a new pkg/enterprise package. The enterprise package registers a patch via operator.Patch on startup; pkg/render/typha.go now has zero IsEnterprise branches.
Calls enterprise.Register() at startup so the typha modifier is wired in. Builds an operator.Context in the installation reconciler and passes it to the component handler so registered modifiers receive reconcile-derived state.
Extracts the image override registry into a leaf pkg/imageoverride package (no render/operator transitive deps) to avoid the render→operator import cycle. operator.OverrideImage/ResolveImage now delegate there. Registers the enterprise node image override in pkg/enterprise. Removes the IsEnterprise image switch from render/node.go; FIPS handling is preserved via a post-resolve check.
…sion The OSS installation controller no longer directly creates the node-prometheus keypair or fetches the prometheus/esgw certs. Those are now handled by a registered InstallationExtension in pkg/enterprise. Port value derivation and the kube-controller TLS block remain in the OSS controller unchanged.
Moves the calico-node-metrics Service out of OSS node render and into the enterprise node modifier, where it derives ports from ctx.FelixConfiguration. Also exports NodeBGPReporterPort so the modifier can reference it.
# Conflicts: # pkg/controller/installation/core_controller_test.go # pkg/controller/utils/component.go # pkg/render/node.go
caseydavenport
commented
Jun 3, 2026
The registry package is renamed to extensions. The installation controller builds the render context through a registered factory, and the componentHandler applies registered modifiers to component output. The node and typha variant branches now live in enterprise modifiers, and the calico log directory is mounted for both variants.
Drop the functional-options builder for Inputs (one call site, all fields always set) in favor of a plain struct literal, and replace the single-method RenderContextFactory interface with a registered builder func. All three extension seams now register a func.
Register modifiers, image overrides, and the render context builder per variant. The registries now gate on the installation variant, so the enterprise funcs drop their self-gate guards (the IsEnterprise checks the PR set out to remove) and the image override drops its decline bool - it only runs for its own variant. Move the node prometheus reporter keypair mounting (volume, mount, cert-management init container, pod hash annotation) into the node modifier, and remove NodeConfiguration.PrometheusServerTLS along with the round-trip through the installation controller. Core node render no longer carries a prometheus mount; in calico the keypair is never created. Rename Extensible.Name() to ModifierKey() so an unrelated Name() method can't make a component modifier-eligible by accident.
Merge the per-component image override and modifier into a single
Extension{Image, Modify} registered once per (variant, component) via
extensions.Register, so all of a component's variance lives in one place.
The image half still lands in the imageoverride leaf so render resolves it
without an import cycle; the fan-out is internal to Register.
Rename the variant-level render context builder to Setup
(RegisterSetup/RunSetup). That names the two phases a reader has to hold:
Setup is the controller-side work that builds the RenderContext baton, and
Extension hooks are the pure render-time funcs. Three registries with three
key schemes become two concepts split by when they run.
modCtx read like "modifier context"; the value is an extensions.RenderContext, so name it for what it is.
Add a package doc that lays out the two-phase model (Setup vs Extension) so the whole seam is legible from `go doc`. Fix two comments that still called the setup a render context builder.
Add a per-component context channel: a component implements render.ExtensionContextProvider to hand its modifier config a modifier can't derive from the shared RenderContext (config only the component's controller has). The componentHandler reads it into RenderContext.Component before applying the modifier. node's setup-produced keypair keeps its own field; this is for component-config-derived inputs. Move windows's enterprise branches into a pkg/enterprise extension: the two windows image overrides, the node-metrics Service, the calico log volume (swapped in for the OSS cni-log mount), the enterprise felix env, the trusted DNS servers for openshift/rke2, and the prometheus reporter keypair mount. The windows component exposes its reporter port, keypair, and trusted bundle via ExtensionContext; the windows controller wires the render context into its handler. Core windows render is now OSS-only.
The installation, windows, and clusterconnection reconcilers copied a handful of ControllerOptions values into their own struct fields. Drop those and read them off r.opts so the options live in one place. Also removes the dead kubernetesVersion field on the installation reconciler.
Set.Decorate wraps a component so its objects pass through the registered modifier, and the wrapped value is itself a render.Component the handler renders like any other. Rename Extension to ComponentExtension.
…er hook A Set holds one Variant bundle per product variant; the controller selects one from the installation variant, so each component has at most one extension and a modifier never re-checks the variant. ControllerExtension (Validate, ExtendContext) replaces the setup func as the controller-side hook. ControllerContext embeds RenderContext and carries the cluster-access deps, so a modifier given a RenderContext can't do I/O. Component modifiers register with RegisterModifier, which hands them typed config and removes RenderContext.Component.
caseydavenport
commented
Jun 18, 2026
Image overrides are plain components.Component values instead of funcs - an override only picks which image (registry, path, and FIPS handling are applied downstream in render), so registration reads v.Image(name, image) and the ImageOverride alias is gone. Rename the modifier RenderContext param from ctx to rc, split the RegisterModifier signature one arg per line, use slices.Contains over the local helpers, and trim the over-comments in the component handler and the ControllerExtension docs.
RenderContext no longer names the enterprise node prometheus keypair. It has an opaque Extension slot the controller extension fills and its own modifiers assert back out. The installation extension stashes the keypair there for the node modifier and returns it as one the controller should manage, so the controller no longer references the enterprise keypair in its cert-management and warning wiring.
…hook Controller extensions are now registered per controller (a ControllerName plus constants) and selected by ControllerContext.Controller, so each controller runs its own hook. The windows controller runs its hook to fetch the node prometheus keypair into the render context's Extension slot; its IsEnterprise branch is gone and WindowsConfiguration no longer carries the enterprise reporter port or keypair (the modifier derives the port from FelixConfiguration and reads the keypair from the slot). Rename the installation hook to coreControllerExtension.
caseydavenport
commented
Jun 18, 2026
The installation hook already fetches the prometheus and esgw certs into the trusted bundle; fold the manager-internal cert in too and drop the IsEnterprise branch from the core controller.
Add a Watcher companion interface to ControllerExtension; each controller's Add() calls Set.SetupWatches, which runs the watch hook of every variant's extension for that controller. The enterprise installation and windows extensions register the enterprise CR and secret watches they need, so core no longer names ManagementCluster, ManagementClusterConnection, LogCollector, or the enterprise prometheus secrets. Still gated by EnterpriseCRDExists for now.
The OSS node render read the enterprise LogCollector CR to set HostPID and FELIX_FLOWLOGSCOLLECTPROCESSPATH - enterprise flow-log behavior (OSS flow logs go through Goldmane, not LogCollector). The installation hook now fetches the LogCollector and records whether process-path collection is on; the node modifier sets HostPID and the env from it. LogCollector is gone from NodeConfiguration and the core controller.
The kube-controllers component renders from a generic config (name, rules, enabled controllers, extra env, network policy) with no IsEnterprise or component-name branching. es-calico-kube-controllers assembly and its constants live in pkg/enterprise and fill that config; calico-kube-controllers still assembles in render.
The installation extension hook creates the kube-controllers metrics serving keypair and returns it as a managed keypair; a calico-kube-controllers modifier mounts it onto the deployment (env, volume, mount, cert-management init container, hash annotation). The kube-controllers render base no longer carries MetricsServerTLS, and the installation controller no longer creates that certificate. The component reports a config-driven modifier key so the shared es-calico-kube-controllers deployment, which leaves it empty, is never decorated.
calico-kube-controllers renders as pure OSS now: the common rules plus the node and loadbalancer controllers, no IsEnterprise. The enterprise extension layers on the rest through a modifier - the enterprise RBAC, the service/federatedservices/usage controllers, the metrics serving TLS, and the WAF v3 (Gateway API add-on) surface (the WASM env, the in-process admission webhook, and the network policy ingress rule). The installation hook produces the controller-side inputs the modifier can't (the webhook keypair, the merged wasm pull secret, the resolved wasm image, the operator CA) and hands them over through the render context; the WASM image resolves with the same GetReference the base uses, via the ImageSet the hook reads itself. The WAF/WASM symbols and the es-kube-controllers pull-secret helper move to pkg/enterprise. The base kube-controllers config no longer carries any WAF/WASM fields.
The installation controller's variant-specific FelixConfiguration defaulting now goes through a FelixConfigDefaulter companion interface, so the enterprise provider-specific dnsTrustedServers default (openshift-dns, rke2-coredns) lives in the enterprise extension instead of an IsEnterprise branch in setDefaultsOnFelixConfiguration. It is a companion rather than part of ExtendContext because felix defaulting persists early in reconcile, before ExtendContext runs.
Strip the enterprise-only fields off APIServerConfiguration (ManagementCluster, ManagementClusterConnection, ApplicationLayer, RequiresQueryServer, the query-server cert, and KeyValidatorConfig) so the base render is variant-blind. A new apiserver controller hook fetches the enterprise CRs, builds the trusted bundle and query-server cert, and resolves the L7 sidecar images, stashing them on the render context. The modifier reads that slot and find-or-creates the deployment skeleton when the base did not render an aggregation server. APIServerPolicy becomes extensible so the OIDC egress rule and L7 ingress port move to a policy modifier too.
…sions # Conflicts: # pkg/render/guardian.go
…rs package RenderContext is a render-phase input (what a modifier consumes), and ControllerContext is a controller-phase concept (the data and machinery a controller gathers). Neither is part of the extension mechanism itself, so having them in pkg/extensions was backwards. Move RenderContext to pkg/render and ControllerContext (plus ControllerName) to a new pkg/controller/contexts. extensions already imported render and render never imports extensions, so the direction is clean: extensions and contexts both depend on render, and extensions depends on contexts. No cycle.
Each extension had its own one-line accessor doing the same type assertion against RenderContext.Extension. Replace that boilerplate with a single generic render.ExtractExtensionData[T] and have the per-component accessors call it.
The stub component and applyExtensions helper for exercising a modifier through Set.Decorate were copy-pasted into the extensions, render, and enterprise test packages. Pull them into pkg/extensions/extensionstest so there's one copy, and have the three test packages import it.
The flat pkg/enterprise mixed every component's extension in one package. Split
it so each subpackage maps to the render component it extends and exposes a
Register func that New() composes:
enterprise/typha, guardian, apiserver, windows - self-contained.
enterprise/installation - the installation controller hook plus the node and
calico-kube-controllers modifiers, which share installationRenderData (kept
internal here).
enterprise/kubecontrollers - the es-calico-kube-controllers assembly plus the
enterprise kube-controllers cluster role rules it shares with the calico
modifier (exported as KubeControllersEnterpriseCommonRules).
The shared felix reporter-port helpers move with node (exported as
NodeReporterPort/ValidateReporterPort) so the windows hook can reuse them.
logstorage now imports enterprise/kubecontrollers for the es-kube-controllers
symbols instead of the old flat package. Each subpackage gets its own test
suite; the decorate helper is the shared extensionstest package.
The enterprise integration cases (managed-cluster resources, impersonation, SCC, the enterprise GuardianPolicy) ran base render plus the real modifier, so they pulled enterprise.New() into the render test package. Move them next to the modifier in pkg/enterprise/guardian, where they run against the real Set. What stays in render is the OSS render path: the OSS guardian-access policy and the public-CA deployment cases. Those never run the modifier, so render_test no longer depends on pkg/enterprise for guardian.
The CalicoEnterprise windows cases ran base render plus the real modifier (and needed the enterprise image overrides), so they pulled enterprise.New() into the render test package. Move them next to the modifier in pkg/enterprise/windows. The OSS render tests stay put, now built with an empty imageoverride.New() instead of the enterprise Set's overrides, so render_test no longer depends on pkg/enterprise for windows.
node_enterprise_test.go ran the real ExtendContext hook and node/typha modifiers, so move it whole into pkg/enterprise/installation next to those. node_test.go only ever rendered base output (no modifier), so its CalicoEnterprise cases are variant-blindness checks and stay put. Swap its cfg from the enterprise Set's image overrides to an empty imageoverride.New() so render_test no longer depends on pkg/enterprise for node.
The apiserver OSS tests went through the enterprise modifier only to pick up the Calico-variant cleanup deletes, but those deletes come from the base render itself (and the cleanup modifier is covered in pkg/enterprise/apiserver). Render the base component directly instead. With apiserver no longer reaching for the enterprise Set, nothing in the render test suite uses it, so remove enterprise_setup_test.go. pkg/render no longer depends on pkg/enterprise, in production or test code.
…lers The es-kube-controllers assembly lives in pkg/enterprise/kubecontrollers, but its render tests sat in pkg/render/kubecontrollers and pulled the enterprise package into the render test deps. Move them next to the assembly. The calico-kube-controllers (OSS) tests stay put, and pkg/render/kubecontrollers no longer depends on pkg/enterprise.
The ManagementClusterConnection controller renders Guardian for both OSS (Whisker) and Enterprise (managed-cluster tunnel), and had IsEnterprise branches woven through its reconcile: the management/managed mutual-exclusion check, the managed cluster version (CNX vs Calico), and the license-gated egress network policy. Add a clusterconnection ControllerExtension (pkg/enterprise/clusterconnection) with Validate (the mutual-exclusion check) and ExtendContext (CNXVersion + the license check), stashing the result as a render.GuardianRenderData in the render context. The controller reads it back generically to fill GuardianConfiguration, falling back to OSS defaults (CalicoVersion, the Whisker Guardian client keypair, egress disabled) when no extension is registered. No IsEnterprise left in the reconcile data path. Still variant-aware in the controller: the ManagementClusterConnection CR validation and defaulting (impersonation, public CA) and the ImageSet selection. Those would need a CR validator/defaulter companion to move, left as a follow-up.
Move the Calico Enterprise cases (enterprise images, the license/tier-gated calico-system policy, and impersonation) out of clusterconnection_controller_test.go into clusterconnection_controller_enterprise_test.go. The generic controller mechanics (default reconcile, the guardian finalizer, proxy settings, and the tigerastatus conditions) stay in the main file. Same specs, just relocated; the package-scope proxy helpers are shared between the two files.
ExtendContext handed back a render context while the controller context it was given already embeds one, leaving two near-identical copies in play. Return the updated controller context instead, so a single context flows through the reconcile and callers read its embedded render context.
The extension Set computes its variant's controller-phase options once at startup (ComputeOptions, run from main) and carries them onto each ControllerContext, so the shared types no longer name an enterprise-only setting. The ControllerContext holds an opaque Options the variant's hooks assert back out; the enterprise multi-tenant flag and its discovery live entirely in pkg/enterprise.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is phase 1 of prepping the operator for the monorepo merge, where the Calico and Calico Enterprise code paths eventually live apart. Today they share one codebase with
IsEnterprise()checks sprinkled through the render and controller code, and that coupling is the thing that makes the split hard.This PR pulls the enterprise-specific behavior out of the core code and behind a generic extension mechanism, so the enterprise build registers its own additions and the core operator stays variant-blind. After this, core render and controller code has no idea enterprise exists.
How it works
Enterprise registers extensions against a set keyed by variant. There are two extension points, one per phase of a reconcile:
The core operator registers nothing and runs the base path. All the enterprise wiring lives in
pkg/enterprise, one subpackage per component. After the split, that package is what the enterprise build constructs and the core build drops.Every extracted component (node, typha, guardian, windows, apiserver, kube-controllers) and the clusterconnection controller now run their enterprise behavior through this mechanism, with no
IsEnterprise()left in their core paths. Behavior is unchanged: the test gate is the existing core tests plus the relocated enterprise tests, which now run against the real extension set.A few shared-code cleanups and ergonomic refactors are left as follow-ups, tracked in CORE-13042.