Skip to content

Extract variant branches into a generic extension mechanism#4871

Open
caseydavenport wants to merge 56 commits into
tigera:masterfrom
caseydavenport:casey-variant-extensions
Open

Extract variant branches into a generic extension mechanism#4871
caseydavenport wants to merge 56 commits into
tigera:masterfrom
caseydavenport:casey-variant-extensions

Conversation

@caseydavenport

@caseydavenport caseydavenport commented May 29, 2026

Copy link
Copy Markdown
Member

This is phase 1 of prepping the operator for the monorepo merge, where the Calico and Calico Enterprise code paths eventually live apart. Today they share one codebase with IsEnterprise() checks sprinkled through the render and controller code, and that coupling is the thing that makes the split hard.

This PR pulls the enterprise-specific behavior out of the core code and behind a generic extension mechanism, so the enterprise build registers its own additions and the core operator stays variant-blind. After this, core render and controller code has no idea enterprise exists.

How it works

Enterprise registers extensions against a set keyed by variant. There are two extension points, one per phase of a reconcile:

  • Controller phase: a per-controller hook does the variant's reconcile-time work (validating config, creating certificates, assembling the trusted bundle, registering extra watches) and hands the result to the render phase.
  • Render phase: a per-component modifier runs at the single point where the operator writes a component's objects, wrapping the component and adjusting its output. A separate image-override registry lets a variant swap an image without core branching.

The core operator registers nothing and runs the base path. All the enterprise wiring lives in pkg/enterprise, one subpackage per component. After the split, that package is what the enterprise build constructs and the core build drops.

Every extracted component (node, typha, guardian, windows, apiserver, kube-controllers) and the clusterconnection controller now run their enterprise behavior through this mechanism, with no IsEnterprise() left in their core paths. Behavior is unchanged: the test gate is the existing core tests plus the relocated enterprise tests, which now run against the real extension set.

A few shared-code cleanups and ergonomic refactors are left as follow-ups, tracked in CORE-13042.

None

Add WithContext/ComponentHandlerOption to NewComponentHandler (variadic,
backward-compatible) and call operator.ApplyPatches in
CreateOrUpdateOrDelete for components implementing render.Named.
Pulls the enterprise RBAC extra-rules and MULTI_INTERFACE_MODE env branches out of pkg/render/typha.go into a new pkg/enterprise package. The enterprise package registers a patch via operator.Patch on startup; pkg/render/typha.go now has zero IsEnterprise branches.
Calls enterprise.Register() at startup so the typha modifier is wired in.
Builds an operator.Context in the installation reconciler and passes it to
the component handler so registered modifiers receive reconcile-derived state.
Extracts the image override registry into a leaf pkg/imageoverride
package (no render/operator transitive deps) to avoid the render→operator
import cycle. operator.OverrideImage/ResolveImage now delegate there.
Registers the enterprise node image override in pkg/enterprise. Removes
the IsEnterprise image switch from render/node.go; FIPS handling is
preserved via a post-resolve check.
…sion

The OSS installation controller no longer directly creates the node-prometheus
keypair or fetches the prometheus/esgw certs. Those are now handled by a
registered InstallationExtension in pkg/enterprise. Port value derivation and
the kube-controller TLS block remain in the OSS controller unchanged.
Moves the calico-node-metrics Service out of OSS node render and into
the enterprise node modifier, where it derives ports from
ctx.FelixConfiguration. Also exports NodeBGPReporterPort so the modifier
can reference it.
# Conflicts:
#	pkg/controller/installation/core_controller_test.go
#	pkg/controller/utils/component.go
#	pkg/render/node.go
Comment thread pkg/controller/utils/component.go Outdated
Comment thread pkg/enterprise/installation.go Outdated
Comment thread pkg/operator/context.go Outdated
Comment thread pkg/operator/extension.go Outdated
Comment thread pkg/operator/extension.go Outdated
Comment thread pkg/enterprise/installation.go Outdated
Comment thread pkg/operator/patch.go Outdated
Comment thread pkg/enterprise/installation/node.go
Comment thread pkg/render/enterprise_setup_test.go Outdated
Comment thread cmd/main.go Outdated
The registry package is renamed to extensions. The installation controller builds the render context through a registered factory, and the componentHandler applies registered modifiers to component output. The node and typha variant branches now live in enterprise modifiers, and the calico log directory is mounted for both variants.
Drop the functional-options builder for Inputs (one call site, all
fields always set) in favor of a plain struct literal, and replace the
single-method RenderContextFactory interface with a registered builder
func. All three extension seams now register a func.
Register modifiers, image overrides, and the render context builder per
variant. The registries now gate on the installation variant, so the
enterprise funcs drop their self-gate guards (the IsEnterprise checks the
PR set out to remove) and the image override drops its decline bool - it
only runs for its own variant.

Move the node prometheus reporter keypair mounting (volume, mount,
cert-management init container, pod hash annotation) into the node
modifier, and remove NodeConfiguration.PrometheusServerTLS along with the
round-trip through the installation controller. Core node render no longer
carries a prometheus mount; in calico the keypair is never created.

Rename Extensible.Name() to ModifierKey() so an unrelated Name() method
can't make a component modifier-eligible by accident.
Merge the per-component image override and modifier into a single
Extension{Image, Modify} registered once per (variant, component) via
extensions.Register, so all of a component's variance lives in one place.
The image half still lands in the imageoverride leaf so render resolves it
without an import cycle; the fan-out is internal to Register.

Rename the variant-level render context builder to Setup
(RegisterSetup/RunSetup). That names the two phases a reader has to hold:
Setup is the controller-side work that builds the RenderContext baton, and
Extension hooks are the pure render-time funcs. Three registries with three
key schemes become two concepts split by when they run.
modCtx read like "modifier context"; the value is an extensions.RenderContext,
so name it for what it is.
Add a package doc that lays out the two-phase model (Setup vs Extension)
so the whole seam is legible from `go doc`. Fix two comments that still
called the setup a render context builder.
Add a per-component context channel: a component implements
render.ExtensionContextProvider to hand its modifier config a modifier
can't derive from the shared RenderContext (config only the component's
controller has). The componentHandler reads it into RenderContext.Component
before applying the modifier. node's setup-produced keypair keeps its own
field; this is for component-config-derived inputs.

Move windows's enterprise branches into a pkg/enterprise extension: the two
windows image overrides, the node-metrics Service, the calico log volume
(swapped in for the OSS cni-log mount), the enterprise felix env, the
trusted DNS servers for openshift/rke2, and the prometheus reporter keypair
mount. The windows component exposes its reporter port, keypair, and trusted
bundle via ExtensionContext; the windows controller wires the render context
into its handler. Core windows render is now OSS-only.
The installation, windows, and clusterconnection reconcilers copied a handful
of ControllerOptions values into their own struct fields. Drop those and read
them off r.opts so the options live in one place. Also removes the dead
kubernetesVersion field on the installation reconciler.
Set.Decorate wraps a component so its objects pass through the registered
modifier, and the wrapped value is itself a render.Component the handler
renders like any other. Rename Extension to ComponentExtension.
…er hook

A Set holds one Variant bundle per product variant; the controller selects one
from the installation variant, so each component has at most one extension and a
modifier never re-checks the variant.

ControllerExtension (Validate, ExtendContext) replaces the setup func as the
controller-side hook. ControllerContext embeds RenderContext and carries the
cluster-access deps, so a modifier given a RenderContext can't do I/O. Component
modifiers register with RegisterModifier, which hands them typed config and
removes RenderContext.Component.
Comment thread cmd/main.go Outdated
Comment thread pkg/extensions/extension.go Outdated
Comment thread pkg/enterprise/installation.go Outdated
Comment thread pkg/controller/utils/component.go Outdated
Comment thread pkg/controller/utils/component.go Outdated
Comment thread pkg/extensions/rendercontext.go Outdated
Comment thread pkg/extensions/variant.go Outdated
Comment thread pkg/extensions/variant.go Outdated
Comment thread pkg/extensions/variant.go Outdated
Comment thread pkg/render/apiserver.go Outdated
Image overrides are plain components.Component values instead of funcs - an
override only picks which image (registry, path, and FIPS handling are applied
downstream in render), so registration reads v.Image(name, image) and the
ImageOverride alias is gone.

Rename the modifier RenderContext param from ctx to rc, split the RegisterModifier
signature one arg per line, use slices.Contains over the local helpers, and trim
the over-comments in the component handler and the ControllerExtension docs.
RenderContext no longer names the enterprise node prometheus keypair. It has an
opaque Extension slot the controller extension fills and its own modifiers assert
back out. The installation extension stashes the keypair there for the node
modifier and returns it as one the controller should manage, so the controller no
longer references the enterprise keypair in its cert-management and warning wiring.
…hook

Controller extensions are now registered per controller (a ControllerName plus
constants) and selected by ControllerContext.Controller, so each controller runs
its own hook. The windows controller runs its hook to fetch the node prometheus
keypair into the render context's Extension slot; its IsEnterprise branch is gone
and WindowsConfiguration no longer carries the enterprise reporter port or
keypair (the modifier derives the port from FelixConfiguration and reads the
keypair from the slot). Rename the installation hook to coreControllerExtension.
Comment thread pkg/enterprise/installation.go Outdated
The installation hook already fetches the prometheus and esgw certs into the
trusted bundle; fold the manager-internal cert in too and drop the IsEnterprise
branch from the core controller.
Add a Watcher companion interface to ControllerExtension; each controller's Add()
calls Set.SetupWatches, which runs the watch hook of every variant's extension
for that controller. The enterprise installation and windows extensions register
the enterprise CR and secret watches they need, so core no longer names
ManagementCluster, ManagementClusterConnection, LogCollector, or the enterprise
prometheus secrets. Still gated by EnterpriseCRDExists for now.
The OSS node render read the enterprise LogCollector CR to set HostPID and
FELIX_FLOWLOGSCOLLECTPROCESSPATH - enterprise flow-log behavior (OSS flow logs go
through Goldmane, not LogCollector). The installation hook now fetches the
LogCollector and records whether process-path collection is on; the node modifier
sets HostPID and the env from it. LogCollector is gone from NodeConfiguration and
the core controller.
The kube-controllers component renders from a generic config (name, rules,
enabled controllers, extra env, network policy) with no IsEnterprise or
component-name branching. es-calico-kube-controllers assembly and its constants
live in pkg/enterprise and fill that config; calico-kube-controllers still
assembles in render.
The installation extension hook creates the kube-controllers metrics serving
keypair and returns it as a managed keypair; a calico-kube-controllers modifier
mounts it onto the deployment (env, volume, mount, cert-management init container,
hash annotation). The kube-controllers render base no longer carries
MetricsServerTLS, and the installation controller no longer creates that
certificate. The component reports a config-driven modifier key so the shared
es-calico-kube-controllers deployment, which leaves it empty, is never decorated.
calico-kube-controllers renders as pure OSS now: the common rules plus the node
and loadbalancer controllers, no IsEnterprise. The enterprise extension layers on
the rest through a modifier - the enterprise RBAC, the service/federatedservices/usage
controllers, the metrics serving TLS, and the WAF v3 (Gateway API add-on) surface
(the WASM env, the in-process admission webhook, and the network policy ingress rule).
The installation hook produces the controller-side inputs the modifier can't (the
webhook keypair, the merged wasm pull secret, the resolved wasm image, the operator CA)
and hands them over through the render context; the WASM image resolves with the same
GetReference the base uses, via the ImageSet the hook reads itself.

The WAF/WASM symbols and the es-kube-controllers pull-secret helper move to pkg/enterprise.
The base kube-controllers config no longer carries any WAF/WASM fields.
The installation controller's variant-specific FelixConfiguration defaulting now
goes through a FelixConfigDefaulter companion interface, so the enterprise
provider-specific dnsTrustedServers default (openshift-dns, rke2-coredns) lives in
the enterprise extension instead of an IsEnterprise branch in setDefaultsOnFelixConfiguration.
It is a companion rather than part of ExtendContext because felix defaulting persists
early in reconcile, before ExtendContext runs.
Strip the enterprise-only fields off APIServerConfiguration
(ManagementCluster, ManagementClusterConnection, ApplicationLayer,
RequiresQueryServer, the query-server cert, and KeyValidatorConfig) so
the base render is variant-blind. A new apiserver controller hook fetches
the enterprise CRs, builds the trusted bundle and query-server cert, and
resolves the L7 sidecar images, stashing them on the render context. The
modifier reads that slot and find-or-creates the deployment skeleton when
the base did not render an aggregation server. APIServerPolicy becomes
extensible so the OIDC egress rule and L7 ingress port move to a policy
modifier too.
…rs package

RenderContext is a render-phase input (what a modifier consumes), and
ControllerContext is a controller-phase concept (the data and machinery a
controller gathers). Neither is part of the extension mechanism itself, so
having them in pkg/extensions was backwards. Move RenderContext to pkg/render
and ControllerContext (plus ControllerName) to a new pkg/controller/contexts.

extensions already imported render and render never imports extensions, so the
direction is clean: extensions and contexts both depend on render, and
extensions depends on contexts. No cycle.
Each extension had its own one-line accessor doing the same type assertion
against RenderContext.Extension. Replace that boilerplate with a single generic
render.ExtractExtensionData[T] and have the per-component accessors call it.
The stub component and applyExtensions helper for exercising a modifier through
Set.Decorate were copy-pasted into the extensions, render, and enterprise test
packages. Pull them into pkg/extensions/extensionstest so there's one copy, and
have the three test packages import it.
The flat pkg/enterprise mixed every component's extension in one package. Split
it so each subpackage maps to the render component it extends and exposes a
Register func that New() composes:

  enterprise/typha, guardian, apiserver, windows - self-contained.
  enterprise/installation - the installation controller hook plus the node and
    calico-kube-controllers modifiers, which share installationRenderData (kept
    internal here).
  enterprise/kubecontrollers - the es-calico-kube-controllers assembly plus the
    enterprise kube-controllers cluster role rules it shares with the calico
    modifier (exported as KubeControllersEnterpriseCommonRules).

The shared felix reporter-port helpers move with node (exported as
NodeReporterPort/ValidateReporterPort) so the windows hook can reuse them.
logstorage now imports enterprise/kubecontrollers for the es-kube-controllers
symbols instead of the old flat package. Each subpackage gets its own test
suite; the decorate helper is the shared extensionstest package.
The enterprise integration cases (managed-cluster resources, impersonation,
SCC, the enterprise GuardianPolicy) ran base render plus the real modifier, so
they pulled enterprise.New() into the render test package. Move them next to the
modifier in pkg/enterprise/guardian, where they run against the real Set.

What stays in render is the OSS render path: the OSS guardian-access policy and
the public-CA deployment cases. Those never run the modifier, so render_test no
longer depends on pkg/enterprise for guardian.
The CalicoEnterprise windows cases ran base render plus the real modifier (and
needed the enterprise image overrides), so they pulled enterprise.New() into the
render test package. Move them next to the modifier in pkg/enterprise/windows.

The OSS render tests stay put, now built with an empty imageoverride.New()
instead of the enterprise Set's overrides, so render_test no longer depends on
pkg/enterprise for windows.
node_enterprise_test.go ran the real ExtendContext hook and node/typha
modifiers, so move it whole into pkg/enterprise/installation next to those.

node_test.go only ever rendered base output (no modifier), so its CalicoEnterprise
cases are variant-blindness checks and stay put. Swap its cfg from the enterprise
Set's image overrides to an empty imageoverride.New() so render_test no longer
depends on pkg/enterprise for node.
The apiserver OSS tests went through the enterprise modifier only to pick up the
Calico-variant cleanup deletes, but those deletes come from the base render
itself (and the cleanup modifier is covered in pkg/enterprise/apiserver). Render
the base component directly instead.

With apiserver no longer reaching for the enterprise Set, nothing in the render
test suite uses it, so remove enterprise_setup_test.go. pkg/render no longer
depends on pkg/enterprise, in production or test code.
…lers

The es-kube-controllers assembly lives in pkg/enterprise/kubecontrollers, but its
render tests sat in pkg/render/kubecontrollers and pulled the enterprise package
into the render test deps. Move them next to the assembly. The calico-kube-controllers
(OSS) tests stay put, and pkg/render/kubecontrollers no longer depends on
pkg/enterprise.
The ManagementClusterConnection controller renders Guardian for both OSS (Whisker)
and Enterprise (managed-cluster tunnel), and had IsEnterprise branches woven through
its reconcile: the management/managed mutual-exclusion check, the managed cluster
version (CNX vs Calico), and the license-gated egress network policy.

Add a clusterconnection ControllerExtension (pkg/enterprise/clusterconnection) with
Validate (the mutual-exclusion check) and ExtendContext (CNXVersion + the license
check), stashing the result as a render.GuardianRenderData in the render context.
The controller reads it back generically to fill GuardianConfiguration, falling
back to OSS defaults (CalicoVersion, the Whisker Guardian client keypair, egress
disabled) when no extension is registered. No IsEnterprise left in the reconcile
data path.

Still variant-aware in the controller: the ManagementClusterConnection CR
validation and defaulting (impersonation, public CA) and the ImageSet selection.
Those would need a CR validator/defaulter companion to move, left as a follow-up.
Move the Calico Enterprise cases (enterprise images, the license/tier-gated
calico-system policy, and impersonation) out of clusterconnection_controller_test.go
into clusterconnection_controller_enterprise_test.go. The generic controller
mechanics (default reconcile, the guardian finalizer, proxy settings, and the
tigerastatus conditions) stay in the main file. Same specs, just relocated; the
package-scope proxy helpers are shared between the two files.
ExtendContext handed back a render context while the controller context it was
given already embeds one, leaving two near-identical copies in play. Return the
updated controller context instead, so a single context flows through the
reconcile and callers read its embedded render context.
The extension Set computes its variant's controller-phase options once at startup
(ComputeOptions, run from main) and carries them onto each ControllerContext, so
the shared types no longer name an enterprise-only setting. The ControllerContext
holds an opaque Options the variant's hooks assert back out; the enterprise
multi-tenant flag and its discovery live entirely in pkg/enterprise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants