Skip to content

Releases: Cloudzero/cloudzero-agent

v1.2.12

18 Jun 13:44

Choose a tag to compare

1.2.12 (2026-06-18)

Release 1.2.12 is a maintenance release focused on deployment flexibility and reliability. Highlights include improved monitoring support, Secrets Store CSI Driver support for API key delivery, chart-wide environment variable injection (enabling egress proxy support), aggregator self-healing for persistent error states, and the ability to disable the webhook server when the experimental KubeState plugin is enabled.

Key Features

  • Selectable Monitoring Discovery (ServiceMonitors or Annotations): When monitoring is enabled, the chart now advertises the agent's metrics via either Prometheus Operator ServiceMonitor resources or prometheus.io/* annotations, rather than emitting both at once. This refines the monitoring integration introduced in validation in 1.2.11. components.monitoring.discovery.method (auto | serviceMonitors | annotations, default auto) selects the mechanism: serviceMonitors renders the ServiceMonitor + PrometheusRule bundle (requires the Prometheus Operator CRDs), while annotations emits prometheus.io/* annotations only. Monitoring itself is gated behind components.monitoring.enabled (default false). See helm/docs/monitoring-infrastructure.md for the full reference. Note: the default install now emits nothing monitoring-related — see Upgrade Steps below.

  • Secrets Store CSI Driver for API Key: New components.apiKey.secretProviderClass value mounts the CloudZero API key via the Secrets Store CSI Driver instead of a Kubernetes Secret, allowing the key to be sourced directly from an external vault (validated against AWS Secrets Manager and Azure Key Vault). The existing apiKey and existingSecretName values continue to work and take priority. This replaces the broader extraVolumes/extraVolumeMounts mechanism from community PR #772 with a focused, single-purpose option.

  • Chart-Wide Environment Injection: New defaults.env value injects a list of environment variables into every container the chart manages. The most common use case is HTTP_PROXY / HTTPS_PROXY / NO_PROXY for clusters behind a corporate egress proxy — every bundled binary honors http.ProxyFromEnvironment — but the field covers any chart-wide binary configuration. Chart-emitted entries (e.g. SERVER_PORT, NODE_NAME) always win on name collision, and component-specific env (e.g. server.env) overrides defaults.env.

  • Optional Webhook Server: The admission webhook server can now be disabled via components.webhookServer.enabled (true | false | "auto", default "auto", which maps to enabled today). When disabled, the entire webhook surface (Deployment, Service, ConfigMap, HPA, PDB, ValidatingWebhookConfiguration, certificate/issuer/secret, init-cert RBAC, and the backfill job) is removed cleanly with no dangling objects. This is primarily groundwork for future deployment topologies — disabling the webhook server is only appropriate for clusters already relying on the experimental KubeState-based collection path, and is not recommended for general use. Default behavior is unchanged. The legacy insightsController.enabled value still takes precedence when explicitly set and is now marked deprecated.

Reliability Improvements

  • Aggregator Self-Healing: The aggregator now detects and recovers from a pod stuck in a persistent 5xx state. /healthz returns 503 when the /collector 5xx rate exceeds 20% (with at least 3 failures in the last 60s), so readiness pulls the pod from the Service within ~30s. A new /livez endpoint applies a 5-minute sticky latch to trigger a pod restart if the broken state persists. Every 5xx response now sets Connection: close so clients reconnect and re-balance immediately. The chart adds aggregator.readinessProbe/aggregator.livenessProbe defaults with per-container override slots.

  • Backfill Memory Footprint: The backfill job no longer reuses the webhook server's in-memory SQLite store. It now uses a streaming store that batches records (500 at a time) directly to the collector via remote_write, holding memory flat at ~53 MiB regardless of cluster size (previously ~597 MiB at 1M resources). The unnecessary pusher, housekeeper, and secret monitor no longer run during backfill.

  • GC-Induced OOM Prevention: All agent binaries now set GOMEMLIMIT to 90% of the detected cgroup memory limit via automemlimit, so the Go garbage collector reclaims aggressively before RSS reaches the pod limit. This fixes customer-reported OOM kills of the backfill job under allocation churn.

Security Improvements

  • Webhook Server Credential Surface: As part of ongoing security hardening, the webhook server no longer mounts the CloudZero API key. It now pushes metrics exclusively to the in-cluster aggregator, so the key mount, the Authorization header, and the in-process secret-rotation monitor have been dropped from its code path. This is an incremental reduction of the credential mount surface — the API key secret itself is unchanged, and other components that need it (the aggregator's shipper, the agent init containers, the config-loader job) are unaffected.

Build and Infrastructure

  • Fixed KUTTL end-to-end suites that were silently no-op'ing (command/spec: wrapper bugs) and wired the freshly built image into the matrix kind tests so the suites genuinely execute

Upgrade Steps

To upgrade to version 1.2.12, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.12

The default install no longer emits any monitoring resources or prometheus.io/* annotations. If you relied on annotation-based scrape discovery in a prior version, restore it explicitly:

components:
  monitoring:
    enabled: true
    discovery:
      method: annotations

v1.2.11

23 Apr 19:16

Choose a tag to compare

1.2.11 (2026-04-23)

Release 1.2.11 is a maintenance release focused on integration with existing observability tooling, better support diagnostics, and a round of dependency and security updates.

Key Features

  • Prometheus Operator Integration: The chart now optionally generates ServiceMonitor and PrometheusRule resources so customers running the Prometheus Operator can monitor the agent out of the box. The bundle includes four ServiceMonitors covering every agent component and 14 alerts spanning critical failures, degraded operation, data quality issues, and container health. Shipper thresholds are derived automatically from aggregator.database.costMaxInterval so alerts stay aligned with configured upload cadence. Enable via components.monitoring.enabled, which supports an auto mode that only renders the resources when the Prometheus Operator CRDs are present on the cluster. See helm/docs/monitoring-infrastructure.md for the full alert catalog.

  • User-Supplied Alloy Configuration: The existing prometheusConfig.scrapeJobs.additionalScrapeJobs extension point only applies to Prometheus mode. Clustered (Alloy) mode now has the equivalent capability via a new prometheusConfig.additionalAlloyConfig value, which accepts a raw Alloy River string that is merged into the generated pipeline. User components can forward to prometheus.remote_write.cloudzero.receiver to ship metrics through the same path as the built-in scrape jobs, making it possible to extend clustered mode with custom scrape targets or transformations.

Bug Fixes

  • Clustered Mode Startup: Fixed a startup failure in clustered mode where the Alloy container was unable to initialize its storage path. Clustered mode installations no longer require any values-level workaround.

Improvements

  • Webhook and TLS Diagnostics: The anaximander support script now collects validating webhook configuration (including certificate bundle validation) and TLS certificate metadata from the agent namespace. Certificate subjects, issuers, expiration dates, fingerprints, and SANs are included; private key material is never collected. This targets one of the most common silent failure modes — a missing or invalid caBundle combined with failurePolicy: Ignore — so CloudZero support can diagnose webhook issues from a single diagnostic bundle.

Build and Infrastructure

  • Go toolchain updated to 1.26.2
  • Embedded Alloy binary updated to v1.15.1
  • Prometheus library updated to 0.311.2
  • Kubernetes client libraries aligned at v0.36.0
  • Numerous dependency updates across Go modules, GitHub Actions, and tooling
  • Removed a transitive dependency on github.com/docker/docker from the production build

Upgrade Steps

To upgrade to version 1.2.11, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.11

v1.2.10

25 Mar 13:47

Choose a tag to compare

1.2.10 (2026-03-23)

Release 1.2.10 brings significant new integrations, security improvements, enhanced diagnostics configurability, and numerous bug fixes. Highlights include Istio service mesh support, a more secure cAdvisor collection mode, granular control over validator diagnostic checks, and the switch to a CloudZero-maintained Alloy binary.

Key Features

  • Istio Service Mesh Integration: Added comprehensive Istio support with automatic detection and runtime validation. The agent detects sidecar and ambient mesh modes, validates cluster ID configuration for multicluster environments, and automatically applies Istio port exclusion annotations. Includes DestinationRule and VirtualService templates for traffic fencing. Configure with integrations.istio.enabled (defaults to auto-detection) and integrations.istio.clusterID for multicluster setups.

  • Direct Kubelet cAdvisor Collection: New integrations.cAdvisor.directNodeAccess.enabled option collects cAdvisor metrics directly from node kubelets (port 10250) instead of through the Kubernetes API server proxy. This significantly improves security posture by requiring only nodes/metrics RBAC permission instead of nodes/proxy, which grants cluster-wide remote code execution capability. Works in both Prometheus and Alloy modes. The previous prometheusConfig.scrapeJobs.cadvisor.enabled is now deprecated in favor of integrations.cAdvisor.enabled.

  • Per-Check Validator Diagnostics: Replaced the per-stage enforce boolean with a granular per-check type system. Each diagnostic check can now be configured as required (blocks pod startup on failure), optional (warns but allows startup), informative (always passes, gathers information), or disabled (skipped). Configure via components.validator.checks.<stage>.<check>: <type> in values.yaml.

  • CloudZero Alloy Fork: Clustered mode now uses a CloudZero-maintained Alloy binary embedded directly in the agent image, eliminating the separate Grafana Alloy container image pull. Users can still override via clusteredNode.image and the new clusteredNode.command value.

Bug Fixes

  • Webhook Service Name in Validator: Fixed incorrect service name reference in the validator ConfigMap (cloudzero-agent-cz-webhook-svc vs actual cloudzero-agent-cz-webhook). This caused the webhook_server_reachable post-start check to fail with DNS lookup errors, adding ~70 seconds of unnecessary delay to pod startup.

  • Image Pull Secrets Inheritance: Fixed defaults.image.pullSecrets not being applied to 6 of 8 workload templates. All templates now correctly use the fallback chain: component-specific image.pullSecretsdefaults.image.pullSecrets → deprecated top-level imagePullSecrets.

  • cAdvisor Schema Constraint: Removed an enum: [true] constraint on prometheusConfig.scrapeJobs.cadvisor.enabled that made it impossible to disable the cAdvisor scrape job via values without bypassing schema validation.

Improvements

  • Scout Configuration Override: Scout-detected values (region, cloud account ID, cluster name) from instance metadata now always override customer-provided values, with customer values used only as a fallback when detection returns empty. A warning is logged when detected and configured values differ.

  • EndpointSlice Service Discovery: Migrated Prometheus and Alloy internal service discovery from the deprecated role: endpoints to role: endpointslice, ensuring forward compatibility with Kubernetes 1.33+ where the Endpoints API is deprecated.

  • Active Series Metric: Now collects prometheus_remote_write_wal_storage_active_series from Alloy's self-scrape job, useful for diagnosing memory usage in high-volume deployments.

  • Configuration Documentation: Added extensive inline documentation to Alloy and Prometheus configuration templates, including architecture diagrams, component reference tables, and data flow explanations for each scrape job.

Support Tooling

  • Label/Annotation Enumeration: New scripts/kube-list-labels-annotations.sh script enumerates all labels and annotations in use across a cluster with frequency counts, useful for debugging label-based cost allocation.

  • Diagnostic Script Improvements: The anaximander diagnostic script now tracks command success/failure with a summary report, and uses updated label selectors matching the current naming conventions.

Experimental Features

  • KubeState Plugin: Added an experimental option to replace the kube-state-metrics subchart with an embedded Alloy plugin for Kubernetes state metrics collection. Enable with components.agent.kubeState.enabled: true in clustered mode.

Build and Infrastructure

  • Go version updated to 1.25.7
  • Base image updated to latest distroless/static-debian12
  • Updated copyright year to 2026
  • Numerous dependency updates (Prometheus, Kubernetes client libraries, etc.)

Upgrade Steps

To upgrade to version 1.2.10, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.10

v1.2.9

25 Nov 17:09

Choose a tag to compare

1.2.9 (2025-11-25)

Release 1.2.9 focuses on quality, scalability, configurability, and consistency. It includes significant improvements to the organization and configurability of the Helm chart, Prometheus 3.x support (with 2.x support preserved for now), an HPA for the webhook server, as well as some early previews of experimental functionality we hope to stabilize over the next few releases.

Key Improvements

  • Prometheus 3.x Support: Upgraded default Prometheus version from 2.55.1 to 3.7.3 with automatic backward compatibility. The chart detects Prometheus version and uses the appropriate agent mode flag (--agent for 3.x, --enable-feature=agent for 2.x). Customers using custom Prometheus 2.x images will continue to work without changes.

  • Webhook Server Autoscaling: Added Horizontal Pod Autoscaler support for the webhook server, enabling automatic scaling based on CPU and memory utilization. Enable with insightsController.autoscaling.enabled: true.

Configuration Improvements

  • Unified Label/Annotation System: Refactored label and annotation generation with new generateLabels and generateAnnotations helpers. Labels now follow Kubernetes recommended practices with app.kubernetes.io/name for component identity and app.kubernetes.io/part-of: cloudzero-agent for chart membership.

  • Component-Specific Metadata: Added support for component-specific labels, annotations, podLabels, and podAnnotations across all workload types, providing fine-grained control over Kubernetes metadata.

  • Centralized Resource Names: Implemented unified resource naming pattern ({release-name}-cz-{component}) across all Kubernetes resources, improving consistency and enabling programmatic name reconstruction.

  • Unified Mode Configuration: New components.agent.mode property consolidates deployment mode selection with values: agent, server, federated, and clustered. Legacy properties continue to work with automatic derivation.

  • Cohesive Replicas System: New defaults.replicas property provides global default with mode-specific constraints.

  • Persistent Volume Strategy: Changed deployment strategy to Recreate when persistent volumes are enabled, preventing volume mount conflicts during rolling updates.

Reliability Improvements

  • Subchart Isolation: Excluded .global sections from configuration checksum calculation, preventing unnecessary pod restarts when parent chart globals change in subchart deployments.

  • Cert-Manager Compatibility: Fixed ArgoCD reconciliation failures by removing empty caBundle key from webhook configuration when using cert-manager (which injects the CA bundle via annotation).

  • DNS Resolution: Updated cAdvisor configuration to use fully qualified domain name kubernetes.default.svc.cluster.local:443 for improved DNS resolution reliability.

Support Tooling

  • Diagnostic Script: Added scripts/anaximander.sh for comprehensive diagnostic information gathering. Customers can run this script to collect logs, configurations, resource status, and environment context for CloudZero support.

  • Post-Install Guidance: Updated Helm NOTES.txt with improved post-installation guidance and next steps.

Experimental Features

The following features are experimental and may change in future releases:

  • Grafana Alloy Integration: Added Grafana Alloy as an alternative to Prometheus for metrics collection in high-volume environments. Configure with components.agent.mode: clustered to enable.

  • GPU Metrics Collection: Added NVIDIA DCGM GPU metrics scraping. Enable with prometheusConfig.scrapeJobs.gpu.enabled: true. Note that this is just for collection, CloudZero does not yet support cost allocation based on GPU.

Upgrade Steps

This release includes changes to immutable Kubernetes selectors, requiring the --force flag to recreate affected resources:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.9 --force

v1.2.8

10 Oct 13:43

Choose a tag to compare

1.2.8 (2025-10-10)

Release 1.2.8 is a maintenance release focused on quality assurance improvements, with enhanced testing infrastructure, better configuration validation, and useful configuration enhancements.

Key Features

  • Configurable Labels and Annotations: All Kubernetes resources now support customizable labels and annotations through Helm values, providing better integration with organizational policies and tooling.
  • Enhanced Build System: Specialized build configurations now support environment-specific customization (e.g., Replicated builds), simplifying multi-environment deployments.

Configuration Improvements

  • Centralized Validation: Moved apiKey/existingSecretName validation from Helm templates to JSON Schema, centralizing all configuration validation in a single location for improved maintainability.
  • Service Port Protocol: Added configurable protocol field for webhook server service ports, improving compatibility with service mesh configurations.

Quality Assurance

  • CI/CD Infrastructure Overhaul: Restructured the entire CI testing infrastructure to support more comprehensive testing during development, including expanded Kubernetes version coverage (now testing against 1.33 and 1.34) and improved test isolation.
  • Unified Testing Framework: Introduced consolidated testing infrastructure with new test-all target covering unit tests, integration tests, Helm tests, and KUTTL end-to-end tests.
  • Workflow Validation: Added actionlint for GitHub Actions workflow validation and markdownlint-cli2 for documentation quality checks.
  • Documentation Expansion: Significantly expanded project documentation with comprehensive guides for development, testing, architecture, and troubleshooting.

Upgrade Steps

To upgrade to version 1.2.8, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.8

v1.2.7

27 Aug 00:15
1bb74d6

Choose a tag to compare

1.2.7 (2025-08-22)

Release 1.2.7 removes a dependency on the Bitnami kubectl container image, replacing it instead with a new cloudzero-certifik8s executable. This is critical as Broadcom (who controls Bitnami through VMware) is doing away with the old bitnami images.

Key Features

  • Go-Based Certificate Management: Complete transformation from bash scripts to modern Go-based certificate management with the new cloudzero-certifik8s tool, providing enhanced security, testability, and maintainability.
  • Comprehensive Security Context: Added security context to all Kubernetes resources (pods, containers, jobs, deployments, daemonsets) with secure defaults and component-specific overrides.
  • Enhanced Shipper Reliability: Improved shipper logging and fixed a replay file processing bug that could cause successful uploads to be incorrectly abandoned.

Security Enhancements

  • Certificate Management Security: Replaced bash scripts with secure Go-based certificate generation, eliminating dependency on deprecated bitnami/kubectl Docker image and implementing proper RBAC with reduced permissions.
  • Security Context Implementation: Added comprehensive security context to all Helm templates with secure defaults (runAsUser: 65534, runAsNonRoot: true) and proper property filtering for pod vs container contexts.
  • Checkov Security Compliance: Enabled security context rules (CKV_K8S_29, CKV_K8S_30, CKV_K8S_23) after implementing proper security contexts across all resources.
  • RBAC Improvements: Enhanced cluster-scoped permissions for certificate management with resource-specific restrictions and proper Kubernetes client integration.

Shipper Reliability Improvements

Replay File Processing Fix:

  • Fixed critical bug where successfully uploaded files were incorrectly abandoned
  • Corrected replay request loop to iterate over reference IDs instead of URLs
  • Enhanced abandon operation logging with file-specific details (reference_id and reason)
  • Added comprehensive debug logging for replay request processing

Enhanced Logging:

  • Improved abandon operation logging to include file-specific details
  • Added debug logging for replay request processing
  • Fixed smoke test failures related to replay request processing

Configuration Enhancements

CloudAccountId Validation:

  • Enhanced JSON schema to allow quoted values for better user experience
  • Added support for quoted numeric and UUID values (e.g., '1234567890', '123e4567-e89b-12d3-a456-426614174000')
  • Implemented comprehensive test coverage for all quote scenarios
  • Added warning notes discouraging manual configuration of auto-detectable properties

Upgrade Steps

To upgrade to version 1.2.7, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.7

v1.2.6

07 Aug 02:42

Choose a tag to compare

1.2.6 (2025-08-05)

Release 1.2.6 introduces a new CronJob-based backfill system, comprehensive resource management improvements, and enhanced security and reliability features.

Key Features

  • Enhanced Shutdown Coordination: Implemented robust file-based shutdown coordination between collector and shipper containers with intelligent waiting mechanisms and timeout protection, ensuring graceful shutdown sequences.
  • Dual Backfill System: Implemented both a CronJob for scheduled runs (default: every 12 hours) and an immediate Job for instant execution on install, providing both immediate execution and configurable recurring runs for ongoing data collection.
  • Comprehensive Resource Management: Systematic refactoring of all components with centralized resource generation, providing consistent resource request/limit configurations across all containers.

Security Enhancements

  • Checkov Security Integration: Added comprehensive security analysis with Checkov to build system and CI, fixing multiple Kubernetes security violations including missing liveness and readiness probes.
  • Fail-Open Webhook Validation: Implemented true fail-open behavior for webhook validation with always-allow behavior, ensuring webhook validation never blocks Kubernetes resource operations.
  • Enhanced Health Monitoring: Added proper liveness and readiness probes to prometheus-config-reloader container, improving container health monitoring and automatic restart capabilities.
  • Comprehensive Security Context Implementation: Added configurable security context support to all Kubernetes pods and containers.

Additional Enhancements

  • Observability Improvements: Removed observability files on upload to prevent storage bloat and improve performance.
  • Scout Configuration Enhancement: Updated scout to return Google project number instead of project ID for improved metadata accuracy.
  • Cloud Account Validation: Added JSON Schema validation for cloudAccountId contents to ensure proper configuration.
  • Image Pull Secrets: Added image pull secrets support for config loader and helmless jobs for enhanced security.

Technical Improvements

  • Centralized Resource Generation: Created reusable helper functions for consistent resource configuration patterns across all templates.
  • Backward Compatibility: Maintained full backward compatibility through legacy precedence logic, ensuring existing deployments continue to work without changes.
  • Comprehensive Testing: Added 20 test suites with 87 total tests covering all fallback scenarios, security context functionality, and edge cases.

Resource Configuration Details

New Component Structure:

  • Core Components: components.agent.resources, components.aggregator.collector.resources, components.aggregator.shipper.resources, components.webhookServer.resources
  • Job Components: components.miscellaneous.configLoader.resources, components.webhookServer.backfill.resources, components.agent.federatedNode.resources
  • New Components: components.helmless.resources, components.initCertJob.resources
  • Specialized Components: components.agent.configmapReloader.resources, components.validator.resources, components.kubeStateMetrics.resources

Upgrade Steps

To upgrade to version 1.2.6, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.6

v1.2.5

25 Jul 15:02

Choose a tag to compare

1.2.5 (2025-07-25)

Release 1.2.5 is a critical maintenance release that fixes a webhook configuration issue affecting resource metadata collection. Due to a single-character difference in resource names (using singular instead of plural), the webhook server was not collecting the necessary information for labels and annotations. Customers on versions 1.2.3 and 1.2.4 should upgrade immediately.

Critical Fix

  • Webhook Configuration Fix: Fixed a critical bug where the webhook server was not collecting resource metadata due to incorrect resource name configuration. This affected label and annotation collection for all resources processed by the webhook.

Key Features

  • Enhanced Webhook Configuration: Fixed webhook misconfiguration issues and improved integration testing infrastructure with comprehensive validation and debugging capabilities.
  • AWS IMDSv1 Fallback Support: The CloudZero Agent's AWS scout implementation now gracefully falls back from IMDSv2 to IMDSv1 when the token endpoint is unavailable, ensuring compatibility with clusters that don't have IMDSv2 enabled. This maintains security preference for IMDSv2 while providing compatibility with IMDSv1-only environments.
  • Comprehensive Troubleshooting Guide: Added a troubleshooting guide covering quick diagnosis, component-specific troubleshooting, network policies, certificate issues, and scaling problems with clear escalation paths.

Additional Enhancements

  • Security Documentation: Significantly expanded SECURITY.md with detailed security considerations, vulnerability reporting procedures, and best practices for secure deployment.
  • Scout Error Messages: Enhanced scout configuration error messages with specific Helm chart parameter guidance, making troubleshooting more actionable.
  • Cloud Provider Detection: Added cloud provider information to cluster configuration for improved metadata collection and environment awareness.
  • Test Infrastructure: Improved webhook integration testing with centralized Kind cluster configuration, enhanced test maintainability, and comprehensive validation.
  • Dependency Updates: All third-party dependencies have been update to the latest versions.

Technical Improvements

  • Webhook Reliability: Fixed service name resolution and improved webhook test validation with comprehensive debugging capabilities
  • Documentation Quality: Added systematic troubleshooting approach with label selector commands and component-specific diagnostic procedures
  • Build System: Enhanced test infrastructure with better organization and maintainability
  • AWS Metadata Service Compatibility: Implemented robust fallback mechanism for AWS metadata retrieval with clear error distinction between IMDSv2 and IMDSv1 failures

Upgrade Steps

⚠️ CRITICAL: Customers on versions 1.2.3 and 1.2.4 should upgrade immediately due to the webhook configuration fix.

To upgrade to version 1.2.5, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.5

v1.2.4

17 Jul 17:46

Choose a tag to compare

1.2.4 (2025-07-17)

Release 1.2.4 is a maintenance release including Improved Metrics Filtering, and Collector Interval Adjustments for better performance. This release focuses on operational improvements, build efficiency, and enhanced visibility into metric processing.

Key Features

  • Optimized Collection Intervals: Increased cost metrics collection interval from 10 minutes to 30 minutes for better performance in smaller clusters, while reducing observability metrics timeout to 10 minutes to maintain cluster connectivity visibility.
  • Enhanced Scout Auto-Detection: The confload job now leverages the Scout system to automatically detect cloud environment metadata (region, account ID, cluster name) when these values are not explicitly provided, significantly simplifying deployment configuration.
  • Dramatic Docker Build Performance: Build times reduced from 2:30-3:00 minutes to ~12 seconds through multi-stage builds with platform-specific caching, selective file copying, and conditional dependency generation.
  • Dropped Metrics Tracking: The metric filter now provides visibility into filtered-out metrics through debug logging, making it easier to debug filter configurations and understand metric processing behavior.

Additional Enhancements

  • Backfiller Reliability: Fixed GroupVersionKind issues and race conditions in namespace and node processing, with comprehensive integration testing.
  • Test Infrastructure: Improved test reliability by fixing flaky tests related to file monitoring, file locking, and SQL timestamp formatting.
  • Development Tooling: Added semantic diff targets (*.{yaml,json}-semdiff) for better visibility into Helm template changes during development.
  • Dependency Management: Updated Dependabot to run on Wednesdays instead of Fridays for better alignment with patch release cycles.

Upgrade Steps

To upgrade to version 1.2.4, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.4

v1.2.3

03 Jul 14:45
a170573

Choose a tag to compare

1.2.3 (2025-07-03)

Release 1.2.3 introduces Cloud Service Provider Auto-Detection, significant Performance Optimizations for the admission controller, enhanced Istio Integration, and numerous reliability improvements. This release dramatically simplifies deployment configuration while improving performance and compatibility with service mesh environments.

Key Features

  • Cloud Service Provider Auto-Detection: The CloudZero agent now includes a comprehensive "scout" system that automatically detects cloud environment metadata including provider, region, account ID, and cluster name. This eliminates the need to manually configure these values in many deployments.
    • AWS Support: Automatically detects region, account ID from EC2 instance metadata
    • Google Cloud Support: Automatically detects region, project ID, and cluster name from GCE metadata
    • Azure Support: Automatically detects region and subscription ID from Azure IMDS
  • Webhook Server Optimization: The webhook server now explicitly requests only the Kubernetes resource types it needs instead of receiving all resources, significantly reducing network traffic and improving performance.
  • Enhanced Istio Integration: The webhook server now automatically includes sidecar.istio.io/inject: "false" annotation by default, providing seamless out-of-the-box compatibility with Istio service mesh environments without requiring manual configuration.

Additional Enhancements

  • Improved Load Balancing: Enhanced webhook server connection handling with periodic connection rotation to ensure proper load distribution across service replicas in multi-replica deployments.
  • Configurable Webhook Timeout: Added ability to configure webhook admission controller timeout values, and changed the default from 15 seconds to 1 second.

Upgrade Steps

To upgrade to version 1.2.3, run the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.3