Skip to Content

Cluster Headroom

Cluster Headroom allows you to reserve capacity in advance, ensuring your cluster is prepared for new workloads. This helps prevent full allocation and accelerates scheduling. ScaleOps provides granular control over resource reservation across different node pools, lifecycles (spot/on-demand), and time-based schedules.

ℹ️

Version Information: The enhanced Cluster Headroom feature with granular configuration control, node pool targeting, scheduling, and GitOps support is available in version 1.25.14 and later. For versions prior to 1.25.14, Cluster Headroom supports basic static and proportional resource reservation strategies with optional scheduling. Existing configurations are automatically migrated to the new implementation upon upgrade.

Cluster Headroom Visibility

On the Cluster Headroom page, you can see metrics and graphs showing the headroom reservations across the cluster of memory, CPU and GPU. This allows to quickly see the headroom reservations over time for each resource.

Cluster Headroom Graphs

Managing Headroom Configurations

You can define your cluster headroom logic by creating cluster headroom configurations. The configurations table captures all the cluster headroom configuration defined in the cluster. Click Create configuration to create cluster headroom configuration.

Cluster Headroom Configuration Fields

When opening a cluster headroom configuration the following fields are available:

Cluster Headroom Configuration

Name

Unique identifier for the configuration. Must be alphanumeric with hyphens allowed.

Resources

Configure how resources are reserved:

  • Static: Fixed resource amounts (e.g., 4 CPU, 8 GiB Memory).
  • Dynamic: Percentage of workload requests within the configuration’s scope (e.g., 10%). Automatically adjusts as workload demand changes within the targeted nodes.

Lifecycle

Target specific node types:

  • Spot: Only schedule on spot/preemptible nodes
  • On-Demand: Only schedule on on-demand nodes
  • Both: No lifecycle preference

Node Pools

Select specific node pools to target. This setting is supported only in EKS, AKS, GKE clusters. When selected, ScaleOps will add target the cluster headroom pods only to the selected node pools.

Note: When using Cluster Autoscaler, Cloud Node integration is required for node pool selection. When using Karpenter, Cloud Node integration is not required.

Advanced Settings - Tolerations

Add custom tolerations for tainted nodes.

Note: Auto-tolerations are added automatically when targeting node pools with taints, not affecting the user configuration.

Advanced Settings - Node Selector

Add custom node selector labels for additional scheduling constraints.

Schedule

Define time windows when the configuration is active:

  • Start time and end time in HH:MM format (24-hour)
  • Select days of the week
  • Supports midnight wrap (e.g., 22:00 - 06:00)
  • Multiple schedules can be added

Cluster Headroom Schedule Configuration

Managing Cluster Headroom with GitOps

Cluster Headroom supports GitOps workflows through a static ConfigMap. When the static ConfigMap exists, it takes precedence over the UI configuration, and UI mutations are blocked.

In order to define cluster headroom configurations via GitOps, create a ConfigMap named scaleops-static-headroom in the ScaleOps namespace.

Configuration Fields

FieldTypeRequiredDescription
namestringYesUnique configuration identifier
cpu.typestringYesstatic or dynamic
cpu.valuenumberYesCPU cores (static) or percentage (dynamic)
memory.typestringYesstatic or dynamic
memory.valuenumberYesGiB (static) or percentage (dynamic)
gpu.typestringNostatic or dynamic
gpu.valuenumberNoGPU count (static) or percentage (dynamic)
lifecyclestringNospot, onDemand, or empty
nodePoolsstring[]NoList of target node pool names
scheduleobject[]NoTime windows (see below)
tolerationsobject[]NoStandard K8s tolerations
nodeSelectormapNoKey-value label selector

Schedule Object

FieldTypeDescription
startTimestringStart time in HH:MM format (24-hour)
endTimestringEnd time in HH:MM format (24-hour)
daysstring[]Days of week: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday

Example:

apiVersion: v1 kind: ConfigMap metadata: name: scaleops-static-headroom namespace: scaleops-system # or your ScaleOps namespace data: configurationv2: | { "configurations": [ { "name": "production-headroom", "cpu": { "type": "static", "value": 4 }, "memory": { "type": "static", "value": 8 }, "gpu": { "type": "static", "value": 0 }, "lifecycle": "onDemand", "nodePools": [], "schedule": null, "tolerations": [], "nodeSelector": {} }, { "name": "spot-buffer", "cpu": { "type": "dynamic", "value": 10 }, "memory": { "type": "dynamic", "value": 10 }, "gpu": { "type": "static", "value": 0 }, "lifecycle": "spot", "nodePools": ["spot-pool-1", "spot-pool-2"], "schedule": [ { "weeklyConfig": { "beginTime": "09:00", "endTime": "18:00", "days": [1, 2, 3, 4, 5] } } ], "tolerations": [ { "key": "spot", "operator": "Equal", "value": "true", "effect": "NoSchedule" } ], "nodeSelector": { "workload-type": "batch" } } ] }

GitOps Precedence

When the static ConfigMap exists:

  • The static ConfigMap configuration takes full precedence
  • UI shows configurations as read-only
  • Create/Edit/Delete buttons are disabled with a tooltip explaining GitOps mode
  • Deleting the ConfigMap returns control to the UI

Implementation

ScaleOps creates Deployments that reserve the configured capacity. These deployments use the pause container image and are configured with:

  • Low-priority PriorityClass so they get preempted when real workloads need resources, if the priority class is not configured well, a message will appear on the page and actions will be blocked
    • Pods with preemptionPolicy: Never: Cluster Headroom relies on Kubernetes preemption to instantly free capacity for new workloads. Pods configured with preemptionPolicy: Never (either directly or via their PriorityClass) will not preempt headroom pods, even if they have higher priority. These pods will instead wait for the cluster autoscaler to provision new nodes.
  • Resource requests of 1 CPU, 1 GiB memory, or 1 GPU per replica
  • Replicas calculated from your configured values (e.g., 4 CPU = 4 replicas)
  • Affinity rules based on lifecycle and node pool targeting
  • Tolerations from configuration plus auto-tolerations from targeted node pools

Prerequisites

  • Cluster Autoscaler or Karpenter: Required for headroom pods to trigger node scale-up
  • Cloud integration: unless cluster has Karpenter, Cloud integration must be available in order to have the “node pool” selection feature (feature will work without it, but nodepool selection section will be greyed out)

Migration from legacy Cluster Headroom configuration

Upon version upgrade, existing cluster headroom configurations are automatically migrated to the new implementation.