Azure with default Cluster Autoscaler Spot Optimization Available in v1.18.8+
This guide covers Azure-specific implementation details for Spot Optimization using AKS with Cluster Autoscaler. For general information about Spot Optimization, see the main overview.
This feature is currently available for Azure AKS clusters with autoscaling enabled on node pools.
Feature Enablement
Before using Spot Optimization, complete the Azure Cloud Integration setup. Once enabled, the feature will be available in the Spot Optimization section of your ScaleOps dashboard.
Azure Implementation with Cluster Autoscaler
Scheduling
In AKS with Cluster Autoscaler, ScaleOps implements a direct scheduling approach that provides immediate control over pod placement. When spot instances are available, ScaleOps uses Required Affinity to force spot node scaling, ensuring pods are scheduled on spot nodes. If spot nodes become unavailable and fallback is enabled, the system automatically switches to Preferred Affinity, allowing pods to be scheduled on on-demand nodes while maintaining the desired spot percentage when capacity becomes available again.
Fallback to On-Demand Toggle
The fallback mechanism in AKS works as follows:
- When Enabled: Uses Required Affinity initially, then switches to Preferred Affinity if spot nodes aren’t available
- When Disabled: Uses Required Affinity only, meaning pods will only schedule on spot nodes
ScaleOps automatically detects spot availability and adjusts affinity rules accordingly
DaemonSet Taint Handling
ScaleOps automatically adds tolerations to DaemonSets to enable scheduling on both On-Demand and Spot nodes. This includes tolerations for both ScaleOps custom taints and the AKS default Spot taint that is automatically applied to all Spot nodes. As a result, DaemonSets that previously couldn’t be scheduled on Spot nodes (before Spot optimization was activated) are now able to run on them.
Limitations
Prerequisites
- Enable Azure Cloud Integration
- Node pools must have autoscaling enabled
- Sufficient Spot quota in your Azure subscription. If quota is insufficient, optimization will be blocked and optimization gaps will be displayed.
Workload Support
- Currently only supports Deployments
Node Pool Constraints
- A maximum of 40 node pools is supported per cluster. If more than 40 node pools are present, the feature will be disabled and optimization gaps will appear for all workloads.
- Only node pools using the VM type
VirtualMachineScaleSetsare supported (CLI propertytypePropertiesType:VirtualMachineScaleSets). Workloads running on unsupported node pool types will not be optimized and will display optimization gaps. - Only node pools with mode set to
userare supported. Node pools with mode=systemare not supported. Workloads running on system node pools will not be optimized and will display optimization gaps.
Existing Configuration Constraints
- Workloads with existing required affinity, node selector or topology spread constraints on the lifecycle label will not be optimized (shown as optimization gap in the workload table and workloads overview)
Increased Max Nodes Limits
Since ScaleOps creates mirrored node pools, the total maximum nodes across both original and mirrored pools effectively doubles your cluster’s max node capacity.