Migration Example: Cluster Autoscaler to Karpenter
This page provides an illustrative example of how you might migrate from Cluster Autoscaler to Karpenter. Note: The YAML below is for conceptual understanding only and will need to be adapted to your environment and Karpenter version.
Example Conversion
Below is an example of converting several Cluster Autoscaler node groups with different CPU:memory ratios and instance types into a more flexible Karpenter configuration.
Cluster Autoscaler (eksctl) Example - AWS
# ILLUSTRATIVE EXAMPLE - cluster-autoscaler-multiple-nodegroups.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
region: us-west-2
managedNodeGroups:
# Memory-optimized on-demand node group
- name: memory-optimized-ondemand
instanceType: r5.2xlarge # High memory:cpu ratio
minSize: 1
maxSize: 5
desiredCapacity: 1
volumeSize: 100
volumeType: gp3
securityGroups:
attachIDs: ["sg-0123456789abcdef0"]
ssh:
allow: true
publicKeyName: my-keypair
tags:
NodeGroupType: "memory-optimized"
Environment: "production"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
# Compute-optimized on-demand node group
- name: compute-optimized-ondemand
instanceType: c5.2xlarge # High cpu:memory ratio
minSize: 1
maxSize: 5
desiredCapacity: 1
volumeSize: 100
volumeType: gp3
securityGroups:
attachIDs: ["sg-0123456789abcdef0"]
ssh:
allow: true
publicKeyName: my-keypair
tags:
NodeGroupType: "compute-optimized"
Environment: "production"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
# Memory-optimized spot node group
- name: memory-optimized-spot
instanceTypes: ["r5.2xlarge", "r5a.2xlarge", "r5d.2xlarge"]
minSize: 0
maxSize: 10
desiredCapacity: 2
spot: true
volumeSize: 100
volumeType: gp3
securityGroups:
attachIDs: ["sg-0123456789abcdef0"]
ssh:
allow: true
publicKeyName: my-keypair
tags:
NodeGroupType: "memory-optimized-spot"
Environment: "production"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
# Compute-optimized spot node group
- name: compute-optimized-spot
instanceTypes: ["c5.2xlarge", "c5a.2xlarge", "c5d.2xlarge"]
minSize: 0
maxSize: 10
desiredCapacity: 2
spot: true
volumeSize: 100
volumeType: gp3
securityGroups:
attachIDs: ["sg-0123456789abcdef0"]
ssh:
allow: true
publicKeyName: my-keypair
tags:
NodeGroupType: "compute-optimized-spot"
Environment: "production"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
# GPU nodes for machine learning workloads (with taints)
- name: gpu-workloads
instanceTypes: ["g4dn.xlarge", "g4dn.2xlarge"]
minSize: 0
maxSize: 3
desiredCapacity: 0
volumeSize: 100
volumeType: gp3
taints:
- key: "nvidia.com/gpu"
value: "true"
effect: "NoSchedule"
labels:
workload-type: "gpu"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnlyEquivalent Karpenter Configuration
# ILLUSTRATIVE EXAMPLE - karpenter-consolidated-configuration.yaml
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: consolidated
spec:
# AL2023 is recommended as AL2 will be deprecated with Kubernetes 1.33
amiFamily: AL2023
# Alternative: Use an alias for better versioning control
# amiSelectorTerms:
# - alias: al2023@v20240807
role: "KarpenterNodeRole-my-cluster"
# Select subnets by tags - can also select by IDs or mix approaches
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "my-cluster"
# To select only private subnets add:
# private-subnet: "true"
securityGroupSelectorTerms:
- id: "sg-0123456789abcdef0"
tags:
Environment: "production"
karpenter.sh/discovery: "my-cluster"
Name: "karpenter-node-{{.Name}}" # Dynamic naming template
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
deleteOnTermination: true
encrypted: true # Enable encryption for security compliance
# kmsKeyID: "arn:aws:kms:region:account:key/key-id" # Optional KMS key
# Example UserData for SSH access via SSM
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Enable SSM agent for secure access
systemctl enable amazon-ssm-agent
systemctl start amazon-ssm-agent
--//--
---
# General purpose node pool for most workloads
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-workloads
spec:
# Higher weight means higher priority when multiple NodePools can satisfy requirements
weight: 100
template:
metadata:
labels:
Environment: "production"
spec:
nodeClassRef:
kind: EC2NodeClass
name: consolidated
# Set node expiration to ensure regular rotation for security and updates
expireAfter: 720h # 30 days
# Requirements section determines what type of nodes Karpenter will provision
requirements:
# Use either specific instance types OR category/generation for better flexibility
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "r", "m"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["4"]
# Capacity type ordering - Karpenter tries the first value (spot) before falling back to on-demand
# This ordering provides cost savings while ensuring workloads can still run if spot is unavailable
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"] # Spot first for cost savings
# Architecture requirement - only amd64 in this example
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
# Disruption configuration controls how Karpenter manages node termination
disruption:
# WhenEmptyOrUnderutilized allows more aggressive consolidation for cost savings
consolidationPolicy: WhenEmptyOrUnderutilized
# Time to wait after a pod leaves before considering node for consolidation
consolidateAfter: 30s
# Resource limits constrain the total size of the pool
# Limits prevent Karpenter from creating new instances once the limit is exceeded
limits:
cpu: "1000"
memory: 1000Gi
---
# Specialized node pool for GPU workloads, matching the Cluster Autoscaler GPU node group
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu-workloads
spec:
# Lower weight than general-purpose pool - avoids using GPUs for non-GPU workloads
weight: 50
template:
metadata:
labels:
workload-type: "gpu"
spec:
nodeClassRef:
kind: EC2NodeClass
name: consolidated
# Same taints as in Cluster Autoscaler configuration to ensure only GPU workloads use these nodes
taints:
- key: "nvidia.com/gpu"
value: "true"
effect: "NoSchedule"
# Fixed 7-day expiry for GPU nodes to ensure they're regularly refreshed
expireAfter: 168h
requirements:
# Specific GPU instance types matching Cluster Autoscaler configuration
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["g4dn.xlarge", "g4dn.2xlarge"]
# GPU nodes often use on-demand for stability
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
disruption:
# Less aggressive consolidation for GPU workloads
consolidationPolicy: WhenEmpty
# GPU-specific resource limits
limits:
cpu: "100"
memory: 100Gi
nvidia.com/gpu: 4Pod Scheduling Example
# ILLUSTRATIVE EXAMPLE - Pod scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
name: memory-intensive-app
spec:
replicas: 3
selector:
matchLabels:
app: memory-intensive
template:
metadata:
labels:
app: memory-intensive
spec:
# No nodeSelector or affinity needed - Karpenter will automatically
# provision appropriate nodes based on pod resource requirements
containers:
- name: memory-app
image: my-memory-app:latest
resources:
requests:
memory: "24Gi"
cpu: "4"
limits:
memory: "24Gi"
cpu: "4"Example for GPU Workload with Tolerations
# ILLUSTRATIVE EXAMPLE - GPU workload with tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-training-job
spec:
replicas: 1
selector:
matchLabels:
app: gpu-training
template:
metadata:
labels:
app: gpu-training
spec:
# Toleration required to schedule on GPU nodes with the nvidia.com/gpu taint
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: ml-training
image: my-gpu-training:latest
resources:
requests:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "2"
limits:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "2"Key Mapping
| Cluster Autoscaler | Karpenter | Notes |
|---|---|---|
| Instance types in node groups | requirements in NodePool | Use operators like In for flexibility, and karpenter.k8s.aws/instance-category for broader selection |
| Spot configuration | karpenter.sh/capacity-type in requirements | List “spot” first in values to prioritize it over on-demand |
| Min/Max/Desired capacity | Use resource limits in NodePool | Karpenter uses resource-based limits (CPU/memory) instead of static min/max node counts, scaling based on actual demand |
| Scale-down behavior | Use disruption.consolidationPolicy in NodePool | Cluster Autoscaler scales down based on utilization thresholds; Karpenter uses consolidation policies (WhenEmpty/WhenEmptyOrUnderutilized) for more flexible scale-down |
| Labels on node groups | template.metadata.labels | Preserved in NodePool definition |
| Taints on node groups | template.spec.taints | Same format, preserved in NodePool |
| Volume configuration | blockDeviceMappings in NodeClass | Centralized in the NodeClass with additional security options like encryption |
| Security groups | securityGroupSelectorTerms in NodeClass | Centralized in the NodeClass |
| IAM roles | role in NodeClass | Simplified reference in NodeClass |
| SSH key pairs | UserData in NodeClass or SSM | Modern approach uses SSM/EC2 Instance Connect instead of key pairs |
| Multiple node groups | NodePool weighting + specialized NodePools | Use weight field to prioritize NodePools and create specialized pools for specific workloads |
| Node rotation (refreshing nodes) | expireAfter in NodePool | Cluster Autoscaler has no built-in rotation; Karpenter allows automatic node replacement via expiration time |
For more details, see the official Karpenter documentation .