Skip to Content

Migration Example: Cluster Autoscaler to Karpenter

This page provides an illustrative example of how you might migrate from Cluster Autoscaler to Karpenter. Note: The YAML below is for conceptual understanding only and will need to be adapted to your environment and Karpenter version.

Example Conversion

Below is an example of converting several Cluster Autoscaler node groups with different CPU:memory ratios and instance types into a more flexible Karpenter configuration.

Cluster Autoscaler (eksctl) Example - AWS

# ILLUSTRATIVE EXAMPLE - cluster-autoscaler-multiple-nodegroups.yaml --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: my-cluster region: us-west-2 managedNodeGroups: # Memory-optimized on-demand node group - name: memory-optimized-ondemand instanceType: r5.2xlarge # High memory:cpu ratio minSize: 1 maxSize: 5 desiredCapacity: 1 volumeSize: 100 volumeType: gp3 securityGroups: attachIDs: ["sg-0123456789abcdef0"] ssh: allow: true publicKeyName: my-keypair tags: NodeGroupType: "memory-optimized" Environment: "production" iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore # Compute-optimized on-demand node group - name: compute-optimized-ondemand instanceType: c5.2xlarge # High cpu:memory ratio minSize: 1 maxSize: 5 desiredCapacity: 1 volumeSize: 100 volumeType: gp3 securityGroups: attachIDs: ["sg-0123456789abcdef0"] ssh: allow: true publicKeyName: my-keypair tags: NodeGroupType: "compute-optimized" Environment: "production" iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore # Memory-optimized spot node group - name: memory-optimized-spot instanceTypes: ["r5.2xlarge", "r5a.2xlarge", "r5d.2xlarge"] minSize: 0 maxSize: 10 desiredCapacity: 2 spot: true volumeSize: 100 volumeType: gp3 securityGroups: attachIDs: ["sg-0123456789abcdef0"] ssh: allow: true publicKeyName: my-keypair tags: NodeGroupType: "memory-optimized-spot" Environment: "production" iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore # Compute-optimized spot node group - name: compute-optimized-spot instanceTypes: ["c5.2xlarge", "c5a.2xlarge", "c5d.2xlarge"] minSize: 0 maxSize: 10 desiredCapacity: 2 spot: true volumeSize: 100 volumeType: gp3 securityGroups: attachIDs: ["sg-0123456789abcdef0"] ssh: allow: true publicKeyName: my-keypair tags: NodeGroupType: "compute-optimized-spot" Environment: "production" iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore # GPU nodes for machine learning workloads (with taints) - name: gpu-workloads instanceTypes: ["g4dn.xlarge", "g4dn.2xlarge"] minSize: 0 maxSize: 3 desiredCapacity: 0 volumeSize: 100 volumeType: gp3 taints: - key: "nvidia.com/gpu" value: "true" effect: "NoSchedule" labels: workload-type: "gpu" iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

Equivalent Karpenter Configuration

# ILLUSTRATIVE EXAMPLE - karpenter-consolidated-configuration.yaml --- apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata: name: consolidated spec: # AL2023 is recommended as AL2 will be deprecated with Kubernetes 1.33 amiFamily: AL2023 # Alternative: Use an alias for better versioning control # amiSelectorTerms: # - alias: al2023@v20240807 role: "KarpenterNodeRole-my-cluster" # Select subnets by tags - can also select by IDs or mix approaches subnetSelectorTerms: - tags: karpenter.sh/discovery: "my-cluster" # To select only private subnets add: # private-subnet: "true" securityGroupSelectorTerms: - id: "sg-0123456789abcdef0" tags: Environment: "production" karpenter.sh/discovery: "my-cluster" Name: "karpenter-node-{{.Name}}" # Dynamic naming template blockDeviceMappings: - deviceName: /dev/xvda ebs: volumeSize: 100Gi volumeType: gp3 deleteOnTermination: true encrypted: true # Enable encryption for security compliance # kmsKeyID: "arn:aws:kms:region:account:key/key-id" # Optional KMS key # Example UserData for SSH access via SSM userData: | MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="//" --// Content-Type: text/x-shellscript; charset="us-ascii" #!/bin/bash # Enable SSM agent for secure access systemctl enable amazon-ssm-agent systemctl start amazon-ssm-agent --//-- --- # General purpose node pool for most workloads apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: general-workloads spec: # Higher weight means higher priority when multiple NodePools can satisfy requirements weight: 100 template: metadata: labels: Environment: "production" spec: nodeClassRef: kind: EC2NodeClass name: consolidated # Set node expiration to ensure regular rotation for security and updates expireAfter: 720h # 30 days # Requirements section determines what type of nodes Karpenter will provision requirements: # Use either specific instance types OR category/generation for better flexibility - key: "karpenter.k8s.aws/instance-category" operator: In values: ["c", "r", "m"] - key: "karpenter.k8s.aws/instance-generation" operator: Gt values: ["4"] # Capacity type ordering - Karpenter tries the first value (spot) before falling back to on-demand # This ordering provides cost savings while ensuring workloads can still run if spot is unavailable - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"] # Spot first for cost savings # Architecture requirement - only amd64 in this example - key: "kubernetes.io/arch" operator: In values: ["amd64"] # Disruption configuration controls how Karpenter manages node termination disruption: # WhenEmptyOrUnderutilized allows more aggressive consolidation for cost savings consolidationPolicy: WhenEmptyOrUnderutilized # Time to wait after a pod leaves before considering node for consolidation consolidateAfter: 30s # Resource limits constrain the total size of the pool # Limits prevent Karpenter from creating new instances once the limit is exceeded limits: cpu: "1000" memory: 1000Gi --- # Specialized node pool for GPU workloads, matching the Cluster Autoscaler GPU node group apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: gpu-workloads spec: # Lower weight than general-purpose pool - avoids using GPUs for non-GPU workloads weight: 50 template: metadata: labels: workload-type: "gpu" spec: nodeClassRef: kind: EC2NodeClass name: consolidated # Same taints as in Cluster Autoscaler configuration to ensure only GPU workloads use these nodes taints: - key: "nvidia.com/gpu" value: "true" effect: "NoSchedule" # Fixed 7-day expiry for GPU nodes to ensure they're regularly refreshed expireAfter: 168h requirements: # Specific GPU instance types matching Cluster Autoscaler configuration - key: "node.kubernetes.io/instance-type" operator: In values: ["g4dn.xlarge", "g4dn.2xlarge"] # GPU nodes often use on-demand for stability - key: "karpenter.sh/capacity-type" operator: In values: ["on-demand"] disruption: # Less aggressive consolidation for GPU workloads consolidationPolicy: WhenEmpty # GPU-specific resource limits limits: cpu: "100" memory: 100Gi nvidia.com/gpu: 4

Pod Scheduling Example

# ILLUSTRATIVE EXAMPLE - Pod scheduling apiVersion: apps/v1 kind: Deployment metadata: name: memory-intensive-app spec: replicas: 3 selector: matchLabels: app: memory-intensive template: metadata: labels: app: memory-intensive spec: # No nodeSelector or affinity needed - Karpenter will automatically # provision appropriate nodes based on pod resource requirements containers: - name: memory-app image: my-memory-app:latest resources: requests: memory: "24Gi" cpu: "4" limits: memory: "24Gi" cpu: "4"

Example for GPU Workload with Tolerations

# ILLUSTRATIVE EXAMPLE - GPU workload with tolerations apiVersion: apps/v1 kind: Deployment metadata: name: gpu-training-job spec: replicas: 1 selector: matchLabels: app: gpu-training template: metadata: labels: app: gpu-training spec: # Toleration required to schedule on GPU nodes with the nvidia.com/gpu taint tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule" containers: - name: ml-training image: my-gpu-training:latest resources: requests: nvidia.com/gpu: 1 memory: "8Gi" cpu: "2" limits: nvidia.com/gpu: 1 memory: "8Gi" cpu: "2"

Key Mapping

Cluster AutoscalerKarpenterNotes
Instance types in node groupsrequirements in NodePoolUse operators like In for flexibility, and karpenter.k8s.aws/instance-category for broader selection
Spot configurationkarpenter.sh/capacity-type in requirementsList “spot” first in values to prioritize it over on-demand
Min/Max/Desired capacityUse resource limits in NodePoolKarpenter uses resource-based limits (CPU/memory) instead of static min/max node counts, scaling based on actual demand
Scale-down behaviorUse disruption.consolidationPolicy in NodePoolCluster Autoscaler scales down based on utilization thresholds; Karpenter uses consolidation policies (WhenEmpty/WhenEmptyOrUnderutilized) for more flexible scale-down
Labels on node groupstemplate.metadata.labelsPreserved in NodePool definition
Taints on node groupstemplate.spec.taintsSame format, preserved in NodePool
Volume configurationblockDeviceMappings in NodeClassCentralized in the NodeClass with additional security options like encryption
Security groupssecurityGroupSelectorTerms in NodeClassCentralized in the NodeClass
IAM rolesrole in NodeClassSimplified reference in NodeClass
SSH key pairsUserData in NodeClass or SSMModern approach uses SSM/EC2 Instance Connect instead of key pairs
Multiple node groupsNodePool weighting + specialized NodePoolsUse weight field to prioritize NodePools and create specialized pools for specific workloads
Node rotation (refreshing nodes)expireAfter in NodePoolCluster Autoscaler has no built-in rotation; Karpenter allows automatic node replacement via expiration time

For more details, see the official Karpenter documentation.