Skip to Content

Node Pools Optimization (Beta)
Available in v1.17.0+

ScaleOps provides comprehensive tools for managing and optimizing Node Pools in your Karpenter-based cluster, helping you reduce infrastructure costs while maintaining performance.

Node pools are groups of nodes with shared configurations that are managed as a unit within your Kubernetes cluster. Optimizing these pools is essential for maximizing resource utilization and achieving cost efficiency. For detailed information about node pools, see the Karpenter Node Pools documentation.

Common Practices for Organizing Node Pools

Effectively organizing your Karpenter node pools can help you achieve better performance, cost efficiency, and manageability. Karpenter offers a fundamentally different approach compared to Cluster Autoscaler — instead of creating and managing numerous node groups with predefined instance types, Karpenter uses a more dynamic, workload-driven approach with fewer, more flexible node pools.

Here are some recommended approaches to consider when structuring your node pools:

Consider a Versatile Default Node Pool

Many organizations find it useful to start with a flexible default node pool that can accommodate the majority of general workloads:

  • Configure it with a variety of instance types (c, m, r families) to allow Karpenter to select the most cost-effective instances
  • Use the WhenEmptyOrUnderutilized consolidation policy to maximize cost savings
  • Set appropriate expiration policies (TTL) to ensure node rotation and keep your infrastructure up-to-date

Options for Managing Critical Services

For operational stability, you might consider:

  • Maintain a small, separate node infrastructure (using managed node groups or a dedicated Karpenter node pool) for system-critical workloads
  • Run core cluster services like CoreDNS, metrics-server, and other essential components on this dedicated infrastructure
  • Configure node affinity for these critical services to ensure they run on appropriate nodes

Create Purpose-Specific Node Pools

Unlike Cluster Autoscaler, which often requires separate node groups for each instance type, Karpenter allows you to create a smaller number of well-defined pools based on:

  • Resource profiles: Separate node pools for compute-intensive, memory-intensive, or GPU workloads
  • Team or application boundaries: Use labels and node affinity to direct specific applications to appropriate node pools
  • Compliance requirements: Separate workloads with specific compliance needs (e.g., PCI, HIPAA)

Use Node Pool Weights for Clear Prioritization

When creating multiple node pools that could potentially satisfy the same workloads:

  • Assign weights to establish clear provisioning priorities
  • Higher weights (e.g., 10-100) will be prioritized over lower weights (e.g., 1-9)
  • This provides more predictable behavior than using only mutually exclusive node pools

Apply Taints and Tolerations

For specialized or expensive resources:

  • Apply appropriate taints to node pools designated for specific workloads (e.g., dedicated=gpu:NoSchedule)
  • Add corresponding tolerations only to pods that should run on those nodes
  • This prevents general workloads from consuming expensive specialized resources

Consider Spot Instance Strategies

Karpenter offers several approaches for Spot instance management that differ from Cluster Autoscaler. While Cluster Autoscaler typically requires separate node groups for each instance type, Karpenter can work with a broad range of instance types for Spot requests, potentially increasing availability and reducing interruption impact in your EKS cluster.

Some configuration options you might consider:

  • Enable capacity type: Add karpenter.sh/capacity-type: spot to your node pool requirements
  • Avoid overly constraining instance types: For Spot workloads, consider allowing a wide variety of instance types and sizes to improve availability and pricing options
  • Leverage multiple availability zones: Configure your node pool to utilize multiple AZs to increase the diversity of available Spot capacity
  • Set appropriate tolerations: Add interruption tolerations to workloads suitable for Spot instances

For workloads that can handle interruptions, you might explore creating a dedicated Spot node pool with appropriate tolerations.

Establish Resource Limits for Cost Control

To prevent unexpected scaling and costs:

  • Set CPU and memory limits on each node pool (spec.limits)
  • Create AWS billing alerts to monitor your compute spend
  • Start with conservative limits and gradually adjust based on actual usage patterns

These practices can be helpful whether you’re setting up a new cluster or transitioning from other scaling solutions like Cluster Autoscaler.

Migration from Cluster Autoscaler

When migrating from Cluster Autoscaler to Karpenter, consider these important differences in node management and follow a phased approach:

Migration Process

  1. Start with coexistence: Deploy Karpenter alongside Cluster Autoscaler, initially handling only new or specific workloads
  2. Gradually shift workloads: Move workloads from Cluster Autoscaler-managed nodes to Karpenter-managed pools in batches, starting with non-critical applications
  3. Maintain core stability: Keep critical system services on static managed node groups until you’re comfortable with Karpenter’s behavior
  4. Scale down carefully: As workloads migrate to Karpenter, gradually reduce the size of your Cluster Autoscaler managed node groups

Key Configuration Differences

  • Launch Template Handling: Unlike Cluster Autoscaler which relies heavily on launch templates, Karpenter manages node creation directly. When migrating, focus on translating your launch template configurations to NodeClass properties.

  • IAM Role Configuration: Ensure your Karpenter NodeClass references the appropriate IAM role that has permissions for both EKS cluster access and any additional AWS resources your workloads require.

  • Node Labels Preservation: If you rely on specific node labels in Cluster Autoscaler, ensure these are configured in your NodePool template to maintain application compatibility during migration.

  • Pod Disruption Budgets: Review your PDB configurations to ensure they properly protect applications during the migration process when nodes are being replaced.

Refer to the Migration Example for more details.

Creating a New Node Pool

Prerequisites: Before creating a node pool, you must first have a node class configured. Node classes define the infrastructure blueprint that Karpenter will use to provision nodes. See Node Classes for details on creating and configuring node classes.

To create a new node pool, click the “Create Node Pool” button located in the top-right corner of the Node Pools table.

Create Node Pool Button

When you press the “Create Node Pool” button, you’ll be guided through a simple creation process.

Step 1: Choose Unique Name

Create Node Pool Name

First, you’ll see a modal where you need to enter a unique name for your new node pool. The system validates that the name doesn’t already exist.

Step 2: Configure Specifications

After entering the name, you’ll proceed to the configuration page with three main tabs. All tabs come pre-populated with ScaleOps recommended default values, so you don’t need to configure anything unless you want to make changes:

Instance Types Tab

Create Node Pool Instance Types

Default instance type configuration includes:

  • Instance Category: Pre-configured to c, m, r optimal instance selection

Disruption Policy Tab

Create Node Pool Disruption

Default disruption and consolidation policies are already configured with recommended settings for optimal cost savings while maintaining workload stability:

  • Consolidation Policy: Set to WhenEmptyOrUnderutilized for maximum cost efficiency
  • Disruption Budget: Set to always allow budget

These defaults are tuned to allow Karpenter to efficiently consolidate underutilized nodes while ensuring your applications remain available during optimization processes.

General Tab

Create Node Pool General

Default settings are already configured for:

  • Life cycle: Pre-selected between Spot and On Demand instances
  • Architecture: Default processor architecture (amd64, arm64)
  • Operating system: Default OS selection (Linux)

All tabs come pre-populated with ScaleOps recommended values that are optimized for your cluster’s requirements and cost efficiency.

Saving Your Node Pool

You can simply save the node pool with these defaults or modify any settings if needed.

Managing Node Pools with ScaleOps

ScaleOps provides comprehensive tools for managing and optimizing Node Pools in your Karpenter-based cluster, helping you reduce infrastructure costs while maintaining performance.

The dashboard displays all Node Pools in your cluster with the following key information:

ColumnDescription
WeightThe relative priority of the node pool during scheduling decisions (higher values indicate higher priority)
Available SavingsPotential cost reductions achievable by optimizing instance types and disruption budgets
Life CycleInstance provisioning type (On-Demand or Spot instances)
Instance TypesAvailable EC2 instance types for this pool, configurable by category, family, or specific types
Disruption PolicyConfiguration for node consolidation (Empty nodes only or Underutilized nodes as well)
Disruption BudgetsLimits on concurrent node disruptions to maintain application availability during scaling events

Optimization Strategies

Instance Types Optimization

Each EC2 instance type provides a specific ratio of CPU to memory along with unique hardware characteristics. With hundreds of instance types available from AWS, selecting the most cost-efficient options for your workload mix can be challenging.

While Karpenter dynamically selects optimal instance types at runtime based on your Node Pool configuration, ScaleOps enhances this process by recommending improvements to your instance type requirements. These recommendations:

  1. Provide Karpenter with maximum flexibility to select instances with optimal CPU-to-memory ratios based on current workload needs
  2. Preserve critical hardware characteristics you’ve specified (e.g., network-optimized instances)
  3. Estimate potential cost savings from each recommendation

For example, if your Node Pool currently includes only compute-optimized instance types (“c” category), ScaleOps might recommend adding balanced (“m”) and memory-optimized (“r”) options to better accommodate diverse workloads and reduce resource waste.

Setting All Recommendations at Once

You can apply all ScaleOps recommendations simultaneously for maximum optimization efficiency:

Instance Types Set Recommendations

This option allows you to implement all suggested instance type optimizations in one action, maximizing your potential cost savings and resource efficiency.

Manual Selection of Recommendations

Alternatively, you can choose to apply only specific recommendations that fit your requirements:

Instance Types Manual Selection

This gives you granular control over which optimizations to implement, allowing you to balance cost savings with your specific workload requirements and constraints.

Optimized View

After applying optimizations, you can see the improved configuration:

Instance Types Optimized

The optimized view shows your updated instance type configuration with the applied recommendations, highlighting the improvements made to your node pool.

YAML Comparison

You can compare the original and optimized configurations by hovering over the comparison icon:

Instance Types Compare

This feature provides a side-by-side comparison of your original YAML configuration versus the optimized version, making it easy to understand exactly what changes were applied.

For more background information on instance types, see Instance Types.

Disruption Policy Optimization

Karpenter implements a mechanism called “Consolidation” that proactively reduces the cluster cost by removing nodes when they become empty or underutilized, as well as by replacing expensive nodes with cheaper nodes (as long as all the workloads can be placed). In order to avoid too much disruption to the application, the user can define “disruption budgets” that control when and how many nodes can be disrupted (terminated or replaced) simultaneously. While strict budgets help maintain application availability, they can prevent cost-saving optimizations like instance consolidation.

ScaleOps provides intelligent recommendations for adjusting your disruption budgets to:

  1. Make them less restrictive while preserving cluster reliability
  2. Enable more effective resource scale-down during periods of low utilization
  3. Generate meaningful cost savings

For example, if you’ve configured a cluster with a scheduled disruption budget that runs every 6 hours for 5h duration, ScaleOps will recommend switching to “always allow” because it’s more cost effective.

Disruption Budget

General Node Pool Settings

In addition to Instance Types and Disruption Policy, the user can use the “General” section of the Node Pool panel to view and customize node pool weight and CPU/memory limits, as well as taints associated with individual nodes.

General

Export YAML for GitOps Integration

For teams managing infrastructure through GitOps workflows using tools like Argo CD or Terraform, ScaleOps provides an Export YAML feature that allows you to get the node pool YAML configuration and use it in your own deployment pipelines.

When viewing the YAML tab, you can click Copy or Download and select either to download/copy the full YAML or specification only.

Export YAML

Alternatively, you can click copy the YAML to clip board or Download the YAML file via the Actions button.

Actions Download YAML