Skip to Content
GPU OptimizationInstallation

Installation

Quick Installation

Add a single flag to install with automatic GPU discovery and cloud configuration:

--set gpu.enabled=true

Advanced Installation

For custom configuration options (e.g. setting nodeSelector), follow DCGM Exporter helm reference.

GPU Device List Strategy

ScaleOps supports all GPU device list strategies, which control how GPU devices are exposed to containers. ScaleOps automatically adjusts automated pods based on the detected cluster environment — no manual configuration is required.

Note - Mixed strategies: Automation is not supported when different nodes in the same cluster use different device list strategies.

Supported Strategies

StrategyDescription
envvarGPU devices are exposed via environment variables (e.g., NVIDIA_VISIBLE_DEVICES). Default when using nvidia-device-plugin directly (without the GPU Operator).
volume-mountsGPU devices are exposed via volume mounts.
cdiGPU devices are exposed using the Container Device Interface (CDI) standard. Also used when NRI (Node Resource Interface) is enabled. Default when using the NVIDIA GPU Operator (NRI is disabled by default).

When Volume Mounts Are Added to Automated Pods

ScaleOps automatically adds volume mounts to automated pods in the following cases:

  • GKE — volume mounts are added for accessing the GPU driver location.
  • Bottlerocket OS with volume-mounts strategy — volume mounts are added to support the volume-mounts device list strategy. Note that cdi (default) and envvar are also supported on Bottlerocket and do not require volume mounts.
  • MPS-enabled pods — when MPS is enabled in the workload policy, volume mounts for the MPS pipe directory are added to MPS capacity pods and automated MPS pods.