Skip to Content
IntegrationsAlert Manager

Alert Manager

ScaleOps seamlessly integrates with Alertmanager to provide comprehensive monitoring of cluster metrics, system events, and resource utilization patterns.

Enable Alertmanager Integration

Add the following helm values to your helm values.yml file:

prometheus: alerts: enabled: true host: myalertmanager.com

Built-in Alerts

ScaleOps offers a comprehensive set of built-in alerts that you can selectively enable or disable based on your monitoring needs.

prometheus: alerts: additionalRules: [] rules: workloadRequestIncrease: enabled: true annotations: summary: "Workload resource requests increased by over 300% in the last 24 hours" cpuThrottling: enabled: true annotations: summary: "CPU throttling is above 90% for 15 minutes" outOfMemory: enabled: true annotations: summary: "More than 5 OOM events in the last 60 minutes" overProvisioned: enabled: true annotations: summary: "Over 70% of workloads are over provisioned for 24 hours" underProvisioned: enabled: true annotations: summary: "Over 40% of workloads are under provisioned for 24 hours" nodeUtilization: enabled: true annotations: summary: "Node CPU or memory usage is over 90% for 15 minutes" failedCreateEvent: enabled: false annotations: summary: "Failed create events" resourceQuotaPods: enabled: false annotations: summary: "Pod ResourceQuota usage exceeded 95% for 15 minutes" resourceQuotaRequestsCPU: enabled: false annotations: summary: "CPU requests quota usage is above 95% for 15 minutes" resourceQuotaRequestsMemory: enabled: false annotations: summary: "Memory requests quota usage is above 95% for 15 minutes" resourceQuotaLimitsCPU: enabled: false annotations: summary: "CPU limits quota usage is above 95% for 15 minutes" resourceQuotaLimitsMemory: enabled: false annotations: summary: "Memory limits quota usage is above 95% for 15 minutes" replicationControllers: enabled: false annotations: summary: "ReplicationControllers quota usage is above 95% for 15 minutes" countReplicaSets: enabled: false annotations: summary: "ReplicaSets quota usage is above 95% for 15 minutes"