Alerting Policies define predictive alerts to ensure compliance to your Service Level Objectives (SLOs) . To create a new Alerting Policy, complete the following steps.
-
Navigate to MENU > Services and find the service you want to add an Alerting Policy.
-
Click on the service to display its details.
-
Select the SLO to which you want to add a new Alerting Policy. If there are no SLOs defined for the service, you must first create a new Service Level objective.
-
In the Alerting Policies section, click CREATE NEW.
-
In the SLI field, select the Service Level Indicator template you want to use. This is just a template, you can modify its parameters as needed for your environment. After selecting an SLI, its description is displayed under the name of the SLI. Click the icon next to the name of the SLI to display its YAML configuration.
-
Enter the BURN RATE THRESHOLD above which you want to alert.
-
Enter the time-frame for which to calculate the burn rate threshold into the PRIMARY LOOKBACK WINDOW field.
-
(Optional) Configure a CONTROL LOOKBACK WINDOW to ensure that the alert is triggered while the error budget is being consumed.
-
(Optional) To alert only if the burn rate is above the specified threshold for a period, set the ALERT AFTER field. That way you can avoid triggering the alert for short peaks if otherwise the burn rate is normal.
-
Select the SEVERITY of the alert (ticket or page). You can use this field to route the alert in Prometheus Alertmanager.
-
Enter a NAME for the Alerting Policy, then click CREATE.
CR reference 🔗︎
This section describes the fields of the AlertingPolicy custom resource.
apiVersion: sre.banzaicloud.io/v1alpha1
kind: AlertingPolicy
spec:
burnRate:
conditionMetDuration: 2m0s
lookBackWindow: 1h0m0s
secondaryWindow: 5m0s
severity: page
sloRef:
name: movies-30d-rolling-availability
namespace: backyards-demo
threshold: '14.4'
apiVersion (string) 🔗︎
Must be sre.banzaicloud.io/v1alpha1
kind (string) 🔗︎
Must be AlertingPolicy
spec (object) 🔗︎
The configuration and parameters of the resource.
spec.burnrate (object) 🔗︎
Specifies the burn rate of the alerting policy.
conditionMetDuration (string) 🔗︎
To alert only if the burn rate is above the specified threshold for a period, set the conditionMetDuration field. That way you can avoid triggering the alert for short peaks if otherwise the burn rate is normal.
lookBackWindow (string) 🔗︎
The time-frame for which to calculate the burn rate threshold.
secondaryWindow (string) 🔗︎
A control lookback window to ensure that the alert is triggered while the error budget is being consumed.
severity (string) 🔗︎
The severity of the alert (ticket or page). You can use this field to route the alert in Prometheus Alertmanager.
sloRef (object) 🔗︎
The name and namespace of the Service Level Objective that the alerting policy will alert on. For example:
sloRef:
name: movies-30d-rolling-availability
namespace: backyards-demo
threshold (string) 🔗︎
The burn rate threshold above which you want to alert.
status (object) 🔗︎
The current state of the resource. This object is managed by Backyards.
Further information 🔗︎
- For further information, you can also watch the recording of our webinar on Tracking and enforcing SLOs on Kubernetes, or read the Tracking and enforcing SLOs on Kubernetes blog post.
- For detailed examples on burn-rate alerting strategies, see our Burn Rate Based Alerting Demystified