After creating your SLIs, you can use them to regularly improve your system in many different ways. Below, you'll see how to use your SLIs across many different areas of the New Relic platform.
Tracking your service level objectives
You should treat service levels as a recurring practice, just like testing, alerting, game days, and others. You could think of them as a tool you use to measure the "health" of your systems. But like all tools, service levels require calibration.
Include the service level practice in your team's process. We recommend the following based on our experience using service levels, and you should adjust them for your specific team requirements:
- Do a periodic review of the service levels, and pay close attention to:
- Do the SLIs reflect incidents and pages?
- What's your error budget for a week?
- If it's too low, investigate what caused a drop, using the "Analyze" feature to find bad events that caused it,
- If it's 100%, make sure your indicator is correct and the SLO is aggressive enough. Being at 100% indicates the SLO is too safe.
- What is the trend that you observe in various time periods (1d/7d/28d).
- Keep an eye on SLIs during game days. SLIs should reflect the impact, just like your alerts do.
- When you have a drop of error budget on production, evaluate why it didn't happen on staging.
Business value
Ultimately, service level management focuses on reducing the cost of business impacting incidents. However, service levels also help you quantify estimated revenue loss during incidents as well as estimated revenue at risk for subscription-based businesses.
For example, you can estimate revenue loss for revenue generated by transaction, such as online retail, as well as penalties paid if your business has service level agreement contracts with penalties built-in.
Revenue at risk is for subscription-based (SaaS) business models where each customer has a monthly or annual subscription value. You can easily estimate the number of customers impacted and their subscription revenue by period to calculate "revenue at risk."
Tip
Subscription businesses can also have penalties within a service level agreement contract, which should be included as stated below.
Automation
Once you have established what does and doesn't work for your stakeholders, you can begin to scale SLM with automation. You can start learning about automating service level management by studying the New Relic Terraform library.
Alert quality management
Alert quality management is another observability maturity practice that compliemnts service level management. The value of both alerting quality data combined with service level data comes from seeing if your alert policies align with real impact or just creating noise. You'll be able to validate good alerts, missing alerts, and just noisy alerts.
You can do this by creating a custom dashboard with an SLI compliance query side-by-side with an alerting quality query. Just check out our alert quality management doc for more details.
Quantifying the direct cost of service level agreement breaches
Determine the cost of previous breaches. For example, online retail businesses know the estimated revenue loss per minute during service loss (downtime). Legal can tell you the penalty costs of service level agreement (SLA) contract breaches. Both losses can be easily estimated in real-time using New Relic data on service level breaches.
Quantifying the revenue opportunity costs of service level breaches
Determine the three variables below.
- (A) number of breaches that trigger penalties or revenue loss
- (B) average duration of breaches
- (C) average penalty or revenue loss per minute/hour
Multiply those three variables (A B C) to calculate total revenue opportunity to recover.
Quantifying revenue leakage
Determine the two variables below.
- (A) Total revenue (per period)
- (B) Total penalties payments made to customers (per same period as A)
Divide B / A to calculate revenue leakage % rate.
What's next?
If you want to delve even further into service level management, we highly recommended our free interactive online course on service levels.