A DevOps transformation requires a cultural shift so that teams can build new skills and motivations for the type of cross-team work required in a true DevOps practice. The transformation can be difficult when the people involved do not see the benefits of change as a clear objective.
Service level objectives (SLOs) provide a powerful mechanism to codify the goals of a DevOps team in a way that can be measured and shared. They also provide clear boundaries on service expectations that help teams achieve greater velocity and freedom in experimenting with new approaches.
This tutorial defines SLOs for successful service delivery objectives and utilize New Relic instrumentation to surface the current performance metrics relative to those objectives. Measurable SLOs and visibility into your current progress against those SLOs ensure that you will be able to properly assess future optimization efforts.
Service level components
An SLO is an agreed upon means of measuring the performance of your service. The SLO defines a target value of a specified quantitative measure, which is called a service level indicator (SLI); for example:
SLOs clarify a target value for SLIs; for example:
- Average response time should be less than 200 ms
- 95% of requests should be completed within 250 ms
- Availability of the service should be 99.99%
Logically group SLOs together to provide an overall boolean indicator of whether or not the service is meeting expectations. For example, a helpful SLO for alerting purposes could be:
95% of requests completed within 250 ms AND availability is 99.99%
|Service level components||Example values|
|SLI (Indicator)||HTTP status codes|
|SLO (Objective)||< 1% HTTP 500s over 30 days|
|SLA (Agreement)||For every additional .1% of HTTP 500s, 5% refund of total contract|
Value stream mapping can be a useful exercise to work through before setting SLOs. Work with your teams to clarify key components of your service and the appropriate metrics. Use these inputs as starting points for this tutorial.
- Learn about SLOs, SLIs, and SLAs from the Google Cloud Platform blog.
- Learn how New Relic has applied SLOs and SLIs into its reliability practices form this SREcon18 Americas presentation (approximately 21 minutes).
1. Build an inventory of services requiring SLOs
Start defining SLOs for your application by first taking an inventory of the services that your application provides to both your internal and external customers.
- Draft a list of services. Make the scope of services you consider as comprehensive as possible.
- Engage your team members and other stakeholders to validate the list for completeness.
- Segment your application stack to understand the potential components that might require SLOs.
For example, most applications can be segmented as:
- Application (backend/microservices)
- Dependency services (such as the message queue)
- Underlying servers
This example lists components that would benefit from SLOs:
|Customer type||Component name||Owner||Language stack||Operating system|
|External||Service 1||John D.||Java||RHEL 6|
|Internal||Service 2||Jane A.||.NET||Win2003 R2|
|External||Website||Jane A.||Classic ASP||Win2000|
|Internal||MS SQL||Dave Z.||n/a||Win2003 R2|
Building a definitive list of services that require an SLO can be challenging, because an application often consists of many endpoints with complex interdependencies.
Begin your SLO journey with pragmatism. Start by defining a broader, simpler set of SLOs that are driven by what your customers care about most and what your team can control. As your teams better align around SLOs, you can then begin to fine-tune and add more complexity.
2. Research customer expectations for SLOs
Once you have an inventory of services, begin to gather the information you need to define the SLOs for those services. Interviews with customers that depend on your services are often valuable for understanding service expectations. For example, to define SLOs for internal teams, New Relic, ask questions such as:
- If possible, can you broadly categorize the types of requests we can expect from you and your service?
- To what extent do you or your service depend on timely responses to requests?
- Are there requests for which response time is not critical?
- How does your service handle unavailable dependencies or data?
- What is the maximum amount of unavailable data that your service can handle?
- At what threshold does your service fail if a request takes too long?
- What are acceptable rates of errors?
- What would a SLA look like between our product and yours?
Existing usage data can also be a helpful research input.
3. Define SLOs
Using the research on customer expectations that you gathered, draft a focused set of SLOs. New Relic recommends setting SLOs against one or more of the following SLIs:
- Application availability percentage
- Average response time
- Response percentile
- Error rate
- Apdex value
Also, consider instrumenting and tracking the following SLIs:
- Throughput (peak and trough)
- Database call count and duration
- DNS and SSL timing
- DOM processing and page rendering
- Mean-time-to-detection (MTTD)
For a more comprehensive list of potential areas to measure, see Measuring DevOps.
Recommendation: To determine if your application is performing to customer expectations:
- Consider combining multiple SLIs (for example, availability and response time) into one SLO.
- Aim to define a consistent set of conditions across all of the services in your list.
- Consult your team and stakeholders to validate that the SLOs you set are reasonable, consistently attainable (even if you are not currently meeting them), and aligned to customer expectations.
After you finish this step, you should have a set of well-defined SLOs and SLIs.
4. Determine what can be instrumented
Now you are ready to deploy agents or monitors to establish a performance baseline for the SLIs you created. With proper instrumentation in place, you will have visibility into the performance indicators that matter for your team and your customers. In addition, you will also have a clear understanding of how to meet your SLOs.
- Identify the service components your team will optimize.
- Verify which application tiers meet New Relic monitoring requirements.
- To ensure you have robust baselines from which to work, determine the level of instrumentation that is possible (or allowed) within your organization.
There may be situations where instrumentation of the current on-premise environment is not viable. For example, firewalls with certain settings may not permit New Relic agents to transmit data.
Recommendation: If the application has a web front end in these situations, use New Relic Synthetics. Synthetics offers non-agent monitoring while still providing the ability to establish a baseline.
To instrument the example applications and components in this tutorial, use these New Relic products:
- New Relic products
Customer type Component name Tier owner Language stack Server OS New Relic products External Service 1 John D. Java RHEL 6 APM, Infrastructure, Synthetics Internal Service 2 Jane A. .Net Win2003 R2 APM, Infrastructure Internal ActiveMQ John D. Java AIX APM, Plugins External Website Jane A. Classic ASP Win2000 Synthetics Internal MS SQL Dave Z. NA Win2003 R2 Infrastructure, On-host integration
- APM installation
After reviewing the compatibility and requirements for New Relic APM, install the APM agent on your application stack. Steps for installing APM agents vary based on language agent type. Follow the installation procedures for specific APM agents.
- Infrastructure installation
After reviewing the requirements for New Relic Infrastructure, follow the installation procedures to install the Infrastructure agent on instances that host your applications. The New Relic Infrastructure agent requires the following host permissions:
- Infrastructure on-host integrations
To gain extended visibility into applications that your code depends on, deploy on-host integrations based on their availability. New Relic supports several commonly used application components, including MySQL, Apache, NGINX, and more. For more information, see New Relic's on-host integrations documentation.
New Relic Synthetics is a suite of automated, scriptable tools to monitor your websites, critical business transactions, and API endpoints. Follow the procedures to create a simple browser check. Be sure to verify that your website URL is accessible from the Synthetics public network locations.
New Relic Browser provides deep insights into how your users are interacting with your application or website. Browser complements Synthetics with data based on actual user experiences, which is useful in discerning how DevOps efforts are ultimately improving the experience for the customer. For more information, see the compatibility and requirements, then install the New Relic Browser agent.
The growing role of mobile apps in customer experience often spurs new performance data needs. Installation of New Relic Mobile lets DevOps teams instrument iOS and Android applications to gain a fuller understanding of service delivery quality.
- Developers can use the New Relic Plugins product to create plugins that monitor numerical metrics provided by external services, hosts, or equipment. You can install existing plugins from Plugin Central, or plan, develop, test, and create your own plugins.
5. Review the default metrics
After you deploy the agents and monitors, use service maps to review the default metrics that New Relic captures. For example, a typical service map show many of the common SLIs that application teams rely on, including response time, Apdex, throughput, and error rate metrics from APM. It also shows page load time, Ajax response, throughput, and error rate from Browser.
6. Set up custom instrumentation
To close any remaining gaps in visibility for your SLIs, use custom instrumentation. New Relic provides several avenues for adding custom instrumentation, including:
- Making API calls to agents from inside your source code
- Packaging XML-based custom instrumentation modules with deployed applications
- Adding UI-based instrumentation without a code deploy
In addition, you can add custom attributes to each transaction event that match application performance factors to critical business information. Then you can track those attributes in Insights dashboards. For more information, see the custom instrumentation documentation for your application:
7. Create Insights dashboards to track SLIs
Once you implement the appropriate instrumentation, it is easy to visualize your service level indicators with New Relic Insights dashboards. Insights provides a single location to query and view all the data that New Relic products gather. For example, Insights helps visualize the following data gathered from New Relic products:
- Infrastructure: Use default Infrastructure events and attributes for your systems, processes, events, storage, and network; Infrastructure integrations; and custom attributes.
- APM: Use
- Browser: Use
- Mobile: Use several Mobile event types.
- Synthetics: Use
To query and view the data from the SLIs you selected for baselining:
- Use the Insights Data explorer to create widgets for your dashboards.
- Create dashboards that include widgets for the SLI baseline data.
- Use these widgets and dashboards to establish team dashboards that you can share and use to conduct operations reviews.
The metrics you capture will become your application's baseline. Share your Insights dashboard with your application team and stakeholders to provide visibility into what is happening with your application and to monitor future performance.