To create and manage service levels requires the following:
- You must be a full platform user.
- You must have the capability for modifying and deleting events-to-metrics.
If you get the following errors, check your user permissions:
- The UI has disabled the option to save an SLI/SLO.
- The API returns the error message “Cannot query field
For New Relic organizations that have multiple accounts: Service levels can only be associated with a single account. If you're trying to create a service level for a workload with entities across multiple accounts, you may want to restructure the workloads so that all of their associated entities are in the same account. You can create a maximum of 500 SLIs on an account.
New Relic ingests data in many different ways and from very different sources. Each has its own individual flavor, creating many possibilities on how data is consumed. There are some scenarios where it is impossible to configure service levels due to the characteristics of the data:
- Subqueries. Subqueries are not supported.
- Addition of sum functions. While it's possible to use
SELECT sum(attributeA + attributeB), the expression
SELECT sum(attributeA) + sum(attributeB)is not supported.
Keep in mind these concepts when defining SLIs and SLOs.
Start by thinking of the highest-level key user experiences your team owns, then focus on underlying key user experiences until more granularity doesn't provide value. When choosing which SLs to start with, we recommend using a top-down approach, meaning start with the least granular ones, and create more granular ones only if necessary.
First of all, identify a "system boundary." This is a part of your system your users perceive as a "black box" of functionality. Some examples:
- In the case of an API, it might simply be a service.
- For a data pipeline, it might be a chain of services necessary to process data end-to-end.
Once you have established these top-level service levels, you might find that not all the endpoints of your service behave in the same way, and might want to split it further. For example:
- Login transactions might need a higher SLO on errors than a browsing one
- Duration of some operations is much higher than the rest
For example, at a high level, a key user experience at New Relic could be: a customer sends us telemetry data and that data is later available to be queried in our product API or UI.
For that user experience, we could create an SLO like:
|last 28 days||99.9%||latency||data ingested by a user is available to query in less than 1 minute|
Note, these kinds of user experiences typically involve more than one service and are spread across multiple team and org boundaries.
Increasing the granularity of underlying user experiences, another key user experience at New Relic could be: a customer can use a custom dashboard to visualize their telemetry data.
This SLO could look like:
|last 28 days||99.9%||availability||user interacts successfully with the dashboard UI|
As an example of taking the granularity too far, adding a chart widget in a dashboard is also a user experience. However, creating a specific SLO for this action doesn't provide additional value compared to the previous SLO about users successfully interacting with the dashboard UI.
In summary, use a top-down approach and start with the least granular service levels. Create more granular service levels only if necessary.
In the New Relic ecosystem, every service level is linked to another entity, which is any element in your stack that reports data to us, or that generates data that we have access to. The entity that a service level is related to determines where the SLI/SLO results show.
You can define SLIs on any NRDB event or dimensional metric that is reported to New Relic. Most custom events are not related to a single New Relic entity, but provide higher level business and user experience insights. In this case, you can still relate the SLI to a specific entity or to a workload.
Keep in mind that the SLI queries will need to be under the scope of the same account where the related entity lives in.
SLIs are defined as the percentage of good responses out of the total number of valid requests. Most often you’ll set up your SLIs by defining the valid and good pieces:
- A valid request is any request that you want to count as meaningful for your SLIs (for example, all transactions related to an endpoint that weren’t initiated by a health check).
- A good response is any response that you consider to provide a good output for the end-user or client service (for example, the service responded in less than 2 seconds, providing a good navigation experience for the end user).
Alternatively, you can define what you consider to be the bad responses instead:
- A bad response is any response that you consider to provide a bad output (for example, the service responded with a server error, causing the client to fail its flow). New Relic will automatically derive the count of good responses as
valid - bad.
Request-based SLOs are based on an SLI defined as the ratio of the number of good requests to the total number of requests. A request-based SLO is met when that ratio meets or exceeds the target for the compliance period.
In this section you’ll find some SLIs that are typically used to measure the performance of services and browser applications.
Transaction events, these SLIs are the most common for request-driven services:
Based on OpenTelemetry spans, these SLIs are the most common for request-driven services:
The following SLIs are based on Google's Browser Core Web Vitals.
You can create SLIs and SLOs from several places on in our UI:
- Go to one.newrelic.com > All capabilities > Service levels. You can associate the SLI with any entity across your accounts, including workloads.
- From the Service levels page in any Service, key transactions, Browser application, or Synthetic monitor. The SLI will be associated with that specific entity. If you use this starting point, New Relic will automatically create the most common service level indicators for this entity type, based on the latest available data.
- From the Service levels tab in any workload. You can associate the SLI with any entity in the workload, or the whole workload.
Data doesn't appear right away after creating an SLI. Expect a few minutes delay before seeing the first SLI attainment results. The data has 13 month retention by default.
Remember that service levels can only be associated with a single account. For details on that, see the requirements.
To create service levels, follow these steps:
Once you've created an SLI, you can edit it through the service levels list page, by clicking on the ... menu and then
Edit, as shown here:
or you can do that same thing through the summary page, by clicking
For information on how to optimize your SLM implementation, see our Observability maturity SLM guide.