Outlier detection (NRQL alert)

Alerts offers NRQL conditions in three threshold types: static, baseline, and outlier. This document explains how the outlier threshold type works, gives some example use cases and NRQL queries, and explains how to create an outlier condition.

NRQL alerts do not affect Alerts policies for a Synthetic monitor. For example, muting a NRQL alert will not mute a Synthetic monitor's alerts.

What is outlier detection?

In software development and operations, it is common to have a group consisting of members you expect to behave approximately the same. For example: for servers using a load balancer, the traffic to the servers may go up or down, but the traffic for all the servers should remain in a fairly tight grouping.

The NRQL alert outlier detection feature parses the data returned by your faceted NRQL query and:

  • Looks for the number of expected groups that you specify
  • Looks for outliers (values deviating from a group) based on the sensitivity and time range you set

Additionally, for queries that have more than one group, you can choose to be notified when groups start behaving the same.

New Relic Alerts - outlier NRQL alerts
This visual aid will help you understand the types of situations that will trigger a violation and those that won't.

For more on the rules and logic behind this calculation, see Outlier detection rules.

Note: this feature does not take into account the past behavior of the monitored values; it looks for outliers only in the currently reported data. For an alert type that takes into account past behavior, see Baseline alerting.

Example use cases

These use cases will help you understand when to use the outlier threshold type. Note that the outlier feature requires a NRQL query with a FACET clause.

Notify if load-balanced servers have uneven workload

A load balancer divides web traffic approximately evenly across five different servers. You can set a notification to be sent if any server starts getting significantly more or less traffic than the other servers.

Example query:

SELECT average(cpuPercent) FROM SystemSample WHERE apmApplicationNames = 'MY-APP-NAME' FACET hostname
Notify if load-balanced application has misbehaving instances

Application instances behind a load balancer should have similar throughput, error rates, and response times. If an instance is in a bad state, or a load balancer is misconfigured, this will not be the case. Detecting one or two bad app instances using aggregate metrics may be difficult if there is not a significant rise in the overall error rate of the application.

You can set a notification for when an app instance’s throughput, error rate, or response time deviates too far from the rest of the group.

Example query:

SELECT average(duration) FROM Transaction WHERE appName = 'MY-APP-NAME' FACET host
Notify of changes in different environments

An application is deployed in two different environments, with ten application instances in each. One environment is experimental and gets more errors than the other. But the instances that are in the same environment should get approximately the same number of errors.

You can set a notification for when an instance starts getting more errors than the other instances in the same environment. Also, you can set a notification for when the two environments start to have the same number of errors as each other.

The number of logged in users for a company is about the same for each of four applications, but varies significantly by each of the three time zones the company operates in.

You can set a notification for when any application starts getting more or less traffic from a certain timezone than the other applications. Sometimes the traffic from the different time zones are the same, so you would set up the alert condition to not be notified if the time zone groups overlap.

For more details on how this feature works, see Outlier rules and logic.

Create an outlier alert condition

To create a NRQL alert that uses outlier detection:

  1. When creating a condition, under Select a product, select NRQL.
  2. For Threshold type, select Outlier.
  3. Create a NRQL query with a FACET clause that returns the values you want to alert on.
  4. Depending on how the returned values group together, set the Number of expected groups.
  5. Adjust the deviation from the center of the group(s) and the duration that will trigger a violation.
  6. Optional: Add a warning threshold and set its deviation.
  7. Set any remaining available options and save.

Rules and logic

Here are the rules and logic behind how outlier detection works:

Details about alert condition logic

After the condition is created, the query is run once every harvest cycle and the condition is applied. Unlike baseline alerts, outlier detection uses no historical data in its calculation; it's calculated using the currently collected data.

Alerts will attempt to divide the data returned from the query into the number of groups selected during condition creation.

For each group, the approximate average value is calculated. The allowable deviation you have chosen when creating the condition is centered around that average value. If a member of the group is outside the allowed deviation, it produces a violation.

If Trigger when groups overlap has been selected, Alerts detects a convergence of groups. If the condition is looking for two or more groups, and the returned values cannot be separated into that number of distinct groups, then that will produce a violation. This type of “overlap” event is represented on a chart by group bands touching.

Because this feature does not take past behavior into account, data is never considered to "belong" to a certain group. For example, a value that switches places with another value wouldn't trigger a violation. Additionally, an entire group that moves together also wouldn't trigger a violation.

NRQL query rules and limits

The NRQL query must be a faceted query, and can only facet on one attribute. Queries that facet on more than one attribute won't work.

The number of unique values returned must be 500 or less. If the query returns more than this number of values, the condition won't be created. If the query later returns more than this number after being created, the alert will fail.

Zero values for unreturned data

When a query returns a set of values, only values that are actually returned are taken into account. If a value is not available for calculation (including if it goes from being collected one harvest cycle to not being collected), it is rendered as a zero and is not considered. In other words, the behavior of unreturned zero values will never trigger violations.

For more help

If you need more help, check out these support and learning resources: