You can use NRQL queries to create alert conditions. Once you've defined your signal, you can further define your warning and critical threshold levels. This determines when an alerts violation is created.
Read on to learn more about how to do this.
For more information on key concepts relating to NRQL alert conditions and streaming alerts, see Streaming alerts: key terms and concepts.
Create a NRQL alert condition
To create a NRQL alert condition for a policy:
- On one.newrelic.com, in the header click Alerts & AI, then in the left sidebar click Policies.
- Select an existing policy or click New alert policy to create a new policy.
- Click Add a condition.
- Under Select a product click NRQL, and then click Next, define thresholds.
NRQL alert syntax
Here's the basic syntax for creating all NRQL alert conditions. The
FACET clause is required for Outlier threshold types, optional for Static, and not allowed for Baseline.
SELECT function(attribute) FROM Event WHERE attribute [comparison] [AND|OR ...]
Supported functions that return numbers include:
If you use the
If you see this error, use
Only one data type can be targeted.
Supported data types:
Required for outlier conditions, but not baseline or static
Include an optional
If the query returns more than the maximum number of values, the alert condition can't be created. If you create the condition and the query returns more than this number later, the alert will fail. Modify your query so that it returns a fewer number of values.
NRQL alert threshold examples
- Alert on specific segments of your data
Create constrained alerts that target a specific segment of your data, such as a few key customers or a range of data. Use the
WHEREclause to define those conditions.
SELECT average(duration) FROM Transaction WHERE account_id in (91290, 102021, 20230)
SELECT percentile(duration, 95) FROM Transaction WHERE name LIKE 'Controller/checkout/%'
- Alert on Nth percentile of your data
Create alerts when an Nth percentile of your data hits a specified threshold; for example, maintaining SLA service levels. Since we evaluate the NRQL query in one-minute time windows, percentiles will be calculated for each minute separately.
SELECT percentile(duration, 95) FROM Transaction
SELECT percentile(databaseDuration, 75) FROM Transaction
- Alert on max, min, avg of your data
Create alerts when your data hits a certain maximum, minimum, or average; for example, ensuring that a duration or response time does not pass a certain threshold.
SELECT max(duration) FROM Transaction
SELECT average(duration) FROM Transaction
- Alert on a percentage of your data
Create alerts when a proportion of your data goes above or below a certain threshold.
SELECT percentage(count(*), WHERE duration > 2) FROM Transaction
SELECT percentage(count(*), WHERE httpResponseCode = '500') FROM Transaction
- Alert on Apdex with any T-value
Create alerts on Apdex, applying your own T-value for certain transactions. For example, get an alert notification when your Apdex for a T-value of 500ms on transactions for production apps goes below 0.8.
SELECT apdex(duration, t:0.5) FROM Transaction WHERE appName like '%prod%'
NRQL condition creation tips
Here are some tips for creating and using a NRQL condition:
|Condition threshold types||
NRQL condition threshold types include static, baseline, and outlier.
|Create a description||
For NRQL conditions, you can create a custom description to add to each violation. Descriptions can be enhanced with variable substitution based on metadata in the specific violation.
For details, see Description
|Query results||Queries must return a number. The condition evaluates the returned number against the thresholds you've set.|
As with all alert conditions, NRQL conditions evaluate one single minute at a time. The implicit
Also, if a query will generate intermittent data, consider using the
|Lost signal threshold
(loss of signal detection)
You can use loss of signal detection to alert on when your data (a telemetry signal) should be considered lost. A signal loss can indicate that a service or entity is no longer online or that a periodic job failed to run. You can also use this to make sure that violations for sporadic data, such as error counts, are closed when no signal is coming in.
To learn more about signal loss and how to request access to it, see this announcement.
|Advanced signal settings||
Use the Condition settings to:
|Limits on conditions||See the maximum values.|
|Health status||NRQL alert conditions don't affect an entity's health status display.|
For more information, see:
Alert threshold types
When you create a NRQL alert, you can choose from different types of thresholds:
|NRQL alert threshold types||Description|
This is the simplest type of NRQL threshold. It allows you to create a condition based on a NRQL query that returns a numeric value.
Optional: Include a
|Uses a self-adjusting condition based on the past behavior of the monitored values. Uses the same NRQL query form as the static type, except you can't use a
|Outlier||Looks for group behavior and values that are outliers from those groups. Uses the same NRQL query form as the static type, but requires a
Sum of query results (limited or intermittent data)
Available only for static (basic) threshold types.
If a query returns intermittent or limited data, it may be difficult to set a meaningful threshold. Missing or limited data will sometimes generate false positives or false negatives. You can use loss of signal, aggregation duration, and gap filling settings to minimize these false notifications.
To avoid this problem when using the static threshold type, you can set the selector to sum of query results. This lets you set the alert on an aggregated sum instead of a value from a single harvest cycle. Up to two hours of one-minute data checks can be aggregated. The duration you select determines the width of the rolling sum and the preview chart will update accordingly.
Set the loss of signal threshold
Loss of signal occurs when no data matches the NRQL condition over a specific period of time. You can set your loss of signal threshold duration and and also what happens when the threshold is crossed.
You may also manage these settings using the GraphQL API (recommended), or the REST API. Go here for specific GraphQL API examples.
Loss of signal settings:
Loss of signal settings include a time duration and two possible actions.
- Signal loss expiration time
- UI label: Signal is lost after:
- GraphQL Node: expiration.expirationDuration
- Expiration duration is a timer that starts and resets when we receive a data point in the streaming alerts pipeline. If we don't receive another data point before your 'expiration time' expires, we consider that signal to be lost. This can be because no data is being sent to New Relic or the
WHEREclause of your NRQL query is filtering that data out before it is streamed to the alerts pipeline.
- The loss of signal expiration time is independent of the threshold duration and triggers as soon as the timer expires.
- The maximum expiration duration is 48 hours. This is helpful when monitoring for the execution of infrequent jobs. The minimum is 30 seconds, but we recommend using at least 3-5 minutes.
- Loss of signal actions
Once a signal is considered lost, you can close open violations, open new violations, or both.
- Close all current open violations: This closes all open violations that are related to a specific signal. It won't necessarily close all violations for a condition. If you're alerting on an ephemeral service, or on a sporadic signal, you'll want to choose this action to ensure that violations are closed properly. The GraphQL node name for this is "closeViolationsOnExpiration"
- Open new violations: This will open a new violation when the signal is considered lost. These violations will indicate that they are due to a loss of signal. Based on your incident preferences, this should trigger a notification. The graphQL node name for this is "openViolationOnExpiration"
- When you enable both actions, we'll close all open violations first, and then open a new violation for loss of signal.
To create a NRQL alert configured with loss of signal detection in the UI:
- For a policy, when you create a condition, under Select a product, click NRQL, then click Next, define thresholds.
- Write a NRQL query that returns the values you want to alert on.
- For Threshold type, select Static or Baseline.
- Click + Add lost signal threshold, then set the signal expiration duration time in minutes or seconds in the Signal is lost after field.
- Choose what you want to happen when the signal is lost. You can check one or both of Close all current open violations and Open new "lost signal" violation . These control how loss of signal violations will be handled for the condition.
- Make sure you name your condition before you save it.
Loss of signal detection doesn't work on NRQL queries that use nested aggregation or sub-queries.
For data that consistently takes longer to arrive, you can use offset evaluation to consistently delay the NRQL condition evaluation. Waiting longer increases accuracy, but also increases latency.
The total supported latency is the multiple of the aggregation window duration times the evaluation offset. In the screenshot example, the latency is 15 minutes (a 5 minute aggregation window X 3 windows).
If the event type data comes from an APM language agent and is aggregated from many app instances (for example,
TransactionErrors, etc.), we recommend using an evaluation offset of 3 with 1 minute aggregation windows.
When creating NRQL conditions for data collected from Infrastructure Cloud Integrations such as AWS Cloudwatch or Azure, we recommend that you start with an evaluation offset of 15 minutes, then adjust up or down depending on how long it takes to collect your data.
Fill data gaps
Gap filling lets you customize the values to use when your signals don't have any data. You can fill gaps in your data streams with None, the last value received, or a static value.
How to edit data gap values:
- In the NRQL conditions UI under Condition settings > Advanced signal settings > fill data gaps with, and then choose None, Last known value, or Custom static value.
- In the Nerdgraph API (preferred), you'll find this node located at:
actor : account : alerts : nrqlCondition : signal : fillOption | fillValue
- In the REST API Explorer, you'll see this under the "signal" section of the Alert NRQL conditions API.
Gap filling options:
- None: (Default) Choose this if you don't want to take any action on empty aggregation windows. On evaluation, an empty aggregation window will reset the threshold duration timer. For example, if a condition says that all aggregation windows must have data points above the threshold for 5 minutes, and 1 of the 5 aggregation windows is empty, then the condition won't be in violation.
- Custom static value: Choose this if you'd like to insert a custom static value into the empty aggregation windows before they're evaluated. This option has an additional, required parameter of
fillValue(as named in the API) that specifies what static value should be used. This defaults to
- Last known value: This option inserts the last seen value before evaluation occurs. We maintain the state of the last seen value for 2 hours.