Alert conditions

An alert condition is the core element that defines when an incident is created. It acts as the essential starting point for building any meaningful alert. Alert conditions contain the parameters or thresholds met before you're informed. They can mitigate excessive alerting or tell your team when new or unusual behavior appears.

A diagram showing some basic concepts and terms for New Relic alerting

Create a new alert condition

An alert condition is a continuously running query that measures a given set of events against a defined threshold and opens an incident when the threshold is met for a specified window of time.

This example demonstrates manually creating a new alert condition using the Alert condition details page. But there are a lot of ways to create an alert condition. You can create an alert condition from:

You can also use one of our alert builders:

Use Write your own query to build alerts from scratch
Use guided mode

For all methods except for our guided mode, the process for creating an alert condition will be exactly the same as described in the steps below.

Set your signal behavior

In this example, imagine that your team manages the WebPortal application for an ecommerce site. You want to be alerted of any latency issues.

You can use a NRQL query to define the signals you want an alert condition to use as the foundation for your alert. For this example, you will be using this query:

SELECT average(duration)
FROM PageView
WHERE appName = 'WebPortal'

Using this query for your alert condition tells New Relic you want to know the average latency, or duration, to load pages within your WebPortal application. Proactive alerting on latency, a core golden signal, provides early warnings of potential degradation.

You can learn more about how to use NRQL, New Relic's query language, see our NRQL documentation.

Fine-tune advanced signal settings

After you've defined your signal, click Run. A chart will appear and display the parameters that you've set.

Tip

To set up cross-account alerts, select a data account from the drop-down list. And you can only query a data from one account for cross-account alerts at a time.

A screenshot showing a user how to set the signal behavior for an alert condition.

For this example, the chart will show the average duration of a transaction. Click Next and begin configuring your alert condition.

For this example, you will customize these advanced signal settings for the condition you created to monitor latency in your WebPortal application.

A screenshot depicting advanced signal settings.

The window duration defines how New Relic groups your data for analysis in an alert condition. Choosing the right setting depends on your data's frequency and your desired level of detail:

High-frequency data (for example, pageviews every minute): Set the window duration to match the data frequency (1 minute in this case) for real-time insights into fluctuations and trends.
Low-frequency data (for example, hourly signals): Choose a window duration that captures enough data to reveal patterns and anomalies (for example, 60 minutes for hourly signals).
Remember, you can customize the window duration based on your needs and experience. We recommend using the defaults when starting and experimenting as you become more comfortable creating alert conditions.

Traditional aggregation methods can fall short when dealing with data that's sparsely populated or exhibits significant fluctuations between intervals. Here's how to use sliding window aggregation to analyze such data and trigger timely alerts effectively:

Smooth out the noise: Start by creating a large aggregation window. This window (for example, 5 minutes) acts as a buffer, smoothing out the inherent "noise" or variability in your data. This helps prevent spurious alerts triggered by isolated spikes or dips.
Avoid lag with a sliding window: While a large window helps in data analysis, if you wait for the entire interval to elapse before checking thresholds, you can experience significant delays in alert notifications. We recommend smaller sliding windows (for example, one minute). Imagine this sliding window as a moving frame scanning your data within the larger aggregation window. Each time the frame advances by its smaller interval, it calculates an aggregate value (for example, average).
Set your threshold duration: Now, you can define your alert threshold within the context of the smaller sliding window. This allows you to trigger alerts quickly when the aggregate value in the current frame deviates significantly from the desired range without sacrificing the smoothing effect of the larger window.
Important
Customers on Advanced and Core Compute pricing plans may incur additional CCU charges when utilizing sliding window aggregation. While this method enhances data analysis by smoothing out fluctuations, its use may lead to increased costs over other methods. For details, refer the pricing section for sliding windows. To determine whether you are on Advanced or Core Compute pricing plans, refer to your Order.
You can learn more about sliding window aggregation in this NRQL tutorial.

The delay feature in alert conditions safeguards against potential issues arising from inconsistent data collection. It acts as a buffer, allowing extra time for data to arrive and be processed before triggering an alert. This helps prevent false positives and ensures more accurate incident creation.

How it works:

The appropriate delay setting is determined by evaluating the consistency of your incoming data:

Consistent data: A lower delay setting is sufficient if data points consistently arrive with timestamps within a single minute.
Inconsistent data: If data points arrive with timestamps spanning multiple minutes in the past or future, a higher delay setting is necessary to accommodate the inconsistency.
Creating a buffer:
The selected delay setting introduces a waiting period before the alert condition assesses data against defined thresholds.
This buffer allows time for data discrepancies to settle, reducing the likelihood of misleading alerts.

You're creating an alert condition to notify your team of any latency issues with the WebPortal application. In this example, your application consistently sends New Relic data. There is a constant stream of signals being sent from your application to New Relic, and there is no expected gap in signal, so you won't need to select a gap-filling strategy.

Gap-filling strategies address scenarios where data collection might be intermittent or incomplete. They provide a method for substituting missing data points with estimated values, ensuring that alert conditions can still function effectively even with gaps in the data stream.

When to leave gap-filling off:

Consistent data flow: If your application consistently sends data to New Relic without expected gaps, as in the case of the WebPortal application, gap-filling is generally unnecessary. Leaving the gap-filling strategy set to none is often the most appropriate approach in such cases.
Key considerations:
Popular use case: A common use of gap filling is to insert a value of 0 for windows with no data received.
Anomaly thresholds: The gap-filling value is interpreted as the number of standard deviations from the last observed value when using anomaly thresholds. For example, filling gaps with a value of 0 would replicate the last value seen, effectively assuming no change.
Learn more about gap-filling strategies in our lost signal docs.

Set thresholds for alert conditions

If an alert condition is a container, then thresholds are the rules each alert condition must follow. As data streams into your system, the alert condition searches for any incidents of these rules. If the alert condition sees data from your system that has met all the conditions you've set, it will create an incident. An incident signals that something is off in your system, and you should look.

A screenshot depicting how to set the threshold for an alert condition.

Anomaly thresholds are ideal when you're more concerned about deviations from expected patterns than specific numerical values. They enable you to monitor for unusual activity without needing to set predefined limits. New Relic's anomaly detection dynamically analyzes your data over time, adapting thresholds to reflect evolving system behavior.

Setting up anomaly detection:

Choose upper or lower:
- Select upper and lower to be alerted about any higher and lower deviations than expected.
- Select upper only to focus solely on unusually high values.
Assign priority:
- Set the priority level to critical for your initial alert to ensure prompt attention to potential issues.
Define breach criteria:
- Start with the default settings: open an incident when a query returns a value that deviates from the predicted value for at least five minutes by three standard deviations.
- Customize these settings as needed to align with your specific application and alerting requirements.
You can learn more about priority levels in our alert condition docs.
Learn more about anomaly threshold and model behaviors in our anomaly documentation.

In New Relic, entities are associated with color-coded health statuses. You can view the current state of these entities from their respective indexes and maps. When an alert condition is associated to an entity, the entity's health status is determined by the alert condition. If the alert triggers an incident, the entity's health status changes based on the incident's severity level.

If you want the alert condition to not affect the health status of the associated entity, enable the Do not report system health status toggle. This is useful when you want to monitor an entity without affecting its overall health status.

Important

When you are creating a cross-account alert condition, the Do not report system health status toggle is disabled by default. To prevent the cross-account alert condition from determining the health status of the associated entity, enable it.

New Relic's Predictive Alerts analyze historical data to predict future trends. If the predicted shows that static thresholds may be breached soon, you receive an alert notification, giving you the opportunity to act before disruptions occur.

For more information, refer to the Predictive alerts documentation.

To enable predictive alerts for your alert condition, do the following steps:

In the Set condition thresholds section, select the threshold condition type as Static.
For the predictive alerts, enable the Predict future behavior toggle.
Set the look-ahead time. This indicates how far into the future it should look for predicted breaches. The maximum look ahead time is 360 times the window duration.
Set the predicted alert event behavior, when the actual signal crosses the threshold.
- Close the predicted alert and open an actual alert.
- Keep the predicted alert open to reduce noise.

The lost signal threshold determines how long to wait before considering a missing signal lost. If the signal doesn't return within that time, you can choose to open a new incident or close any related ones. You can also choose to skip opening an incident when a signal is expected to terminate. Set the threshold based on your system's expected behavior and data collection frequency. For example, if a website experiences a complete loss of traffic, or throughput, the corresponding telemetry data sent to New Relic will also cease. Monitoring for this loss of signal can serve as an early warning system for such outages.

Add alert condition details

At this point in the process, you now have a fully defined condition and set all the rules to ensure an incident is opened when you want it to be. Based on the settings above, if your alert condition recognizes this behavior in your system that breaches the thresholds that you've set, it will create an incident. Now, all you need to do is to name this condition and attach it to a policy.

The policy is the sorting system for the incident. When you create a policy, you create the tool that organizes all your incoming incidents. You can connect policies to workflows that tell New Relic where you want all this incoming information to go, how often you want it to be sent, and where.

A screenshot demonstrating how you can new alert condition.

A best practice for condition naming involves a structured format that conveys essential information at a glance. Include the following elements in your condition names:

Priority: Indicate the severity or urgency of the alert, like P1, P2, P3.
Signal: Specify the metric or condition being monitored, like High Avg Latency or Low Throughput.
Entity: Identify the affected system, application, or component, like WebPortal App or Database Server.
An example of a well-formed condition name following this structure would be P2 | High Avg Latency | WebPortal App.

Balancing responsiveness and fatigue in your alerting strategy is crucial, and you've laid out the key considerations regarding pageview monitoring for your WebPortal application. Let's explore the policy options:

One issue per policy (default):
- Pros: Reduces noise and ensures immediate action.
- Cons: Groups all incidents under one issue, even if triggered by different conditions. It's not ideal for multiple pageview concerns.
One issue per condition:
- Pros: Creates separate issues for each condition, ideal for isolating and addressing specific latency issues.
- Cons: Can generate more alerts, potentially leading to fatigue.
An issue for every incident:
- Pros: Provides granular detail for external systems but is not optimal for internal consumption due to potential overload.
- Cons: It is the noisiest option, and it is challenging to track broader trends and prioritize effectively.
Learn more about creating policies here.

An incident automatically closes when the targeted signal returns to a non-breaching state for the period indicated in the condition's thresholds. This wait time is called the recovery period.

For example, if you're measuring latency and the breaching behavior is that duration to load pages in your WebPortal application has increased to more than 3 seconds, the incident will automatically close when duration is equal to or lower than 3 seconds for 5 consecutive minutes.

When an incident closes automatically:

The closing timestamp is backdated to the start of the recovery period.
The evaluation resets and restarts from when the previous incident ended.
All conditions have an incident time limit setting that automatically force close a long-lasting incident.
New Relic automatically defaults to 3 days and recommends that you use our default settings for your first alert.
Another way to close an open incident when the signal does not return data is by configuring a loss of signal threshold. Refer to the lost signal threshold section above for more details.

To know more about the cross-account alerts, refer to our Cross-account alerts.

Edit an existing alert condition

If you want to edit an alert condition you've already created, you can:

Go to one.newrelic.com > All capabilities > Alerts.
Select Alert Conditions in the left navigation.
Click on the alert condition you want to edit.

From there, you will see the Alert conditions details page. This page contains all the elements you set when you created your condition. You can edit specific aspects of the alert condition by clicking the pencil in the top right of each section.

Signal history

Under Signal history, you can see the most recent results for the NRQL query you used to create your alert condition. For this example, you would see the average latency on the WebPortal app for the specific time frame you've set.

For all alert conditions built with NRQL queries, the Signal history will be presented with a line chart.

Any alert condition built with a synthetic monitor will be a bit different. This is because synthetic monitors allow you to ping your application from multiple locations, which can produce positive or negative results each time the monitor runs. This data can only be presented with a table.

Types of conditions

The primary and recommended condition type is a NRQL alert condition, but there are other types of conditions. We've included a complete list of our condition types below.

You can set thresholds that open an incident when they are breached by any of your Java app's instance metrics.

By scoping thresholds to specific instances, you can more quickly identify where potential problems are originating. This is useful, for example, to detect anomalies that are occurring only in a subset of your app's instances. These sorts of anomalies are easy to miss for apps that aggregate metrics across a large number of instances.

For Java apps monitored by APM, you can set thresholds that open an incident when the heap size or number of threads for a single JVM is out of the expected operating range.

We evaluate alerting threshold breaches individually for each of the app's selected instances. When creating your condition, select JVM health metric as the type of condition for your Java app's alert policy, then select any of the available thresholds:

Deadlocked threads
Heap memory usage
CPU utilization time
Garbage collection CPU time
Incidents will automatically close when the inverse of the threshold is met, but by using the UI you can also change the time when an incident force closes for a JVM health metric. Default is 24 hours.

We include the option to define a percentile as the threshold for your condition when your web app's response time is above, below, or equal to this value. This is useful, for example, when Operations personnel want to alert on a percentile for an app server's overall web transaction response time rather than the average web response time.

Tip

If you want to set an arbitrary threshold in a condition for a non-web app transaction, use the NRQL queries feature.

To define the percentile threshold:

Select Web transactions percentiles as the type of condition for your app's condition, then select a single app. (To alert on more than one app, create an individual Web transactions percentiles condition for each.)
To define the thresholds that open the incident, type the Percentile nth response time value, then select its frequency (above, below, or equal to this value).
We store the transaction time in milliseconds, although the user interface shows the Critical and Warning values as seconds. If you want to define milliseconds, be sure to include the decimal point in your value.

By applying labels to applications, you can automatically link these entities to your condition. This makes it easy to manage all the applications within a dynamic environment. We recommend using the agent configuration file to best maintain entity labels.

A single label identifies all entities associated with that label (maximum 10,000 entities). Multiple labels only identify entities which share all the selected labels.

Using dynamic targeting with your condition also requires that you set an incident close timer.

To add, edit, or remove up to ten labels for a condition:

Select APM > Application metric as the product type.
When identifying entities, select the Labels tab. Search for a label by name, or select a label from the list of categories.
You can also create conditions directly within the context of what you're monitoring with Infrastructure.

Add conditions to a policy

To add more conditions to a policy:

Go to one.newrelic.com > All capabilities > Alerts > Alert Policies
Detect a policy.
Click Add a condition.

To create a new NRQL condition:

Go to one.newrelic.com > All capabilities > Alerts > Alert Conditions
Click Add a condition.

Copy a condition

To copy an existing condition, including its targets and thresholds, and add it to another policy for the selected account:

Go to one.newrelic.com > All capabilities > Alerts > Alert conditions.
From the list of alert conditions, click on the three dots icon of the alert you want to copy and select Duplicate condition.
From the Copy alert condition, search or scroll the list to select the policy where you want to add this condition.
Optional: Change the condition's name if necessary.
Optional: Click the toggle switch to Enable on save
Select Copy condition.
By default, the selected alert policy will add the copied condition in the Disabled state. Follow standard procedures to add or copy more conditions to the alert policy, and then Enable the condition as needed. Additionally, the new condition will not copy any tags added to the original condition.

Enable/disable a condition

To disable or re-enable a condition:

Go to one.newrelic.com > All capabilities > Alerts > Alert Conditions. Then, from the list of Alert conditions, use the toggle to enable or disable the condition.
Click the On/Off switch to toggle it.

If you copy a condition, it automatically saves it in the new policy as disabled (Off), even if the condition was enabled (On) in the original policy.

Delete a condition

To turn a condition off but keep it with the policy, disable it. To delete one or more conditions:

Go to one.newrelic.com > All capabilities > Alerts > Alert Conditions.
From the list of Alert conditions, select a condition, then click Delete from the ellipses menu (...).
Tip
If you don't see the delete button, your account admin may have disabled condition deletion for your organization.

Troubleshoot the alert condition page

If you see an empty signal in the history chart on the Alert Condition page, consider one of the following:

Review the condition's settings: Double-check that all elements are configured correctly.
Inspect NRQL queries: Ensure they target valid metrics or events and return data.
Examine entity configuration: Confirm that the entity is set up correctly to send signals.
Consult New Relic documentation: Refer to relevant guides for assistance with specific issues.

What's next?

Create your first New Relic alert

A crash course in alerts for beginners

Start here

Advanced alert conditions

If you've already set up your alert conditions, dig deeper with advanced settings

Get notified

Use workflows to get notified about any unusual behavior in your stack

Create a new alert condition .css-21sua1{background:none;border:none;width:0;padding:0;}

Set your signal behavior

Fine-tune advanced signal settings

Tip

Window duration

Use sliding window aggregation

Streaming method

Delay

Gap-filling strategy

Cross-account alerts

Set thresholds for alert conditions

Anomaly threshold

Static threshold

Health status reporting

Predictive alerts with static thresholds

Lost signal threshold (optional)

Add alert condition details

Name your condition

Select an existing policy

Create a new policy

Close open incidents

Set a custom incident description

Use the title template

Add runbook URL

Edit an existing alert condition

Signal history

Types of conditions

NRQL query conditions

APM metric alert conditions

Anomaly conditions

Synthetic monitoring multi-location conditions

Key transaction metrics conditions

Java instance conditions

JVM health metric conditions (Java apps)

Web transaction percentile conditions

Dynamic targeting with labels for apps

Infrastructure conditions