Proactive Detection with New Relic AI

With New Relic AI's Proactive Detection, you can be notified of unusual app behavior and be provided with an automatic analysis of this unusual behavior in Slack. Or, you can set up a webhook to deliver messages when you need them. These events are automatically available in NRDB for dashboarding, alerting and integration with Incident Intelligence.

Why it matters

With Proactive Detection, New Relic AI delivers insights about anomalies in your production system. Real-time failure warnings alert you when an anomaly is detected, notifications are sent directly in your Slack channel where your teammates can see them and an automatic analysis is provided of the anomaly inside New Relic AI.

How it works

New Relic Proactive Detection uses the following methods to detect anomalies in your app data:

  1. Proactive Detection monitors metric data reported by an APM agent, building a model of your typical application dynamics, and focuses on key golden signals: throughput, response time, and errors.

  2. If one of these golden signals shows anomalous behavior, the system flags it and tracks recovery to normal behavior.

  3. The system adapts to changes in your data, and continuously updates models based on new data.

Receiving notifications: We send notifications when we detect anomalous changes in throughput or response time. The notifications are sent to selected Slack channels, or sent via webhook. When the anomaly goes back to normal, a recovery message is sent.

Anomaly analysis: For each anomaly, we provide a link in Slack to an analyze anomaly page. This page generates automatic insights into the anomaly. The page is also available from the anomaly overview page, which lists all recent anomalies. This page uses your existing APM and Proactive Detection data to provide explanations as to the cause of the anomaly.

Applications will not always generate anomalies, so it can be normal to not receive any detections.

Requirements

To use Proactive Detection, ensure you have:

  • Access to New Relic One, which requires a paid subscription or free trial
  • An APM agent installed on applications to monitor
  • For Slack configuration: the New Relic AI Slack application installed into your Slack workspace by an IT administrator.

For more details, see Data limits.

Set up Proactive Detection

You can configure these features in the Proactive Detection section of the New Relic AI dashboard:

Set up for Slack
  1. Go to one.newrelic.com > New Relic AI > Proactive Detection > Configure.
  2. Select Real-time failure warnings in Slack. Then click plus Add configuration.
  3. Input the following information into the form:
    • Choose a name for your configuration that helps you easily identify it from others in your account.
    • Select an account.
    • Select up to 200 applications. Note that certain applications with low throughput might not be good candidates for Proactive Detection, as they can be more sensitive to smaller amounts of data fluctuation.
    • Choose which Slack channels receive notifications (you can send them to an existing channel or create a new one). This prompts the workflow to add the New Relic AI Slack application to your selected channel.

      If you experience an error when assigning Slack channels, make sure that the New Relic AI Slack application has been added to your Slack workspace.

    • Enable the configuration. You can modify the applications for each configuration at any time by selecting Edit configuration in the configuration table.
Set up for webhooks
  1. Go to one.newrelic.com > New Relic AI > Proactive Detection > Configure.
  2. Select Real-time failure warnings in Slack. Then click plus Add configuration.
  3. Input the following information into the form:
    • Choose a name for your configuration that helps you easily identify it from others in your account.
    • Select an account.
    • Select up to 200 applications. Note that certain applications with low throughput might not be good candidates for Proactive Detection, as they can be more sensitive to smaller amounts of data fluctuation.
    • Provide the webhook URL.
    • Provide optional custom headers.
    • Choose to edit the custom payload, or enable using the default payload.
  4. Enable the configuration. You can modify the applications for each configuration at anytime by selecting Edit configuration in the configuration table.

Muting notifications (Slack only)

In Slack, detections coming from specific applications can be muted temporarily or permanently. The entire channel can also be muted temporarily. This is useful in the case of an incident or when the channel should otherwise not be interrupted.

To mute in Slack, select Mute this app’s warnings or Mute all warnings, then select the duration you wish to mute for. We will resume sending notifications for any detections once the muting duration has completed.

Muting an application permanently removes it from the configuration. To add it back in, navigate to New Relic AI > Proactive Detection, and select the configuration to edit.

Muting Proactive Detection notifications does not affect New Relic Alerts.

Using Proactive Detection Slack messages

Each anomaly message has several key pieces of information you can use to learn more about and start troubleshooting the potential issue:

  • The application name and a link to more information about it in New Relic One.
  • The metric experiencing an anomaly and a link to its details in New Relic One.
  • A graph of the metric over time to provide a visual understanding of the anomaly’s behavior and degree.
  • An Analyze button that navigates to an analysis page in New Relic AI that identifies key attributes that are unique to the anomaly, anomalies found upstream or downstream, and any other relevant signals.
  • Once an anomaly has returned to normal, we send a recovery notification with the option to provide feedback. Your feedback provides our development team with input to help us improve detection quality. If we helped you, you can select Yes or No.

Anomaly overview page

In addition to real-time failure notifications that give you information about anomalies at your fingertips via Slack or webhook, Proactive Detection also includes a UI view with more information about the anomalies in your environment. This provides a list of all the recent anomalies from every configuration in the selected account.

Using anomaly events with NRDB

Once you configure Proactive Detection (with either Slack or a webhook), anomaly events will be sent to New Relic’s database (NRDB). You can query NRDB for proactive detection events and use them in conjunction with other New Relic AI tools.

Send Proactive Detection events to Incident Intelligence

You can send Proactive Detection events to Incident Intelligence to be processed and correlated with other activity in your system:

  1. Create an Alert condition for a NRQL query that pulls proactive detection events from NRDB.
  2. Configure a new incident intelligence source for your Alert condition.
  3. (Optional) Create Decision logic to correlate future anomalies with related events.

Webhook payload and examples

Proactive Detection will send the event body in JSON format via HTTPS POST. The system expects the endpoint to return a successful HTTP code (2xx). If you use webhooks to configure Proactive Detection, use these examples of the webhook body format and JSON schema.

Attribute Description

category

Enum (“web throughput”,
“non-web throughput”,
“web transactions”,
“non-web transactions”,
“error class”)

The category of data that was analyzed.

Categories include: web throughput, non-web throughput, web transactions, non-web transactions, and error class.

data

List​

The time series data leading up to the detection.

data[].timestamp

Number​

The timestamp of the data point in epoch milliseconds.
Example: 1584366819000

data[].unit

String​

The unit describing the value of the data point.
Data units include: count, milliseconds, and error_rate

data[].value

Number​

The value of the data point.

Example: 1.52

detectionType

Enum(“latency”, “throughput”,
“error_rate”)​

The type of data that was analyzed.

Types include: error_rate, latency, throughput

entity

Object​

The entity that reported the unusual data.

entity.accountId

Number​

The ID of the account to which the entity belongs.

entity.domain

Enum

The domain to which the entity belongs.

Example: APM

entity.domainId

String​

The id used to uniquely identify the entity within the domain.

entity.guid

String​

The guid used to uniquely identify the entity across all products.

entity.name

String​

The name of the entity.

Example: “Laura’s coffee service”

entity.link

String​

A link to view the entity.

Example: ‘https://rpm.newrelic.com/accounts/YOUR_ACCOUNT_ID/applications/987654321”

severity

Enum (“NORMAL”,
“WARNING”,
“CRITICAL”)

A description of how unusual of a change occurred.

version

String​

Version used to describe the data being provided.

Example: v1

JSON schema example

New Relic AI will send the event body in JSON format via HTTPS POST. The system expects the endpoint to return a successful HTTP code (2xx).

Template:

{
  "version": "{{version}}", 
  "entity": {
    "type": "{{entity.type}}",
    "name": "{{entity.name}}",
    "link": "{{entity.link}}",
    "entityGuid": "{{entity.entityGuid}}",
    "domainId": "{{entity.domainId}}",
    "domain": "{{entity.domain}}",
    "accountId": {{entity.accountId}}
  },
  "detectionType": "{{detectionType}}",
  "category": "{{category}}",
  "data": [{{#each data}}
    {
      "value": {{value}},
      "unit": "{{unit}}",
      "timestamp": {{timestamp}}
    }
    {{#unless @last}},{{/unless}}
  {{/each}}]
}
    

Sample Payload:

{
  "version": "v1", 
  "entity": {
    "type": "APPLICATION",
    "name": "My Application",
    "link": "https://rpm.newrelic.com/accounts/ACCOUNT_ID/applications/123",
    "entityGuid": "foo",
    "domainId": "123",
    "domain": "APM",
    "accountId": YOUR_ACCOUNT_ID
  },
  "detectionType": "metric",
  "category": "web throughput",
  "severity": "CRITICAL",
  "data": [
    {
      "value": 100,
      "unit": "count",
      "timestamp": 1584047560917
    }
    ,
  
    {
      "value": 99,
      "unit": "count",
      "timestamp": 1584047620917
    }
    ,
  
    {
      "value": 0,
      "unit": "count",
      "timestamp": 1584047680917
    }
  ]
}

Data limits

In addition to requirements, data limits include:

  • APM app transactions per month: up to 100 million included free
  • Monitored APM applications: limited to 1,000 per configuration
  • Slack configurations: limited to 200 per account
  • Webhook configurations: limited to 200 per account

For more help

Recommendations for learning more: