Our Incident Intelligence uses logic to correlate your incidents. We call these logic "decisions." We have built-in decisions and you can create and customize your own on the Incident Intelligence Decisions UI page. The more optimally you configure decisions, the better we will group and correlate your incident events, resulting in reduced noise and increased context for on-call teams.
one.newrelic.com > Applied Intelligence > Incident intelligence > Decisions: Our UI shows how each decision correlates incidents.
Here are some key concepts for understanding our decisions logic:
- What is "correlation" and how does it work? For the incident events being sent into Incident Intelligence from your various alerting engines, the most recently created and active incidents are available for correlation. Correlation occurs between two events whenever the criteria in any decision is met. All the events available for correlation are tested against each other in all possible pair combinations and a "greedy merge" is performed. This means that if incident A correlates with B into an issue and incident B correlates with incident C into another issue, then AB and BC will also merge together resulting in a single issue including A, B, and C.
- What types of logic can be used in a decision? At a high level, decisions allow you to define logic based on time (duration between events), frequency (number of events), context (metadata structure & values), and topology (entity relationships).
Decisions determine how Incident Intelligence correlates incidents together. By default, a broad set of global decisions is enabled when you start using Incident Intelligence.
To review existing decisions:
- Go to one.newrelic.com and click Alerts & AI. In the left navigation, under Incident Intelligence, click Decisions.
- Review the list of active decisions. To see the rule logic that creates correlations between your issues, click the decision.
- To see examples of incidents the decision correlated, click the Recent correlations tab.
- Use any of the other options to enable or disable these global decisions.
Your decisions will routinely be analyzed for their efficacy as well as other best practices, and recommendations will be attached for review.
Whether you want to use the suggested decisions we provide using our pattern recognition algorithms, or you want to add in your own correlation logic, you'll get insights into your correlation rate, flapping effectiveness, noise reduction improvement, and the number of correlated issues arising, as shown below.
From the UI, you can view the underlying NRQL queries and create your own custom charts and dashboards from this data.
one.newrelic.com > Applied Intelligence > Incident intelligence > Decisions: Some example statistics from the decisions UI.
Some definitions of statistics:
- Correlation rate: The percentage of the time correlations are occurring versus not.
- Total correlated issues: Number of issues correlated with another issue.
- Noise reduction: Total number of issues after correlation divided by the total number of issues before correlation.
- Flapping effectiveness: The percentage of issues where flapping was detected (opening and closing frequently) and new redundant issues were prevented from opening.
- Correlation reason: Shows which decisions are correlating issues the most.
Information about the types of suggested decisions:
- Suggested decisions: The data from your selected sources is continuously inspected for patterns to help reduce noise. Once patterns have been observed in your data, decisions are suggested that would allow these events to correlate in the future.
- Accelerated suggested decisions: If you have been using New Relic Alerts for a while, when you add alert policies to sources, we're able to use that historical data to accelerate the pattern recognition step and suggest decisions up to 30% faster.
To get started, click on a suggested decision, located under the statistics block on the Decisions UI page. You'll see information on the logic behind the suggested decision, why we think it will help you, and the estimated correlation rate for that decision.
one.newrelic.com > Applied Intelligence > Incident intelligence > Decisions: Suggested decision block
If there isn't enough data to see the correlation rate, a link right below the percentage estimate will guide you to other sources you can add to get stronger results. Note: if you have less than 5000 incidents per month, you probably won't have suggested decisions.
To add the suggested decision, click Activate decision and it will appear enabled, alongside the other decisions. If the decision isn't relevant to your needs, click Dismiss.
You can reduce noise and improve correlation by building your own custom decisions. To start building a decision, go to one.newrelic.com and click Alerts & AI. In the left navigation, under Incident Intelligence, click Decisions, then click Add a decision. Tips on how to use that UI are in the table below.
When building a decision, steps 1, 2 and 3 are optional on their own but at least one must be defined in order to create a decision.
Step 1: Filter your data
In this step you'll define your filters. Remember correlation occurs between two incidents. If no filters are defined all incoming Incidents will be considered by the decision.
Define your filters for the first segment (or bucket) of incidents, and the second segment of Incidents. Types of filter operators range from substring matching to regex matching to help you target the incident events you want and exclude those you don't.
All combination of event pairs between segment one and segment two are used in the next steps of the decision.
Step 2: Contextual correlation
Once you've filtered your data, define the logic used when comparing the incidents' context. You can correlate events based on the following methods:
Step 3: Topology correlation
Topology correlation is currently in limited release. Contact your account representative to enable this feature. Once enabled, you'll first need to ingest your topology data via our NerdGraph
Give it a name
After you configure your decision logic, give it a recognizable name and description. This is used in notifications and other areas of the UI to indicate which decision caused a pair of incidents to be correlated together.
Optional. The advanced settings area allows you to further customize how your decision behaves when correlating events. Each setting has a default value so customization is optional.
Here are technical details on the similarity algorithms we use:
When building a decision, available operators include:
contains (regex): used in Step 1: Filter your data.
regular expression match: used in Step 2: Contextual correlation.
The decision builder follows the standards outlined in these documents for regular expressions.
In order for your regex to test as true, the entire attribute value (the data you’re evaluating) must be matched by the regular expression provided. Captured groups can be used but are not explicitly evaluated.
For instance, if the attribute value is
foobarbaz, these examples would meet the criteria and test as true:
In order for your regex to test as true, the entire attribute values for incident 1 and incident 2 must be included in the match. Also, each captured group (expressions in
( ) parentheses) must exist in both values (incident 1 and incident 2 attributes), and have the same value:
- The number of captured groups must be equal for both incident attributes.
- Each group must be equal to the corresponding group between attribute values: the value of the first captured group in the incident 1 attribute value is equal to the value of the first captured group in the incident 2 attribute.
For instance, if attribute value 1 is
abc-123-xyz and attribute value 2 is
(\w+)-(?:\w+)-(\w+) would meet the criteria:
- The whole value is matched by the expression.
- The first and third captured groups have the same respective values.
- The second group is not captured using
?:, which allows the whole value to match but isn’t used in the capture group comparison.
No flags are enabled by default. Some useful flags to include in regular expressions in the decision builder are:
- CASE_INSENSITIVE: (?i)
- MULTILINE: (?m)
- DOTALL: (?s)
See field detail for more notes on the function and implementation of each of these flags.
You can use the correlation assistant to more quickly analyze incidents, create decision logic, and test the logic with a simulation. To use the correlation assistant:
- From one.newrelic.com, click Alerts & AI, click Overview, then click the Incidents tab.
- Check the boxes of incidents you'd like to correlate. Then, at the bottom of the incident list, click Correlate incidents.
- For best results for correlating incidents, select common attributes with a low frequency percentage. Learn more about using frequency.
- Click Simulate to see the likely effect of your new decision on the last week of your data.
- Click on examples of correlation pairs and determine of those correlations are desired.
- If you like what's been simulated, click Next, and then name and describe your decision.
- If the simulation result shows too many potential incidents, you may want to choose a different set of attributes and incidents for your decision and run another simulation. Learn more about simulation.
Here are explanations of the two types of attribute analysis displayed in the UI:
- Common attributes: This analysis simply highlights attributes and values that are the exact same between all selected incidents.
- Similar attributes: Similarity analysis uses the Levenshtein algorithm with a distance of 3 to find attributes whose values would be the same if 3 or less character changes are performed. Other details:
- Numerical values as well as single character values are filtered out of the results.
- Requires two incidents to be selected. Similarity analysis is not performed when 3 or more incidents are selected.
To create the best decisions, we recommend choosing common attributes that have a lower frequency in your incidents. Here are tips for understanding how choosing low or high frequency attributes affects your decisions:
- Low frequency: As an example, an attribute with a 0% in the frequency column is likely a unique identifier or an attribute that only recently reported in your data in the last month. Choosing low frequency attributes may correlate few events.
- High frequency: On the other end, an attribute with 100% frequency would be one that is present on all your data. Choosing these attributes would correlate all of your events together.
By default, the attributes are sorted by frequency with the least frequently reported attributes at the top. Click an attribute's frequency percentage to get more information about the distribution of values we've seen reported for that attribute in the last month.
Using a set of incidents, the correlation assistant identifies common attributes among those incidents as well as attributes with similar values. Select attributes you believe are good indicators that events should correlate and then simulate the decision. Simulation will test the logic against the last week of your data and show you how many correlations would have happened, in addition to actual examples to inspect.
If the simulation looks good, continue creating your real decision. If the simulation doesn't show examples of useful correlations, choose a different set of attributes, and run the simulation again.
Here’s a breakdown of the decision preview information displayed when you create a simulation:
- Potential correlation rate: The percentage of tested incidents this decision would have affected.
- Total created incidents: The number of incidents tested by this decision.
- Total estimated correlated incidents: The estimated number of incidents this decision would have correlated.
- Incident examples: A list of incident pairs this decision would have correlated. You can click on these to see a side by side comparison of all attributes and values to help you determine if the correlation is desired or not.
Run the simulation with different attributes as many times as you need until you see results you like. When you’re ready, follow the UI prompts to save your decision.
What is “topology”? At a high level, topology is a representation of your infrastructure service map. It represents how different services and resources within your infrastructure relate to one another. By ingesting your service map through our aiTopologyCollector API in NerdGraph, we can correlate your incident events based on this information.
When you are building a custom decision, you can use our topology correlation to find relationships between incident sources to determine if incidents should correlate. Topology is a powerful way to increase the quality of your correlations and the speed at which they're found.
Topology correlation is currently in limited release. Requirements:
- Given access to topology feature by a New Relic representative
- Limit of 10K vertices per New Relic organization per environment
Here's an overview of what needs to be done to set up topology correlation:
- A topology graph has been created using the NerdGraph aiTopologyCollector APIs.
- Vertices with "defining attributes" exist in the topology and they have been connected together with edges to indicate relationship.
- The defining attributes on the vertices are also present on your incident events within Incident Intelligence.
Here's a step-by-step explanation of how topology correlation works:
- First, all relevant incidents are gathered. This includes incidents where decision-logic steps 1 and 2 are true, and that are also within the defined time window in advanced settings.
- Next, we attempt to associate each incident to a vertex in the topology graph using a vertex's defining attributes and the available attributes on the incident.
- Next, the pairs of vertices which were associated with incidents are tested using the "topologically dependent" operator to determine if these vertices are connected to each other. This operator checks to see if there is any path in the graph that connect the two vertices within 5 hops.
- The incidents are then correlated and the issues are merged together.
To create and maintain your topology graph use the aiTopologyCollector APIs in Nerdgraph to create vertices and connect them together with edges. Note that deleting a vertex will remove all edges connected with it.
Here are definitions of key topology concepts:
- Vertices: You can think of a vertex as an entity and the source from which your incident events are coming from or describing the problematic symptom about. To create a vertex, you need to define these attributes:
- Name: The unique name of the vertex such as an application, service, or host name. This is value must be unique within the graph.
- Defining attributes: A vertex requires a set of special attributes (key/value pairs) to allow it to be associated with incoming incident events. Unique identifiers that appear on all incidents are recommended, such as entity GUIDs or other IDs.
- Vertex class (optional): An optional class can be set such as
datastore. This allows your decision logic to restrict the vertices the "topologically dependent" operator traverses to only those you define. For example, "test if vertex A is connected to vertex B traversing only application type vertices".
- Edges: An edge is simply a connection between two vertices. Use edges to describe how everything is connected together. To create an edge, you need to define these attributes:
from vertex name: The name of the vertex you're starting to define an edge for.
to vertex name: The name of the vertex at which you're ending the edge.
directed: This is a boolean indicating that the edge connection goes only in one direction. The direction, if thought of as an arrow, starts at the
from vertex nameand points at the
to vertex name. This is currently used in the dependent operator.
Incidents are connected to vertices using a vertex's defining attributes. In the example topology graph under Topology explained, each vertex has a defining attribute "CID" with a unique value. Next, we find a vertex that matches the attribute of an incident.
If the defining attribute you'd like to use on your vertices isn't already on your incident events there are a few ways to add it:
- Tag your entities in New Relic: By tagging your entities, those tags will enrich the incident events generated by Alerts. For example, if you've tagged your entities with
CIDand their corresponding unique values then you can have defining attributes on your vertex as follows:
'newrelic/tags/CID' : CID_VALUE
- Facet your data: Creating NRQL alert conditions with one or more facets defined will group your data by attribute. Also, incident events emitted will be enriched with those attributes and values. For incidents, faceted attributes follow the same format:
If you need more help, check out these support and learning resources:
- Browse the Explorers Hub to get help from the community and join in discussions.
- Find answers on our sites and learn how to use our support portal.
- Run New Relic Diagnostics, our troubleshooting tool for Linux, Windows, and macOS.
- Review New Relic's data security and licenses documentation.