Our incident intelligence uses logic to correlate your incidents. We call these logic "decisions." We have built-in decisions, and you can create and customize your own on the incident intelligence Decisions UI page. The more optimally you configure decisions, the better we will group and correlate your incident events, resulting in reduced noise and increased context for on-call teams.
one.newrelic.com > Applied intelligence > Incident intelligence > Decisions: Our UI shows how each decision correlates incidents.
Here are some key concepts for understanding our decisions logic:
For the incident events being sent into incident intelligence from your various alerting engines, the most recently created and active incidents are available for correlation. Correlation occurs between two events whenever the criteria in any decision are met.
All the events available for correlation are tested against each other in all possible pair combinations, and a "greedy merge" is performed. This means that if incident A correlates with B into an issue, and incident B correlates with incident C into another issue, then AB and BC will also merge together. This results in a single issue including A, B, and C.
At a high level, decisions allow you to define logic based on time (duration between events), frequency (number of events), context (metadata structure and values), and topology (entity relationships).
Decisions determine how incident intelligence correlates incidents together. By default, a broad set of global decisions is enabled when you start using incident intelligence.
To review existing decisions:
- Go to one.newrelic.com and click Alerts & AI. In the left navigation, under Incident intelligence, click Decisions.
- Review the list of active decisions. To see the rule logic that creates correlations between your issues, click the decision.
- To see examples of incidents the decision correlated, click the Recent correlations tab.
- Use any of the other options to enable or disable these global decisions.
Your decisions will routinely be analyzed for their efficacy as well as other best practices, and recommendations will be attached for review.
Whether you want to use the suggested decisions we provide using our pattern recognition algorithms, or you want to add in your own correlation logic, you'll get insights into your correlation rate, noise reduction improvement, and the number of correlated issues arising.
From the UI, you can view the underlying NRQL queries and create your own custom charts and dashboards from this data.
one.newrelic.com > Applied intelligence > Incident intelligence > Decisions: Some example statistics from the decisions UI.
Some definitions of statistics:
- Correlation rate: The percentage of the time correlations are occurring versus not.
- Total correlated issues: Number of issues correlated with another issue.
- Noise reduction: Total number of issues after correlation divided by the total number of issues before correlation.
- Correlation reason: Shows which decisions are correlating issues the most.
Information about the types of suggested decisions:
- Suggested decisions: The data from your selected sources is continuously inspected for patterns to help reduce noise. Once patterns have been observed in your data, decisions are suggested that would allow these events to correlate in the future.
- Accelerated suggested decisions: If you have been using alerts for a while, when you add alert policies to sources, we're able to use that historical data to accelerate the pattern recognition step and suggest decisions up to 30% faster.
To get started, click on a suggested decision, located under the statistics block on the Decisions UI page. You'll see information on the logic behind the suggested decision, why we think it will help you, and the estimated correlation rate for that decision.
one.newrelic.com > Applied intelligence > Incident intelligence > Decisions: Suggested decision block.
If there isn't enough data to see the correlation rate, a link below the percentage estimate will guide you to other sources you can add to get stronger results. If you have less than 5000 incidents per month, you probably won't have suggested decisions.
To enable the suggested decision, click Activate decision. If the decision isn't relevant to your needs, click Dismiss.
You can reduce noise and improve correlation by building your own custom decisions. To start building a decision, go to one.newrelic.com and click Alerts & AI. In the left navigation, under Incident intelligence, click Decisions, then click Add a decision.
When building a decision, steps 1, 2, and 3 are optional on their own, but at least one must be defined in order to create a decision.
Step 1: Filter your data (optional)
In this step you'll define your filters. Remember correlation occurs between two incidents. If no filters are defined, all incoming incidents will be considered by the decision.
Define your filters for the first segment (or bucket) of incidents, and the second segment of incidents. Types of filter operators range from substring matching to regex matching to help you target the incident events you want and exclude those you don't.
All combination of event pairs between segment one and segment two are used in the next steps of the decision.
Step 2: Correlate context (optional)
Once you've filtered your data, define the logic used when comparing the incidents' context. You can correlate events based on the following methods:
Step 3: Correlate topology (optional)
In addition, you can setup topology data via our NerdGraph
Step 4: Give it a name
After you configure your decision logic, give it a recognizable name and description. This is used in notifications and other areas of the UI to indicate which decision caused a pair of incidents to be correlated together.
Step 5: Use advanced settings (optional)
Optional. Use the advanced settings area to further customize how your decision behaves when correlating events. Each setting has a default value so customization is optional.
Here are technical details on the similarity algorithms we use:
When building a decision, available operators include:
contains (regex): used in Step 1: Filter your data.
regular expression match: used in Step 2: Contextual correlation.
The decision builder follows the standards outlined in these documents for regular expressions.
In order for your regex to test as true, the entire attribute value (the data you're evaluating) must be matched by the regular expression provided. Captured groups can be used but are not explicitly evaluated.
For instance, if the attribute value is
foobarbaz, these examples would meet the criteria and test as true:
In order for your regex to test as true, the entire attribute values for incident 1 and incident 2 must be included in the match. Also, each captured group (expressions in
( ) parentheses) must exist in both values (incident 1 and incident 2 attributes), and have the same value:
- The number of captured groups must be equal for both incident attributes.
- Each group must be equal to the corresponding group between attribute values: the value of the first captured group in the incident 1 attribute value is equal to the value of the first captured group in the incident 2 attribute.
For instance, if attribute value 1 is
abc-123-xyz and attribute value 2 is
(\w+)-(?:\w+)-(\w+) would meet the criteria:
- The whole value is matched by the expression.
- The first and third captured groups have the same respective values.
- The second group is not captured using
?:, which allows the whole value to match but isn’t used in the capture group comparison.
No flags are enabled by default. Some useful flags to include in regular expressions in the decision builder are:
- CASE_INSENSITIVE: (?i)
- MULTILINE: (?m)
- DOTALL: (?s)
See Oracle's field detail documentation for more notes on the function and implementation of each of these flags.
You can use the correlation assistant to more quickly analyze incidents, create decision logic, and test the logic with a simulation. To use the correlation assistant:
- From one.newrelic.com, click Alerts & AI, click Issues & Activity, then click the Incidents tab.
- Check the boxes of incidents you'd like to correlate. Then, at the bottom of the incident list, click Correlate incidents.
- For best results for correlating incidents, select common attributes with a low frequency percentage. Learn more about using frequency.
- Click Simulate to see the likely effect of your new decision on the last week of your data.
- Click on examples of correlation pairs to determine which correlations to use.
- If you like what's been simulated, click Next, and then name and describe your decision.
- If the simulation result shows too many potential incidents, you may want to choose a different set of attributes and incidents for your decision and run another simulation. Learn more about simulation.
Two types of attribute analysis appear in the UI:
- Common attributes: This analysis simply highlights attributes and values that are the exact same between all selected incidents.
- Similar attributes: Similarity analysis uses the Levenshtein algorithm with a distance of 3 to find attributes whose values would be the same if 3 or fewer character changes are performed.
- Numerical values as well as single character values are filtered out of the results.
- Requires two incidents to be selected. Similarity analysis is not performed when 3 or more incidents are selected.
To create the best decisions, we recommend choosing common attributes that have a lower frequency in your incidents. Here are tips for understanding how choosing low or high frequency attributes affects your decisions:
- Low frequency: As an example, an attribute with a 0% in the frequency column is likely a unique identifier or an attribute that only recently reported in your data in the last month. Choosing low frequency attributes may correlate few events.
- High frequency: On the other end, an attribute with 100% frequency would be one that is present on all your data. Choosing these attributes would correlate all of your events together.
By default, the attributes are sorted by frequency with the least frequently reported attributes at the top. Click an attribute's frequency percentage to get more information about the distribution of values we've seen reported for that attribute in the last month.
Using a set of incidents, the correlation assistant identifies common attributes among those incidents as well as attributes with similar values. Select attributes you believe are good indicators that events should correlate, and then simulate the decision. Simulation will test the logic against the last week of your data and show you how many correlations would have happened, in addition to actual examples to inspect.
If the simulation looks good, continue creating your real decision. If the simulation doesn't show examples of useful correlations, choose a different set of attributes, and run the simulation again.
Here's a breakdown of the decision preview information displayed when you create a simulation:
- Potential correlation rate: The percentage of tested incidents this decision would have affected.
- Total created incidents: The number of incidents tested by this decision.
- Total estimated correlated incidents: The estimated number of incidents this decision would have correlated.
- Incident examples: A list of incident pairs this decision would have correlated. You can click on these to see a side by side comparison of all attributes and values to help you determine if the correlation is desired or not.
Run the simulation with different attributes as many times as you need until you see results you like. When you're ready, follow the UI prompts to save your decision.
What do we mean by topology? For New Relic's applied intelligence, topology is a representation of your service map: how the services and resources in your infrastructure relate to one another.
For decisions users, a default topology decision is added and enabled in your account. You also have the option to create custom decisions.
Our topology correlation finds relationships between incident sources to determine if incidents should correlate. Topology correlation is designed to improve the quality of your correlations and the speed at which they're found.
For automatic topology correlation (without the need to explicitly set up topology graph), make sure your telemetry data is collected by New Relic agents. The more types of New Relic agents are installed in your services and environment, the more opportunities for topology decisions to correlate your incidents.
In this service map, the hosts and apps are the vertices, and the lines showing their relationships are the edges.
Customized topology correlation relies on two main concepts:
- Vertex: A vertex represents a monitored entity. It's the source from which your incident events are coming from, or describing a problematic symptom about. A vertex has attributes (key/value pairs) configured for it, like entity GUIDs or other IDs, which allow it be associated with incoming incident events.
- Edges: An edge is a connection between two vertices. Edges describe the relationship between vertices.
It may help to understand how topology is used to correlate incidents:
First, New Relic gathers all relevant incidents. This includes incidents where decision logic steps 1 and 2 are true and that are also within the defined time window in advanced settings.
In this example, all of the incidents in the dotted-line selection have met these requirements: they've gone through the decision logic in steps 1 and 2, and all contextual comparisons made in decision logic step 2 are true.
Next, we attempt to associate each incident to a vertex in your topology graph, using a vertex's defining attributes and the available attributes on the incident.
Here's an example of the steps for associating incidents with the information in the topology graph.
Then, the pairs of vertices which were associated with incidents are tested using the "topologically dependent" operator to determine if these vertices are connected to each other. This operator checks to see if there is any path in the graph that connect the two vertices within five hops.
The incidents are then correlated and the issues are merged together.
Incidents are connected to vertices using a vertex's defining attributes. (In the example topology under Topology explained, each vertex has a defining attribute "CID" with a unique value.) Next, applied intelligence finds a vertex that matches the attribute.
If the defining attribute you'd like to use on your vertices isn't already on your incident events, use either of these options to add it:
- Tag your entities in New Relic: By tagging your entities, those tags will enrich the incident events generated by alerts. For example, if you've tagged your entities with
CIDand their corresponding unique values, then you can have defining attributes on your vertex as follows:
'newrelic/tags/CID' : CID_VALUE
- Facet your data: Creating NRQL alert conditions with one or more facets defined will group your data by attribute. Also, incident events emitted will be enriched with those attributes and values. For incidents, faceted attributes follow the same format:
To set up your topology or view existing topology, see the NerdGraph topology tutorial.