After you set up Incident Intelligence, our system will begin finding issues from your data sources.
In the issue feed, you can find an overview of all your issues, along with helpful information about them. You can also click any individual issue for more detail, including its analysis summary, event log, and details about correlated issues.
This screenshot shows an example issue feed, which describes your issues' statuses, correlations, and more.
What's the difference between an issue, incident, and event? In short, these terms are like building blocks. Events are raw data from your sources. Incidents are made up of one or more events. Issues are composed of one or more incidents.
In more detail:
- Events indicate a state change or trigger defined by your monitoring systems. An event contains information about the affected entity, and they are almost always triggered automatically by the system.
- Incidents are groups of events that describe the "symptoms" of your system over time. These symptoms are detected by your monitoring tools, which evaluate your data streams and events.
- Issues are groups of incidents that describe the underlying problem of your symptoms. When a new incident is created, Incident Intelligence opens an issue and evaluates other open issues for correlations.
The issue page is built to first provide you with bottom line insights to understand the problem, and to minimize the time you need to resolve it.
The issue page is made of:
- The Analysis Summary: the analysis summary has two machine learning modules, the golden signals and the related components.
- The Suggested Responder: the suggested responder will tell you who to potentially reach out to on your team to solve a specific problem.
- The Impacted Entities: an entity is anything that has data you can monitor. Specifically, these are focused on incidents from New Relic sources, extracting the entities and providing a summary. Each entity is unique.
- The Label Sets: label sets are focused on incidents that come from 3rd party sources, such as PagerDuty, AWS Cloudwatch, REST APIs, etc., as well as for NRQL queries. They come in the form of key:value pairs.
Depending on the data in an issue, all four of these sections can show up together for each issue, or separately.
In addition to these four sections, you can also take a look at anomaly overview and entity overview directly from the issue feed.
If you hover over an impacted entity application, you’ll notice both call to actions: anomaly overview and entity overview. Anomaly overview will open the application's anomalies page. This is only available for applications that are set up for Proactive Detection.
Finally, the issue page contains deployment events.
APM’s Deployment page lists recent deployments and their impact on your end user and app server's Apdex scores, response times, throughput, and errors. This section will only show up if New Relic has identified applications under the Impacted Entities that have deployments.
There are two types of deployment events: deployments and related deployments. Click Show all deployments to see all your deployment events when they arrive. Click a specific deployment to see its APM deployments page.
If you’re using PagerDuty or New Relic alerts violations as your incident notification tools, Incident Intelligence suggests relevant team members that can help resolve your issues.
Incident Intelligence learns from your PagerDuty and alerts violations data to provide suggestions for each new incident. Once you receive a suggestion, you can contact the responder or search for relevant documentation that person may have written.
To get started, enable PagerDuty or alerts violations as a source for Incident Intelligence. Afterwards, you can view the suggestions in two places:
- The issue feed, where you can also provide feedback on the suggestions.
- Directly within PagerDuty (both UI and API.) If you’re also using PagerDuty as a destination, the suggestions will appear in your issue notifications payload.
This feature doesn't account for on-call availability at the time of incident.
In order to train the model, we use the information PagerDuty provides about individuals. We ingest incident information only, not users’ contact details.
Root cause analysis automatically finds potential causes for an issue and its impacted entities. It shows you why open issues occurred, which deployments contributed, and relevant error logs and attributes. With this, you can investigate the problem and reduce your mean time to resolution (MTTR).
Note that root cause analysis is dependent on other New Relic data sources and features. This is why root cause analysis information may not always be present for every issue.
When you select an issue, you may see Root cause analysis information.
Root cause analysis includes three main UI sections:
- Deployment events: When you set up deployments, we provide the deployment nearest to the issue creation. Changes, such as deployments, account for a high percentage of the root causes of incidents and having that information at hand can help diagnose and resolve issues.
- Error logs: You can explore millions of log messages with a single click and use manual querying to help you find anomalous patterns and hard-to-find problems.
- Attributes to investigate: We scan the distribution of attributes and surface possible causes by finding significant changes in the distribution. For example, for every single transaction event, we can scan to see if an individual user starts to take up an anomalous share of the requests sent to your app. You can also query interesting attributes.
The issue timeline, as presented below, shows you a breakdown of:
- The trends taking place
- What incidents are active
- What incidents are resolved
- What is correlated to each other
- Various milestones at different issue levels
In addition, you’ll see a grey line at the top of the timeline. In comparison to the visual timeline that shows the changes to each incident, the grey line represents changes to the issue.
Mouse over the grey line to see details of the event.
Finally, mouse over the incident to see information on the location, timing, and level of importance of a specific incident.
This figure shows a particular incident populated on January 11th with a level of Critical.
To view the issues in a text format, in the right hand corner, click Switch to issue log view.
The related activity section aggregates a set of incidents into a single issue, according to a rule-based system.
This section will show you the Last Update, the Source location, its State, the number of Related Events, and where it Originated. You can also copy the Payload or click on Analyze for more information.
To further reduce noise or get improved incident correlation, you can change or customize your decisions. Decisions determine how Incident Intelligence groups incidents together.
To get started, see Decisions.
If you need more help, check out these support and learning resources:
- Browse the Explorers Hub to get help from the community and join in discussions.
- Find answers on our sites and learn how to use our support portal.
- Run New Relic Diagnostics, our troubleshooting tool for Linux, Windows, and macOS.
- Review New Relic's data security and licenses documentation.