The Issues feed page is where you can find an overview of all your issues, along with helpful information about them. You can also click any individual issue for more detail, including its analysis summary, event log, and details about correlated issues.
Go to one.newrelic.com > All capabilities > Alerts & AI > Issues & activity. This screenshot shows an example issue feed, which describes your issues' statuses, correlations, and more.
You can easily search for any issue by using the search bar free text. This search bar enables you to search by:
- Issue name
- Issue ID
- Policy Name
- Condition Name
- Entity name
- Entity ID
If you click the filter icon, you can filter by any issue attributes or any related tag:
- Issue state (created, active, or closed)
- Issue acknowledged(true or false)
- Issue muted (true or false)
- Issue correlated (true or false)
- Issue priority (low, medium, high, or critical)
- Issue source (NR alerts, NR anomalies, or REST API)
- Alert policy
- Alert condition
- Tags (search for any tag related to the issue)
You can also use the feed sorting option to find what you're looking faster, for example sort by issue duration or created time.
The Issues page provides you with bottom line insights so you can first understand the problem, and then minimize the time to resolve.
We've created a quick demo here to walk you through the issue page user interface:
The Issues page includes the following sections:
Issue payload: This provides you the issue payload details and lets you copy the payload with a click of a button.
Issue destinations: Under the issue name, you'll see a category called Notified, which will showcase your destinations. Hover over the destinations and you'll see a detailed presentation of the destinations and a link to the ticket that has been opened.
Issue duration: At the top of the Issue name, you'll see the duration that the issue lasted.
Number of incidents: At the top of the incident activity section, you'll see the total number of incidents within the issue.
Incident list: On the left side of the incident activity section, you'll see the incident activity list, which displays the most relevant information about each incident such as the priority, state, name of the incident, the date and time it was created, and the duration. You can also sort the list by duration, incidents that are muted, newest to oldest and critical to low incidents. Finally, you can filter by only open incidents by enabling the Show open only button.
Incident activity chosen: Clicking on an incident from the incident activity list will open the incident on the right in a full view mode, which will include information like alert policy, the alert condition, and the alert condition type.
Incident graph: Clicking on an incident from the incident activity list will also open up the incident chart in full scope, which will allow you to better visualize the degradation, incident, and recovery periods.
User actions (above the chart): Clicking on an incident from the incident activity list will display certain actions that you can take:
- A Runbook URL will appear here if this condition has a Runbook URL defined.
- An entity overview button that redirects you to the entity summary page with the incident time window (any entity type).
- A See errors button appears only when there are error groups from errors inbox that are related to the same entity and occurred on the same time window as the incident.
- The ellipsis icon ... has a dropdown menu that contains the option to close a certain incident (only when the issue has more than 1 incident) and also displays the incident payload.
Incident entity section (below the chart): If you click on an incident from the incident activity list, you will see the following:
The impacted entity, entity type, and the account name of this incident belongs to.
A list of tags (entity & condition tags) and their values. Click on the Show all button to display all the tags.
A postmortem is a retrospective process that teams use to analyze what worked and what didn't when responding to and resolving an incident.
In the New Relic platform, the postmortem feature is a tool that automatically collects data related to an incident, freeing up your team to focus on analysis and action items for improved responses to future incidents.
The postmortem includes:
- the record of an incident, including descriptions
- a timeline of the incident
- the incident's impact
- the incident's root causes
- mitigation measures taken by your team
- follow-up action items to prevent the incident from recurring in the future
For detailed steps on creating a postmortem or to watch our walk-through demo, visit our Postmortem documentation page.
Root cause analysis automatically finds potential causes for an issue and its impacted entities. It shows you why open issues occurred, which deployments contributed, and relevant error logs and attributes. With this, you can investigate the problem and reduce your mean time to resolution (MTTR).
Note that root cause analysis is dependent on other New Relic data sources and features. This is why root cause analysis information may not always be present for every issue.
When you select an issue, you may see Root cause analysis information.
Root cause analysis includes three main UI sections:
- Deployment events: When you set up deployments, we provide the deployment nearest to the issue creation. Changes, such as deployments, account for a high percentage of the root causes of incidents and having that information at hand can help diagnose and resolve issues.
- Error logs: You can explore millions of log messages with a single click and use manual querying to help you find anomalous patterns and hard-to-find problems.
- Attributes to investigate: We scan the distribution of attributes and surface possible causes by finding significant changes in the distribution. This section also shows changes in database and external metrics. You can also query interesting attributes.
Mouse over an entity to see more information about it.
In the Impacted entities section, an issue map is available for any issue involving two or more entities. The map shows not only the affected entities, but also the services and resources directly related to those entities.
Impacted entities: An entity is anything that has data you can monitor. Specifically, these are focused on incidents from New Relic sources, extracting the entities and providing a summary. Each entity is unique. You can see your entities in a list or on a map.
Depending on the data in an issue, all four of these sections can show up together for each issue or separately. All you've to do is click on the 3 dots next to the entity to open the dropdown menu with options as follow:
- See dashboards helps users in your account look at and interact with dashboards you've created that are related to an entity. The queries you've run to power the various widgets are automatically mapped to entities whenever possible and are presented back to you here for quick access and discovery.
- Entity view will open the application's anomalies page. This is only available for applications that are set up for proactive detection.
- There are two types of deployment events: deployments and related deployments. Click Show all deployments to see all your deployment events when they arrive, or click a specific deployment to see its deployments page. The APM deployment page lists recent deployments and their impact on your end user and app server's Apdex scores, response times, throughput, and errors. This section will only show up if New Relic has identified applications under the impacted entities that have deployments.
The issue timeline, as presented below, shows you a breakdown of:
- The trends taking place
- What incidents are active
- What incidents are resolved
- What is correlated to each other
- Various milestones at different issue levels
If you're interested in viewing the issue logs, you can simply toggle to the issue log button where you'll be able to view the timestamp and notification details. You can also click on "show more" to see the full issue log.
In addition, you'll see a grey line at the top of the timeline. In comparison to the visual timeline that shows the changes to each incident, the grey line represents changes to the issue.
Mouse over the grey line to see details of the event.
Finally, mouse over the incident to see information on the location, timing, and level of importance of a specific incident.
This figure shows a particular incident populated on January 11th with a level of Critical.
To view the issues in a text format, in the right hand corner, click Switch to issue log view.
Here you will find an indication about which of the 4 golden signals is related to issue, it means that there is a problem affecting the performance or availability of your distributed system in one or more of the key areas.
We're analyzing the title and based on that we get value from the list components.
We also present a list of tags that are derived from the stack trace, based on Stack Overflow tags analysis.
To further reduce noise or get improved incident correlation, you can change or customize your decisions. Decisions determine how incidents are grouped together.
To get started, see Decisions.