Error analytics: Explore the events behind the errors

Depending on your New Relic agent version and the time period you select, New Relic APM's Error analytics feature can provide either:

  • An events view, which allows detailed investigation of error information within the past eight days through grouping and filtering
  • A metrics view, which provides summary error rate information for any time period that falls outside or extends beyond the past eight days

Access to this feature depends on your subscription level.

Caps on error reporting

New Relic caps error reporting at 100 events per minute per agent instance. This prevents error reporting from negatively impacting application performance. If your error rate exceeds this cap, a Too many errors banner message appears on the Error analytics page to let you know that New Relic has not recorded every error.

Examples:

  • App running across five hosts: New Relic caps error reporting at 100 events per minute x 5 instances = 500 events per minute.
  • App running on one host with ten instances: New Relic caps error reporting at 100 events per minute x 10 instances = 1000 events per minute.

View the Error analytics page

To view the Error analytics page:

  • From rpm.newrelic.com, select APM > Applications > (selected app) > Events > Error analytics.

    OR

  • From rpm.newrelic.com, select APM > Applications > (selected app) > Monitoring > Overview, then select the Error rate table's title.

Use any of New Relic's standard page functions to drill down into detailed information.

New Relic APM: Error analytics with alerts
APM > Applications > (selected app) > Events > Error analytics: The Error rate chart always shows the rate and count for all errors. To drill down further, use grouping and filter options for the Top 5 errors chart. Here is an example of an app in an alert state. (The chart background changes to light pink at the Warning threshold and to dark pink for a Critical condition.) In addition, use the Error traces table and Error frequency heatmap to explore specific error details and trends over time.

Select the time period for error data

With the Error analytics events view and the time picker, you can examine details of error events over the past week. The error events view is available for up to a seven-day window of data collected over the last eight days.

You may notice slight differences in count if your time window is ending now. This occurs because the counts for the list and table may be requested at slightly different times as the page auto-refreshes.

Use the error events view workflow

Here is a basic workflow describing how to get the most out of the information you see from the Error analytics events view.

  1. Start with the Error rate chart to see at a glance whether there are any unexpected spikes, dips, or patterns with errors in general.
  2. Correlate any general patterns on the Top 5 errors chart to alerts occurring during the same time period. Use the groups and filters to examine the error events and attributes in more detail, and look for patterns with error messages or transaction names.
  3. Explore and share Error trace table information, including specific stack trace details: associated host, user, framework code, custom attributes, etc.
  4. Identify error patterns on the Error frequency heatmap for a selected grouping (host, error message, custom attributes, etc.) within a time range.
Error rate chart: See patterns immediately

Start with the Error rate chart to see at a glance whether there are any unexpected spikes, dips, or patterns with errors in general. For example, are there any spikes near a recent deployment? You may want to change the selected time period to look for other historical patterns.

This chart always shows the overall error rate and count for the selected time period, even when you filter the rest of the page. If you want to focus your investigation on a particular type of error, use the Top 5 errors chart or the Error traces table.

New Relic APM: Error analytics
APM > Applications > (selected app) > Events > Error analytics: Here is an example of the events view. The Error rate chart always shows the rate and count for all errors. To drill down further, use grouping and filter options for the Top 5 errors chart, or examine the Error traces and Error frequency data.
Top five errors: Correlate to alerts or Insights events

Use the Top 5 errors chart to identify what error types and how many of those errors occurred during the same time period as the Error rate chart. For example:

  • Is the error spike related to a specific class?
  • Do the top errors point to a new host that recently got moved into production?
  • Are the top error messages repeatedly about failed connections by hosts that you know are in a specific region of your organization?
  • Have the chart backgrounds changed color to indicate an alert condition? (Light pink indicates the alert condition's Warning threshold, and dark pink indicates the Critical threshold.)
New Relic APM: Error analytics grouping and filters
APM > Applications > (selected app) > Events > Error analytics: Here is an example of grouping by HTTP response codes, then selecting 404 from the list of HTTP response codes to filter and look for trends related to 404 errors. The Error rate chart still shows all errors, but the Top 5 errors chart now shows only 404 errors during the same time period.
If you want to... Do this...
Change the "top 5" selection By default, Top 5 errors chart shows the top five errors by class. To filter or group by other attributes, such as error message, host, or transaction name, or to select any of your custom attributes, use the search window, or select Back to groupings list.
Explore or share the error data in Insights The Top 5 errors chart uses New Relic Insights error event default attributes along with any custom attributes you have added to this event type. To examine the Top 5 errors data in more detail, or to share it with others, select the View query or View in Insights links that appear when you hover below the chart.
Error traces: Dive deeper into stack traces, framework code, and more

Supplementing the two charts, the Error traces table groups errors by the transaction name and error class, and links them to relevant error traces.

New Relic APM: Error analytics traces sort
APM > Applications > (selected app) > Events > Error analytics: Changing the sort order on any Error traces table column can help surface patterns more quickly. Here is an example of changing the sort order by error message, which immediately identifies a pattern with the execution expired message.

Each row helps you find answers to questions such as:

  • How many of this transaction/class occurred within the selected time period?
  • What is the most recent error message?
  • When did it first and last occur?

Sometimes it may be more useful to examine error trace data from lowest to highest. For example:

  • Which error has the fewest number of occurrences?
  • When did a particular error stop (Last occurrence)?

You can change the sort order or filter options to focus on just the types of errors that matter the most to you and your teams. In addition, from the Error traces table, you can drill down into the stack trace and framework code, explore the related transaction, file a ticket, and more.

Error frequency: Compare counts over time

To examine error counts and traces by a particular category within a specific time period, select any of the available attributes from the Back to groupings list. For example, to compare error counts between hosts, select Host as the grouping, and then filter by an individual host to see only the error traces for it.

New Relic APM: Error analytics frequency
APM > Applications > (selected app) > Events > Error analytics: Here is an example of the Error frequency heatmap with grouping by error messages over the past seven days. The darker the color, the more errors that occurred during that period.

The shaded heatmap immediately helps you identify patterns; the darker the color, the more errors that occurred during that period. To explore even deeper, select any area on the heatmap to see details including:

  • Total count
  • Number of traces created
  • Time period
  • Error trace details

Use the error metrics view

Use the error metrics view to analyze error trends in your applications over time. The metrics view is available for accounts, agents, and time windows that do not have access to the error events view.

The error metrics view includes these components:

  • A frequency chart of the top five errors for the specified time window, by transaction name
  • Application overview metrics showing overall error rate and status for alerts and deployment markers to provide additional context
  • A list of recent error traces, depending on the selected time period
New Relic APM: Error analytics metrics view
APM > Applications > (selected app) > Events > Error analytics: Here is an example of the metrics view showing data over the selected seven-day time period. Deployment markers may be useful to identify reasons behind spikes or dips in errors.

For more help

Additional documentation resources include:

Join the discussion about New Relic APM in the New Relic Online Technical Community! The Technical Community is a public platform to discuss and troubleshoot your New Relic toolset.

If you need additional help, get support at support.newrelic.com.