Understand and use distributed trace UI

New Relic's distributed tracing helps you monitor and analyze the behavior of your distributed system. This document explains how to understand and use the UI.

Get started

In New Relic, you can find the distributed tracing UI in two places:

In New Relic One

To understand how distributed tracing data is displayed in New Relic One, see Distributed tracing in New Relic One.

In New Relic APM

Go to rpm.newrelic.com/apm > (select an application) > Distributed tracing.

Overview of UI

Our distributed tracing feature has three main UI views:

Solve problems with distributed tracing

Here are some examples of using distributed tracing to solve problems:

Diagnosing and fixing a latency issue
  1. An engineer receives an alert notification showing that their service is experiencing frequent high latency.
  2. The engineer goes to the APM Overview page and sees a large increase in response time from dependent services.
  3. They go to the list of traces and applies a filter for duration > 2, so they can look at only the traces with spans over that duration.
  4. They select a trace with unusually long latency. On the trace details page, they see there's a call being made to a dependent service that is querying a database that is responding very slowly.
  5. They check with the owner of that service and learn that that team is currently working on scaling the database due to unexpected high throughput.
Analyze errors in requests spanning multiple services
  1. An engineer is tasked with troubleshooting errors occuring in a complex transaction spanning many services.
  2. They go to the list of traces and filter down to that specific request.
  3. They filter down to show only traces containing errors.
  4. On the trace details page, they can see the span along the request route that originated the error.
  5. Noting the error class and message, they navigate to the service from its span in the trace and see that the error is occurring at a high rate. They ask the service owner to look into that error.

List of traces UI page

New Relic distributed tracing main page with numbers
rpm.newrelic.com/apm > (select an application) > Distributed tracing: On the first distributed tracing UI page, you can see all available traces and filter down to those having specific criteria. (This screenshot shows New Relic APM, not New Relic One distributed tracing.)

Some features of the trace list (screenshot above):

  1. Trace scatter plot: Shows a plot of traces that lets you easily see outliers. Select a plot point to see details for that trace.
  2. Trace list: Shows a list of traces, along with information like the root span duration, the number of spans in a trace, the number of errors, and the number of services. Select a trace to see trace details.
  3. Filter: View only traces that contain spans with certain traits. For advanced options, select the filter dropdown. For details on using feature, see Filter details.
  4. Group by: Group the displayed trace scatter plot in different ways. For more on the grouping options, see Trace grouping.

Trace details UI page

New Relic distributed tracing trace waterfall UI
rpm.newrelic.com/apm > (select an application) > Distributed tracing > (select a trace) > (select a span): See the spans in a trace. Examine individual span details and see notifications for spans with anomalous behavior. (This screenshot shows New Relic APM, not New Relic One distributed tracing.)

When you select a trace from the trace list, you see that trace's timeline and spans (see screenshot above).

The UI indicates some basic span properties with icons:

Span property Indicator Description
Service New Relic distributed tracing datastore span icon This icon represents a span that's a service's entry point.
In-process New Relic distributed tracing in-process span icon This icon represents an in-process span, which is a span that takes place within a process (as opposed to a cross-process span). Examples: middleware instrumentation, user-created spans.
Datastore New Relic distributed tracing datastore span icon This icon represents a span call to a datastore.
External New Relic distributed tracing external span icon This icon represents category representing a call to an external service made via HTTP.
Lambda New Relic distributed tracing external span icon This icon represents a span from a Lambda function.

Some spans will have other indicators of span properties and relationships:

Span property Indicator Description
Type of connection Lines (dotted, solid) Solid lines indicate a direct parent-child relationship; in other words, one process or function directly calling another. A dotted line indicates a non-direct relationship. For more on relationships between spans, see Trace structure.
Errors Red text A span with red text indicates the presence of an error.
Anomalous New Relic distributed tracing datastore span icon This icon represents the detection of an anomalous span.
Orphaned spans New Relic distributed tracing - fragmented trace icon Some spans may be "orphaned," or separated, from the trace. These spans will appear at the bottom of the trace. For more details, see Fragmented traces.
Multiple app names New Relic distributed tracing multiple app names indicator When beside a span name, this represents an application that has had multiple app names set in New Relic. Selecting this will show you all app names it reports to.
Client/server time difference New Relic distributed tracing client-server time difference indicator If a span's duration indicator is not completely colored in (like in this example), it means that there is a time discrepancy between the server-side duration and the client-side duration for that activity. For details on this, see Client/server time difference.

For more on the trace structure and how span properties are determined, see Trace structure.

Span details pane

When you select a span, a pane opens up with span details. These details can be helpful for troubleshooting performance issues. Details include:

What exactly a span displays is based on its span type. For example, a datastore span's name attribute will contain the datastore query.

Trace grouping options

When viewing the trace scatter plot, there are several Group traces by options:

  • Errors: Group by whether traces contain errors or not.
  • Root service: Group by the name of the first service in traces. In a trace where Service A calls Service B and Service B calls Service C, the root service would be Service A.
  • Root entry span: Group by the root transaction: in other words, the root service's endpoint. In a trace where Service A calls Service B and Service B calls Service C, the root entry span is Service A's endpoint. For example: "Service A - GET /user/%".
  • Service entry span: Group by the span name of the service currently being viewed in APM. For example, for a trace where Service A calls Service B and Service B calls Service C: if you are viewing Service B in APM and select this grouping, the traces will be represented by their Service B span names. If a service has multiple spans in a trace, this grouping option will use that service's first entry point.

Query trace data

For how to query your trace data, see Example queries.

Additional UI details

Some additional distributed tracing UI details, rules, and limits:

Anomalous spans

If a span is displayed as anomalous in the UI, it means that the following are both true:

  • The span is more than two standard deviations slower than the average of all spans with the same name from the same service over the last six hours.
  • The span's duration is more than 10% of the trace's duration.
Client span duration: time differences between client and server spans

When a process calls another process, and both processes are instrumented by New Relic, the trace contains both a client-side representation of the call and a server-side representation. The client span (calling process) can have time-related differences when compared to the server span (called process). These differences could be due to:

  • Clock skew, due to system clock time differences
  • Differences in duration, due to things like network latency or DNS resolution delay

The UI shows these time-related differences by displaying an outline of the client span in the same space as the server span. This span represents the duration of the client span.

It isn't possible to determine every factor contributing to these time-related discrepancies, but here are some common span patterns and tips for understanding them:

New Relic distributed tracing client vs server time discrepancy diagram
  1. When a client span is longer than the server span, this could be due to latency in a number of areas, such as: network time, queue time, DNS resolution time, or from a load balancer that we cannot see.
  2. When a client span starts and ends before a server span begins, this could be due to clock skew, or due to the server doing asynchronous work that continues after sending the response.
  3. When a client span starts after a server span, this is most likely clock skew.
Fragmented traces

Fragmented traces are traces with missing spans. When a span is missing or has invalid parent span IDs, its children spans become separated from the rest of the trace, which we refer to as "orphaned." Orphaned spans appear at the bottom of the trace, and they will lack connecting lines to the rest of the trace. Types of orphaned span properties indicated in the UI:

  • No root span. Missing the root span, which is the first operation in the request. When this happens, the span with the earliest timestamp is displayed as the root.
  • Orphaned span. A single span with a missing parent span. This could be due to the parent span having an ID that doesn't match its child span.
  • Orphaned trace fragment. A group of connected spans where the first span in the group is an orphan span.

This can happen for a number of reasons, including:

  • Collection limits. Some high-throughput applications may exceed collection limits (for example, APM agent collection limits, or API limits). When this happens, it may result in traces having missing spans. One way to remedy this is to turn off some reporting, so that the limit is not reached.
  • Incorrect instrumentation. If an application is instrumented incorrectly, it won't pass trace context correctly and this will result in fragmented traces. To remedy this, examine the data source that is generating orphan spans to ensure instrumentation is done correctly. To discover a span's data source, select it and examine its span details.
  • Spans still arriving. If some parent spans haven't been collected yet, this can result in temporary gaps until the entire trace has reported.
  • UI display limits. Orphaned spans may result if a trace exceeds the 10K span display limit.
Filter functionality

There are differences in how trace filtering works in New Relic One and New Relic APM:

New Relic APM

Some rules governing trace filtering:

  • Filtering based on current application. The attributes available for filtering are only those available in the application you are currently viewing in New Relic. If you do not see attributes you expect to see, it's probably because they are not available for the application you are viewing. If that is the case, you will need to go to the application index and select the application where those attributes are captured.
  • Multi-attribute filter affected by first attribute selected. There are two types of event data distributed tracing reports: transaction events and span events. When you select an attribute in the filter, the event that attribute is attached to will dictate the available attributes. For example, if you filter on an attribute that is attached to a Transaction event, only Transaction event attributes will be available when you attempt to add filter on additional attribute values.
New Relic One

In New Relic One distributed tracing, you can search for attributes from across all spans in a trace.

For details, see Distributed tracing in New Relic One.

Trace details obfuscated based on account access

If you don’t have access to the New Relic accounts that monitor other services, some of the span and service details will be obfuscated in the UI. Obfuscation can include:

  • Span name concealed by asterisks
  • Service name replaced with New Relic account ID and app ID

The two main factors affecting this obfuscation:

  • Account permissions. Master/sub-account relationships will impact access. If you have access to only a sub-account, you’ll be able to see details for only that sub-account. If you have access to a master account, you’ll be able to see details for that account’s sub-accounts.
  • Authentication. You’ll be able to see span details only for New Relic accounts you can access based on your current login. This means that, for example, even the admin of a master account may not be able to see all details if the trace crosses the boundaries of different authentication mechanisms.
Span collection and display limits, and sampling details

New Relic APM agents have a limit of 1,000 on the number of spans that can be collected per agent instance. The agents use sampling to select the requests chosen for a trace. For more on sampling and agent span limits, see Sampling.

The maximum total number of spans displayed in the UI is 10,000.

Incomplete span names in waterfall view

When viewing the span waterfall, span names may be displayed in an incomplete form that is more human-readable than the complete span name. To find the complete name, select that span and look for the Full span name. Knowing the complete name can be valuable for querying that data in Insights.

Missing spans and span/service count discrepancies

A trace may sometimes have (or seem to have) missing spans or services. This can manifest as a discrepancy between the count of a trace's spans or services displayed in the trace list and the count displayed on the trace details page.

Reasons for missing spans and count discrepancies include:

All spans collected, including those not displayed, can be queried in Insights.

For more help

Recommendations for learning more: