Understand and use the distributed tracing UI

Distributed tracing helps you monitor and analyze the behavior of your distributed system. After you enable distributed tracing, you can use our UI tools to search for traces and analyze them.

For example, let's say you are an engineer troubleshooting errors in a complex transaction spanning many services. Here's what you can do in our UI:

  1. Open the distributed tracing UI page.
  2. Sort through your traces using a filter to find that specific request and show only traces containing errors.
  3. On the trace details page, you review the span along the request route that originated the error.
  4. Noting the error class and message, navigate to the service from its span in the trace so you can see that the error is occurring at a high rate.

Read on to explore the options in the distributed tracing UI.

See distributed tracing UI

Here's how you can access the distributed tracing UI, depending on the type of search you want to do:

View traces for a specific service

The Entity explorer and APM are two menu options that help you navigate to a specific service so you can see traces that include that service.

  1. Go to one.newrelic.com.
  2. Click Entity explorer or APM in the top menu bar.
  3. Filter to the service you enabled for distributed tracing by typing the service name, and then press Enter.
  4. In the left navigation's Monitor section, click Distributed tracing.
View traces across all accounts

If you want to view traces from across all accounts you have access to, there are two ways to view that UI:

If you don't have access to accounts for some services in a trace, we'll obfuscate some details for those services.

Sort through your traces

The opening distributed tracing page is populated with a list of traces, and you can quickly refine this list using these tools:

In addition to these tools, you can also use other options mentioned in Query distributed trace data.

Filter using the query bar

The Find traces query bar is a quick way to narrow your search for traces. You can either start typing in the query bar or use the dropdown to create a compound query.

Query returns are based on span attributes, not on trace attributes. You define spans that have certain criteria, and the search displays traces that contain those spans.

If you use a multi-attribute filter, it is affected by first attribute selected. Distributed tracing reports on two types of data: transaction events and spans. When you select an attribute in the filter, the data type that attribute is attached to dictates the available attributes. For example, if you filter on an attribute that is attached to a transaction event, only transaction event attributes are available when you attempt to add filter on additional attribute values.

Queries for traces are similar to NRQL (our query language). Here are the main exceptions:

  • String values don't require quote marks (for example, you can use either appName = MyApp or appName = 'MyApp')
  • The like operator doesn’t require % (for example, you can use either appName like product or appName like %product%).

Some queries that return a large number of results may return false positives. The trace list limits these incorrect results to 10% of the returned results. False positives may also result in histogram chart results that are not displayed in the trace list.

Here are two query bar examples:

Find traces that touch two services

The query in the image below finds traces that:

  1. Pass through both WebPortal and Inventory Service applications
  2. Have an Inventory Service datastore call that takes longer than 500 ms
  3. Contains an error in any span.
Go to one.newrelic.com > Apps > Distributed tracing
Find error spans using the like operator

The query in the image below finds traces that:

  1. Contain spans that pass through the WebPortal application and where an error occurred on any span in the WebPortal application
  2. Contain spans where the customer_user_email attribute contains a value ending with hotmail.com anywhere in the trace.
New Relic One distributed tracing - query example 2
Go to one.newrelic.com > Apps > Distributed tracing

Filter using the scatter plot

The trace scatter plot on the opening page of distributed tracing is a quick way to search for outlying traces. You can move the cursor across the chart to view trace details and you can click individual points to get details.

Screenshot showing the distributed tracing scatter plot chart.
Go to one.newrelic.com > Apps > Favorites > Distributed tracing or go to one.newrelic.com > APM > (select an app) > Monitor > Distributed tracing.

When viewing the trace scatter plot, you can select the duration type in View by, and you can select one of these options in Group traces by:

  • Errors: Group by whether traces contain errors or not.
  • Root service: Group by the name of the first service in traces. In a trace where Service A calls Service B and Service B calls Service C, the root service would be Service A.
  • Root entry span: Group by the root transaction, which is the root service's endpoint. In a trace where Service A calls Service B and Service B calls Service C, the root entry span is Service A's endpoint. For example: "Service A - GET /user/%".
  • Service entry span: Group by the span name of the service currently being viewed in APM. For example, for a trace where Service A calls Service B and Service B calls Service C, if you are viewing Service B in APM and select this grouping, the traces will be represented by their Service B span names. If a service has multiple spans in a trace, this grouping option will use that service's first entry point.

Filter traces using histograms

Trace histogram charts are available only on the global distributed tracing page.

The histogram charts on the global distributed tracing page give you a quick understanding of trace distribution for important values, like duration. You can use the sliders under the charts to control what traces are displayed. For example, you can drag the Trace duration chart slider to show only traces over 500 ms, as shown in the histogram example below.

New Relic One distributed tracing - histogram

Some queries that produce many results may result in false positives in histograms. This could manifest as histograms showing trace results that are not in the trace list.

Trace details UI page

When you select a trace from the trace list, you see that trace's timeline and spans:

New Relic distributed tracing UI - trace details page
one.newrelic.com > APM > (select an application) > Monitor > Distributed tracing > (select a trace) > (select a span): See the spans in a trace. Examine individual span details and see notifications for spans with anomalous behavior.

The UI indicates some span properties with icons:

Span property Indicator Description
Service New Relic distributed tracing service icon This icon represents a span that's a service's entry point.
In-process New Relic distributed tracing in-process span icon This icon represents an in-process span, which is a span that takes place within a process (as opposed to a cross-process span). Examples: middleware instrumentation, user-created spans.
Datastore New Relic distributed tracing datastore span icon This icon represents a span call to a datastore.
External New Relic distributed tracing external span icon This icon represents category representing a call to an external service made via HTTP.
Browser app New Relic distributed tracing browser span icon This icon represents a browser application span.
Lambda New Relic distributed tracing external span icon This icon represents a span from a Lambda function.

Some spans will have additional indicators:

Span property Indicator Description
Type of connection New Relic distributed tracing connecting lines image Solid lines indicate a direct parent-child relationship; in other words, one process or function directly calling another. A dotted line indicates a non-direct relationship. For more on relationships between spans, see Trace structure.
Errors New Relic distributed tracing error icon A span with an error. See How to understand span errors.
Anomalous New Relic distributed tracing datastore span icon This icon represents the detection of an anomalous span.
Orphaned spans New Relic distributed tracing orphaned span icon Some spans may be "orphaned," or separated, from the trace. These spans will appear at the bottom of the trace. For more details, see Fragmented traces.
Multiple app names New Relic distributed tracing multiple app names indicator When beside a span name, this represents an entity that has had multiple app names set. Select this to see all app names it reports to. To search trace data by alternate app names, use the appName attribute.
Client/server time difference New Relic distributed tracing client-server time difference indicator If a span's duration indicator is not completely colored in (like in this example), it means that there is a time discrepancy between the server-side duration and the client-side duration for that activity. For details on this, see Client/server time difference.

For more on the trace structure and how span properties are determined, see Trace structure.

Span details pane

When you select a span, a pane opens up with span details. These details can be helpful for troubleshooting performance issues. Details include:

What a span displays is based on its span type. For example, a datastore span's name attribute will contain the datastore query.

View related logs

If you are using our logs in context feature together with our log management, you can see any logs that are linked to your traces:

  1. Go to the trace details page by clicking on a trace.
  2. Click See logs in the upper-right corner.
  3. For details related to an individual log message, click directly on the message.

Additional UI details

Here are some additional distributed tracing UI details, rules, and limits:

How to understand span errors

Span-level errors show you where errors originated in a process, how they bubbled up, and where they were handled. Every span that ends with an exception is shown with an error in the UI and contributes to the total error count for that trace.

Here are some general tips about understanding span errors:

  • Spans with errors are highlighted red in the distributed tracing UI. You can see more information on the Error Details pane for each span.
  • All spans that exit with errors are counted in the span error count.
  • When multiple errors occur on the same span, only one is written to the span in this order of precedence:
    • A noticeError
    • The most recent span exception

This table describes how different span errors are handled:

Error type Description
Spans ending in exceptions An exception that leaves the boundary of a span results in an error on that span and on any ancestor spans that also exit with an error, until the exception is caught or exits the transaction. You can see if an exception is caught in an ancestor span.
Notice errors Errors noticed by calls to the agent noticeError API or by the automatic agent instrumentation are attached to the currently executing span.
Response code errors Response code errors are attached to the associated span, such as:
  • Client span: External transactions prefixed with http or db.
  • Entry span: In the case of a transaction ending in a response code error.

The response code for these spans is captured as an attribute httpResponseCode and attached to that span.

Anomalous spans

If a span is displayed as anomalous in the UI, it means that the following are both true:

  • The span is more than two standard deviations slower than the average of all spans with the same name from the same service over the last six hours.
  • The span's duration is more than 10% of the trace's duration.
Client span duration: time differences between client and server spans

When a process calls another process, and both processes are instrumented by New Relic, the trace contains both a client-side representation of the call and a server-side representation. The client span (calling process) can have time-related differences when compared to the server span (called process). These differences could be due to:

  • Clock skew, due to system clock time differences
  • Differences in duration, due to things like network latency or DNS resolution delay

The UI shows these time-related differences by displaying an outline of the client span in the same space as the server span. This span represents the duration of the client span.

It isn't possible to determine every factor contributing to these time-related discrepancies, but here are some common span patterns and tips for understanding them:

New Relic distributed tracing client vs server time discrepancy diagram
  1. When a client span is longer than the server span, this could be due to latency in a number of areas, such as: network time, queue time, DNS resolution time, or from a load balancer that we cannot see.
  2. When a client span starts and ends before a server span begins, this could be due to clock skew, or due to the server doing asynchronous work that continues after sending the response.
  3. When a client span starts after a server span, this is most likely clock skew.
Fragmented traces

Fragmented traces are traces with missing spans. When a span is missing or has invalid parent span IDs, its children spans become separated from the rest of the trace, which we refer to as "orphaned." Orphaned spans appear at the bottom of the trace, and they will lack connecting lines to the rest of the trace. Types of orphaned span properties indicated in the UI:

  • No root span. Missing the root span, which is the first operation in the request. When this happens, the span with the earliest timestamp is displayed as the root.
  • Orphaned span. A single span with a missing parent span. This could be due to the parent span having an ID that doesn't match its child span.
  • Orphaned trace fragment. A group of connected spans where the first span in the group is an orphan span.

This can happen for a number of reasons, including:

  • Collection limits. Some high-throughput applications may exceed collection limits (for example, APM agent collection limits, or API limits). When this happens, it may result in traces having missing spans. One way to remedy this is to turn off some reporting, so that the limit is not reached.
  • Incorrect instrumentation. If an application is instrumented incorrectly, it won't pass trace context correctly and this will result in fragmented traces. To remedy this, examine the data source that is generating orphan spans to ensure instrumentation is done correctly. To discover a span's data source, select it and examine its span details.
  • Spans still arriving. If some parent spans haven't been collected yet, this can result in temporary gaps until the entire trace has reported.
  • UI display limits. Orphaned spans may result if a trace exceeds the 10K span display limit.
Trace details obfuscated based on account access

If you don’t have access to the New Relic accounts that monitor other services, some of the span and service details will be obfuscated in the UI. Obfuscation can include:

  • Span name concealed by asterisks
  • Service name replaced with New Relic account ID and app ID

The two main factors affecting this obfuscation:

  • Account permissions. Master/sub-account relationships will impact access. If you have access to only a sub-account, you’ll be able to see details for only that sub-account. If you have access to a master account, you’ll be able to see details for that account’s sub-accounts.
  • Authentication. You’ll be able to see span details only for New Relic accounts you can access based on your current login. This means that, for example, even the admin of a master account may not be able to see all details if the trace crosses the boundaries of different authentication mechanisms.
Span limits and sampling

See Sampling.

Incomplete span names in waterfall view

When viewing the span waterfall, span names may be displayed in an incomplete form that is more human-readable than the complete span name. To find the complete name, select that span and look for the Full span name. Knowing the complete name can be valuable for querying that data with NRQL.

Missing spans and span/service count discrepancies

A trace may sometimes have (or seem to have) missing spans or services. This can manifest as a discrepancy between the count of a trace's spans or services displayed in the trace list and the count displayed on the trace details page.

Reasons for missing spans and count discrepancies include:

All spans collected, including those not displayed, can be queried with NRQL.

For more help

If you need more help, check out these support and learning resources: