New Relic's distributed tracing helps you monitor and analyze the behavior of your distributed system. This document contains:
Access to this feature requires APM Pro and enabling of distributed tracing.
This document is focused on how the UI features work. For a technical explanation, see How distributed tracing works.
Distributed tracing UI page features
Here are some features of the main Distributed tracing UI page (screenshot above):
- Trace scatter plot: Shows a plot of traces that lets you easily see outliers. Select a plot point to see details for that trace.
- Trace list: Shows a list of traces, along with information like the root span duration, the number of spans in a trace, the number of errors, and the number of services. Select a trace to see trace details.
- Filter: Filter by attributes to display only the traces containing spans with those values. To see suggested filters, click in the filter search field. To use additional operators (!=, >, <, IN), select the filter dropdown. For more on filtering, see filter rules.
- Group by: Group the displayed trace scatter plot in different ways. For more on the grouping options, see Trace grouping.
Trace details UI features
When a trace is selected, you see details for that trace (see screenshot above):
- Trace timeline: Gives a visual representation of the trace over time. To zoom in on a specific time range of the trace, click and drag over an area. Red spans indicate errors.
- Trace waterfall: Displays the spans in a trace. Red spans indicate errors. Also displays anomalous spans and time differences between client and server spans. In-process spans are initially collapsed but can be expanded.
- Service colors. Select the Services dropdown to see the services displayed in the trace and their colors.
- Anomalous spans detected: Displays a count of anomalous spans. Select the dropdown for links to those spans.
- Span details: Select a span to see details, such as charts, attributes, anomalous span data, and the full span name. Different span types will have different charts and details available.
For technical details about some parts of the UI, and commonly asked questions, see Additional UI details.
For more on how spans are created, see Distributed tracing explained.
There are four types of spans:
|Span type||UI icon||Description|
|Datastore||Represents a call to a datastore service. Datastore span events have a
|External||Represents a call to an external service made via HTTP. External span events have a
|Service||Represents the entry point into a service monitored by New Relic APM.|
|In-process||Represents a span that takes place within a process (as opposed to cross-process spans, which cross process boundaries). Examples: middleware instrumentation, user-created spans. Service span events have a
Some spans in the UI may be marked as anomalous.
Example use cases
Here are some examples of using our distributed tracing feature to solve problems.
- Diagnosing and fixing a latency issue
- An engineer receives an alert notification showing that their service is experiencing frequent high latency.
- The engineer goes to the APM Overview page and sees a large increase in response time from dependent services.
- They go to the Distributed tracing UI page and apply a filter for
duration > 2, so they can look at only the traces with spans over that duration.
- They examine one trace with unusually long latency. They see that there is a call being made to a dependent service that is querying a database that is responding very slowly.
- They check with the owner of that service and learn that their team is currently working on scaling the database due to unexpected high throughput.
- Analyzing errors in requests spanning multiple services
- An engineer is tasked with troubleshooting errors occuring in a complex transaction spanning many services.
- They go to the Distributed tracing UI page and filter down to that specific request.
- They filter down to traces containing errors.
- Looking at the trace details, they can see the span along the request route that originated the error.
- Noting the error class and message, they navigate to the service from its span in the trace and see that that error is occurring at a high rate. They then ask the service owner to look into that error.
Trace grouping options
When viewing the trace scatter plot, there are several Group traces by options:
- Errors: Group by whether traces contain errors or not.
- Root service: Group by the name of the first service in traces. In a trace where Service A calls Service B and Service B calls Service C, the root service would be Service A.
- Root entry span: Group by the root transaction: in other words, the root service's endpoint. In a trace where Service A calls Service B and Service B calls Service C, the root entry span is Service A's endpoint. For example: "Service A - GET /user/%".
- Service entry span: Group by the span name of the service currently being viewed in APM. For example, for a trace where Service A calls Service B and Service B calls Service C: if you are viewing Service B in APM and select this grouping, the traces will be represented by their Service B span names. If a service has multiple spans in a trace, this grouping option will use that service's first entry point.
Query data in Insights
For how to query distributed tracing data in New Relic Insights, see Example Insights queries.
Additional UI details
Here are some additional rules and limits for this feature:
- Anomalous spans
If a span is displayed as anomalous in the UI, it means that the following are both true:
- The span is more than two standard deviations slower than the average of all spans with the same name from the same service over the last six hours.
- The span's duration is more than 10% of the trace's duration.
- Time differences between client and server spans
When a process calls another process, and both processes are instrumented by New Relic, the trace contains both a client-side representation of the call and a server-side representation. The client span (calling process) can have time-related differences when compared to the server span (called process). These differences could be due to:
- Clock skew, due to system clock time differences
- Differences in duration, due to things like network latency or DNS resolution delay
The UI shows these time-related differences by displaying an outline of the client span in the same space as the server span. This span represents the duration of the client span.
It isn't possible to determine every factor contributing to these time-related discrepancies, but here are some common span patterns and tips for understanding them:
- When a client span is longer than the server span, this could be due to latency in a number of areas, such as: network time, queue time, DNS resolution time, or from a load balancer that we cannot see.
- When a client span starts and ends before a server span begins, this could be due to clock skew, or due to the server doing asynchronous work that continues after sending the response.
- When a client span starts after a server span, this is most likely clock skew.
- Filter rules
Some rules governing filters:
- Attributes available for filtering are only from current application. The attributes available for filtering are only those available in the application you are currently viewing in New Relic. If you do not see attributes you expect to see, it is probably because they are not available for the application you are viewing. If that is the case, you will need to go to the application index and select the application where those attributes are captured.
- Multi-attribute filter affected by first attribute selected. There are two types of event data distributed tracing reports: transaction events and span events. When you select an attribute in the filter, the event that attribute is attached to will dictate the available attributes. For example, if you filter on an attribute that is attached to a
Transactionevent attributes will be available when you attempt to add filter on additional attribute values.
- Lines between spans: dotted vs solid
In the waterfall view, lines between spans are either solid or dotted. Solid lines indicate a direct parent-child relationship; in other words, one process or function directly calling another. A dotted line indicates a non-direct relationship.
- Span details obfuscated if you don't have account access
If you don’t have access to the New Relic accounts that monitor other services in a distributed trace, some of the span and service details will be obfuscated in the UI. Obfuscation includes:
- The span’s name is concealed by asterisks.
- The service name is replaced with the New Relic account ID and app ID.
The two main factors affecting this obfuscation:
- Authentication. You’ll be able to see span details only for New Relic accounts you can access based on your current login. This means that, for example, even the admin of a master account may not be able to see all span details if the trace crosses the boundaries of different authentication mechanisms.
- Account permissions. Master/sub-account relationships will impact access. If you have access to only a sub-account, you’ll be able to see details for only that sub-account. If you have access to a master account, you’ll be able to see details for that account’s sub-accounts.
- Span collection and display limits, and sampling details
New Relic APM agents have a limit of 1,000 on the number of spans that can be collected per agent instance. The agents use sampling to select the requests chosen for a trace. For more on sampling and agent span limits, see Sampling.
The maximum total number of spans displayed in the UI is 10,000.
- Missing spans and span/service count discrepancies
A trace may sometimes have (or seem to have) missing spans or services. This can manifest as a discrepancy between the count of a trace's spans or services displayed in the trace list and the count displayed on the trace details page.
Reasons for missing spans and count discrepancies include:
- An APM agent may have hit its 1K span collection limit.
- A span may be initially counted but not make it into a trace display, for reasons such as network latency or a query issue.
- If an APM-monitored application reports as multiple app names, the trace waterfall view will display a single span for that service, but will actually report multiple span events, one for each application name.
- The UI may have hit its 10K span display limit.
All spans collected, including those not displayed, can be queried in Insights.
- Incomplete span names in waterfall view
When viewing the span waterfall, span names may be displayed in an incomplete form that is more human-readable than the complete span name. To find the complete name, select that span and look for the Full span name. Knowing the complete name can be valuable for querying that data in Insights.