The UI for external services is a great place to analyze a single service along with its upstream and downstream services. The UI starts you off with a map of your selected service, along with performance charts showing the top five results for response time, throughput, and error rate. If you prefer, you can also see the same map details represented in a table format.
The external services feature is available from the left navigation pane once you select an APM-monitored services. You can open it by going to one.newrelic.com > All capabilities > APM & Services. Select an app, then under the Monitor section, click External services.
The opening map displays your selected service as a vertex (hexagonal shape) with rectangles around the upstream or downstream services. The initial view is the downstream service, so the Downstream entities tab is selected by default. When you're on that tab, the performance charts (Response time, Throughput, and Error rate) apply to the downstream services. You can click Upstream entities to switch to those performance charts.
On the initial page for external services, each of the rectangles contains vertices that represent the upstream or downstream services. The vertices are connected across services by edges (lines). When you drill into a particular service, the vertices on those drill-down pages become the service endpoints so you can see transaction detail.
The thickness of the lines represents the throughput for the service, and the darkness of the line is time consumed (throughput times duration).
In the map legend, you have the option to select two types of services:
- Services: These are services you own and have instrumented.
- Uninstrumented externals: These are uninstrumented services that you may or may not own.
The opening page of external services displays three performance charts. For APM agents, these initial performance charts are populated by metric data, while for OpenTelemetry, the initial values are populated by sampled data.
As you drill below the initial page–whether it's OpenTelemetry or APM agents–each child page is populated by sampled data. This means that if you're not seeing the data you expect on pages that show sampled data, you may need to increase your sampling.
The performance charts always reflect the data from the page you're viewing, but the set of performance charts changes as you drill below the opening page. Here's what you need to understand these charts:
The average duration of calls between services in the initial view or between transactions in the drill-down views. The initial view for APM shows the response time as metric data, which is based on all calls. The initial view for OpenTelemetry shows the response time as trace data, which is based on sampled calls only.
Response time in all the drill-down pages show response time as trace data, which is based on sampled calls only. How well this represents actual system performance depends on the effective sample rate.
The total number of calls between two services.
The number of errors per minute for calls between two services.
Traced call count
Represents the number of sampled calls we have for a given path between two services or transactions. This is lower than the total throughput, unless you are sampling 100% of your requests.
Traced error count
The number of sampled calls between two services or transactions that had errors.
As a companion to the map view, the table view lists all the related services in a columnar format. When you click List in the upper right of the page, you see the same services from the map view:
Similar to the map view, you can click on specific entities (services) to see transactions in the drill-down tables. As you drill down and find an interesting endpoint, you can click on Traces to switch over to distributed tracing details.
To the right of each performance column is a corresponding % change (percentage change) column. The percentage change calculation is based on the time frame you choose in the main time picker and the comparison time picker (Compare to). The comparison time picker indicates how much before the main time window the comparison should start.
Here's an example with Response time: If the current time is 11 a.m. and the main time picker is last 30 minutes and the Compare to time picker is 1 hour ago:
- The duration is the average from 10:30-11:00 a.m.
- The % change compares that to the average from 9:30-10:00 a.m.
Here is a typical map workflow:
- Look for the thickest and darkest line on the map and follow it to its upstream or downstream service.
- Click on the upstream or downstream vertex.
- View a breakdown of transactions between the two services.
In this example, one of the thicker edges (lines) goes from the Order-Composer service to the warehouse endpoint in the Order Status service.
- If you decide that a particular transaction is taking the most time, click on that transaction to focus specifically on its dependencies.
In this drill-down view, you can see the transaction between the Order-Composer service and the warehouse endpoint in the Order-Status service.
- From any point in this flow, consult the supporting performance charts, which show changes over time.
- If you reach a point in the drill-down where you want to see distributed tracing, click List in the upper right, and then click Traces in the table.
The classic external services view is still available if you are monitoring existing services that use cross application tracing. Since the default view is the expanded external services, you need to click the toggle Show new view to switch to the classic view.