App remediation: Gather performance statistics

Application remediation involves gathering metrics and performance statistics about your application. This helps you identify bottlenecks and errors that can lead to instability in your software (and frustration for your users).

New Relic provides views of your application’s performance to help you quickly identify and remediate errors. These quick wins ensure you and your colleagues can accelerate the momentum of DevOps in your organization.

1. Use semantic naming throughout New Relic

Use semantic naming to give meaningful, structured names and labels to the applications you’re monitoring. Create a scalable syntax that can be understood, filtered, and sorted by any team who might use New Relic now or in the future. When you create your naming structure, be sure to consider growth and the potential for scaling/changes in your application’s architecture.

Consider the following naming structure:

  • store--us-web-prod
  • store--us-web-stage
  • store--us-web-dev

This structure allows for additions to scale (for example, store-eu-web-prod-02), and makes the application’s environment searchable. Any convention you use should be unique to your organization’s structure, but be sure to use a convention that will allow you to optimize your environment as need.

2. Deploy the New Relic agent

Review the compatibility and requirements for New Relic agents and products, and then follow the documentation to install the appropriate agent in your application. Once the agent is installed, you should see data begin to populate in New Relic within minutes.

If after several minutes, New Relic is not showing performance information, see No data appears.

3. Identify application performance outliers and errors

The default charts in New Relic tell a story about your application’s performance. If there are performance issues, patterns in the charts highlight the causes. The following figure, from the Overview page in New Relic APM, shows a large spike occurred at noon on March 27th, and was nearly double the response time of our application’s overall trend:

APM_transactions.png > (select an app) > Monitoring > Overview

Additionally, the response time breakdown shows a spike in Web external (which indicates downstream dependencies). Since the spike seems to correlate with periods of higher throughput, we can then use the Throughput chart to analyze the problem in more detail.

4. Drill into specific time frames

Click and drag on a graph, to drill into a focused time slice of performance data to better isolate any issue you’re investigating. In this example, we’ve narrowed the view on the spike we saw in web transactions times:

2018-05-31_15-17-28.png > (select an app) > Monitoring > Overview

Here, it’s obvious that some part of Web external, which is an application or service called by our Web portal app, is likely the source of the issue.

5. Drill into transaction traces to investigate outliers

New Relic displays transactions ordered with the most time consuming transaction at the top. Click a transaction to navigate to the Transactions page and expand that transaction into its component parts and see any transaction traces that have been captured as a result of any performance issues. New Relic automatically captures these traces whenever a transaction violates your Apdex score by more than four times for a specific number of seconds.

If you’re not capturing any transaction traces, you may not be violating these thresholds. Be sure to configure transaction traces for your applications as needed.

In the following example, we see that interceptor.ServletConfigInter... has significant transaction times:

2018-05-31_15-17-50.png > (select an app) > Monitoring > Overview

In fact, it’s responsible for 99.9% of the app server time, which means we’re getting closer to identifying the culprit of our spike.

Here, we see the same spike from before, but the performance of the transaction is broken down into its components:

app_Performance_DevOps_Catalyst.png > (select an app) > Monitoring > Overview

While the share of the response times for most components of this transaction remained stable, GetPlansController (in brown) spiked massively.

From the Transaction traces table, we can drill further into the transaction trace to get method-level detail of where the issue is occurring.

6. Identify performance outliers in the database

We identified that GetPlansController is consuming the vast majority of our response time. Here we see it in the transaction trace summary:

Trace_Details_DevOps_Catalyst.png > (select an app) > Monitoring > Overview

Trace details shows an execution timeline of this transaction, and we see that Plan Service is the external transaction causing the issues—the red color-coding indicates the problem.

2018-05-31_15-18-15.png > (select an app) > Monitoring > Overview

From this point, we can navigate to the Transactions summary page for Plan Service:

2018-05-31_15-18-36.png > (select an app) > Monitoring > Transactions

The breakdown of the GetPlans transaction shows that database calls, particularly MySQL PlansTable select, appear to be a significant portion of the overall response time. The Breakdown table further identifies the problem: the number of database calls per transaction is very high. Note again that it’s highlighted in red.

2018-05-31_15-18-47.png > (select an app) > Monitoring > Transactions

Once again, we can look at a transaction trace to find what might be causing these queries.

2018-05-31_15-18-59.png > (select an app) > Monitoring > Transactions

Finally we find that an extremely large number of select methods account for the majority of transaction time. We can now take steps to address this potential N+1 query problem.

After we remediate this issue, improved response times will lead to fewer frustrated requests and more satisfied customers—which will be reflected in this application's Apdex score.

7. Explore and Resolve Clusters of Errors

When you need to track down what causes errors in your app, it may not be easy to identify the cause. Using applied intelligence developed by New Relic, APM Error Profiles automatically compare one set of events to another. An error profile is a collection of attributes with significantly different traits compared to non-errors. New Relic displays pie charts that sorts error attributes by the greatest deviation from the “norm."

For backend errors, go to APM, select Error Analytics in the left nav, and then click on the Error Profile tab. In this example, the culprit is obviously a web transaction validating coupons, and now you can view the error stack trace, message, and the line of code from which the error was thrown.

New Relic APM Error Profiles > (select an app) > Error Analytics > Error Profile

As a quality customer experience increasingly relies on complex client-side logic, it’s important to quickly analyze and understand JavaScript errors. In New Relic Browser, select JS errors in left navigation menu. Expand the details about JavaScript errors by clicking on a attribute. In this case, we've expanded the transaction names that are related to the errors:

New Relic JS Error Profiles > (select an app) > JS errors

Roughly half of the errors come from phone.jsp, so that is the place to start investigating. Then determine if you can safely ignore the error, or if you should resolve the error with code edits and a new deployment, or provide communication about the issue to your customers.

Now that you have dealt with performance outliers and clusters of errors, you are well on your way to optimizing your application so you can baseline it for trend analysis.

For more help

  • Error Rate: Watch for spikes or values changing under load

  • Throughput: Watch for high spikes or low dips

Recommendations for learning more: