In a DevOps world, a deep understanding of customer expectations and your progress in meeting those expectations is vital to providing stability, maintaining goodwill with your users, and increasing business value. With New Relic you can measure the success of your DevOps initiatives and learn how to identify, resolve, and decrease any gaps in your DevOps performance.
In this tutorial, you’ll gather key service delivery stakeholders to assess your team's progress against service level objectives (SLOs) and service level agreements (SLAs), while facilitating further optimizations.
This tutorial assumes you’ve completed the Establish objectives and baseline tutorial.
1. Assemble a cross-functional team to review service delivery
The first (and most important step) is assembling the right team. Identify the proper stakeholders and representatives to play active parts in the operations review process. This team should be comprised of individuals who develop applications, work with service delivery, maintain your ecosystem, and resolve problems for customers.
While operations review teams often focus on technical members, the best teams have broad representation across the company, including representatives from Business Operations, Marketing, and Support.
These cross-functional teams help ensure that the service delivery process is strongly integrated with customer expectations. If you can specifically identify how technical improvements meet customer expectations and positively impact the business’s bottom line, then your operations team is functioning at optimal levels.
Ideally, the cross-functional operations review team should also be the team that defines your SLOs. If this is not possible, try to ensure that some members of the operations review team are also on the team responsible for SLOs.
2. Review service records and note key metrics
Create a regular meeting to track your service records. New Relic recommends reviewing your previous service records, and pinpointing specific metrics to analyze at every review.
For example, start with application state, alert conditions, and runtime anomalies. Look at the same metrics on two separate occasions to identify and assess patterns, inconsistencies, and anomalies.
Monitor these metrics using the service delivery Insights dashboards you created in the Establish objectives and baselines tutorial. The widgets on these dashboards provide a high-level overview of the relationships between different performance indicators and baselines:
When conducting a thorough performance review, make several dashboards that have corresponding widgets. Use these dashboards to hone-in on two specific time periods that you want to compare. This comparative analysis could cover everything from Infrastructure or Browser performance, to Synthetics testing or business impact.
To get started with performance indicators in Insights, read the following example queries. These queries provide information that you can consider incorporating for the operations review.
- Daily uptime
SELECT percentage(count(result), where result = 'SUCCESS') from SyntheticCheck since 1 Day ago
Create dashboards that are dedicated to the functionality that drives your software development process, such as testing. For example, collect related metrics on uptime, types of monitors, geo-locations, and other data points that are required for appropriately measuring SLAs. The following dashboard provides such an example:
- Device performance breakdown
SELECT count(*) as '# Pages',average(duration) as 'AVG',percentile(duration,50,75) as '%',average(duration - backendDuration) as 'Front',average(backendDuration) as 'Back',average(connectionSetupDuration) as 'Connection',average(domProcessingDuration) as 'DOM Processing',average(pageRenderingDuration) as 'Page Rendering' FROM PageView FACET deviceType LIMIT 3 SINCE 1 day ago
- Top URL performance
SELECT count(*) as '# Pages',average(duration) as 'AVG',percentile(duration,50,75) as '%',average(duration - backendDuration) as 'Front',average(backendDuration) as 'Back',average(connectionSetupDuration) as 'Connection',average(domProcessingDuration) as 'DOM Processing',average(pageRenderingDuration) as 'Page Rendering' FROM PageView facet pageUrl SINCE 1 day ago limit 30
- Memory usage
SELECT average(memoryUsedBytes) /1000000 AS 'Avg MB Used', average(memoryFreeBytes)/1000000 AS 'Avg MB Free', average(memoryFreeBytes/memoryTotalBytes)*100 as 'Memory used %' FROM SystemSample since 30 minutes ago
- Server CPU
SELECT average(cpuPercent) FROM SystemSample since 3 hours ago facet hostname limit 400
Combine the results into a single dashboard to help drive decisions during your operations review. Dashboards that give a brief overview of performance across your entire application stack are invaluable for cross-functional team reviews.
Beyond Insights, the APM reports allow you to see how you’re performing on a daily, weekly, and monthly basis with built in SLA reports and other detailed reports. The out-of-the-box details from these reports give your operations team a launch pad for conversations about progress against objectives and overall performance.
3. Summarize events related to application alerts, downtime, and errors
Now that you’ve gathered data about your service delivery, the next step is to dive deeper into specific incidents that warrant further investigation or surface a need for broader team action. For example, notable downtime and errors are key areas to investigate as a team.
After completing the Setup proactive alerting tutorial, you can explore notifications of violations as they occur. Once these violations are brought to your attention, acknowledge the specific incident and have an ongoing record of both opened and closed incidents. Use the incident reports as a focal point for discussion of gaps.
Ultimately, it’s important to solicit feedback from the cross-functional team on the causes of the incidents to determine how you can improve service delivery processes to prevent reoccurrences. One approach might be to focus on one or two notable incidents, use the New Relic UI to walk through the data points leading up to those incidents, and assess the actions that occurred. Summarize the incident and the cross-functional feedback on causes and potential solutions as succinctly as possible. Over time, you’ll notice patterns that will require deeper action.
4. Create follow-up tracking tickets
With built-in or customizable integrations with ServiceNow and other common ticketing systems, New Relic helps you follow up on anomalies and performance shortfalls as they arise. Add the information provided by New Relic to the tracking system, and ensure that the team charged with solving the problem has all of the details they need to track down and resolve the issue.