This guide walks you through establishing KPIs and using New Relic to improve and optimize the quality and cadence of your code deployments. It's part of our series on observability maturity.
A development team is ultimately measured by the frequency and success of their releases. Teams that release too slowly won't be able to keep pace with business demands and innovation. Teams that create too many unsuccessful releases will have a negative impact on customer satisfaction, revenue, and overall system stability.
Google's DevOps Research and Assessment (DORA) team has identified four key metrics that indicate the performance of a software development organization. Our Innovation and Growth value driver uses those metrics to create an overall program that drives more efficient and responsive development teams, along with more reliable and performant applications. This Release quality (RelQual) guide contributes to that program by driving deployment frequency, application performance, and application reliablity.
You will use this Release Quality guide's processes to increase the frequency of deployments while reducing their size and scope. This will result in a faster time for an idea to be implemented in production code, while also reducing the impact (or blast radius) of software defects.
In addition, the RelQual process will be used to identify performance bottlenecks and sources of errors and feed them back into the development process so they can be resolved. This will increase reliability and customer satisfaction, while driving down infrastructure costs.
RelQual's overall goal is to create a continuous cycle of performance improvement and to drive a pattern of more releases with fewer changes. By doing this, you'll reduce the risks to business reputation and earnings and increase application performance and reliability.
Kep concepts include:
One of the central themes of New Relic's observability maturity practice is "Communicate, remediate, innovate." RelQual supports that theme by enabling you to communicate the current state of your development practices and applications to stakeholders using specific KPIs. You will then use those KPIs to adjust your development practices and to identfy slow and unreliable application components so they can be fixed in subsequent development sprints. Finally, you will use those KPIs to acclerate your development practices, thus adding additional time to innovate.
Trunk-based development is a practice identified by the DevOps Research and Assessment (DORA) organization as a key capability that drives faster delivery and higher organizational performance. It's a required practice for CI/CD.
In short, trunk-based development divides development work into small batches which are performed against branches of a single trunk. As soon as the batch of work is completed, the branch is merged back into the trunk. Each branch is expected to have a short lifetime, thus making merges back into the trunk simple and ensuring every developer is working from a recent release of the code base.
RelQual drives the adoption of this DevOps best practice and ensures that you adhere to it. By doing so, you will accelerate the delivery of code into production.
The RelQual use case works at the level of the IT service boundary. By measuring the service at the boundary, we get a picture of what's happening upstream from it.
As you may have seen, the service level management guide in our Observability maturity series uses the service boundary concept to measure the response time and error rate of the service and its dependencies. RelQual uses the same concept to measure the impact that your development practices have on the service and then to improve your development team's responsiveness, ability to innovate, and application stability as the IT service evolves.
You'll use the development quality process to collect and measure the following KPIs:
- Release rate
- Release size/scope
- App responsiveness and error rate
- Production impact
- Support ticket volume
- Infrastructure costs
- Observability coverage
Detailed information on each metric follows.
Before you begin, if you don't have equivalent experience, you should complete the New Relic University (NRU) Overview Course. You should also have a basic understanding of:
As with any continuous improvement process, the first step of RelQual is to establish the current state of your KPIs. To do so, you'll perform the following tasks:
- Identify the application
- Gather the required KPIs
- Implement the RelQual dashboards
- Establish a baseline
- Perform initial enablement
- Start continuous improvement process
These are explained in more detail below.
The first step is for you to identify the application or applications that are in-scope for the first iterations of the RelQual process. Applications that are good candidates for inclusion in the RelQual process are ones that:
- Are under active development
- Are a key operational service
- Have slow development cycles
- Have a track record of failed deployments
The success of the release quality use case is highly dependent on its key performance indicators. We need to start by ensuring we gather the KPIs as defined from definitive sources such as your CI/CD platform, source repository, observability solution, etc. Once you identify the sources for your KPIs, you will need to identify methods for extracting them and importing them into the New Relic platform.
The KPIs and minimum required attributes required by this use case are listed in the key performance indicators section. Typically, you'll use your development toolchain's APIs to extract the KPIs and their attributes and then submit them to New Relic using the custom events API.
Prior to starting any custom integration work, you should determine if any out-of-box integrations exist that meet your goals.
The RelQual dashboards are the primary drivers of the quality improvement process. They'll show KPIs and trends so you can identify and prioritize your improvement efforts.
Sample RelQual dashboards can be found in our observability maturity resource center on GitHub.
Some of the information you present in the dashboards is dependent on your development toolchain, so your dashboard may need to be modified from the sample.
The RelQual process requires a baseline so you can perform the initial enablement. Your baseline should include a representative sample of your release activity. This should include at least one complete release cycle, but may require more cycles to get a valid baseline.
Your baseline collection and evaluation cycle should be aligned with your Agile sprints, and you should periodically ensure that RelQual event data is accumulating as expected.
In this stage, you'll introduce development teams and other stakeholders to the baseline RelQual data and the ongoing continuous improvement process you will be following.
The process consists of four activities:
- Introduce the concepts of trunk-based development. You and the stakeholders will review the core concepts of truck-based development, identify where your current practices differ, and then create strategies to implement it.
- Review your release KPIs and trends. You'll review the release rate and release size/scope KPIs to ensure you're making progress towards implementing trunk-based development. Your goal is to increase your release rate while reducing the size/scope of new releases.
- Review your application KPIs and trends. Here, you'll review your application's performance and error KPIs to identify and prioritize your efforts towards improving application reliability and performance.
- Make technical recommendations. Here, you and the relevant stakeholders will identify and review technical recommendations, such as making changes to your release workflows or observability strategies.
You can use the session template presentation to keep this part of the RelQual process organized.
This is the ongoing phase of the continuous improvement process. During this phase you'll repeat the initial enablement steps to review your progress against your baseline and adjust your strategies and tactics so you deliver the required positive business outcome.
Each cycle of the improvment process should occur after several iterations of your release process. As an alternative, you can schedule evaluations at the mid-point and end of each Agile sprint.
During this phase you should:
- Report your KPIs to upper management to ensure that the stakeholder teams are appropriately prioritizing the work, and to show that progress towards the promised business outcomes are being reached.
- Record and retain your weekly KPIs over periods of months to years to establish a baseline and to show the rate of improvement.
You should keep in mind that this is a continuous improvment process. You'll continue to collect and evaluate the KPIs over the long-term to ensure you're meeting your RelQual goals.
Once your implementation of the RelQual process becomes mature, you'll see a reduction in the business and technical risks associated with application deployments and improved application performance. This will result in improvements to customer satisfaction, reduced overhead costs, and an improvement in earnings potential.
If you haven't already done it, we also recommend reviewing and using the Development quality guide.