This guide walks you through making sure you have the telemetry needed for monitoring and optimizing your digital services. It's part of our series on observability maturity.
"Do I have all the telemetry I need to measure the service I'm providing to my customers?"
With service observability, there is little guidance on how developers can meaningfully contribute to production telemetry definitions. If you are a developer looking for practical advice on how to assess the quality of your telemetry then this guide is for you. Observability practices that link developer expectations with the runtime behavior of production systems are more effective at diagnosing and remediating aberrant conditions than those that don't. The more-direct developer connection produces services that are more robust and performant.
You're a good candidate for using this guide if any of the following are true:
- Your development teams are disconnected from production observability design.
- You have new services/capabilities that run in production and before fully establishing telemetry and alerting.
- You need to provide additional business context to your instrumentation to improve diagnosis and business KPI measurement.
- You employ a highly customized or proprietary software framework.
- Your service is under active development. Legacy services, and services built from commercial-off-the-shelf platforms, tend to be better served with generic instrumentation options.
This guide focuses on the metrics derived from your application's runtime operation (its code execution) as well as external measurements of execution (through synthetic testing). Service instrumentation planning is the approach used to describe a single service runtime through telemetry.
Modern monitoring systems provide deep insight into the technical details of service implementation. The power of distributed trace, byte code, or script instrumentation allows operations teams to quickly collect detailed service telemetry. Unfortunately, operations teams are often not in the best position to evaluate the quality of the telemetry gathered from the instrumentation. This challenge is compounded by the fact that service delivery teams are asked to implement telemetry collection for the first time in live production systems.
Exposing inadequately instrumented services to production users for the purposes of refining that instrumentation creates a period that puts customer satisfaction at risk. This burn-in period often becomes difficult to escape as new features are delivered from code bases without a strong linkage between software delivery and observability programs.
Involving developers in instrumentation has the following benefits:
- Improved troubleshooting:
- Good telemetry naming gives operations staff a common language to use with developers during incidents, reducing the time to triage and remediate.
- More precise and contextually relevant telemetry from your service allows for more accurate and actionable detection of faults.
- Better informed development decisions by:
- Detecting areas of volatility or unexpected behavior and addressing them.
- Understanding what dependencies in your code lack redundancy or robustness, and taking measures to refactor the service.
- Appreciating how end-user cohorts are employing your software. You can better understand where improvements will have the biggest impact.
Key performance indicators
It's important to identify some simple KPIs that help to gauge the ongoing improvements in your software delivery and operations programs. Here are two main types of KPIs to consider as you invest in improved instrumentation.
- Business KPIs are aligned to your overall program objectives and should be consistently measured to demonstrate ongoing program improvement for each service.
- Practitioner KPIs are used to measure changes in the execution of job functions for those participating in the development and management of services.
We'll examine these in more detail below.
Business KPIs include:
Practictioner KPIs include: