• Log inStart now

Observability Center of Excellence: Create your observability team

This guide focuses on how to create a Center of Excellence around observability best practices. Adopting an Observability Center of Excellence (OCoE) will help you drive standards, speed, and scale across your organization, and maximize the return on the investment in observability solutions. This is part of our series on observability maturity.

What is a Center of Excellence?

A center of excellence (CoE) is a team of dedicated individuals from multiple functional areas of an organization. Their task is to lead the way in exploring and adopting tools, techniques, and practices for the organization.

CoEs exist around a host of disciplines including things like security, applications, cloud services, and automation.

Here's a 15-minute video explaining the Center of Excellence concept:

Why build an Observability CoE?

Creating an Observability Center of Excellence (OCoE) delivers benefits in three key areas: adoption, implementation, and expertise.

Adoption:

  • Leads to wider adoption of observability practices, tools, and processes
  • Reduces startup time for new teams
  • Breaks down skill silos

Implementation:

  • Maintains standards and provisions frameworks for implementation
  • Creates a supported framework for enablement, learning, and training
  • Keeps observability fresh, new, and part of the every-day

Expertise:

  • Encourages collaboration and sharing of success and best practice
  • Enables practitioners to be supported by their peers
  • Helps prevent expertise decay

Successful implementation of an OCoE helps organizations to reduce risk, drive operational efficiencies, and improve their service delivery. This in turn helps businesses to achieve their goals and objectives as their applications and services evolve to meet changing demands.

Core concepts

An Observability Center of Excellence primarily consists of three core elements: the framework, key stakeholders, and the operating model.

Framework

Key stakeholders

The OCoE key stakeholders are the primary consumers and contributors to the OCoE itself. They're responsible for:

  • Creating, maintaining, and supporting the OCoE
  • Selecting the standards and best practices to be included in the OCoE

There are three types of stakeholders (these may be known by other names in your organization): the core team, council, and guild.

Stakeholder 1: Core team

This is the central team responsible for maintaining the OCoE.

Their responsibilities include:

  • Maintaining a relationship with New Relic
  • Administration and coordination of accounts and users
  • Onboarding of new teams and individuals
  • Maintenance of resources and knowledge base
  • Promotion of collaboration and sharing amongst teams

Typically this would be a team of 1-3 people with a background in observability and strong familiarity with New Relic. On average, they might work 2-3 hours per week on the OCoE alongside their regular role.

Stakeholder 2: Council

The council stakeholder is group of leaders responsible for selecting or producing, ratifying, and promoting observability standards across the organization.

Their responsibilities include:

  • Thought leadership and direction
  • Defining and agreeing on standards and tooling
  • Ensuring implementation quality

The size of this team will vary based upon your organization's size and structure. It will consist of individuals from across the organization that have a vested interest in ensuring observability standards are implemented and maintained across the business. Ideal candidates for this function include those with leadership roles in operations, development, and support. This is an advisory role that does not require a significant commitment of time.

Stakeholder 3: Guild

The guild stakeholder is a cross-functional group of individuals across the organization who have experience and passion for observability, tooling, and are willing to help others. They are the heroes for your OCoE and the main contributors to supporting others and generating content for it.

Their responsibilities include:

  • Answering questions from others
  • Contributing to knowledge base
  • Demonstrating best practice
  • Fostering an environment of self help and collaboration.
  • Attending New Relic technical enablement workshops and events

Guild members are not appointed but should volunteer based upon their passion and expertise for observability. The number of guild members in any organization will vary and generally increase as the adoption of observability best practices increases across your teams. Being part of the guild becomes just part of doing their day job and something which helps both them and others to benefit from the OCoE.

Operating model

Finally, after learning about the entities that make up your OCoE and the key stakeholders who will interact with it, the last core element of your OCoE is the operating model, which brings all the elements together. Below is a visualization of the model which will be covered in more detail later in this guide.

The creation and operation of an OCoE breaks down into two key stages:

  • Stage 1: Creating the OCoE framework that will support the adoption of the selected observability best practices.
  • Stage 2: The ongoing maintenance and support of the OCoE to ensure that the best practices and standards, along with the content and enablement materials are kept up to date and relevant.

Stage 1: Ramp up

The ramp up stage for an OCoE involves introducing the concepts behind the initiative, running a workshop to understand the objectives and primary stakeholders, selecting supporting technologies and the best practices to adopt, and then seeding your OCoE with the assets to support them.

Steps

Collaboration channels

Having determined the collaboration tool of choice for you OCoE, we would recommend setting up a regular clinic and collaboration channels.

New Relic clinic

A New Relic clinic is a regular open session for the OCoE core team to answer questions, share information, and address issues with the practitioners.

Typical agenda for this session would be:

  • Help and support
  • Platform updates
  • Best practice sharing
  • Demos and examples

Typical attendees would be:

  • Core team (coordinators)
  • Guild heroes
  • Practitioners

Messaging channels

  • #help-newrelic: A channel for all users of New Relic to ask questions, share knowledge, and get help.

    Typical members of this channel would be:

    • All observability practitioners
    • Guild heroes (providing active support)
    • Core team (providing active support)
  • #core-newrelic: A channel for core team and New Relic CSS to freely communicate and escalate issues and problems.

    Typical members of this channel would be:

    • OCoE core team
    • Guild heroes (possibly)
    • New Relic CSM / SA / TAM

Knowledge base

The knowledge base provides a single destination for practitioners to find standards, configuration, examples, code snippets, quick start guides, and other related material that is focused on your organization's configuration of New Relic and implementation of observability best practices. It provides links out to New Relic documentation and learning tools to support self passed enablement and onboarding.

To seed your knowledge base, New Relic provides a starter kit for building your own wiki-like knowledge base intended to jump start your OCoE. The wiki skeleton consists of a number of markdown pages that cover many areas of the knowledge base from account information to quick start reference guides.

You can find the knowledge base at Github where you can clone and customize the content to suit your needs.

Stage 2: Ongoing support

Once the framework of your OCoE is up and running it needs to be maintained to keep the content current and relevant to your objectives. To achieve this New Relic provides ongoing support in four key areas to help get the most value from your OCoE.

Conclusion

By implementing some or all of the guidance outlined in this document you will have put in place the key components to create collaborative culture around your observability practice. An environment that will foster passion and develop strong expertise around observability, while also delivering business benefits aligned to improving:

  • Operational efficiency
  • Uptime
  • Performance and reliability
  • Customer experience
  • Innovation and growth.

The OCoE helps your organization develop and retain the skills and best practices that are essential to developing and operating today's complex modern application architectures.

Copyright © 2023 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.