• Log in

Baseline your data ingest

Baseline

Data ingest governance is a practice of getting optimal value for the telemetry data collected by an organization. This is especially important for a complex organization that has numerous business units and working groups. This is the second part of a four-part guide to optimizing your New Relic data ingest.

For this stage of your data ingest governance practice, it's necessary to get a high level view of all of the telemetry currently being generated by your organization. The unit focuses on breaking down ingest stats into various groups such as account, telemetry type, and application. These figures will be used to inform the Optimize your ingest data and Forecast your ingest data stages.

You'll learn how to generate a structured breakdown report for the following dimensions:

  • Organization
  • Specific accounts in your organization
  • Billable telemetry type

In addition you'll learn how to create highly granular breakdowns including:

  • Application (APM | browser | mobile)
  • Kubernetes cluster
  • Infrastructure integration

Desired outcome

Understand exactly which groups within your organization are contributing which types of data and how much.

Prerequisites

Process

Here are the major steps you'll do as part of this data ingest governance improvement procedure:

We'll describe these steps in more detail below.

Install the data ingest governance baseline dashboard

To install the dashboard:

  1. Navigate to the data ingest governance quickstart.
  2. Click Install this quickstart in the upper right portion of your browser window.
  3. If applicable: select your primary or top-level account in the account dropdown.
  4. Click Done.
  5. When the quickstart is done installing, open the Data ingest governance baseline dashboard.

That will bring you to the newly installed dashboard.

Dashboard overview

The main overview tab shows a variety of charts including some powerful time series views.

Organization wide baseline ingest time series

The second tab provides a baseline report by sub-account and usage metric.

Organization wide baseline tabular view

The remaining tabs provide detailed views of specific telemetry types such as browser data, APM data, logs, and traces. For example, this screenshot shows the browser detail page:

Example of an ingest detail focused on a single telemetry type (in this case browser data).

Detail tabs include:

  • APM: ApmEventsBytes
  • Tracing: TracingBytes
  • Browser: BrowserEventsBytes
  • Mobile: MobileEventsBytes
  • Infra (host): InfraHostBytes
  • Infra (process):InfraProcessBytes
  • Infra (integration): InfraIntegrationBytes
  • Custom events: CustomEventsBytes
  • Serverless: ServerlessBytes
  • Pixie: PixieBytes

Add ingest target indicators to your dashboard

In the prerequisites section we discussed the concept of a monthly usage target. You may actually have several targets to help keep you on track:

  • An overall organizational target on daily rate or monthly ingest.
  • Targets per data type to ensure the optimal breakdown (for example 1 TB per day for logs and 2 TB per day for metrics).
  • Targets for specific sub-accounts or business units.

In our example we have an organization that targets their total organizational ingest to < 360 TB per month. This was a new target after having reduced ingest down from over 20TB per day (600 TB per month).

To make the target easier to measure against we added a threshold line chart by adding the static number 360000 to our SELECT statement.

SELECT 360000, rate(sum(GigabytesIngested), 30 day) AS '30 Day Rate' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago limit max compare with 1 month ago TIMESERIES 7 days

We can use NRQL to render a line representing our target thirty-day ingest target.

We can also apply a daily rate target line. Let's just divide 360000 by 30 and we'll use 12000 as our daily rate target. Update the Daily ingest rate (compare with 3 months prior) chart:

SELECT 12000, rate(sum(GigabytesIngested), 1 day) AS avgGbIngestTimeseries FROM NrConsumption WHERE productLine='DataPlatform' TIMESERIES AUTO since 9 months ago limit max COMPARE WITH 3 months ago

We can use NRQL to render a line representing our daily ingest target.

Generate a tabular 30-day ingest report

To create a 30-day ingest report:

  1. Open the previously installed data ingest governance baseline dashboard.
  2. Click on the Baseline report tab.
  3. Click on ... in the upper right of the "Last 30 days" table and choose Export as CSV
  4. Import the CSV into Google Sheets, or the spreadsheet of your choice.

Alternatively, if you didn't install the dashboard, you may simply use this query to create a custom chart in the query builder:

SELECT sum(GigabytesIngested) AS 'gb_ingest_30_day_sum', rate(sum(GigabytesIngested), 1 day) AS 'gb_ingest_daily_rate', derivative(GigabytesIngested, 90 day) as 'gb_ingest_90_day_derivative' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago facet consumingAccountName, usageMetric limit max

Below is an example of a sheet we imported into Google Sheets.

A spreadsheet exported from the baseline dashboard tabular page

The screenshot shows the table sorted by 30 day ingest total.

Feel free to adjust your timeline and some of the details as needed. For example, we chose to extract a 90-day derivative to have some sense of change over the past few months. You could easily alter the time period of the derivative to suit your objectives.

Customize your report

Add useful columns to your report in order to facilitate other phases of data ingest governance, such as Optimize, and Forecast. The following fields will help guide optimization and planning decisions:

  • Notes: Note any growth anomalies and any relevant explanations for them. Indicate any major expected growth if foreseen.
  • Technical contact: Name of the manager of a given account or someone related to a specific telemetry type.

Detect ingest anomalies

Here are some steps for detecting ingest anomalies.

Alert on ingest anomalies

Use this ingest alerts guide to make sure that an increase in data consumption doesn't catch you by surprise. At a minimum, create:

  • A threshold alert to notify if you exceed monthly targets for data ingest beyond seasonal increases
  • A baseline alert to notify you of a sudden sharp increase ingest data

In addition to using alerts to identify consumption anomalies, you can use New Relic Lookout to explore potential ingest anomalies.

Lookout view

Lookout allows you to provide nearly any NRQL query and it will search for anomalies over a given period of time. The view below is based on this query:

SELECT rate(sum(GigabytesIngested), 1 day) AS avgGbIngest FROM NrConsumption WHERE productLine='DataPlatform' FACET usageMetric

We can use Lookout to find anomalies in our ingest by usageMetric.

Change the facet field to consumingAcountName to get this view:

We can use Lookout to find anomalies in our ingest by consumingAccountName.

Install the entity breakdown dashboard (optional)

In a previous section you installed the ingest baseline dashboard that uses NrConsumption as its primary source. In addition to that high level view you can create other visualizations that use bytescountestimate() to estimate ingest for nearly any event or metric. A detailed overview of bytescountestimate() was discussed in the prerequisites section.

To install the entity breakdown dashboard:

  1. Go to the same quickstart you used for the baseline dashboard.

  2. Click Install this quickstart in the upper right section of your browser window.

  3. You should install it into any account that contains APM, browser monitoring, mobile monitoring, or Kubernetes clusters using the import dashboard function. (If you have a partnership: don't install this dashboard into a partnership owner account, or POA.) You can install this dashboard into multiple accounts. If you have a parent/child account structure: you can install the dashboard into a parent account and modify the dashboard so you have account-specific charts all in one dashboard.

  4. Click Done.

  5. When the quickstart is done installing, open the Data governance entity breakdowns dashboard.

    The entity breakdown dashboard uses bytecountestimate() to facet ingest by useful attributes such as application or cluster name

You can refer back to this section to see exactly which event types are used in these breakdowns.

Tip

These queries consume more resources because they don't work from a pre-aggregated data source like NrConsumption. You may need to adjust the time frames by using additional WHERE and LIMIT clauses to make them work better in some of your environments.

Install the cloud integration dashboard (optional)

New Relic's cloud integrations can often be a significant source of data ingest growth. Without good visualizations it can be very difficult to pinpoint where the growth is coming from. This is partly because these integrations are so easy to configure and they aren't part of an organization's normal CI/CD pipeline. They may also not be part of a formal configuration management system.

Fortunately this powerful set of dashboards can be installed directly from New Relic Instant Observability.

Individual dashboards installed by this package include:

  • AWS integrations
  • Azure integrations
  • Google Cloud Platform integrations
  • On-host integrations
  • Kubernetes

This quickstart contains a highly granular set of dashboards breaking down data by nearly every cloud integration, on-host integration, and the Kubernetes integration.

Exercise

Answering the following questions will help you develop confidence in your ability to interpret baseline data and make correct inferences. These questions can be answered using the data ingest baseline and data ingest entity breakdown dashboards. Install those dashboards as described and see how many of these questions you can answer.

Questions
What is the typical daily ingest rate for the entire organization (all accounts) in the past week? What was it three months prior?
What are the top three telemetry types (for the organization as a whole) by ingest? List each telemetry type and its most recent 30 day ingest rate.
How many accounts contribute to this organization's ingest?
How many accounts (if any) currently contribute more than 50TB per month?
What are the top three accounts in terms of ingest for the past 30 days?
What is the GB ingest for the calendar month of this past January for the highest consuming account?
What are the top three accounts in terms of ApmEventsBytes ingest for the past 30 days
What is the single largest increase in terms of telemetry type ingest for a given account in the last 9 months? What about decreases?
Go to the account that contributes the most ApmEventsBytes and install/open the data governance entity breakdown dashboard. List the top three APM applications by ingest for the past 24 hours and their respective 24-hour ingest rates.

Conclusion

The process section took you through the creation of data ingest visualizations and reports. You can now review data ingest with a data driven visual approach that you and your peers can use to collaborate around.

Going forward, decide which visualizations to use for:

Additional resources

Other related resources include:

Copyright © 2022 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.