Data ingest governance is the practice of getting optimal value for the telemetry data collected by an organization. This is especially important for a complex organization that has numerous business units and working groups. This is the second part of a four-part guide to optimizing your New Relic data ingest, and is part of our series on observability maturity.
Before you start
This guide contains detailed recommendations for optimizing your data ingest. Before using this guide, we recommend you review our general data management docs.
About this stage
For this stage of your data ingest governance practice, it's necessary to get a high level view of all of the telemetry currently being generated by your organization. The unit focuses on breaking down ingest stats into various groups such as account, telemetry type, and application. These figures will be used to inform the Optimize your ingest data and Forecast your ingest data stages.
You'll learn how to generate a structured breakdown report for the following dimensions:
Organization
Specific accounts in your organization
Billable telemetry type
In addition you'll learn how to create highly granular breakdowns including:
Application (APM | browser | mobile)
Kubernetes cluster
Infrastructure integration
Desired outcome
Understand exactly which groups within your organization are contributing which types of data and how much.
Prerequisites
All billable telemetry is tracked with our NrConsumption and NrMTDConsumption events. This guide focuses on how to query NrConsumption, which provides more granular, real-time data than NrMTDConsumption. The NrConsumption attribute usageMetric denotes the telemetry type.
Using NrConsumption, you can ask questions like "How much browser monitoring data has each account ingested in the last 30 days?" and "How has the ingest changed since the previous 30 days?" Here's a query returning that data:
FROM NrConsumption SELECTsum(GigabytesIngested)WHERE usageMetric ='BrowserEventsBytes' SINCE 30 days AGO COMPARE WITH30 days AGO FACET consumingAccountName
The response shows you how many GBs of browser monitoring data you've ingested by account.
bash
$
Banking platform, 75 GB, +2.9%
$
Marketing platform, 40 GB, -1.3%
Below is a breakdown of the different usageMetric types, the constituent events (event types where the data is stored), and the type of agent or mechanism responsible for creating the data ingest.
Log as well as partition-specific events of the pattern [partition].Log
Various (Fluentd, FluentBit, Syslog, cloud-specific streaming services)
MetricEventBytes
Metric
From the Metric API and integrations that use that (dimensional metrics), or from agents such as browser agent, APM agent, or mobile agent (metric timeslice data).
For our usage-based pricing model, telemetry data and users both contribute to your usage and cost. This guide is focused on maximizing the value of telemetry data. The mention of users in this section is to help you understand different options for balancing users and data.
There are three general types of usage plans. Your usage plan may affect how you set ingest targets for your organization.
Commitment contracts
If you have a commitment contract, you'll likely have a monthly target budget for data ingest. For example, you may have set a target of 5TB per day and 100 full platform users. In this type of contract, users can be "traded off" but it's best to discuss this with other stakeholders in your organization to ensure you're getting the right mix for your observability goals. Although some customers will plan for variability in their consumption during the year, let's assume for now your monthly consumption budget is your total yearly commitment fee divided by 12.
If you know the number of full platform users and core users you need, you can use this formula:
In a pay-as-you-go plan you'll not have a predetermined yearly commit however, you'll likely have an understood limit to your monthly spend. In this model, you'd do the following to determine your target ingest:
With all our editions, you get 100GB data ingest per month for free. For details on data ingest prices above the free amount, see the list price table.
When to use
Use the rate operator when you need to take a sample of data pulled from a certain time period and produce a given rate. For example, take a daily sample of data and compute a 30 day rate based on that.
Compute rate based on a given sample of data
See what your daily average ingest has been for the past month.
SELECT rate(sum(GigabytesIngested),1day)AS'Daily Ingest Rate (GB)'FROM NrConsumption WHERE productLine ='DataPlatform'LIMIT MAX SINCE 30 days AGO
Our simple response for the entire organization is
bash
$
Daily ingest rate: 30.4 k
This query shows that the daily ingest rate was approximately 30 TB per day for the last month.
When to use
Use this when it's important to constrain an ingest calculation to specific calendar months. For example, ingest for an integration may have increased in late January and continued through mid February. This operator will help facet the ingest to the specific calendar months used for billing.
Facet by calendar month
SELECTsum(GigabytesIngested)AS'Daily Ingest Rate (GB)'FROM NrConsumption WHERE productLine ='DataPlatform' FACET monthOf(timestamp)LIMIT MAX SINCE 56 weeks AGO
The resulting table shows fairly high variability. Note that things were fairly hot in August and September. Some of that is our organization seasonality but also was related to some increasing the breadth of our telemetry coverage.
bash
$
|MONTH OF TIMESTAMP|GB INGESTED|
$
|---|---|
$
|December 2021*|636 k|
$
|November 2021|901 k|
$
|October 2021|873 k|
$
|September 2021|1.05 M|
$
|August 2021|1.08 M|
$
|July 2021|1.05 M|
$
|June 2021|887 k|
$
|May 2021|881 k|
$
|||
When to use
Use this when you want to evaluate the amount of change in ingest volume or rate between one time period in another. This is important to know if your ingest is creeping up unexpectedly.
Simple Change Analysis
SELECTsum(GigabytesIngested)FROM NrConsumption WHERE productLine ='DataPlatform'AND usageMetric ='BrowserEventsBytes' SINCE 6 months AGO UNTIL 1 week AGO TIMESERIES 7 weeks COMPARE WITH2 months ago
Example chart showing the use of COMPARE WITH to understand growth patterns.
When to use
Use this when you need to remove the effects of regular variability of ingest to see the broader pattern.
Telemetry is inherently noisy. Real world phenomena happen in spurts leaving many random peaks and troughs in the signal. This is good in a way as it lets us view the full complexity of a phenomenon. However, when we're seeking to see trends, we can be distracted by detail. NRQL provides a powerful way to smooth out any time series by combining each data point with slightly older points. This let's us focus on the overall temporal trend rather than one extreme increase or decrease.
Note the jaggedness of the raw timeseries for 1 day ingest rate:
FROM NrConsumption SELECT rate(sum(GigabytesIngested),1day)WHERE productLine ='DataPlatform' SINCE 26 weeks AGO TIMESERIES 1DAY
Daily rate time series without smoothing
Now if we use a sliding window of four days to reduce the impact of single day events we'll see a clearer picture. Four days is a good choice since it will blur the impact of weekends, so data for a Sunday will be combined somewhat with data for a Friday, etc.
FROM NrConsumption SELECT rate(sum(GigabytesIngested),1day)WHERE productLine ='DataPlatform' since 24 weeks ago TIMESERIES 1DAY SLIDE BY4 days
Daily rate time series with smoothing
When to use
Use this to estimate the statistical rate of change over a given time period. The rate of change is calculated using a linear least-squares regression to approximate the derivative.
NRQL provides us with some tools to assess the rate of change. This is useful because, as we see in the previous example, we had a very large increase over the past several months in browser metrics. This rate of change analysis uses the derivative operator and it gives us some confidence that the main growth happened back in early September. It seems as though our growth rate based on the 7 day derivative is somewhat negative so we may have reached a new plateau at the moment in BrowserEventsBytes ingest.
SELECT derivative(sum(GigabytesIngested),7day)FROM NrConsumption WHERE productLine ='DataPlatform'and usageMetric ='BrowserEventsBytes'LIMIT MAX SINCE 3 MONTHS AGO UNTIL THIS MONTH TIMESERIES 1MONTH slide by3 days COMPARE WITH1 WEEK AGO
Using a seven day derivative to explore ingest trends
When to use
Use bytecountestimate() when you need to estimate the ingest data footprint for a subset of raw events or metrics.
Examples
Run these queries in each sub-account or in a dashboard with account-specific charts. The queries estimate a 30 day rate based on 1 week of collection.
Estimate 30 day rate
APM:
FROMTransaction, TransactionError, TransactionTrace, SqlTrace, ErrorTrace, Span SELECT rate(bytecountestimate()/10e8,30day)AS'GB Ingest' FACET appName SINCE 1 WEEK AGO
Browser:
FROM PageAction, PageView, PageViewTiming, AjaxRequest, JavaScriptError SELECT rate(bytecountestimate()/10e8,30day)AS'GB Ingest' FACET appName SINCE 1 WEEK AGO
Mobile:
FROM Mobile, MobileRequestError, MobileSession SELECT rate(bytecountestimate()/10e8,30day)AS'GB Ingest' FACET appName SINCE 1 WEEK AGO
Some examples of usage.Integration values that will show up with this facet are:
com.newrelic.mssql (the New Relic MSSQL on-host integration)
com.newrelic.rabbitmq (the New Relic RabbitMQ on-host integration)
EC2 (the AWS EC2 integration)
Lambda (the Lambda integration)
Run these queries in each specific account or in a dashboard with account-specific charts.
Estimate 30 day rate:
FROM Metric SELECT rate(bytecountestimate()/10e8, 30 day) FACET usage.integrationName SINCE 1 WEEK AGO
Seven day sum:
FROM Metric SELECT bytecountestimate()/10e8 FACET usage.integrationName SINCE 1 WEEK AGO
Of all New Relic telemetry types, log data is the one with the most variation. A Log record can contain nearly any field and often it's unknown what a given log record will contain. Because there's no common schema, log ingest baselining may require a bit more analysis than baselining other data types.
One of the more useful basic log ingest techniques is to try to estimate ingest by host, container, or even by Kubernetes cluster. Here are some examples:
Log ingest by host for past 3 hours (total):
FROM Log SELECT bytecountestimate()/10e8
WHERE host isnotNULL SINCE 3 hours ago FACET host
Log ingest by host (30 day rate):
FROM Log SELECT rate(bytecountestimate()/10e8,30day)
WHERE host isnotNULL SINCE 3 hours ago FACET host
Log ingest by cluster_name (30 day rate):
FROM Log SELECT rate(bytecountestimate()/10e8,30day)
WHERE host isnotNULL SINCE 3 hours ago FACET cluster_name
Log ingest by cluster_name and container_name (30 day rate):
FROM Log SELECT rate(bytecountestimate()/10e8,30day)
WHERE host isnotNULL SINCE 3 hours ago FACET cluster_name, container_name
Estimate 30 day rate
FROM K8sClusterSample, K8sContainerSample,K8sDaemonsetSample, K8sDeploymentSample, K8sEndpointSample, K8sHpaSample, K8sNamespaceSample, K8sNodeSample, K8sPodSample, K8sReplicasetSample, K8sServiceSample, K8sVolumeSample SELECT rate(bytecountestimate()/10e8,30day)AS'GB Ingest' FACET clusterName SINCE 1 WEEK AGO
ProcessSample can be quite a high volume event. In this example we'll compute the 30 day ingest per command line.
Estimate 30 day rate by command name
FROM ProcessSample SELECT rate(bytecountestimate()/10e8,30day)AS'GB Ingested' FACET commandName SINCE 1DAY AGO
When to use
Use the eventType() operator when you need to have event level granularity in your query and when you are unfamiliar with what custom events are present in your account.
Often times we'll use a query that selects multiple events. This one of the primary means we have to determine how much data a given agent or integration is sending us.
The following query tells us how much data the Kubernetes integration is sending us:
FROM
K8sApiServerSample,
K8sClusterSample,
K8sContainerSample,
K8sControllerManagerSample,
K8sDaemonsetSample,
K8sDeploymentSample,
K8sEndpointSample,
K8sNamespaceSample,
K8sNodeSample,
K8sPodSample,
K8sReplicasetSample,
K8sSchedulerSample,
K8sServiceSample,
K8sStatefulsetSample,
K8sVolumeSample
SELECT bytecountestimate()/10e8 AS'Gigabytes'
SINCE 1 DAYS AGO
LIMIT MAX
It's powerful in itself, but it'll only return a single aggregate value:
bash
$
42.341 Gigabytes
When we need to drill deeper to know how much data specific event is consuming, we can use eventType() in a facet clause to get that result.
Adding the clause FACET eventType() to the previous query gives us:
Listing of ingest by K8s event type
When to use
Use SHOW EVENT TYPES when you're uncertain of the events that exist in your account.
SHOW EVENT TYPES lists all event types in an account for a given time period. For more detail, see SHOW EVENT TYPES. Using a specific time window can be useful to better understand when a given event started to come into the system.
When to use
Use FACET metricName when you need metric-name-level granularity in your query.
The best way to really explore metrics and get a sense of the relative volume of data coming from each is to use FACET metricName on a SELECT FROM Metric query. It's possible to incorporate WHERE clauses to narrow the list down. For example, to view the relative ingest volume for metrics with the text kube_pod in metricName run a query like this:
SELECT bytecountestimate()/10e8 as'Gigabytes'from Metric facet metricName where metricName like'%kube_pod%' since 1day ago limit max
Listing of ingest by the metricName attribute of the Metric namespace
Process
Here are the major steps you'll do as part of this data ingest governance improvement procedure:
Click Install this quickstart in the upper right portion of your browser window.
If applicable: select your primary or top-level account in the account switcher.
Click Done.
When the quickstart is done installing, open the Data ingest governance baseline dashboard.
That will bring you to the newly installed dashboard.
Dashboard overview
The main overview tab shows a variety of charts including some powerful time series views.
Organization wide baseline ingest time series
The second tab provides a baseline report by sub-account and usage metric.
Organization wide baseline reports view
The remaining tabs provide detailed views of specific telemetry types such as browser data, APM data, logs, and traces. For example, this screenshot shows the browser detail page:
Example of an ingest detail focused on a single telemetry type (in this case browser data).
Detail tabs include:
APM: ApmEventsBytes
Tracing: TracingBytes
Browser: BrowserEventsBytes
Mobile: MobileEventsBytes
Infra (host): InfraHostBytes
Infra (process):InfraProcessBytes
Infra (integration): InfraIntegrationBytes
Custom events: CustomEventsBytes
Serverless: ServerlessBytes
Pixie: PixieBytes
Add ingest target indicators to your dashboard
In the prerequisites section we discussed the concept of a monthly usage target. You may actually have several targets to help keep you on track:
An overall organizational target on daily rate or monthly ingest.
Targets per data type to ensure the optimal breakdown (for example 1 TB per day for logs and 2 TB per day for metrics).
Targets for specific sub-accounts or business units.
In our example we have an organization that targets their total organizational ingest to < 360 TB per month. This was a new target after having reduced ingest down from over 20TB per day (600 TB per month).
To make the target easier to measure against we added a threshold line chart by adding the static number 360000 to our SELECT statement.
SELECT360000, rate(sum(GigabytesIngested),30day)AS'30 Day Rate'FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago limit max compare with1month ago TIMESERIES 7 days
We can use NRQL to render a line representing our target thirty-day ingest target.
We can also apply a daily rate target line. Let's just divide 360000 by 30 and we'll use 12000 as our daily rate target. Update the Daily ingest rate (compare with 3 months prior) chart:
SELECT12000, rate(sum(GigabytesIngested),1day)AS avgGbIngestTimeseries FROM NrConsumption WHERE productLine='DataPlatform' TIMESERIES AUTO since 9 months ago limit max COMPARE WITH3 months ago
We can use NRQL to render a line representing our daily ingest target.
Generate a tabular 30-day ingest report
To create a 30-day ingest report:
Open the previously installed data ingest governance baseline dashboard.
Click on the Baseline report tab.
Click on ... in the upper right of the "Last 30 days" table and choose Export as CSV
Import the CSV into Google Sheets, or the spreadsheet of your choice.
Alternatively, if you didn't install the dashboard, you may simply use this query to create a custom chart in the query builder:
SELECTsum(GigabytesIngested)AS'gb_ingest_30_day_sum', rate(sum(GigabytesIngested),1day)AS'gb_ingest_daily_rate', derivative(GigabytesIngested,90day)as'gb_ingest_90_day_derivative'FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago facet consumingAccountName, usageMetric limit max
Below is an example of a sheet we imported into Google Sheets.
A spreadsheet exported from the baseline dashboard tabular page
The screenshot shows the table sorted by 30 day ingest total.
Feel free to adjust your timeline and some of the details as needed. For example, we chose to extract a 90-day derivative to have some sense of change over the past few months. You could easily alter the time period of the derivative to suit your objectives.
Customize your report
Add useful columns to your report in order to facilitate other phases of data ingest governance, such as Optimize, and Forecast. The following fields will help guide optimization and planning decisions:
Notes: Note any growth anomalies and any relevant explanations for them. Indicate any major expected growth if foreseen.
Technical contact: Name of the manager of a given account or someone related to a specific telemetry type.
Detect ingest anomalies
Here are some steps for detecting ingest anomalies.
Alert on ingest anomalies
Use this ingest alerts guide to make sure that an increase in data consumption doesn't catch you by surprise.
At a minimum, create:
A threshold alert to notify if you exceed monthly targets for data ingest beyond seasonal increases
An anomaly alert to notify you of a sudden sharp increase ingest data
In addition to using alerts to identify consumption anomalies, you can use New Relic Lookout to explore potential ingest anomalies.
Lookout view
Lookout allows you to provide nearly any NRQL query and it will search for anomalies over a given period of time. The view below is based on this query:
SELECT rate(sum(GigabytesIngested),1day)AS avgGbIngest FROM NrConsumption WHERE productLine='DataPlatform' FACET usageMetric
We can use Lookout to find anomalies in our ingest by usageMetric.
Change the facet field to consumingAccountName to get this view:
We can use Lookout to find anomalies in our ingest by consumingAccountName.
Install the entity breakdown dashboard (optional)
In a previous section you installed the ingest baseline dashboard that uses NrConsumption as its primary source. In addition to that high level view you can create other visualizations that use bytescountestimate() to estimate ingest for nearly any event or metric. A detailed overview of bytescountestimate() was discussed in the prerequisites section.
To install the entity breakdown dashboard:
Go to the same quickstart you used for the baseline dashboard.
Click Install this quickstart in the upper right section of your browser window.
You should install it into any account that contains APM, browser monitoring, mobile monitoring, or Kubernetes clusters using the import dashboard function. (If you have a partnership: don't install this dashboard into a partnership owner account, or POA.) You can install this dashboard into multiple accounts. If you have a parent/child account structure: you can install the dashboard into a parent account and modify the dashboard so you have account-specific charts all in one dashboard.
Click Done.
When the quickstart is done installing, open the Data governance entity breakdowns dashboard.
The entity breakdown dashboard uses bytecountestimate() to facet ingest by useful attributes such as application or cluster name
You can refer back to this section to see exactly which event types are used in these breakdowns.
Tip
These queries consume more resources because they don't work from a pre-aggregated data source like NrConsumption. You may need to adjust the time frames by using additional WHERE and LIMIT clauses to make them work better in some of your environments.
Install the cloud integration dashboard (optional)
New Relic's cloud integrations can often be a significant source of data ingest growth. Without good visualizations it can be very difficult to pinpoint where the growth is coming from. This is partly because these integrations are so easy to configure and they aren't part of an organization's normal CI/CD pipeline. They may also not be part of a formal configuration management system.
Individual dashboards installed by this package include:
AWS integrations
Azure integrations
Google Cloud Platform integrations
On-host integrations
Kubernetes
This quickstart contains a highly granular set of dashboards breaking down data by nearly every cloud integration, on-host integration, and the Kubernetes integration.
Exercise
Answering the following questions will help you develop confidence in your ability to interpret baseline data and make correct inferences. These questions can be answered using the data ingest baseline and data ingest entity breakdown dashboards. Install those dashboards as described and see how many of these questions you can answer.
Questions
What is the typical daily ingest rate for the entire organization (all accounts) in the past week? What was it three months prior?
What are the top three telemetry types (for the organization as a whole) by ingest? List each telemetry type and its most recent 30 day ingest rate.
How many accounts contribute to this organization's ingest?
How many accounts (if any) currently contribute more than 50TB per month?
What are the top three accounts in terms of ingest for the past 30 days?
What is the GB ingest for the calendar month of this past January for the highest consuming account?
What are the top three accounts in terms of ApmEventsBytes ingest for the past 30 days
What is the single largest increase in terms of telemetry type ingest for a given account in the last 9 months? What about decreases?
Go to the account that contributes the most ApmEventsBytes and install/open the data governance entity breakdown dashboard. List the top three APM applications by ingest for the past 24 hours and their respective 24-hour ingest rates.
Conclusion
The process section took you through the creation of data ingest visualizations and reports. You can now review data ingest with a data driven visual approach that you and your peers can use to collaborate around.
Going forward, decide which visualizations to use for: