• EnglishEspañol日本語한국어Português
  • Log inStart now

Manage OpenTelemetry data ingest volume

One of OpenTelemetry's core strengths is its rich set of tools providing unparalleled control over your telemetry data pipeline. This control complements New Relic's consumption-based pricing model.

This page talks about a variety of concepts that contribute to data volume when using OpenTelemetry with New Relic, and tools / patterns available to manage your telemetry data pipeline. It's organized into sections talking about key concepts contributing to data volume for resource, traces, metrics, and logs, followed by a catalog of tools available for helping manage volume.

Key concepts: Resources

OTLP defines a similar hierarchical message structure across all signals, which avoids repetition at the protocol level by sharing information across records.

  • Each export request contains many Resource{SignalRecord}s
  • Each Resource{SignalRecord} contains many Scope{SignalRecord}s
  • Each Scope{SignalRecord} contains many {SignalRecord}s
  • {SignalRecord} is Span, Metric, and LogRecord

This hierarchical structure is flattened when sent to New Relic endpoint, and each resource attribute is denormalized onto individual Span, Metric, and Log records. For more informaton on how OpenTelemetry data is handled in New Relic, see the following pages:

Most OpenTelemetry language implementations provide packages with resource detectors, which contribute resource attributes based on information detected in the environment. These attributes can be extremely useful, but may include more information than necessary.

For more details, see the following:

Key concepts: Traces

Sampling

Sampling is the process of controlling which spans are exported to an observability backend. Trace data can provide highly valuable insights, but if unchecked can quickly fill up hard drives (and bills!).

By default, OpenTelemetry SDKs use the ParentBased(root=AlwaysOn) sampler. The ParentBased sampler delegates to different configurable samplers based on whether there is a parent span, whether that parent is local to the current process or remote, and whether that parent is sampled. The default ParentBased(root=AlwaysOn) sampler will sample a span if either of the following is true:

  • There is no parent span (i.e. this span is the root)
  • The parent is sampled, regardless of whether the parent is local or remote

In other words, it will sample the span unless the parent is unsampled.

This is a good default for the OpenTelemetry community since it allows users to install instrumentation and see trace data without first needing to be aware of the sampling concept. However, you should be cautious with production deployments of OpenTelemetry. Under this policy, all spans are sampled unless there is some upstream component or gateway making intelligent sampling decisions for downstream systems to conform to.

For alternatives, see the following:

Span data

OpenTelemetry spans are composed of a variety of top-level fields (name, kind, etc.), attributes, span events, and span links. The amount of data attached to spans directly corresponds to data volume in New Relic.

Instrumentation libraries make decisions about what pieces of information to attach to spans, often following the semantic conventions. When exceptions occur, the information is often attached to the span in the form of an exception span event. This event includes attributes representing the exception message, type, and stack trace, which, depending on the language and application, can consist of thousands of bytes. If exceptions occur frequently, stack traces may produce high data volume in New Relic.

For strategies for managing span data, see the following:

Tracers

An instrumenation scope is a logical unit of application code with telemetry associated with it. Each instrumentation library has one (or more) unique scopes, and corresponding tracer(s).

Tracers which do not produce high-value trace data can be selectively disabled without breaking traces.

For more details, see SDK disable Tracers, Meter, Loggers.

Key concepts: Metrics

Collection interval

Metrics aggregate individual measurements and export the aggregated state out of process. For push-based protocols like OTLP, exporting occurs on a configurable interval defaulting to 60s. This interval directly corresponds to data volume in New Relic. Reduce the interval to 30s and data volume should approximately double. Increase the interval to 120s and the data volume should approximately cut in half.

The 60s default interval is a reasonable default, as it balances the tradeoff between data volume and observability lag. Increase the interval too high, and delays in critical signals reaching New Relic dashboards and alerts can exacerbate issues.

For more details, see SDK periodic exporting Metric Reader exportIntervalMillis.

Cardinality

The measurements that metrics aggregate are associated with a set of attributes. The number of distinct sets of attributes is called cardinality. Cardinality is important because it dictates how much application memory is required to maintain the aggregated state of the measurements, how much data is exported each collection interval, and the data volume in New Relic.

If cardinality is too high, consider omitting attributes which contribute to it. If you control the instrumentation, this could mean recording fewer attributes with each measurement. However, the instrumentation is often not directly configurable.

For details on how to drop attributes from metrics, see the following:

Aggregation temporality

In OpenTelemetry metrics, the concept of aggregation temporality defines whether or not the state of aggregated measurements resets after each collection. When aggregation temporality is cumulative, the aggregated state does not reset and metrics represent the cumulative values since application start. When aggregation temporality is delta, the aggregated state resets after each collection and metrics represent the difference since the previous collection.

While New Relic's OTLP endpoint supports cumulative aggregation temporality, the New Relic metrics architecture is a delta metrics system. Using the default cumulative setting will generally incur more memory usage from SDKs and result in high data ingest. Conversion from cumulative to delta aggregation is a stateful activity, since you need to hold on to the previous point of each time series in order to compute the difference. For this reason, it's best to configure aggregation temporality in the SDK, which saves application memory and avoids extra complexity downstream.

For more details, see the following:

Meters and Instruments

An instrumentation scope is a logical unit of application code with telemetry associated with it. Each instrumentation library has one (or more) unique scopes and corresponding meter(s), and each meter has one or more instruments.

Meters or instruments that don't provide valuable metric data can be selectively disabled.

For more details, see the following:

Key concepts: Logs

LogRecord data

OpenTelemetry log records are composed of a variety of top-level fields (timestamp, severity, body, etc.) and attributes. The amount of data attached to log records directly corresponds to data volume in New Relic.

Instrumentation libraries (called log appenders in the OpenTelemetry log space since the OpenTelemetry log bridge API is meant for bridging logs from log APIs into OpenTelemetry) make decisions about what pieces of information to attach to log records, often following the semantic conventions.

For strategies on managing log data, see the following:

Loggers

An instrumenation scope is a logical unit of application code with telemetry associated with it. For OpenTelemetry logging, each distinct logger (bridged by a log appender using the OpenTelemetry log bridge API) has a unique instrumentation scope.

Loggers that don't produce high-value log data can be selectively disabled.

For more details, see the following:

Pipeline management tool catalog

The following table lists a variety of useful tools for managing your OpenTelemetry data pipeline. Note that OpenTelemetry is a highly extensible ecosystem. If these tools do not suffice, other tools may be available or you may be able to write custom extension logic for the language SDKs or collector to accomplish your goals.

Name

Collector or SDK?

Description

SDK resource detectors

SDK

OpenTelemetry language SDKs package detectors to supply resource attributes based on the environment. Some set of these are often enabled by default with zero-code instrumenation options like the OpenTelemetry Java Agent. See language docs for details on how to enable / disable resource detectors.

SDK ParentBased(root=TraceIdRatioBased) sampler

SDK

The ParentBased sampler with the root set to the TraceIdRatioBased sampler is simple and reasonable alternative to the default ParentBased sampler with the root set to AlwaysOn. With the root set to TraceIdRatioBased, spans which represent new traces are sampled probabilistically, with child spans sampled according to the sampling decision of their parent (so long as other applications are configured with the same sampling policy). The sampler can be configured programmatically on the SDK TracerProvider, but it's common to use env vars:

bash
$
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
$
export OTEL_TRACES_SAMPLER_ARG=0.25

Set the TraceIdRatioBased sampler to sample 25% of root spans.

SDK Span limits

SDK

The OpenTelemetry trace SDK allows span limits to be configured to specify the max length and quantity of attributes, the max number of span events, and the max number of span links. Span limits can be configured programmatically on the SDK TracerProvider, but it's common to use env vars:

bash
$
export OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT=4095
$
export OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT=64

Set the span limits to align with the New Relic OTLP endpoint's attribute limits.

SDK LogRecord limits

SDK

The OpenTelemetry log SDK allows span limits to be configured to specify the max length and quantity of attributes. LogRecord limits can be configured programmatically on the SDK LoggerProvider, but it's common to use env vars:

bash
$
export OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMIT=4095
$
export OTEL_LOGRECORD_ATTRIBUTE_COUNT_LIMIT=64

Set the log record limits to align with the New Relic OTLP endpoint's attribute limits.

SDK disable Tracers, Meters, Loggers

SDK

The OpenTelemetry SDK defines TracerConfigurator, MeterConfigurator, and LoggerConfigurator to configure and disable Tracers, Meters, and Loggers respectively. This concept is currently under development and not available in all language implementations. See individual language docs and reach out to language maintainers to check the status.

SDK periodic exporting Metric Reader exportIntervalMillis

SDK

The OpenTelemetry metric SDK allows the collection interval of the periodic exporting Metric reader to be configured. The interval can be configured programmatically, but it's common to use env vars:

bash
$
export OTEL_METRIC_EXPORT_INTERVAL=60000

Set the collection interval to be 60s (60k ms). This is the default, but can be adjusted to suit.

SDK metric views

SDK

The OpenTelemetry metric SDK allows MeterProvider to be configured with views to specify various options including the set of attribute keys to retain, the aggregation type, and dropping the metric. Generally, views are configured programmatically. See individual language docs to check for alternatives in your language. For example, OpenTelemetry Java has experimental support configuring views in a YAML file.

SDK delta aggregation temporality

SDK

The OpenTelemetry metrics SDK allows aggregation temporality to be configured for the OTLP exporter. This temporality can be configured programatically, but it's common to use env vars:

bash
$
export OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta

Set the OTLP metric exporter aggregation temporality to be delta, aligning with the New Relic OTLP endpoint's preference.

SDK configure log appenders

SDK

The OpenTelemetry log bridge API is designed for use by log appenders, which bridge logs from log APIs into OpenTelemetry. These log appenders may be automatically installed with zero-code instrumenation options like the OpenTelemetry Java Agent, or may require manual installation steps. They often have configuration options to specify which logs and what data is bridged into OpenTelemetry. Additionally, it's often possible to configure the log API being bridged to specify which logs (baesd on severity or logger name) should be passed to the log appender. See individual language docs for details.

Collector filter processor

Collector

The collector filter processor can be used to filter out span, metric, and log records from your observability pipeline. For spans, it can function as a simple tail sampler, acting on completed spans but without access to the full trace (Note: this can result in broken traces). For metrics, it can be used to drop metrics or series which are not high value. For logs, it can be used to drop log records which are not high value (e.g. logs with fine grain severity or from noisy loggers).

Collector tailsampling processor

Collector

The collector tailsampling allows you to decide whether to sample based on completed traces. For example, you can emphasize retaining traces which have errors or which touch high interest areas of the system. The drawback is that tailsampling processor adds complexity to your observability pipeline since it requires that all spans of a trace be routed to the same collector instance, and that the collector hold spans in memory while waiting for the trace to complete. This requires careful planning when your observability pipeline reaches a scale which cannot be handled by a single collector instance.

Collector resource processor

Collector

The collector resource processor allows you to write simple rules to manipulate resource attributes of spans, metrics, and logs. For example, you can delete attributes which are not high value.

Collector metricstransform processor

Collector

The collector metric transform processor allows you to manipulate metric data. For example, you can delete series which are not high value, or merge timeseries to reduce cardinality (sometimes called spatial re-aggregation).

Collector transform processor

Collector

The collector transform processor allows you transform observability data using conditions to select data and statements to modify it. For example, you can remove resource attributes, truncate attributes, change top-level fields for spans, metrics, and logs records, and much more.

Collector cumulativetodelta processor

Collector

The collector cumulativetodelta processor allows you to transform metric aggregation temporality from cumulative to delta. This is useful to align your metrics with the preferred aggregation temporality of New Relic's OTLP endpoint. Note that translation from cumulative to delta aggregation is a stateful process, requiring the collector to store the last point of each timeseries in memory in order to compute and emit the difference. This requires careful planning of collector memory resources and structuring the observability pipeline to ensure that all points of the same series arrive at the same collector instance.

Copyright © 2024 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.