New Relic data types: metrics, events, logs, and traces (MELT)
The New Relic platform is built around the four fundamental telemetry data types we believe are necessary for complete and effective system monitoring: metrics, events, logs, and traces (often referred to as "MELT" in the observability industry).
In this doc, we'll give a fairly technical explanation of our core MELT data types, their structure, and how they're used in our features. You can use most of our features without needing to understand the underlying data structure. But having a better understanding of this can help you get data into New Relic, understand the data you see in our UI, and query and chart your data.
Metrics
First, we'll explain the definition of metrics from a monitoring industry perspective, and then we'll explain how New Relic handles metrics.
Metrics in the monitoring industry
In the software monitoring industry, a metric means a numeric measurement of an application or system. Metrics are typically reported on a regular schedule.
Two major types of metrics are:
Aggregated data. For example: a count of events over one minute's time, or the rate of some event per minute.
A numeric status at a moment in time. For example: a CPU temperature reading, or a “CPU% used” status.
Metrics are relatively easy to report and store because a single record can represent a range of time. They can also be aggregated more and more over time. For example, per-minute data may be “rolled up” to per-hour aggregations after some amount of time, and eventually may be rolled up to a per-day aggregation. This approach is efficient for long-term data storage.
Metrics are a strong solution for storing data long-term, and understanding trends over time. One potential downside is that it can be difficult to do detailed analysis of older data that has been aggregated over time; when high detail is required about specific important actions, event data can be used.
Metrics at New Relic
Conceptually, "metrics" is a broad, general category. There are various ways New Relic measures and reports metrics but, in practice, when using the New Relic UI, you usually won't have to understand how exactly this happens. In our documentation, we typically will just refer to "metrics," regardless of how that data is reported, unless there's a reason you need to know more (like understanding how to query your data).
Here are some of the ways metrics are reported and stored across the New Relic platform:
In the monitoring industry, "dimensional" metrics refer to metric data that has a variety of attributes (dimensions) attached, such as duration-related attributes (start time, end time), entity ID, region, host, and more. This level of detail allows for in-depth analysis and querying.
At New Relic, this metric data is attached to our Metric data type. This is our primary metric data type and is used by many of our tools, including:
To query the Metric data type, you could use a NRQL query like:
Select*from Metric
As time passes, metrics are increasingly aggregated into larger time buckets. This is done to optimize your ability to query data over a long period of time.
New Relic's APM, browser, and report and display metrics in a simple data format that we refer to as metric timeslice data. A metric timeslice consists of three parts: a metric name, the segment of time the metric represents (the "timeslice"), and a numeric value (the measurement).
For example: an metric timeslice for time spent in a particular transaction is named WebTransaction/URI/foo, and might have a response time of 0.793 for a one-minute time slice from 10:20am to 10:21am. These metrics usually follow a pattern like <category>/<class>/<method>.
Our agents (APM, browser, and mobile) can collect thousands of metric timeslices per minute for a variety of performance metrics. For example: error rate, bandwidth usage, and garbage collection time. You also have the ability to create custom metrics.
Metric timeslice data is a lightweight data type and lacks the detail that dimensional metrics have.
Ways to explore and query metric timeslice data:
For APM: metric timeslice data is converted to dimensional metrics and can be queried via NRQL
If you want to learn more about the structure of metric timeslice data and see some examples, expand the collapser below.
Here are some common metric timeslice data examples, with a focus on common ones used by Ruby applications.
ActiveMerchant
New Relic tracks a variety of metrics on ActiveMerchant transactions which can be used for business analytics as well as performance monitoring. The metrics are summarized by operation as well as by gateway.
ActiveRecord is the Object-Relational Mapping API used by Ruby on Rails applications. The metrics shown here measure the performance of ActiveRecord's find and save methods.
Apdex is a measure of user satisfaction with page load times.
Controller
In Ruby on Rails applications, HTTP requests are handled by Controller actions. A Rails application has many controllers, each of which has one or more actions. When your rails application receives an http request, that request is routed to the appropriate controller and action, based on the URL of that request. That action then does whatever processing is neccesary to generate an http response, which is most often a web page, but could also be a page fragment, an xml document, or any other kind of data that is requested by the client.
The following metrics track the performance of controller actions, regardless of routing, and without taking into account any network or web server effects.
This metric tracks the number of errors or exceptions raised while processing requests.
regex
sample metric
legend name
Errors/all
Errors/all
External services
External service instrumentation captures calls to out-of-process services such as web services, resources in the cloud and any other network calls. It does not include other first class backend components such as MemCache and the database.
In Ruby applications we instrument the Net::Http library to capture all HTTP services.
regex
sample metric
legend name
External/[^/]+/all$
External/service.example.com/all
All service.example.com calls
External/
External/host.aws.com/Net::Http::POST
Net::Http::POST[host.aws.com]
External/all$
External/all
External Services
External/[^/]+/(?!all)/
External/service.example.com/all
All service.example.com calls
HTTP dispatcher
This metric represents a summary of the throughput and response time of all web requests.
regex
sample metric
legend name
^HttpDispatcher$
HttpDispatcher
HttpDispatcher
MemCache
MemCache is a popular technology that enables applications to access shared memory provided by any number of physical machines as a global cache. Applications that heavily use the database often use MemCache for performance and scalability benefits.
These metrics measure the frequency and response time of calls to MemCache to read and write data from the cache. Response times should be low (less than 5 ms) for a well performing MemCache deployment.
regex
sample metric
legend name
MemCache/.*
MemCache/read
MemCache read operations
MemCache/read
MemCache/read
MemCache read operations
MemCache/write
MemCache/write
MemCache write operations
Mongrel
This metric measures the length of the mongrel queue, which holds pending http requests to be processed by mongrel. The HTTP Activity graph overlays the maximimum queue length for a given period. The value is zero if mongrel is processing a request but has no other requests waiting in its queue.
When looking at this value across an aggregate cluster of mongrels, the queue lengths of all mongrels is added together, showing the sum of all queue lengths.
A mongrel queue length should be at or near zero; if it is consistently at a higher level, then it indicates that your rails application is having trouble keeping up with its load requirements.
regex
sample metric
legend name
Mongrel/Queue Length
Mongrel/Queue Length
Queue Length
View
ActionView is a package in Rails that is used to render the output that is the response to an http request, such as an html page or an xml document. The View is rendered by the controller that is handling the request.
If View metrics represent a large portion of your controller's response time, it could mean you are doing a lot of database operations inside the view template itself.
Because event-type data can have any type of key-value pair data attached to it, one way metrics can be reported is as attributes attached to an event.
A couple examples of this at New Relic:
Our infrastructure monitoring reports many metrics that are attached to events. For example, we report a ProcessSample event, which has various sample-based metrics attached to it, like CPU percentage. To learn more about infrastructure monitoring data, see Infrastructure data.
In APM, the Transaction event has several metrics attached to it, including databaseDuration.
To learn more about this data and how to query it, see Events.
Metrics can be formed by counting New Relic events, or doing some other mathematical calculation on those events. For example, if you wanted to measure the total number of Transaction events over the last half hour, you might run this NRQL query:
Selectcount(*)fromTransaction since 30 minutes ago
Another example: if you wanted to compute the average response time for your service, you might run a query like:
FROMTransactionSELECT average(duration) SINCE 30 minutes ago
Some New Relic charts are generated with these kinds of queries. The downside of this approach is that there are limits on how many events a monitoring system (including ours) can report. This means that sometimes, for high-throughput systems, the count may not accurately represent the total activity on that system. To learn more about how this can be addressed, see Event limits and sampling.
First, we'll explain the definition of events from a monitoring industry perspective, and then we'll explain some specifics about how New Relic handles event data.
Events in the monitoring industry
In the software industry, events can be thought of as simply “things that occur in a system.” For example, a server setting being changed would be an event. Another example: a website user clicking a mouse.
Some events will generate a stored record, and that record is typically also called an event.
Event data represents discrete occurrences and typically will have a high level of detail, so event data is suited for detailed analysis and querying. The downside to the use of event data is that there are typically so many events reported that it can become difficult to query that large dataset over longer time ranges.
Events at New Relic
At New Relic, we report events to data objects also called events. These events have multiple attributes (key-value pairs) attached. Event data is used in some UI charts and tables, and you can also query it. How long event data remains available is determined by data retention rules.
One example of an event: APM reports an event type named Transaction, which represents a logical unit of work in an application. To see the attributes attached to this event, you could use a NRQL query like:
To increase the availability of your event data for querying/charting, you can turn events into metrics.
Some systems generate a large number of events that exceeds collection limits and results in incomplete query results. For more on this, see Event sampling.
Because event is a general term, in some New Relic contexts it will refer to any data type that can be queried via NRQL. For example, when you run a NRQL query, it returns a count of inspected events: this is a count of all data types queried.
Log data
First, we'll explain the definition of logs from a monitoring industry perspective, and then we'll explain some specifics about how New Relic handles log reporting.
Logs in the monitoring industry
A log is a message about a system used to understand the activity of the system and to diagnose problems.
Logs at New Relic
Our capabilities give you a centralized platform that connects your log data with other New Relic-monitored data. For example, you can see logs alongside your APM data.
In New Relic, log data is reported with multiple attributes (key-value data) attached. To query your log data, you could use a NRQL query like:
First, we'll explain the definition of traces from a monitoring industry perspective, and then we'll explain some specifics about how New Relic handles tracing.
Tracing in the monitoring industry
In the application/infrastructure-monitoring world, tracing is a general term used to refer to various ways to report information about how a program or system is operating. For example, a stack trace provides in-depth information about a program's subroutines.
For large modern systems, which are often distributed across many services and micro-services, “tracing” often refers to distributed tracing, which is a way to monitor requests as they propagate through a complex, distributed environment.
Tracing at New Relic
New Relic offers a distributed tracing feature that tracks requests across a distributed system, and provides a dedicated UI for understanding and analyzing your traces. In New Relic, trace data is reported as Span objects, with multiple attributes (key-value pairs) attached.
To query your tracing data, you could use a NRQL query like: