Amazon SageMaker integration

New Relic integrates with Amazon Web Services (AWS) for reporting your Amazon SageMaker metrics and other data to New Relic.

This document explains how to activate the integration, and describes the data reported.

Features

Collect and send telemetry data to New Relic from your Amazon SageMaker services using our integration. Monitor your services, query incoming data, and build dashboards to observe everything at a glance.

Activate integration

This integration is available through CloudWatch Metric Streams.

To enable this integration, see how to connect AWS services to New Relic via CloudWatch Metric Streams.

Find and use data

To find your integration's metrics, go to one.newrelic.com > Metrics and events and filter by aws.sagemaker.

Metric data

This New Relic infrastructure integration collects the following Amazon SageMaker data:

SageMaker Metric data

Metric (min, max, average, count, sum)	Unit	Description
`Invocations`	Count	The number of InvokeEndpoint requests sent to a model endpoint.
`InvocationsPerInstance`	Count	The number of invocations sent to a model, normalized by InstanceCount in each ProductionVariant.
`OverheadLatency`	Microseconds	The interval of time added to the time taken to respond to a client request by SageMaker overheads.
`ModelLatency`	Microseconds	The interval of time taken by a model to respond to a SageMaker API request.
`Invocation4XXErrors`	Count	The number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
`Invocation5XXErrors`	Count	The number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
`InvocationModelErrors`	Count	The number of model invocation requests which did not result in 2XX HTTP response.

All imported data from SageMaker have one dimension: EndpointName

Sagemaker Endpoints Metric data

Metric (min, max, average, count, sum)	Unit	Description
`MemoryUtilization`	Percent	The percentage of memory that is used by the containers on an instance. For endpoint variants, the value is the sum of the memory utilization of the primary and supplementary containers on the instance.
`DiskUtilization`	Percent	The percentage of disk space used by the containers on an instance uses. For endpoint variants, the value is the sum of the disk space utilization of the primary and supplementary containers on the instance.
`CPUUtilization`	Percent	The sum of each individual CPU core's utilization. For endpoint variants, the value is the sum of the CPU utilization of the primary and supplementary containers on the instance.
`GPUMemoryUtilization`	Percent	The percentage of GPU memory used by the containers on an instance. For endpoint variants, the value is the sum of the GPU memory utilization of the primary and supplementary containers on the instance.
`GPUUtilization`	Percent	The percentage of GPU units that are used by the containers on an instance. For endpoint variants, the value is the sum of the GPU utilization of the primary and supplementary containers on the instance.

All imported data from SageMaker Endpoints have one dimension: Host

SageMaker Training Jobs Metric data

Metric (min, max, average, count, sum)	Unit	Description
`MemoryUtilization`	Percent	The percentage of memory that is used by the containers on an instance. For training jobs, the value is the memory utilization of the algorithm container on the instance.
`DiskUtilization`	Percent	The percentage of disk space used by the containers on an instance uses. For training jobs, the value is the disk space utilization of the algorithm container on the instance.
`CPUUtilization`	Percent	The sum of each individual CPU core's utilization. For training jobs, the value is the CPU utilization of the algorithm container on the instance.
`TrainErrors`	Count	Measures the number of training job's train errors.

All imported data from SageMaker Training Jobs have one dimension: Host

Create alerts

You can set up to notify you if there are any changes. For example, you can set up an alert to notify relevant parties of critical or fatal errors.

Learn more about creating alerts here.