• /
  • EnglishEspañol日本語한국어Português
  • Log inStart now

Amazon SageMaker integration

New Relic integrates with Amazon Web Services (AWS) for reporting your Amazon SageMaker metrics and other data to New Relic.

This document explains how to activate the integration, and describes the data reported.

Features

Collect and send telemetry data to New Relic from your Amazon SageMaker services using our integration. Monitor your services, query incoming data, and build dashboards to observe everything at a glance.

Activate integration

This integration is available through CloudWatch Metric Streams.

To enable this integration, see how to connect AWS services to New Relic via CloudWatch Metric Streams.

Find and use data

To find your integration's metrics, go to one.newrelic.com > Metrics and events and filter by aws.sagemaker.

Metric data

This New Relic infrastructure integration collects the following Amazon SageMaker data:

SageMaker Metric data

Metric (min, max, average, count, sum)

Unit

Description

Invocations

Count

The number of InvokeEndpoint requests sent to a model endpoint.

InvocationsPerInstance

Count

The number of invocations sent to a model, normalized by InstanceCount in each ProductionVariant.

OverheadLatency

Microseconds

The interval of time added to the time taken to respond to a client request by SageMaker overheads.

ModelLatency

Microseconds

The interval of time taken by a model to respond to a SageMaker API request.

Invocation4XXErrors

Count

The number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.

Invocation5XXErrors

Count

The number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.

InvocationModelErrors

Count

The number of model invocation requests which did not result in 2XX HTTP response.

All imported data from SageMaker have one dimension: EndpointName

Sagemaker Endpoints Metric data

Metric (min, max, average, count, sum)

Unit

Description

MemoryUtilization

Percent

The percentage of memory that is used by the containers on an instance. For endpoint variants, the value is the sum of the memory utilization of the primary and supplementary containers on the instance.

DiskUtilization

Percent

The percentage of disk space used by the containers on an instance uses. For endpoint variants, the value is the sum of the disk space utilization of the primary and supplementary containers on the instance.

CPUUtilization

Percent

The sum of each individual CPU core's utilization. For endpoint variants, the value is the sum of the CPU utilization of the primary and supplementary containers on the instance.

GPUMemoryUtilization

Percent

The percentage of GPU memory used by the containers on an instance. For endpoint variants, the value is the sum of the GPU memory utilization of the primary and supplementary containers on the instance.

GPUUtilization

Percent

The percentage of GPU units that are used by the containers on an instance. For endpoint variants, the value is the sum of the GPU utilization of the primary and supplementary containers on the instance.

All imported data from SageMaker Endpoints have one dimension: Host

SageMaker Training Jobs Metric data

Metric (min, max, average, count, sum)

Unit

Description

MemoryUtilization

Percent

The percentage of memory that is used by the containers on an instance. For training jobs, the value is the memory utilization of the algorithm container on the instance.

DiskUtilization

Percent

The percentage of disk space used by the containers on an instance uses. For training jobs, the value is the disk space utilization of the algorithm container on the instance.

CPUUtilization

Percent

The sum of each individual CPU core's utilization. For training jobs, the value is the CPU utilization of the algorithm container on the instance.

TrainErrors

Count

Measures the number of training job's train errors.

All imported data from SageMaker Training Jobs have one dimension: Host

Create alerts

You can set up to notify you if there are any changes. For example, you can set up an alert to notify relevant parties of critical or fatal errors.

Learn more about creating alerts here.

Copyright © 2024 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.