AWS EMR monitoring integration

Access to this feature depends on your subscription level. Requires Infrastructure Pro.

New Relic Infrastructure integrations include an integration for reporting your AWS EMR (Elastic MapReduce) data to New Relic products. This document explains how to activate this integration and describes the data that can be reported.

Features

You can monitor and alert on your EMR data directly from New Relic Infrastructure, and query data and create dashboards in New Relic Insights.

Activate integration

To enable this integration:

  1. Make sure you have installed the Infrastructure agent before you activate AWS integrations from your Infrastructure account.
  2. Follow standard procedures to Connect AWS services to Infrastructure.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

Default polling information for the AWS EMR integration:

  • New Relic polling interval: 5 minutes
  • Resolution: 1 data point every 5 minutes

Explore integration data

To use your integration data in Infrastructure, go to infrastructure.newrelic.com > Integrations > Amazon Web Services and select one of the EMR integration links.

In New Relic Insights, data is attached to the ElasticMapReduceClusterSample event type, with a provider value of ElasticMapReduceCluster.

Metric data

This New Relic Infrastructure integration collects the following Amazon EMR data. For use cases and additional information, see Amazon's EMR documentation.

Name Description
isIdle

Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise./

This value is checked at five-minute intervals, and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. Recommendation: To avoid false positives, raise an alerting threshold when this value has been 1 for more than one consecutive five-minute check. For example, raise an alert on this value if it has been 1 for thirty minutes or longer.

coreNodesRunning

The number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists.

coreNodesPending

The number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists.

liveDataNodesPercentage

The percentage of data nodes that are receiving work from Hadoop.

s3WrittenBytes

The number of bytes written to Amazon S3. This metric aggregates MapReduce jobs only. It does not apply for other workloads on EMR.

s3ReadBytes

The number of bytes read from Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on EMR.

hdfsUtilizationPercentage

The percentage of HDFS storage currently used.

hdfsReadBytes

The number of bytes read from HDFS.

hdfsWrittenBytes

The number of bytes written to HDFS.

missingBlocks The number of blocks in which HDFS has no replicas. These might be corrupt blocks.
totalLoad

The current, total number of readers and writers reported by all DataNodes in a cluster.

mostRecentBackupDurationMinutes

The amount of time it took the previous backup to complete. This metric is set regardless of whether the last completed backup succeeded or failed.

While the backup is ongoing, this metric returns the number of minutes after the backup started. This metric is only reported for HBase clusters.

timeSinceLastSuccessfulBackupMinutes

The number of elapsed minutes after the last successful HBase backup started on your cluster. This metric is only reported for HBase clusters.

The following metrics appear in the sample depending on the Hadoop version of the resource.

Name Description
jobsRunning The number of jobs in the cluster that are currently running.
jobsFailed The number of jobs in the cluster that have failed.
mapTasksRunning The number of running map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated.
mapTasksRemaining The number of remaining map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. A remaining map task is one that is not in any of the following states: Running, Killed, or Completed.
mapSlotsOpen The unused map task capacity. This is calculated as the maximum number of map tasks for a given cluster, less the total number of map tasks currently running in that cluster.
remainingMapTasksPerSlot The ratio of the total map tasks remaining to the total map slots available in the cluster.
reduceTasksRunning The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated.
reduceTasksRemaining The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated.
reduceSlotsOpen Unused reduce task capacity. This is calculated as the maximum reduce task capacity for a given cluster, less the number of reduce tasks currently running in that cluster.
taskNodesRunning The number of task nodes working. Data points for this metric are reported only when a corresponding instance group exists.
taskNodesPending The number of core nodes waiting to be assigned. All of the task nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists.
liveTaskTrackersPercentage The percentage of task trackers that are functional.
backupFailed Whether the last backup failed. This is set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters.
Name Description
containerAllocated The number of resource containers allocated by the ResourceManager.
containerReserved The number of resource containers allocated by the ResourceManager.
containerPending The number of containers in the queue that have not yet been allocated.
containerPendingRatio The ratio of pending containers to containers allocated.
appsCompleted The number of applications submitted to YARN that have completed.
appsFailed The number of applications submitted to YARN that have failed to complete.
appsKilled The number of applications submitted to YARN that have been killed.
appsPending The number of applications submitted to YARN that are in a pending state.
appsRunning The number of applications submitted to YARN that are running.
appsSubmitted The number of applications submitted to YARN.
mrTotalNodes The number of nodes presently available to MapReduce jobs.
mrActiveNodes The number of nodes presently running MapReduce tasks or jobs.
mrLostNodes The number of nodes allocated to MapReduce that have been marked in a LOST state.
mrUnhealthyNodes The number of nodes available to MapReduce jobs marked in an UNHEALTHY state.
mrDecommissionedNodes The number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state.
mrRebootedNodes The number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state.
corruptBlocks The number of blocks that HDFS reports as corrupted.
memoryTotalBytes The total amount of memory in the cluster.
memoryReservedBytes The amount of memory reserved.
memoryAvailableBytes The amount of memory available to be allocated.
memoryAllocatedBytes The amount of memory allocated to the cluster.
yarnMemoryAvailablePercentage The percentage of remaining memory available to YARN
underReplicatedBlocks The number of blocks that need to be replicated one or more times.
dfsPendingReplicationBlocks The status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests.
capacityRemainingBytes The amount of remaining HDFS disk capacity.
hbaseBackupFailed Whether the last backup failed. This is set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters.

Inventory data

Inventory data provides information about the service's state and configuration. EMR configuration options are reported as inventory data.

Object Attributes
aws/emr/cluster

id

name

status

tags

applications

autoScalingRole

autoTerminate

configurations

customAmiId

ebsRootVolumeSize

ec2InstanceAttributes

instanceCollectionType

logUri

masterPublicDnsName

normalizedInstanceHours

releaseLabel

repoUpgradeOnBoot

requestedAmiVersion

runningAmiVersion

scaleDownBehavior

securityConfiguration

serviceRole

terminationProtected

visibleToAllUsers

aws/emr/instance

id

ec2InstanceId

instanceFleetId

instanceGroupId

instanceType

privateDnsName

privateIpAddress

publicDnsName

publicIpAddress

status

ebsVolumes

market

aws/emr/instance-fleet

id

name

status

instanceFleetType

instanceTypeSpecifications

launchSpecifications

provisionedOnDemandCapacity

provisionedSpotCapacity

targetOnDemandCapacity

targetSpotCapacity

aws/emr/instance-group

id

name

status

instanceType

instanceGroupType

autoScalingPolicy

bidPrice

configurations

ebsBlockDevices

ebsOptimized

market

requestedInstanceCount

runningInstanceCount

shrinkPolicy

For more help

Recommendations for learning more: