AWS Elasticsearch monitoring integration

Access to this feature depends on your subscription level. Requires Infrastructure Pro.

New Relic Infrastructure's integrations include an integration for reporting Amazon Elasticsearch data to New Relic products. This document explains the integration's features, how to activate it, and what data can be reported.

Features

Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time capabilities along with the availability, scalability, and security required by production workloads. New Relic's Elasticsearch monitoring integration allows you to track cluster status, CPU utilization, read/write latency, throughput, and other metrics, at specific points in time. Elasticsearch data is also available for querying, data analysis, and chart creation in New Relic Insights.

Activate integration

To enable this integration follow standard procedures to Connect AWS services to Infrastructure.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

Default polling information for the AWS Elasticsearch integration:

  • New Relic polling interval: 5 minutes
  • Amazon CloudWatch data interval: 1 minute

View and use data

To view and use this integration's data in Infrastructure, go to infrastructure.newrelic.com > Integrations > Amazon Web Services and select one of the Elasticsearch integration links.

In New Relic Insights, data is attached to the DatastoreSample event type, with a provider value of ElasticsearchCluster for clusters, and with a provider value of ElasticsearchCluster for nodes.

Metric data

The Elasticsearch integration collects these metrics for clusters:

Name Relevant statistics Description
ClusterStatus.green Minimum, Maximum

Indicates that all index shards are allocated to nodes in the cluster.

ClusterStatus.yellow Minimum, Maximum

Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Single node clusters always initialize with this cluster status because there is no second node to which a replica can be assigned.

You can either increase your node count to obtain a green cluster status, or you can use the Amazon ES API to set the number_of_replicas setting for your index to 0. For more information, see Amazon's documentation for Updating indices settings.

ClusterStatus.red Minimum, Maximum

Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. For more information, see Amazon's documentation on Red Cluster Status.

Nodes Minimum, Maximum, Average The number of nodes in the Amazon ES cluster.
SearchableDocuments Minimum, Maximum, Average

The total number of searchable documents across all indices in the cluster.

DeletedDocuments Minimum, Maximum, Average

The total number of deleted documents across all indices in the cluster.

CPUUtilization Minimum, Maximum, Average

The maximum percentage of CPU resources used for data nodes in the cluster.

FreeStorageSpace Minimum

The free space, in megabytes, for all data nodes in the cluster.

ClusterUsedSpace Minimum, Maximum

The total used space, in megabytes, for a cluster.

ClusterIndexWritesBlocked Maximum

Indicates whether your cluster is accepting or blocking incoming write requests. A value of 0 means that the cluster is accepting requests. A value of 1 means that it is blocking requests.

JVMMemoryPressure Maximum

The maximum percentage of the Java heap used for all data nodes in the cluster.

AutomatedSnapshotFailure Minimum, Maximum

The number of failed automated snapshots for the cluster. A value of 1 indicates that no automated snapshot was taken for the domain in the previous 36 hours.

CPUCreditBalance Minimum

The remaining CPU credits available for data nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metrics is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.

KibanaHealthyNodes Minimum A health check for Kibana. A value of 1 indicates normal behavior. A value of 0 indicates that Kibana is inaccessible. In most cases, the health of Kibana mirrors the health of the cluster.
KMSKeyError Minimum, Maximum

A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been disabled. To restore the domain to normal operations, re-enable the key.

KMSKeyInaccessible Minimum, Maximum A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. You can't recover domains that are in this state. If you have a manual snapshot, though, you can use it to migrate the domain's data to a new domain.
InvalidHostHeaderRequests Sum

The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header.

ElasticsearchRequests Sum The number of requests made to the Elasticsearch cluster.
RequestCount Sum

The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request.

MasterCPUUtilization Average

The maximum percentage of CPU resources used by the dedicated master nodes. We recommend increasing the size of the instance type when this metric reaches 60 percent.

MasterJVMMemoryPressure Maximum

The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. We recommend moving to a larger instance type when this metric reaches 85 percent.

MasterCPUCreditBalance Minimum

The remaining CPU credits available for dedicated master nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metric is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.

MasterReachableFromNode Minimum

A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that /_cluster/health/ is failing.

Failures mean that the master node stopped or is not reachable. They are usually the result of a network connectivity issue or AWS dependency problem.

ReadLatency Minimum, Maximum, Average

The latency, in seconds, for read operations on EBS volumes.

WriteLatency Minimum, Maximum, Average

The latency, in seconds, for write operations on EBS volumes.

ReadThroughput Minimum, Maximum, Average

The throughput, in bytes per second, for read operations on EBS volumes.

WriteThroughput Minimum, Maximum, Average

The throughput, in bytes per second, for write operations on EBS volumes.

DiskQueueDepth Minimum, Maximum, Average

The number of pending input and output (I/O) requests for an EBS volume.

ReadIOPS Minimum, Maximum, Average

The number of input and output (I/O) operations per second for read operations on EBS volumes.

WriteIOPS Minimum, Maximum, Average

The number of input and output (I/O) operations per second for write operations on EBS volumes.

The following metrics are collected for Elasticsearch clusters, and optionally for each instance or node in a domain as well:

Name Relevant statistics Description
IndexingLatency

For nodes: Average

For clusters: Average, Maximum

The average time, in milliseconds, that it takes a shard to complete an indexing operation.

IndexingRate

For nodes: Average

For clusters: Average, Maximum, Sum

The number of indexing operations per minute.

SearchLatency

For nodes: Average

For clusters: Average, Maximum

The average time, in milliseconds, that it takes a shard to complete a search operation.

SearchRate

For nodes: Average

For clusters: Average, Maximum, Sum

The total number of search requests per minute for all shards on a node.

SysMemoryUtilization Minimum, Maximum, Average The percentage of the instance's memory that is in use.
JVMGCYoungCollectionCount

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.

JVMGCYoungCollectionTime

For nodes: Maximum

For clusters: Sum, Maximum, Average

The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.
JVMGCOldCollectionCount

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.

JVMGCOldCollectionTime

For nodes: Maximum

For clusters: Sum, Maximum, Average

The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.

ThreadpoolForce_mergeQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the force merge thread pool. If the queue size is consistently high, consider scaling your cluster.

ThreadpoolForce_mergeRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolForce_mergeThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the force merge thread pool.

ThreadpoolIndexQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the index thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum index queue size is 200.

ThreadpoolIndexRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolIndexThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the index thread pool.
ThreadpoolSearchQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the search thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum search queue size is 1000.

ThreadpoolSearchRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the search thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolSearchThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the search thread pool.

ThreadpoolBulkQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.
ThreadpoolBulkRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the bulk thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolBulkThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the bulk thread pool.

Inventory data

The integration collects this ElasticSearch data as inventory data.

Configuration

The integration collects this data from aws/elasticsearch/cluster/config:

Inventory Description
aRN The Amazon resource name (ARN) of an Elasticsearch domain.
accessPolicies The IAM access policy.
created The domain creation status. True if the creation of an Elasticsearch domain is complete. False if domain creation is still in progress.
deleted The domain deletion status. True if a delete request has been received for the domain but resource cleanup is still in progress. False if the domain has not been deleted.
domainId The unique identifier for the specified Elasticsearch domain.
domainName Name of the Elasticsearch domain.
elasticsearchVersion Elasticsearch version.
endpoint The Elasticsearch domain endpoint that you use to submit index and search requests.
processing The status of the Elasticsearch domain configuration. True if Amazon Elasticsearch Service is processing configuration changes. False if the configuration is active.
upgradeProcessing The status of an Elasticsearch domain version upgrade. True if Amazon Elasticsearch Service is undergoing a version upgrade. False if the configuration is active.

eBSOptions

The integration collects this data from aws/elasticsearch/cluster/config/eBSOptions:

Name Description
eBSEnabled Specifies whether EBS-based storage is enabled.
iops Specifies the IOPD for a Provisioned IOPS EBS volume (SSD).
volumeSize Integer to specify the size of an EBS volume.
volumeType Specifies the volume type for EBS-based storage.

snapshotOptions

The integration collects this data from aws/elasticsearch/cluster/config/snapshotOptions:

Name Description
automatedSnapshotStartHour Specifies the time, in UTC format, when the service takes a daily automated snapshot of the specified Elasticsearch domain.

elasticsearchClusterConfig

The integration collects this data from aws/elasticsearch/cluster/config/elasticsearchClusterConfig:

Name Description
dedicatedMasterCount Total number of dedicated master nodes, active and on standby, for the cluster.
dedicatedMasterEnabled A boolean value to indicate whether a dedicated master node is enabled.
dedicatedMasterType The instance type for a dedicated master node.
instanceCount The number of instances in the specified domain cluster.
instanceType The instance type for an Elasticsearch cluster.
zoneAwarenessEnabled A boolean value to indicate whether zone awareness is enabled.

For more help

Recommendations for learning more: