Amazon Elasticsearch monitoring integration

Important

Enable the AWS CloudWatch Metric Streams integration to monitor all CloudWatch metrics from your AWS services, including custom namespaces. Individual integrations are no longer our recommended option.

New Relic infrastructure integrations include an integration for reporting Amazon Elasticsearch data to New Relic. This document explains the integration's features, how to activate it, and what data can be reported.

Features

Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time capabilities along with the availability, scalability, and security required by production workloads. New Relic's Elasticsearch monitoring integration allows you to track cluster status, CPU utilization, read/write latency, throughput, and other metrics, at specific points in time. Elasticsearch data is also available to query, analyze, and chart your data.

Activate integration

To enable this integration, follow standard procedures to connect AWS services to New Relic.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

Default polling information for the Amazon Elasticsearch integration:

New Relic polling interval: 5 minutes
Amazon CloudWatch data interval: 1 minute

View and use data

To view and use this integration's data, go to one.newrelic.com > All capabilities > Infrastructure > AWS and select one of the Elasticsearch integration links.

To query and explore your data, use the DatastoreSample event type with the appropriate provider value:

ElasticsearchCluster for clusters
ElasticsearchNode for nodes

Metric data

The Elasticsearch integration collects these metrics for clusters:

Name	Relevant statistics	Description
`ClusterStatus.green`	Minimum, Maximum	Indicates that all index shards are allocated to nodes in the cluster.
`ClusterStatus.yellow`	Minimum, Maximum	Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Single node clusters always initialize with this cluster status because there is no second node to which a replica can be assigned. You can either increase your node count to obtain a green cluster status, or you can use the Amazon ES API to set the `number_of_replicas` setting for your index to 0. For more information, see Amazon's documentation for Updating indices settings.
`ClusterStatus.red`	Minimum, Maximum	Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. For more information, see Amazon's documentation on Red Cluster Status.
`Nodes`	Minimum, Maximum, Average	The number of nodes in the Amazon ES cluster.
`SearchableDocuments`	Minimum, Maximum, Average	The total number of searchable documents across all indices in the cluster.
`DeletedDocuments`	Minimum, Maximum, Average	The total number of deleted documents across all indices in the cluster.
`CPUUtilization`	Minimum, Maximum, Average	The maximum percentage of CPU resources used for data nodes in the cluster.
`FreeStorageSpace`	Minimum	The free space, in megabytes, for all data nodes in the cluster.
`ClusterUsedSpace`	Minimum, Maximum	The total used space, in megabytes, for a cluster.
`ClusterIndexWritesBlocked`	Maximum	Indicates whether your cluster is accepting or blocking incoming write requests. A value of 0 means that the cluster is accepting requests. A value of 1 means that it is blocking requests.
`JVMMemoryPressure`	Maximum	The maximum percentage of the Java heap used for all data nodes in the cluster.
`AutomatedSnapshotFailure`	Minimum, Maximum	The number of failed automated snapshots for the cluster. A value of 1 indicates that no automated snapshot was taken for the domain in the previous 36 hours.
`CPUCreditBalance`	Minimum	The remaining CPU credits available for data nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metrics is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.
`KibanaHealthyNodes`	Minimum	A health check for Kibana. A value of 1 indicates normal behavior. A value of 0 indicates that Kibana is inaccessible. In most cases, the health of Kibana mirrors the health of the cluster.
`KMSKeyError`	Minimum, Maximum	A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been disabled. To restore the domain to normal operations, re-enable the key.
`KMSKeyInaccessible`	Minimum, Maximum	A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. You can't recover domains that are in this state. If you have a manual snapshot, though, you can use it to migrate the domain's data to a new domain.
`InvalidHostHeaderRequests`	Sum	The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header.
`ElasticsearchRequests`	Sum	The number of requests made to the Elasticsearch cluster.
`RequestCount`	Sum	The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request.
`MasterCPUUtilization`	Average	The maximum percentage of CPU resources used by the dedicated master nodes. We recommend increasing the size of the instance type when this metric reaches 60 percent.
`MasterJVMMemoryPressure`	Maximum	The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. We recommend moving to a larger instance type when this metric reaches 85 percent.
`MasterCPUCreditBalance`	Minimum	The remaining CPU credits available for dedicated master nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metric is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.
`MasterReachableFromNode`	Minimum	A health check for `MasterNotDiscovered` exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that `/_cluster/health/` is failing. Failures mean that the master node stopped or is not reachable. They are usually the result of a network connectivity issue or AWS dependency problem.
`ReadLatency`	Minimum, Maximum, Average	The latency, in seconds, for read operations on EBS volumes.
`WriteLatency`	Minimum, Maximum, Average	The latency, in seconds, for write operations on EBS volumes.
`ReadThroughput`	Minimum, Maximum, Average	The throughput, in bytes per second, for read operations on EBS volumes.
`WriteThroughput`	Minimum, Maximum, Average	The throughput, in bytes per second, for write operations on EBS volumes.
`DiskQueueDepth`	Minimum, Maximum, Average	The number of pending input and output (I/O) requests for an EBS volume.
`ReadIOPS`	Minimum, Maximum, Average	The number of input and output (I/O) operations per second for read operations on EBS volumes.
`WriteIOPS`	Minimum, Maximum, Average	The number of input and output (I/O) operations per second for write operations on EBS volumes.

The following metrics are collected for Elasticsearch clusters, and optionally for each instance or node in a domain as well:

Name	Relevant statistics	Description
`IndexingLatency`	For nodes: Average For clusters: Average, Maximum	The average time, in milliseconds, that it takes a shard to complete an indexing operation.
`IndexingRate`	For nodes: Average For clusters: Average, Maximum, Sum	The number of indexing operations per minute.
`SearchLatency`	For nodes: Average For clusters: Average, Maximum	The average time, in milliseconds, that it takes a shard to complete a search operation.
`SearchRate`	For nodes: Average For clusters: Average, Maximum, Sum	The total number of search requests per minute for all shards on a node.
`SysMemoryUtilization`	Minimum, Maximum, Average	The percentage of the instance's memory that is in use.
`JVMGCYoungCollectionCount`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.
`JVMGCYoungCollectionTime`	For nodes: Maximum For clusters: Sum, Maximum, Average	The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.
`JVMGCOldCollectionCount`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.
`JVMGCOldCollectionTime`	For nodes: Maximum For clusters: Sum, Maximum, Average	The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.
`ThreadpoolForce_mergeQueue`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of queued tasks in the force merge thread pool. If the queue size is consistently high, consider scaling your cluster.
`ThreadpoolForce_mergeRejected`	For nodes: Maximum For clusters: Sum	The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster.
`ThreadpoolForce_mergeThreads`	For nodes: Maximum For clusters: Sum, Average	The size of the force merge thread pool.
`ThreadpoolIndexQueue`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of queued tasks in the index thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum index queue size is 200.
`ThreadpoolIndexRejected`	For nodes: Maximum For clusters: Sum	The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.
`ThreadpoolIndexThreads`	For nodes: Maximum For clusters: Sum, Average	The size of the index thread pool.
`ThreadpoolSearchQueue`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of queued tasks in the search thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum search queue size is 1000.
`ThreadpoolSearchRejected`	For nodes: Maximum For clusters: Sum	The number of rejected tasks in the search thread pool. If this number continually grows, consider scaling your cluster.
`ThreadpoolSearchThreads`	For nodes: Maximum For clusters: Sum, Average	The size of the search thread pool.
`ThreadpoolBulkQueue`	For nodes: Maximum For clusters: Sum, Maximum, Average	The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.
`ThreadpoolBulkRejected`	For nodes: Maximum For clusters: Sum	The number of rejected tasks in the bulk thread pool. If this number continually grows, consider scaling your cluster.
`ThreadpoolBulkThreads`	For nodes: Maximum For clusters: Sum, Average	The size of the bulk thread pool.