• EnglishEspañol日本語한국어Português
  • Log inStart now

Amazon Elasticsearch monitoring integration

Important

Enable the AWS CloudWatch Metric Streams integration to monitor all CloudWatch metrics from your AWS services, including custom namespaces. Individual integrations are no longer our recommended option.

New Relic infrastructure integrations include an integration for reporting Amazon Elasticsearch data to New Relic. This document explains the integration's features, how to activate it, and what data can be reported.

Features

Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time capabilities along with the availability, scalability, and security required by production workloads. New Relic's Elasticsearch monitoring integration allows you to track cluster status, CPU utilization, read/write latency, throughput, and other metrics, at specific points in time. Elasticsearch data is also available to query, analyze, and chart your data.

Activate integration

To enable this integration, follow standard procedures to connect AWS services to New Relic.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

Default polling information for the Amazon Elasticsearch integration:

  • New Relic polling interval: 5 minutes
  • Amazon CloudWatch data interval: 1 minute

View and use data

To view and use this integration's data, go to one.newrelic.com > All capabilities > Infrastructure > AWS and select one of the Elasticsearch integration links.

To query and explore your data, use the DatastoreSample event type with the appropriate provider value:

  • ElasticsearchCluster for clusters
  • ElasticsearchNode for nodes

Metric data

The Elasticsearch integration collects these metrics for clusters:

Name

Relevant statistics

Description

ClusterStatus.green

Minimum, Maximum

Indicates that all index shards are allocated to nodes in the cluster.

ClusterStatus.yellow

Minimum, Maximum

Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Single node clusters always initialize with this cluster status because there is no second node to which a replica can be assigned.

You can either increase your node count to obtain a green cluster status, or you can use the Amazon ES API to set the number_of_replicas setting for your index to 0. For more information, see Amazon's documentation for Updating indices settings.

ClusterStatus.red

Minimum, Maximum

Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. For more information, see Amazon's documentation on Red Cluster Status.

Nodes

Minimum, Maximum, Average

The number of nodes in the Amazon ES cluster.

SearchableDocuments

Minimum, Maximum, Average

The total number of searchable documents across all indices in the cluster.

DeletedDocuments

Minimum, Maximum, Average

The total number of deleted documents across all indices in the cluster.

CPUUtilization

Minimum, Maximum, Average

The maximum percentage of CPU resources used for data nodes in the cluster.

FreeStorageSpace

Minimum

The free space, in megabytes, for all data nodes in the cluster.

ClusterUsedSpace

Minimum, Maximum

The total used space, in megabytes, for a cluster.

ClusterIndexWritesBlocked

Maximum

Indicates whether your cluster is accepting or blocking incoming write requests. A value of 0 means that the cluster is accepting requests. A value of 1 means that it is blocking requests.

JVMMemoryPressure

Maximum

The maximum percentage of the Java heap used for all data nodes in the cluster.

AutomatedSnapshotFailure

Minimum, Maximum

The number of failed automated snapshots for the cluster. A value of 1 indicates that no automated snapshot was taken for the domain in the previous 36 hours.

CPUCreditBalance

Minimum

The remaining CPU credits available for data nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metrics is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.

KibanaHealthyNodes

Minimum

A health check for Kibana. A value of 1 indicates normal behavior. A value of 0 indicates that Kibana is inaccessible. In most cases, the health of Kibana mirrors the health of the cluster.

KMSKeyError

Minimum, Maximum

A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been disabled. To restore the domain to normal operations, re-enable the key.

KMSKeyInaccessible

Minimum, Maximum

A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. You can't recover domains that are in this state. If you have a manual snapshot, though, you can use it to migrate the domain's data to a new domain.

InvalidHostHeaderRequests

Sum

The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header.

ElasticsearchRequests

Sum

The number of requests made to the Elasticsearch cluster.

RequestCount

Sum

The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request.

MasterCPUUtilization

Average

The maximum percentage of CPU resources used by the dedicated master nodes. We recommend increasing the size of the instance type when this metric reaches 60 percent.

MasterJVMMemoryPressure

Maximum

The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. We recommend moving to a larger instance type when this metric reaches 85 percent.

MasterCPUCreditBalance

Minimum

The remaining CPU credits available for dedicated master nodes in the cluster. A CPU credit provides the performance of a full CPU core for one minute. This metric is available only for the t2.micro.elasticsearch, t2.small.elasticsearch, and t2.medium.elasticsearch instance types.

MasterReachableFromNode

Minimum

A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that /_cluster/health/ is failing.

Failures mean that the master node stopped or is not reachable. They are usually the result of a network connectivity issue or AWS dependency problem.

ReadLatency

Minimum, Maximum, Average

The latency, in seconds, for read operations on EBS volumes.

WriteLatency

Minimum, Maximum, Average

The latency, in seconds, for write operations on EBS volumes.

ReadThroughput

Minimum, Maximum, Average

The throughput, in bytes per second, for read operations on EBS volumes.

WriteThroughput

Minimum, Maximum, Average

The throughput, in bytes per second, for write operations on EBS volumes.

DiskQueueDepth

Minimum, Maximum, Average

The number of pending input and output (I/O) requests for an EBS volume.

ReadIOPS

Minimum, Maximum, Average

The number of input and output (I/O) operations per second for read operations on EBS volumes.

WriteIOPS

Minimum, Maximum, Average

The number of input and output (I/O) operations per second for write operations on EBS volumes.

The following metrics are collected for Elasticsearch clusters, and optionally for each instance or node in a domain as well:

Name

Relevant statistics

Description

IndexingLatency

For nodes: Average

For clusters: Average, Maximum

The average time, in milliseconds, that it takes a shard to complete an indexing operation.

IndexingRate

For nodes: Average

For clusters: Average, Maximum, Sum

The number of indexing operations per minute.

SearchLatency

For nodes: Average

For clusters: Average, Maximum

The average time, in milliseconds, that it takes a shard to complete a search operation.

SearchRate

For nodes: Average

For clusters: Average, Maximum, Sum

The total number of search requests per minute for all shards on a node.

SysMemoryUtilization

Minimum, Maximum, Average

The percentage of the instance's memory that is in use.

JVMGCYoungCollectionCount

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.

JVMGCYoungCollectionTime

For nodes: Maximum

For clusters: Sum, Maximum, Average

The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.

JVMGCOldCollectionCount

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.

JVMGCOldCollectionTime

For nodes: Maximum

For clusters: Sum, Maximum, Average

The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.

ThreadpoolForce_mergeQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the force merge thread pool. If the queue size is consistently high, consider scaling your cluster.

ThreadpoolForce_mergeRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolForce_mergeThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the force merge thread pool.

ThreadpoolIndexQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the index thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum index queue size is 200.

ThreadpoolIndexRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolIndexThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the index thread pool.

ThreadpoolSearchQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the search thread pool. If the queue size is consistently high, consider scaling your cluster. The maximum search queue size is 1000.

ThreadpoolSearchRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the search thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolSearchThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the search thread pool.

ThreadpoolBulkQueue

For nodes: Maximum

For clusters: Sum, Maximum, Average

The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.

ThreadpoolBulkRejected

For nodes: Maximum

For clusters: Sum

The number of rejected tasks in the bulk thread pool. If this number continually grows, consider scaling your cluster.

ThreadpoolBulkThreads

For nodes: Maximum

For clusters: Sum, Average

The size of the bulk thread pool.

Copyright © 2024 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.