HashiCorp Consul monitoring integration

The HashiCorp Consul on-host integration collects and sends inventory and metrics from your Consul environment to New Relic Infrastructure, where you can see the health of your Consul datacenter environment. We collect data on both the datacenter and agent/node levels.

Read on to install the integration, and to see what data we collect.

Compatibility and requirements

Our integration is compatible with HashiCorp Consul 1.0 or newer.

Before installing the integration, make sure that you meet the following requirements:

This integration is released as open source under the MIT license on Github.

Install and activate

To install the HashiCorp Consul integration, follow the instructions for your environment:

ECS

See Monitor service running on ECS.

Kubernetes

See Monitor service running on Kubernetes.

Linux
  1. Follow the instructions for installing an integration, using the file name nri-consul.
  2. Change directory to the integrations folder:
    cd /etc/newrelic-infra/integrations.d
  3. Copy the sample configuration file:
    sudo cp consul-config.yml.sample consul-config.yml
  4. Edit the consul-config.yml file as described in the configuration settings.
  5. Restart the infrastructure agent.
Windows
  1. Download the nri-consul .MSI installer image from:

    http://download.newrelic.com/infrastructure_agent/windows/integrations/nri-consul/nri-consul-amd64.msi

  2. To install from the Windows command prompt, run:
    msiexec.exe /qn /i PATH\TO\nri-consul-amd64.msi
  3. In the Integrations directory, C:\Program Files\New Relic\newrelic-infra\integrations.d\, create a copy of the sample configuration file by running:

    cp consul-config.yml.sample consul-config.yml
  4. Edit the consul-config.yml file as described in the configuration settings.
  5. Restart the infrastructure agent.

Additional notes:

Configure the integration

The Consul integration's configuration is how you can set required login credentials and configure how data is collected. Which options you change depend on your setup and preference.

There are several ways to configure the integration, depending on how it was installed:

Config options are below. For an example configuration, see Example config file.

Commands

The configuration accepts one command:

  • all_data: collects both inventory and metrics for the HashiCorp Consul environment.

If you run a cluster environment in Kubernetes and use autodiscovery to monitor Consul pods, you may receive duplicated cluster data. Facet by node for creating dashboards and alerts.

Arguments

The configuration command accepts the following arguments:

  • hostname: The hostname or IP of a Consul node within the cluster. Defaults: localhost. Required.

  • port: Port to connect to node on. Default: 8500. Required.

  • fan_out: Whether to connect to other Consul agents to collect node data from them. Default: true.

  • token: ACL Token if token authentication is enabled.

  • enable_ssl: Whether or not to connect using SSL. Default: false.

  • trust_server_certificate: If set to true, server certificate is not verified for SSL. If set to false, certificate will be verified against supplied CA Bundles. Default: false.

  • ca_bundle_file: Alternative Certificate Authority bundle file, required if enable_ssl is set to true and trust_server_certificate is set to false.

  • ca_bundle_dir: Alternative Certificate Authority bundle directory, required if enable_ssl is set to true and trust_server_certificate is set to false.

Example consul-config.yml file configuration:

Example configuration
integration_name: com.newrelic.consul

instances:
  - name: consul-prod
    command: all_data
    arguments:
      hostname: consul-dev-0.consul.localnet
      token: my_token
    labels:
      env: production
      role: consul

Find and use data

Data from this service is reported to an integration dashboard.

Metrics are attached to these event types:

You can query this data for troubleshooting purposes or to create custom charts and dashboards.

For more on how to find and use your data, see Understand integration data.

Metric data

The HashiCorp Consul integration collects the following metric data attributes.

These attributes are attached to the ConsulDatacenterSample event type:

Metric Description

consul.catalog.nodes_critical

The number of nodes with service status critical from those registered.

consul.catalog.nodes_passing

The number of nodes with service status passing from those registered.

consul.catalog.nodes_up

The number of nodes.

consul.catalog.nodes_warning

The number of nodes with service status warning from those registered.

consul.catalog.total_nodes

The number of nodes registered in the consul cluster.

consul.memberlist.msg.suspect

The number of times an agent suspects another as failed while probing during gossip protocol.

consul.raft.apply

The number of raft transactions occurring.

consul.raft.commitTime.avg

The average time it takes to commit a new entry to the raft log on the leader.

consul.raft.commitTime.count

The number of samples of raft.commitTime.

consul.raft.commitTime.max

The max time it takes to commit a new entry to the raft log on the leader.

consul.raft.commitTime.median

The median time it takes to commit a new entry to the raft log on the leader.

consul.raft.leader.dispatchLog.avg

The average time it takes for the leader to write log entries to disk.

consul.raft.leader.dispatchLog.count

The number of samples of raft.leader.dispatchLog.

consul.raft.leader.dispatchLog.max

The max time it takes for the leader to write log entries to disk.

consul.raft.leader.dispatchLog.median

The median time it takes for the leader to write log entries to disk.

consul.raft.leader.lastContact.avg

The average time elapsed since the leader was last able to check its lease with followers.

consul.raft.leader.lastContact.count

The number of samples of raft.leader.lastContact.

consul.raft.leader.lastContact.max

The max time elapsed since the leader was last able to check its lease with followers.

consul.raft.leader.lastContact.median

The median time elapsed since the leader was last able to check its lease with followers.

consul.raft.state.candidate

The number of initiated leader elections.

consul.raft.state.leader

The number of completed leader elections.

consul.serf.member.flap

The number of times an agent is marked dead and then quickly recovers.

These attributes are attached to the ConsulAgentSample event type:

Metric Description

agent.aclCacheHit

ACL cache hits.

agent.aclCacheMiss

ACL cache misses.

agent.kvStores

The number of samples of kvs.apply.

agent.kvStoresAvgInMilliseconds

The average time it takes to complete an update to the KV store.

agent.kvStoresMaxInMilliseconds

The max time it takes to complete an update to the KV store.

agent.kvStoresMedianInMilliseconds

The median time it takes to complete an update to the KV store.

agent.peers

The number of peers in the peer set.

agent.staleQueries

Served queries within the allowed stale threshold.

agent.txnAvgInMilliseconds

The average time it takes to apply a transaction operation.

agent.txnMaxInMilliseconds

The max time it takes to apply a transaction operation.

agent.txnMedianInMilliseconds

The median time it takes to apply a transaction operation.

agent.txns

The number of samples of txn.apply.

client.rpcFailed

Measure of failed RPC requests.

client.rpcLoad

Measure of how much an agent is loading Consul servers.

client.rpcRateLimited

Measure of RPC requests that get rate limited.

net.agent.maxLatencyInMilliseconds

Maximum latency from this node to all others.

net.agent.medianLatencyInMilliseconds

Median latency from this node to all others.

net.agent.minLatencyInMilliseconds

Minimum latency from this node to all others.

net.agent.p25LatencyInMilliseconds

P25 latency from this node to all others.

net.agent.p75LatencyInMilliseconds

P75 latency from this node to all others.

net.agent.p90LatencyInMilliseconds

P90 latency from this node to all others.

net.agent.p95LatencyInMilliseconds

P95 latency from this node to all others.

net.agent.p99LatencyInMilliseconds

P99 latency from this node to all others.

runtime.allocations

Cumulative count of heap objects allocated.

runtime.allocationsInBytes

The current bytes allocated by the Consul process.

runtime.frees

Cumulative count of heap objects freed.

runtime.gcCycles

The number of completed GC cycles.

runtime.gcPauseInMilliseconds

Cumulative nanoseconds in GC stop-the-world pauses since Consul started.

runtime.goroutines

The number of running go routines.

runtime.heapObjects

The number of objects allocated on the heap

runtime.virtualAddressSpaceInBytes

Total size of the virtual address space reserved by the go runtime.

Inventory data

The HashiCorp Consul integration captures the configuration parameters and current settings of the Consul Agent nodes. It collects the results of the /v1/agent/self REST API endpoint. It pulls the Config and DebugConfig sections from that response.

Note: Nested sections within Config and DebugConfig are not collected.

The data is available on the Infrastructure Inventory page, under the config/consul source. For more about inventory data, see Understand integration data.

For more help

Recommendations for learning more: