The New Relic Kafka on-host integration reports metrics and configuration data from your Kafka service. We instrument all the key elements of your cluster, including brokers (both ZooKeeper and Bootstrap), producers, consumers, and topics.
To install the Kafka monitoring integration, you must run through the following steps:
For a comprehensive list of specific Windows and Linux versions, check the table of compatible operating systems.
System requirements
A New Relic account. Don't have one? Sign up for free! No credit card required.
If Kafka is not running on Kubernetes or Amazon ECS, you can install the infrastructure agent on a Linux or Windows OS host or on a host capable of remotely accessing where Kafka is installed. Otherwise:
Java-based consumers and producers only, and with JMX enabled.
Total number of monitored topics must be fewer than 10000.
Connectivity requirements
The integration needs to be configured and allowed to connect to:
Hosts listed in zookeeper_hosts over the Zookeeper protocol, using the Zookeeper authentication mechanism, if autodiscover_strategy is set to zookeeper.
Hosts defined in bootstrap_broker_host over the Kafka protocol, using the Kafka broker's authentication/transport mechanisms, if autodiscover_strategy is set to bootstrap.
All brokers in the cluster over the Kafka protocol and port, using the Kafka brokers' authentication/transport mechanisms.
All brokers in the cluster over the JMX protocol and port, using the authentication/transport mechanisms specified in the JMX configuration of the brokers.
All producers/consumers specified in producers and consumers over the JMX protocol and port, if you want producer/consumer monitoring. JMX settings for the consumer must be the same as for the brokers.
Important
By default, security groups and their equivalents in other cloud providers, in AWS don't have the required ports open by default. JMX requires two ports in order to work: the JMX port and the RMI port. These can be set to the same value when configuring the JVM to enable JMX and must be open for the integration to be able to connect to and collect metrics from brokers.
Prepare for the installation
Kafka is a complex piece of software that is built as a distributed system. For this reason, you need to ensure that the integration can contact all the required hosts and services so the data is collected correctly.
Given the distributed nature of Kafka, the actual number and list of brokers is usually not fixed by the configuration, and it is instead quite dynamic. For this reason, the Kafka integration offers two mechanisms to perform automatic discovery of the list of brokers in the cluster: Bootstrap and Zookeeper. The mechanism you use depends on the setup of the Kafka cluster being monitored.
Bootstrap
With the bootstrap mechanism, the integration uses a bootstrap broker to perform the autodiscovery. This is a broker whose address is well known and that will be asked for any other brokers it is aware of. The integration needs to be able to contact this broker in the address provided in the bootstrap_broker_host parameter for bootstrap discovery to work.
Zookeeper
Alternatively, the Kafka integration can also talk to a Zookeeper server in order to obtain the list of brokers. To do this, the integration needs to be provided with the following:
The list of Zookeeper hosts, zookeeper_hosts, to contact.
The proper authentication secrets to connect with the hosts.
Together with the list of brokers it knows about, Zookeeper will also advertise which connection mechanisms are supported by each broker.
You can configure the Kafka integration to try directly with one of these mechanisms with the preferred_listener parameter. If this parameter is not provided, the integration will try to contact the brokers with all the advertised configurations until one of them succeeds.
Tip
The integration will use Zookeeper only for discovering brokers and will not retrieve metrics from it.
To correctly list the topics processed by the brokers, the integration needs to contact brokers over the Kafka protocol. Depending on how the brokers are configured, this might require setting up SSL and/or SASL to match the broker configuration. The topics must have DESCRIBE access.
The Kafka integration queries JMX, a standard Java extension for exchanging metrics in Java applications. JMX is not enabled by default in Kafka brokers, and you need to enable it for metrics collection to work properly. JMX requires RMI to be enabled, and the RMI port needs to be set to the same port as JMX.
You can configure JMX to use username/password authentication, as well as SSL. If such features have been enabled in the broker's JMX settings, you need to configure the integration accordingly.
If autodiscovery is set to bootstrap, the JMX settings defined for the bootstrap broker will be applied for all other discovered brokers, so the Port and other settings should be the same on all the brokers.
Important
We don't recommend enabling anonymous and/or unencrypted JMX/RMI access on public or untrusted network segments because this poses a big security risk.
The offset of the consumer and consumer groups of the topics as well as the lag, can be retrieved as a KafkaOffsetSample with the CONSUMER_OFFSET=true flag but should be in a separate instance because when this flag is activated the instance will not collect other Samples.
Producers and consumers written in Java can also be monitored to get more specific metadata through the same mechanism (JMX). This will generate KafkaConsumerSamples and KafkaProducerSamples. JMX needs to be enabled and configured on those applications where it is not enabled by default.
Non-Java producers and consumers do not support JMX and are therefore not supported by the Kafka integration.
Install and activate the integration
To install the Kafka integration, follow the instructions for your environment:
Linux installation
Follow the instructions for installing an integration, and replace the INTEGRATION_FILE_NAME variable with nri-kafka.
Change the directory to the integrations configuration folder by running:
bash
$
cd /etc/newrelic-infra/integrations.d
Copy the sample configuration file by running:
bash
$
sudocp kafka-config.yml.sample kafka-config.yml
Edit the kafka-config.yml configuration file with your favorite editor. Check out some configuration file examples..
If installed on-host, edit the config in the integration's YAML config file, kafka-config.yml. An integration's YAML-format configuration is where you can place required login credentials and configure how data is collected. Which options you change depend on your setup and preference. The configuration file has common settings applicable to all integrations like interval, timeout, inventory_source. To read all about these common settings refer to our Configuration Format document.
Important
If you are still using our Legacy configuration and definition files, refer to this document for help.
As with other integrations, one kafka-config.yml configuration file can have many instances of the integration collecting different brokers, consumers and producers metrics. You can see config examples with one or multiple instances in the kafka-config.yml sample files
Specific settings related to Kafka are defined using the env section of each instance in the kafka-config.yml configuration file. These settings control the connection to your Brokers, Zookeeper, and JMX as well as other security settings and features. The list of valid settings is described in Kafka's configuration settings.
The integration has two modes of operation on each instance, which are mutually exclusive, that you can set up with the CONSUMER_OFFSET parameter:
Consumer offset collection: set CONSUMER_OFFSET = true to collect KafkaOffsetSample.
These modes are are mutually exclusive because consumer offset collection takes a long time to run and has high performance requirements, in order to collect both groups of Samples, set two instances, one with each mode.
The values for these settings can be defined in several ways:
Adding the value directly in the config file. This is the most common way.
Using secrets management. Use this to protect sensitive information, such as passwords that would be exposed in plain text on the configuration file. For more information, see secrets management.
Offset monitoring
When setting CONSUMER_OFFSET = true, by default, only the metrics from consumer groups with active consumers (and consumer metrics) will be collected.
To also collect the metrics from consumer groups with inactive consumers you must set INACTIVE_CONSUMER_GROUP_OFFSET to true.
When a consumer group is monitoring more than one topic, it's valuable to have consumer group metrics separated by topics, specially if one of the topics have inactive consumers, because then it's possible to spot in which topic the consumer group is having lag and if there are active consumers for that consumer group and topic.
To get consumer group metrics separated by topic, you must set CONSUMER_GROUP_OFFSET_BY_TOPIC to true (it defaults to false)
The Kafka integration collects the following metrics. Each metric name is prefixed with a category indicator and a period, such as broker. or consumer..
Metric
Description
broker.bytesWrittenToTopicPerSecond
Number of bytes written to a topic by the broker per second.
broker.IOInPerSecond
Network IO into brokers in the cluster in bytes per second.
broker.IOOutPerSecond
Network IO out of brokers in the cluster in bytes per second.
broker.logFlushPerSecond
Log flush rate.
broker.messagesInPerSecond
Incoming messages per second.
follower.requestExpirationPerSecond
Rate of request expiration on followers in evictions per second.
net.bytesRejectedPerSecond
Rejected bytes per second.
replication.isrExpandsPerSecond
Rate of replicas joining the ISR pool.
replication.isrShrinksPerSecond
Rate of replicas leaving the ISR pool.
replication.leaderElectionPerSecond
Leader election rate.
replication.uncleanLeaderElectionPerSecond
Unclean leader election rate.
replication.unreplicatedPartitions
Number of unreplicated partitions.
request.avgTimeFetch
Average time per fetch request in milliseconds.
request.avgTimeMetadata
Average time for metadata request in milliseconds.
request.avgTimeMetadata99Percentile
Time for metadata requests for 99th percentile in milliseconds.
request.avgTimeOffset
Average time for an offset request in milliseconds.
request.avgTimeOffset99Percentile
Time for offset requests for 99th percentile in milliseconds.
request.avgTimeProduceRequest
Average time for a produce request in milliseconds.
request.avgTimeUpdateMetadata
Average time for a request to update metadata in milliseconds.
request.avgTimeUpdateMetadata99Percentile
Time for update metadata requests for 99th percentile in milliseconds.
request.clientFetchesFailedPerSecond
Client fetch request failures per second.
request.fetchTime99Percentile
Time for fetch requests for 99th percentile in milliseconds.
request.handlerIdle
Average fraction of time the request handler threads are idle.
request.produceRequestsFailedPerSecond
Failed produce requests per second.
request.produceTime99Percentile
Time for produce requests for 99th percentile.
topic.diskSize
Topic disk size per broker and per topic. Only present if COLLECT_TOPIC_SIZE is enabled.
topic.offset
Topic offset per broker and per topic. Only present if COLLECT_TOPIC_OFFSET is enabled.
Metric
Description
consumer.avgFetchSizeInBytes
Average number of bytes fetched per request for a specific topic.
consumer.avgRecordConsumedPerTopic
Average number of records in each request for a specific topic.
consumer.avgRecordConsumedPerTopicPerSecond
Average number of records consumed per second for a specific topic in records per second.
consumer.bytesInPerSecond
Consumer bytes per second.
consumer.fetchPerSecond
The minimum rate at which the consumer sends fetch requests to a broke in requests per second.
consumer.maxFetchSizeInBytes
Maximum number of bytes fetched per request for a specific topic.
consumer.maxLag
Maximum consumer lag.
consumer.messageConsumptionPerSecond
Rate of consumer message consumption in messages per second.
consumer.offsetKafkaCommitsPerSecond
Rate of offset commits to Kafka in commits per second.
consumer.offsetZooKeeperCommitsPerSecond
Rate of offset commits to ZooKeeper in writes per second.
consumer.requestsExpiredPerSecond
Rate of delayed consumer request expiration in evictions per second.
Metric
Description
producer.ageMetadataUsedInMilliseconds
Age in seconds of the current producer metadata being used.
producer.availableBufferInBytes
Total amount of buffer memory that is not being used in bytes.
producer.avgBytesSentPerRequestInBytes
Average number of bytes sent per partition per-request.
producer.avgCompressionRateRecordBatches
Average compression rate of record batches.
producer.avgRecordAccumulatorsInMilliseconds
Average time in ms record batches spent in the record accumulator.
producer.avgRecordSizeInBytes
Average record size in bytes.
producer.avgRecordsSentPerSecond
Average number of records sent per second.
producer.avgRecordsSentPerTopicPerSecond
Average number of records sent per second for a topic.
producer.AvgRequestLatencyPerSecond
Producer average request latency.
producer.avgThrottleTime
Average time that a request was throttled by a broker in milliseconds.
producer.bufferMemoryAvailableInBytes
Maximum amount of buffer memory the client can use in bytes.
producer.bufferpoolWaitTime
Faction of time an appender waits for space allocation.
producer.bytesOutPerSecond
Producer bytes per second out.
producer.compressionRateRecordBatches
Average compression rate of record batches for a topic.
producer.iOWaitTime
Producer I/O wait time in milliseconds.
producer.maxBytesSentPerRequestInBytes
Max number of bytes sent per partition per-request.
producer.maxRecordSizeInBytes
Maximum record size in bytes.
producer.maxRequestLatencyInMilliseconds
Maximum request latency in milliseconds.
producer.maxThrottleTime
Maximum time a request was throttled by a broker in milliseconds.
producer.messageRatePerSecond
Producer messages per second.
producer.responsePerSecond
Number of producer responses per second.
producer.requestPerSecond
Number of producer requests per second.
producer.requestsWaitingResponse
Current number of in-flight requests awaiting a response.
producer.threadsWaiting
Number of user threads blocked waiting for buffer memory to enqueue their records.
Metric
Description
topic.partitionsWithNonPreferredLeader
Number of partitions per topic that are not being led by their preferred replica.
topic.respondMetaData
Number of topics responding to meta data requests.
topic.retentionSizeOrTime
Whether a partition is retained by size or both size and time. A value of 0 = time and a value of 1 = both size and time.
topic.underReplicatedPartitions
Number of partitions per topic that are under-replicated.
Metric
Description
consumer.offset
The last consumed offset on a partition by the consumer group.
consumer.lag
The difference between a broker's high water mark and the consumer's offset (consumer.hwm - consumer.offset).
consumer.hwm
The offset of the last message written to a partition (high water mark).
consumer.totalLag
The sum of lags across partitions consumed by a consumer.
consumerGroup.totalLag
The sum of lags across all partitions consumed by a consumerGroup.
consumerGroup.maxLag
The maximum lag across all partitions consumed by a consumerGroup.
consumerGroup.activeConsumers
The number of active consumers in this consumerGroup.