• /
  • ログイン

Kafka monitoring integration

The New Relic Kafka on-host integration reports metrics and configuration data from your Kafka service. We instrument all the key elements of your cluster, including brokers (both ZooKeeper and Bootstrap), producers, consumers, and topics.

Read on to install the Kafka integration, and to see what data it collects. To monitor Kafka with our Java agent, see Instrument Kafka message queues.

Compatibility and requirements

Our integration is compatible with Kafka versions 0.8 or higher.

Before installing the integration, make sure that you meet the following requirements:

  • If Kafka is not running on Kubernetes or Amazon ECS, you must install the infrastructure agent on a host that's running Kafka. Otherwise:
  • Java 8 or higher
  • JMX enabled on all brokers
  • Java-based consumers and producers only, and with JMX enabled
  • Total number of monitored topics must be fewer than 300

For Kafka running on Kubernetes, see the Kubernetes requirements.

Prepare for the installation

Kafka is a complex piece of software that is built as a distributed system. For this reason, you’ll need to ensure that the integration can contact all the required hosts and services so the data is collected correctly.

Autodiscovery

Given the distributed nature of Kafka, the actual number and list of brokers is usually not fixed by the configuration, and it is instead quite dynamic. For this reason, the Kafka integration offers two mechanisms to perform automatic discovery of the list of brokers in the cluster: Bootstrap and Zookeeper. The mechanism you use depends on the setup of the Kafka cluster being monitored.

Bootstrap

With the bootstrap mechanism, the integration uses a bootstrap broker to perform the autodiscovery. This is a broker whose address is well known and that will be asked for any other brokers it is aware of. The integration needs to be able to contact this broker in the address provided in the bootstrap_broker_host parameter for bootstrap discovery to work.

Zookeeper

Alternatively, the Kafta integration can also talk to a Zookeeper server in order to obtain the list of brokers. To do this, the integration needs to be provided with the following:

  • The list of Zookeeper hosts to contact (zookeeper_hosts).
  • The proper authentication secrets to connect with the hosts.

Together with the list of brokers it knows about, Zookeeper will also advertise which connection mechanisms are supported by each broker.

You can configure the Kafka integration to try directly with one of these mechanisms with the preferred_listener parameter. If this parameter is not provided, the integration will try to contact the brokers with all the advertised configurations until one of them succeeds.

ヒント

The integration will use Zookeeper only for discovering brokers and will not retrieve metrics from it.

Topic listing

To correctly list the topics processed by the brokers, the integration needs to to contact brokers over the Kafka protocol. Depending on how the brokers are configured, this might require setting up SSL and/or SASL to match the broker configuration.

Broker monitoring (JMX)

The Kafka integration queries JMX, a standard Java extension for exchanging metrics in Java applications. JMX is not enabled by default in Kafka brokers, and you need to enable it for metrics collection to work properly. JMX requires RMI to be enabled, and the RMI port needs to be set to the same port as JMX.

You can configure JMX to use username/password authentication, as well as SSL. If such features have been enabled in the broker's JMX settings, you need to configure the integration accordingly.

重要

We do not recommend enabling anonymous and/or unencrypted JMX/RMI access on public or untrusted network segments because this poses a big security risk.

Producer/consumer monitoring (JMX)

Producers and consumers written in Java can also be monitored through the same mechanism (JMX). JMX needs to be enabled and configured on those applications where it is not enabled by default.

Non-Java producers and consumers do not support JMX and are therefore not supported by the Kafka integration.

Connectivity requirements

As a summary, the integration needs to be configured and allowed to connect to:

  • Hosts listed in zookeeper_hosts over the Zookeeper protocol, using the Zookeeper authentication mechanism (if autodiscover_strategy is set to zookeeper).
  • Hosts defined in bootstrap_broker_host over the Kafka protocol, using the Kafka broker’s authentication/transport mechanisms (if autodiscover_strategy is set to bootstrap).
  • All brokers in the cluster over the Kafka protocol and port, using the Kafka brokers' authentication/transport mechanisms.
  • All brokers in the cluster over the JMX protocol and port, using the authentication/transport mechanisms specified in the JMX configuration of the brokers.
  • All producers/consumers specified in producers and consumers over the JMX protocol and port, if you want producer/consumer monitoring. JMX settings for the consumer must be the same as for the brokers.

重要

For the cloud: By default, Security Groups (and their equivalents in other cloud providers) in AWS do not have the required ports open by default. JMX requires two ports in order to work: the JMX port and the RMI port. These can be set to the same value when configuring the JVM to enable JMX and must be open for the integration to be able to connect to and collect metrics from brokers.

Install and activate

To install the Kafka integration, choose your setup:

Additional notes:

Configure the integration

An integration's YAML-format configuration is where you can place required login credentials and configure how data is collected. Which options you change depend on your setup and preference. The entire environment can be monitored remotely or on any node in that environment.

There are several ways to configure the integration, depending on how it was installed:

For examples of typical configurations, see the example configurations.

重要

With secrets management, you can configure on-host integrations with New Relic infrastructure's agent to use sensitive data (such as passwords) without having to write them as plain text into the integration's configuration file. For more information, see Secrets management.

Commands

The configuration accepts the following commands:

  • inventory: collects configuration status
  • metrics: collects performance metrics
  • consumer_offset: collects consumer group offset data

Arguments

The configuration accepts the following arguments:

General arguments:

  • cluster_name: user-defined name to uniquely identify the cluster being monitored. Required.
  • kafka_version: the version of the Kafka broker you're connecting to, used for setting optimum API versions. Defaults to 1.0.0. Versions older than 1.0.0 may be missing some features.
  • autodiscover_strategy: the method of discovering brokers. Options are zookeeper or bootstrap. Defaults to zookeeper

Zookeeper autodiscovery arguments (only relevant when autodiscover_strategy is zookeeper):

  • zookeeper_hosts: the list of Apache ZooKeeper hosts (in JSON format) that need to be connected.
  • zookeeper_auth_scheme: the ZooKeeper authentication scheme that is used to connect. Currently, the only supported value is digest. If omitted, no authentication is used.
  • zookeeper_auth_secret: the ZooKeeper authentication secret that is used to connect. Should be of the form username:password. Only required if zookeeper_auth_scheme is specified.
  • zookeeper_path: the Zookeeper node under which the Kafka configuration resides. Defaults to /.
  • preferred_listener: use a specific listener to connect to a broker. If unset, the first listener that passes a successful test connection is used. Supported values are PLAINTEXT, SASL_PLAINTEXT, SSL, and SASL_SSL. Note: The SASL_* protocols only support Kerberos (GSSAPI) authentication.

Bootstrap broker discovery arguments (only relevant when autodiscover_strategy is bootstrap):

  • bootstrap_broker_host: the host for the bootstrap broker.
  • bootstrap_broker_kafka_port: the Kafka port for the bootstrap broker.
  • bootstrap_broker_kafka_protocol: the protocol to use to connect to the bootstrap broker. Supported values are PLAINTEXT, SASL_PLAINTEXT, SSL, and SASL_SSL. Note: The SASL_* protocols only support Kerberos (GSSAPI) authentication. Default: PLAINTEXT.
  • bootstrap_broker_jmx_port: the JMX port to use for collection.
  • bootstrap_broker_jmx_user: the JMX user to use for collection.
  • bootstrap_broker_jmx_password: the JMX password to use for collection.

Producer and consumer collection:

  • producers: producers to collect. For each provider a name, hostname, port, username, and password can be provided in JSON form. name is the producer’s name as it appears in Kafka. hostname, port, username, and password are optional and use the default if unspecified.
  • consumers: consumers to collect. For each consumer a name, hostname, port, username, and password can be specified in JSON form. name is the consumer’s name as it appears in Kafka. hostname, port, username, and password are optional and use the default if unspecified.

JMX connection options:

  • default_jmx_host: the default host to collect JMX metrics. If the host field is omitted from a producer or consumer configuration, this value will be used.
  • default_jmx_port: the default port to collect JMX metrics. If the port field is omitted from a producer or consumer configuration, this value will be used.
  • default_jmx_user: the default user that is connecting to the JMX host to collect metrics. This field should only be used if all brokers have a non-default username. If the username field is omitted from a producer or consumer configuration, this value will be used.
  • default_jmx_password: the default password to connect to the JMX host. This field should only be used if all brokers have a non-default password. If the password field is omitted from a producer or consumer configuration, this value will be used.
  • key_store: the filepath of the keystore containing the JMX client's SSL certificate.
  • key_store_password: the password for the JMX SSL key store.
  • trust_store: the filepath of the trust keystore containing the JMX server's SSL certificate.
  • trust_store_password: the password for the JMX trust store.
  • timeout: the timeout for individual JMX queries in milliseconds. Default: 10000.

Broker connection options:

  • tls_ca_file: the certificate authority file for SSL and SASL_SSL listeners, in PEM format.
  • tls_cert_file: the client certificate file for SSL and SASL_SSL listeners, in PEM format.
  • tls_key_file: the client key file for SSL and SASL_SSL listeners, in PEM format.
  • tls_insecure_skip_verify: skip verifying the server's certificate chain and host name
  • sasl_mechanism: the type of SASL authentication to use. Supported options are SCRAM-SHA-512, SCRAM-SHA-256, PLAIN, and GSSAPI.
  • sasl_gssapi_realm: kerberos realm. Required for SASL_SSL or SASL_PLAINTEXT
  • sasl_gssapi_service_name: kerberos service name. Required for SASL_SSL or SASL_PLAINTEXT
  • sasl_gssapi_username: kerberos username. Required for SASL_SSL or SASL_PLAINTEXT
  • sasl_gssapi_key_tab_path: path to the kerberos keytab. Required for SASL_SSL or SASL_PLAINTEXT
  • sasl_gssapi_kerberos_config_path: path to the kerberos config file. Default: /etc/krb5.conf

Collection filtering:

  • collect_broker_topic_data: signals if broker and topic metrics are collected. Options are true or false, defaults to true. Should only be set to false when monitoring only producers and consumers, and topic_mode is set to all.
  • local_only_collection: collect only the metrics related to the configured bootstrap broker. Only used if autodiscover_strategy is bootstrap. Default: false
  • consumer_group_regex: regex pattern that matches the consumer groups to collect offset statistics for. This is limited to collecting statistics for 300 consumer groups. Note: consumer_groups has been deprecated, use this argument instead.
  • topic_mode: determines how many topics we collect. Options are all, none, list, or regex.
  • collect_topic_size: collect the metric Topic size. Options are true or false, defaults to false. topic_size is a resource-intensive metric to collect.
  • topic_list: array of topic names to monitor. Only in effect if topic_mode is set to list.
  • topic_regex: regex pattern that matches the topic names to monitor. Only in effect if topic_mode is set to regex.
  • topic_bucket: used to split topic collection across multiple instances. Should be of the form <bucket number>/<number of buckets>. Default: 1/1.

Labels

Labels are optional tags which help to identify collection data. Some examples are included below.

  • env: label to identify the environment. For example: production.
  • role: label to identify which role is accessing the data.

Example configuration

ヒント

For more details on configuration parameters, see the kafka-config.yml.sample config file on GitHub.

For more about the general structure of on-host integration configuration, see Configuration.

Find and use data

Data from this service is reported to an integration dashboard.

Kafka data is attached to the following event types:

You can query this data for troubleshooting purposes or to create charts and dashboards.

For more on how to find and use your data, see Understand integration data.

Metric data

The Kafka integration collects the following metric data attributes. Each metric name is prefixed with a category indicator and a period, such as broker. or consumer..

KafkaBrokerSample event

Metric

Description

broker.bytesWrittenToTopicPerSecond

Number of bytes written to a topic by the broker per second.

broker.IOInPerSecond

Network IO into brokers in the cluster in bytes per second.

broker.IOOutPerSecond

Network IO out of brokers in the cluster in bytes per second.

broker.logFlushPerSecond

Log flush rate.

broker.messagesInPerSecond

Incoming messages per second.

follower.requestExpirationPerSecond

Rate of request expiration on followers in evictions per second.

net.bytesRejectedPerSecond

Rejected bytes per second.

replication.isrExpandsPerSecond

Rate of replicas joining the ISR pool.

replication.isrShrinksPerSecond

Rate of replicas leaving the ISR pool.

replication.leaderElectionPerSecond

Leader election rate.

replication.uncleanLeaderElectionPerSecond

Unclean leader election rate.

replication.unreplicatedPartitions

Number of unreplicated partitions.

request.avgTimeFetch

Average time per fetch request in milliseconds.

request.avgTimeMetadata

Average time for metadata request in milliseconds.

request.avgTimeMetadata99Percentile

Time for metadata requests for 99th percentile in milliseconds.

request.avgTimeOffset

Average time for an offset request in milliseconds.

request.avgTimeOffset99Percentile

Time for offset requests for 99th percentile in milliseconds.

request.avgTimeProduceRequest

Average time for a produce request in milliseconds.

request.avgTimeUpdateMetadata

Average time for a request to update metadata in milliseconds.

request.avgTimeUpdateMetadata99Percentile

Time for update metadata requests for 99th percentile in milliseconds.

request.clientFetchesFailedPerSecond

Client fetch request failures per second.

request.fetchTime99Percentile

Time for fetch requests for 99th percentile in milliseconds.

request.handlerIdle

Average fraction of time the request handler threads are idle.

request.produceRequestsFailedPerSecond

Failed produce requests per second.

request.produceTime99Percentile

Time for produce requests for 99th percentile.

KafkaConsumerSample event

Metric

Description

consumer.avgFetchSizeInBytes

Average number of bytes fetched per request for a specific topic.

consumer.avgRecordConsumedPerTopic

Average number of records in each request for a specific topic.

consumer.avgRecordConsumedPerTopicPerSecond

Average number of records consumed per second for a specific topic in records per second.

consumer.bytesInPerSecond

Consumer bytes per second.

consumer.fetchPerSecond

The minimum rate at which the consumer sends fetch requests to a broke in requests per second.

consumer.maxFetchSizeInBytes

Maximum number of bytes fetched per request for a specific topic.

consumer.maxLag

Maximum consumer lag.

consumer.messageConsumptionPerSecond

Rate of consumer message consumption in messages per second.

consumer.offsetKafkaCommitsPerSecond

Rate of offset commits to Kafka in commits per second.

consumer.offsetZooKeeperCommitsPerSecond

Rate of offset commits to ZooKeeper in writes per second.

consumer.requestsExpiredPerSecond

Rate of delayed consumer request expiration in evictions per second.

KafkaProducerSample event

Metric

Description

producer.ageMetadataUsedInMilliseconds

Age in seconds of the current producer metadata being used.

producer.availableBufferInBytes

Total amount of buffer memory that is not being used in bytes.

producer.avgBytesSentPerRequestInBytes

Average number of bytes sent per partition per-request.

producer.avgCompressionRateRecordBatches

Average compression rate of record batches.

producer.avgRecordAccumulatorsInMilliseconds

Average time in ms record batches spent in the record accumulator.

producer.avgRecordSizeInBytes

Average record size in bytes.

producer.avgRecordsSentPerSecond

Average number of records sent per second.

producer.avgRecordsSentPerTopicPerSecond

Average number of records sent per second for a topic.

producer.AvgRequestLatencyPerSecond

Producer average request latency.

producer.avgThrottleTime

Average time that a request was throttled by a broker in milliseconds.

producer.bufferMemoryAvailableInBytes

Maximum amount of buffer memory the client can use in bytes.

producer.bufferpoolWaitTime

Faction of time an appender waits for space allocation.

producer.bytesOutPerSecond

Producer bytes per second out.

producer.compressionRateRecordBatches

Average compression rate of record batches for a topic.

producer.iOWaitTime

Producer I/O wait time in milliseconds.

producer.maxBytesSentPerRequestInBytes

Max number of bytes sent per partition per-request.

producer.maxRecordSizeInBytes

Maximum record size in bytes.

producer.maxRequestLatencyInMilliseconds

Maximum request latency in milliseconds.

producer.maxThrottleTime

Maximum time a request was throttled by a broker in milliseconds.

producer.messageRatePerSecond

Producer messages per second.

producer.responsePerSecond

Number of producer responses per second.

producer.requestPerSecond

Number of producer requests per second.

producer.requestsWaitingResponse

Current number of in-flight requests awaiting a response.

producer.threadsWaiting

Number of user threads blocked waiting for buffer memory to enqueue their records.

KafkaTopicSample event

Metric

Description

topic.diskSize

Current topic disk size per broker in bytes.

topic.partitionsWithNonPreferredLeader

Number of partitions per topic that are not being led by their preferred replica.

topic.respondMetaData

Number of topics responding to meta data requests.

topic.retentionSizeOrTime

Whether a partition is retained by size or both size and time. A value of 0 = time and a value of 1 = both size and time.

topic.underReplicatedPartitions

Number of partitions per topic that are under-replicated.

KafkaOffsetSample event

Metric

Description

consumer.offset

The last consumed offset on a partition by the consumer group.

consumer.lag

The difference between a broker's high water mark and the consumer's offset (consumer.hwm - consumer.offset).

consumer.hwm

The offset of the last message written to a partition (high water mark).

consumer.totalLag

The sum of lags across partitions consumed by a consumer.

consumerGroup.totalLag

The sum of lags across all partitions consumed by a consumerGroup.

consumerGroup.maxLag

The maximum lag across all partitions consumed by a consumerGroup.

Inventory data

The Kafka integration captures the non-default broker and topic configuration parameters, and collects the topic partition schemes as reported by ZooKeeper. The data is available on the Inventory UI page under the config/kafka source.

Troubleshooting

Troubleshooting tips:

Check the source code

This integration is open source software. That means you can browse its source code and send improvements or create your own fork and build it.

その他のヘルプ

さらに支援が必要な場合は、これらのサポートと学習リソースを確認してください:

問題を作成するこのページを編集する
Copyright © 2020 New Relic Inc.