This page documents the metrics collected when monitoring Apache Kafka with OpenTelemetry. Metrics are collected via the Kafka metrics receiver, JMX receiver, and OpenTelemetry Java agent for client-side metrics.
Metric collection methods
OpenTelemetry Kafka monitoring uses two complementary receivers:
- Kafka metrics receiver: Connects to Kafka's bootstrap port to collect cluster, topic, partition, and consumer group metrics
- JMX receiver: Connects to JMX port (typically 9999) to collect detailed broker metrics and JVM metrics
Kafka metrics receiver metrics
These metrics are collected from Kafka brokers using the Kafka protocol (bootstrap port). Based on the kafkametricsreceiver metadata, with some metrics disabled by default or in typical configurations.
Metric name | Description | Type |
|---|---|---|
| Total number of brokers in the cluster | Gauge (int) |
Metric name | Description | Type | Attributes |
|---|---|---|---|
| Number of partitions in topic | Sum (int) | topic |
| Minimum in-sync replicas of a topic | Gauge (int) | topic |
| Replication factor of a topic | Gauge (int) | topic |
Metric name | Description | Type | Attributes |
|---|---|---|---|
| Number of in-sync replicas for a partition | Sum (int) | topic, partition |
| Total number of in-sync replicas aggregated across all partitions for a topic | Sum (int) | topic |
Metric name | Description | Type | Attributes |
|---|---|---|---|
| Count of members in the consumer group | Sum (int) | group |
| Current offset of the consumer group at partition of topic | Gauge (int) | group, topic, partition |
| Sum of consumer group offset across partitions of topic | Gauge (int) | group, topic |
| Current approximate lag of consumer group at partition of topic | Gauge (int) | group, topic, partition |
| Current approximate sum of consumer group lag across all partitions of topic | Gauge (int) | group, topic |
JMX receiver metrics
The JMX receiver collects detailed metrics from Kafka broker MBeans via JMX (typically port 9999). Metrics are collected using two configurations:
- Default Kafka target system - Built-in Kafka-specific metrics from
target_system: kafka(kafka.yaml) - Custom JMX metrics - Additional Kafka metrics and JVM metrics defined in custom configuration
Default Kafka target system metrics
These metrics are automatically collected when using target_system: kafka:
These metrics are collected from the controller broker and provide cluster-wide information:
Metric name | Description | Type |
|---|---|---|
| The number of partitions offline in the cluster | Gauge |
| The leader election count | Counter |
| Unclean leader election count - increasing indicates broker failures | Counter |
Metric name | Description | Type |
|---|---|---|
| The number of messages received by the broker | Counter |
| The number of requests received by the broker | Counter |
| The number of failed requests | Counter |
| The total time spent processing requests (ms) | Counter |
| Average request processing time (ms) | Gauge |
| 50th percentile request time (ms) | Gauge |
| 99th percentile request time (ms) | Gauge |
| Bytes received or sent by broker per second (includes direction attribute: in/out) | Counter |
| The number of requests waiting in purgatory (Produce and Fetch operations) | Gauge |
| The number of partitions on the broker | Gauge |
| The number of under-replicated partitions on this broker | Gauge |
| In-sync replica operations (shrink or expand) | Counter |
| Maximum lag between follower and leader replicas | Gauge |
| Whether this broker is the active controller (0 or 1) | Gauge |
| Log flush count | Counter |
| Log flush time - 50th percentile (ms) | Gauge |
| Log flush time - 99th percentile (ms) | Gauge |
Attributes: Many metrics include type attribute indicating request type (e.g., fetch, produce), or state for ISR operations (e.g., shrink, expand), or direction for network I/O (in, out).
Custom JMX metrics
These additional Kafka metrics and JVM metrics are collected when using a custom JMX configuration file (as shown in the self-hosted setup). The custom configuration allows you to collect additional Kafka-specific metrics beyond the default set, as well as JVM health metrics.
Metric name | Description | Type |
|---|---|---|
| The total number of topics in the cluster | Gauge |
| The total number of partitions in the cluster | Gauge |
| The number of fenced brokers in the cluster | Gauge |
| The count of topic partitions for which the leader is not the preferred leader | Gauge |
Metric name | Description | Type |
|---|---|---|
| The number of partitions where the number of in-sync replicas is less than the minimum | Gauge |
| Broker uptime (ms) | Gauge |
| Number of partitions for which this broker is the leader | Gauge |
Metric name | Description | Type | Attributes |
|---|---|---|---|
| The number of messages received per topic | Counter | topic |
| The bytes received or sent per topic | Counter | topic, direction (in/out) |
Metric name | Description | Type |
|---|---|---|
| Current heap memory used (bytes) | Gauge |
| Maximum heap memory available (bytes) | Gauge |
| Committed heap memory (bytes) | Gauge |
| Total number of garbage collections that have occurred | Counter |
| The approximate accumulated collection elapsed time (ms) | Counter |
| Total thread count (Kafka typical range 100-300 threads) | Gauge |
| System load average (1 minute) - alert if greater than CPU count | Gauge |
| Number of processors available | Gauge |
| Recent CPU utilization for JVM process (0.0 to 1.0) | Gauge |
| Recent CPU utilization for whole system (0.0 to 1.0) | Gauge |
| Number of open file descriptors - alert if greater than 80% of ulimit | Gauge |
| Currently loaded class count | Gauge |
| Memory pool usage by generation (G1 Old Gen, Eden, Survivor) in bytes | Gauge |
| Maximum memory pool size (bytes) | Gauge |
| Memory used after last GC - shows retained memory baseline (bytes) | Gauge |
Attributes: JVM metrics include attributes like name (for GC collector names or memory pool names).
Kafka client metrics (OpenTelemetry Java agent)
These metrics are collected from Kafka producer and consumer applications instrumented with the OpenTelemetry Java agent with Kafka instrumentation enabled. These provide client-side visibility into application interactions with Kafka brokers and complement the broker-side metrics by providing the application perspective.
Connection and network metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.connection_count | Number of active connections | client-id |
kafka.producer.connection_creation_rate | Rate of new connections established | client-id |
kafka.producer.connection_creation_total | Total connections created | client-id |
kafka.producer.connection_close_rate | Rate of connections closed | client-id |
kafka.producer.network_io_rate | Rate of network operations | client-id |
kafka.producer.network_io_total | Total network operations | client-id |
kafka.producer.outgoing_byte_rate | Rate of outgoing bytes | client-id, node-id |
kafka.producer.outgoing_byte_total | Total outgoing bytes | client-id, node-id |
Request and response metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.request_rate | Rate of requests sent | client-id, node-id |
kafka.producer.request_total | Total requests sent | client-id, node-id |
kafka.producer.request_size_avg | Average request size | client-id, node-id |
kafka.producer.request_size_max | Maximum request size | client-id, node-id |
kafka.producer.request_latency_avg | Average request latency (ms) | client-id, node-id |
kafka.producer.request_latency_max | Maximum request latency (ms) | client-id, node-id |
kafka.producer.response_rate | Rate of responses received | client-id, node-id |
kafka.producer.response_total | Total responses received | client-id, node-id |
kafka.producer.requests_in_flight | Number of in-flight requests | client-id |
Record metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.record_send_rate | Rate of records sent | client-id, topic |
kafka.producer.record_send_total | Total records sent | client-id, topic |
kafka.producer.record_error_rate | Rate of record send errors | client-id, topic |
kafka.producer.record_error_total | Total record send errors | client-id, topic |
kafka.producer.record_retry_rate | Rate of record retries | client-id, topic |
kafka.producer.record_retry_total | Total record retries | client-id, topic |
kafka.producer.record_size_avg | Average record size | client-id |
kafka.producer.record_size_max | Maximum record size | client-id |
kafka.producer.record_queue_time_avg | Average time records spend in send buffer (ms) | client-id |
kafka.producer.record_queue_time_max | Maximum time records spend in send buffer (ms) | client-id |
kafka.producer.records_per_request_avg | Average records per request | client-id |
Throughput metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.byte_rate | Rate of bytes produced | client-id, topic |
kafka.producer.byte_total | Total bytes produced | client-id, topic |
kafka.producer.compression_rate | Average compression rate | client-id, topic |
kafka.producer.compression_rate_avg | Average compression ratio | client-id |
Batching metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.batch_size_avg | Average batch size | client-id |
kafka.producer.batch_size_max | Maximum batch size | client-id |
kafka.producer.batch_split_rate | Rate of batch splits | client-id |
kafka.producer.batch_split_total | Total batch splits | client-id |
Buffer metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.buffer_total_bytes | Total buffer memory | client-id |
kafka.producer.buffer_available_bytes | Available buffer memory | client-id |
kafka.producer.buffer_exhausted_rate | Rate of buffer exhaustion | client-id |
kafka.producer.buffer_exhausted_total | Total buffer exhaustions | client-id |
kafka.producer.bufferpool_wait_ratio | Fraction of time waiting for buffer space | client-id |
kafka.producer.bufferpool_wait_time_total | Total time waiting for buffer space | client-id |
I/O metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.io_ratio | Fraction of time spent in I/O | client-id |
kafka.producer.io_time_ns_avg | Average I/O time (ns) | client-id |
kafka.producer.io_wait_time_ns_avg | Average I/O wait time (ns) | client-id |
kafka.producer.io_wait_ratio | Fraction of time waiting for I/O | client-id |
kafka.producer.iotime_total | Total I/O time | client-id |
kafka.producer.io_waittime_total | Total I/O wait time | client-id |
Throttling metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.produce_throttle_time_avg | Average throttle time (ms) | client-id |
kafka.producer.produce_throttle_time_max | Maximum throttle time (ms) | client-id |
Authentication metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.successful_authentication_rate | Rate of successful authentications | client-id |
kafka.producer.successful_authentication_total | Total successful authentications | client-id |
kafka.producer.successful_authentication_no_reauth_total | Successful authentications without reauthentication | client-id |
kafka.producer.successful_reauthentication_rate | Rate of successful reauthentications | client-id |
kafka.producer.successful_reauthentication_total | Total successful reauthentications | client-id |
kafka.producer.failed_authentication_rate | Rate of failed authentications | client-id |
kafka.producer.failed_authentication_total | Total failed authentications | client-id |
kafka.producer.failed_reauthentication_rate | Rate of failed reauthentications | client-id |
kafka.producer.failed_reauthentication_total | Total failed reauthentications | client-id |
kafka.producer.reauthentication_latency_avg | Average reauthentication latency (ms) | client-id |
kafka.producer.reauthentication_latency_max | Maximum reauthentication latency (ms) | client-id |
Miscellaneous metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.producer.metadata_age | Age of current metadata (seconds) | client-id |
kafka.producer.waiting_threads | Number of threads waiting for buffer space | client-id |
kafka.producer.select_rate | Rate of select calls | client-id |
kafka.producer.select_total | Total select calls | client-id |
Connection and network metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.connection_count | Number of active connections | client-id |
kafka.consumer.connection_creation_rate | Rate of new connections established | client-id |
kafka.consumer.connection_creation_total | Total connections created | client-id |
kafka.consumer.connection_close_rate | Rate of connections closed | client-id |
kafka.consumer.connection_close_total | Total connections closed | client-id |
kafka.consumer.network_io_rate | Rate of network operations | client-id |
kafka.consumer.network_io_total | Total network operations | client-id |
kafka.consumer.outgoing_byte_rate | Rate of outgoing bytes | client-id, node-id |
kafka.consumer.incoming_byte_rate | Rate of incoming bytes | client-id, node-id |
Request and response metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.request_rate | Rate of requests sent | client-id, node-id |
kafka.consumer.request_total | Total requests sent | client-id, node-id |
kafka.consumer.request_size_avg | Average request size | client-id, node-id |
kafka.consumer.request_size_max | Maximum request size | client-id, node-id |
kafka.consumer.request_latency_avg | Average request latency (ms) | client-id, node-id |
kafka.consumer.request_latency_max | Maximum request latency (ms) | client-id, node-id |
kafka.consumer.response_rate | Rate of responses received | client-id, node-id |
kafka.consumer.response_total | Total responses received | client-id, node-id |
Consumption metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.bytes_consumed_rate | Rate of bytes consumed | client-id, topic |
kafka.consumer.bytes_consumed_total | Total bytes consumed | client-id, topic |
kafka.consumer.records_consumed_rate | Rate of records consumed | client-id, topic |
kafka.consumer.records_consumed_total | Total records consumed | client-id, topic |
kafka.consumer.records_per_request_avg | Average records per request | client-id, topic |
Consumer lag metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.records_lag | Current lag in number of records | partition, client-id, topic |
kafka.consumer.records_lag_avg | Average consumer lag | partition, client-id, topic |
kafka.consumer.records_lag_max | Maximum consumer lag | partition, client-id, topic |
kafka.consumer.records_lead | Current lead in number of records | partition, client-id, topic |
kafka.consumer.records_lead_avg | Average consumer lead | partition, client-id, topic |
kafka.consumer.records_lead_min | Minimum consumer lead | partition, client-id, topic |
Fetch metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.fetch_rate | Rate of fetch requests | client-id |
kafka.consumer.fetch_total | Total fetch requests | client-id |
kafka.consumer.fetch_size_avg | Average fetch size | client-id, topic |
kafka.consumer.fetch_size_max | Maximum fetch size | client-id, topic |
kafka.consumer.fetch_latency_avg | Average fetch latency (ms) | client-id |
kafka.consumer.fetch_latency_max | Maximum fetch latency (ms) | client-id |
kafka.consumer.fetch_throttle_time_avg | Average fetch throttle time (ms) | client-id |
kafka.consumer.fetch_throttle_time_max | Maximum fetch throttle time (ms) | client-id |
Consumer group coordination metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.assigned_partitions | Number of partitions assigned | client-id |
kafka.consumer.commit_rate | Rate of offset commits | client-id |
kafka.consumer.commit_total | Total offset commits | client-id |
kafka.consumer.commit_latency_avg | Average commit latency (ms) | client-id |
kafka.consumer.commit_latency_max | Maximum commit latency (ms) | client-id |
kafka.consumer.heartbeat_rate | Rate of heartbeats sent | client-id |
kafka.consumer.heartbeat_total | Total heartbeats sent | client-id |
kafka.consumer.heartbeat_response_time_max | Maximum heartbeat response time (ms) | client-id |
kafka.consumer.last_heartbeat_seconds_ago | Seconds since last heartbeat | client-id |
kafka.consumer.last_poll_seconds_ago | Seconds since last poll | client-id |
Rebalance metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.rebalance_total | Total rebalances | client-id |
kafka.consumer.rebalance_rate_per_hour | Rebalances per hour | client-id |
kafka.consumer.rebalance_latency_avg | Average rebalance latency (ms) | client-id |
kafka.consumer.rebalance_latency_max | Maximum rebalance latency (ms) | client-id |
kafka.consumer.rebalance_latency_total | Total rebalance time (ms) | client-id |
kafka.consumer.failed_rebalance_total | Total failed rebalances | client-id |
kafka.consumer.failed_rebalance_rate_per_hour | Failed rebalances per hour | client-id |
kafka.consumer.last_rebalance_seconds_ago | Seconds since last rebalance | client-id |
kafka.consumer.partition_assigned_latency_avg | Average partition assignment latency (ms) | client-id |
kafka.consumer.partition_assigned_latency_max | Maximum partition assignment latency (ms) | client-id |
kafka.consumer.partition_revoked_latency_avg | Average partition revocation latency (ms) | client-id |
kafka.consumer.partition_revoked_latency_max | Maximum partition revocation latency (ms) | client-id |
kafka.consumer.partition_lost_latency_avg | Average partition loss latency (ms) | client-id |
kafka.consumer.partition_lost_latency_max | Maximum partition loss latency (ms) | client-id |
Sync group metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.sync_rate | Rate of group syncs | client-id |
kafka.consumer.sync_total | Total group syncs | client-id |
kafka.consumer.sync_time_avg | Average sync time (ms) | client-id |
kafka.consumer.sync_time_max | Maximum sync time (ms) | client-id |
kafka.consumer.join_rate | Rate of group joins | client-id |
kafka.consumer.join_total | Total group joins | client-id |
kafka.consumer.join_time_avg | Average join time (ms) | client-id |
kafka.consumer.join_time_max | Maximum join time (ms) | client-id |
I/O metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.io_ratio | Fraction of time spent in I/O | client-id |
kafka.consumer.io_time_ns_avg | Average I/O time (ns) | client-id |
kafka.consumer.io_wait_time_ns_avg | Average I/O wait time (ns) | client-id |
kafka.consumer.io_wait_ratio | Fraction of time waiting for I/O | client-id |
kafka.consumer.iotime_total | Total I/O time | client-id |
kafka.consumer.io_waittime_total | Total I/O wait time | client-id |
Polling metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.poll_idle_ratio_avg | Average fraction of time consumer is idle during poll | client-id |
kafka.consumer.time_between_poll_avg | Average time between polls (ms) | client-id |
kafka.consumer.time_between_poll_max | Maximum time between polls (ms) | client-id |
Authentication metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.successful_authentication_rate | Rate of successful authentications | client-id |
kafka.consumer.successful_authentication_total | Total successful authentications | client-id |
kafka.consumer.successful_authentication_no_reauth_total | Successful authentications without reauthentication | client-id |
kafka.consumer.successful_reauthentication_rate | Rate of successful reauthentications | client-id |
kafka.consumer.successful_reauthentication_total | Total successful reauthentications | client-id |
kafka.consumer.failed_authentication_rate | Rate of failed authentications | client-id |
kafka.consumer.failed_authentication_total | Total failed authentications | client-id |
kafka.consumer.failed_reauthentication_rate | Rate of failed reauthentications | client-id |
kafka.consumer.failed_reauthentication_total | Total failed reauthentications | client-id |
kafka.consumer.reauthentication_latency_avg | Average reauthentication latency (ms) | client-id |
kafka.consumer.reauthentication_latency_max | Maximum reauthentication latency (ms) | client-id |
Miscellaneous metrics
| Metric name | Description | Attributes |
|---|---|---|
kafka.consumer.select_rate | Rate of select calls | client-id |
kafka.consumer.select_total | Total select calls | client-id |
Metric attributes
Metrics can be filtered and grouped using the following attributes:
Common attributes:
kafka.cluster.name- Name of the Kafka cluster (all metrics)instrumentation.provider- Alwaysopentelemetry(all metrics)topic- Kafka topic namepartition- Partition numbergroup- Consumer group namebroker.id- Broker identifier (JMX metrics)client-id- Client identifier (client metrics)node-id- Broker node identifier (client metrics)type- Request type (e.g., fetch, produce)direction- Data direction (in, out)state- ISR operation state (shrink, expand)name- GC collector or memory pool name (JVM metrics)
Next steps
- Query and visualize your data - Find metrics in New Relic UI, write NRQL queries, create dashboards, and set up alerts
- Query metric data types - Learn advanced techniques for working with OpenTelemetry metrics