You can collect metrics about your Confluent Cloud-managed Kafka deployment with the OpenTelemetry collector. The collector is a component of OpenTelemetry that collects, processes, and exports telemetry data to New Relic (or any observability back-end).
Tip
If you're looking for help with other collector use cases, see the newrelic-opentelemetry-examples repository.
Complete the steps below to collect Kafka metrics from Confluent.
Step 1: Sign up for New Relic!
- If you haven't already done so, sign up for a free New Relic account.
- Get the license key for the New Relic account to which you want to report data.
Step 2: Prerequisites
Step 3: Compile from PR source repo
Important
New Relic supports the OpenTelemetry community by contributing our work upstream to both the Core and Contrib repos.
When PR14167 on the OpenTelemetry Collector Contrib repo has been merged, the documentation below will be updated to reflect the main branch of the Contrib repo.
See https://github.com/abeach-nr/opentelemetry-collector-contrib.git for latest installation instructions.
$$ git clone https://github.com/abeach-nr/opentelemetry-collector-contrib.git$$ cd opentelemetry-collector-contrib$$ make otelcontribcol
The binary will be installed under ./bin
Step 4: Configure Opentelemetry collectors
Create a new file called config.yaml
from the example below.
Replace the following keys in the file with your own values:
- Cloud API key
- CONFLUENT_API_ID
- CONFLUENT_API_SECRET
- Kafka Client API key
- CLUSTER_API_KEY
- CLUSTER_API_SECRET
- New Relic Ingest key
- NEW RELIC LICENSE KEY
- CLUSTER_ID
- Cluster ID from Confluent cloud
- Cluster key/secret should be specific to this cluster
- CLUSTER_BOOTSTRAP_SERVER
- bootstrap server provided by confluent for the cluster
- example: xxx-xxxx.us-east-2.aws.confluent.cloud:9092
receivers: kafkametrics: brokers: - CLUSTER_BOOTSTRAP_SERVER protocol_version: 2.0.0 scrapers: - brokers - topics - consumers auth: sasl: username: CLUSTER_API_KEY password: CLUSTER_API_SECRET mechanism: PLAIN tls: insecure_skip_verify: false collection_interval: 30s
prometheus: config: scrape_configs: - job_name: "confluent" scrape_interval: 60s # Do not go any lower than this or you'll hit rate limits static_configs: - targets: ["api.telemetry.confluent.cloud"] scheme: https basic_auth: username: CONFLUENT_API_ID password: CONFLUENT_API_SECRET metrics_path: /v2/metrics/cloud/export params: "resource.kafka.id": - CLUSTER_IDexporters: otlp: endpoint: https://otlp.nr-data.net:4317 headers: api-key: NEW_RELIC_LICENSE_KEYprocessors: batch: memory_limiter: limit_mib: 400 spike_limit_mib: 100 check_interval: 5sservice: telemetry: logs: pipelines: metrics: receivers: [prometheus] processors: [batch] exporters: [otlp] metrics/kafka: receivers: [kafkametrics] processors: [batch] exporters: [otlp]
Step 5: Run the collector
Execute the following, making sure to insert the operating system (for example, darwin
or linux
):
./bin/otelcontribcol_INSERT_THE_OPERATING_SYSTEM_amd64 --config config.yaml
Step 6: Set up dashboards in New Relic
Check out this New Relic example dashboard that uses these metrics:
Kafka instance metrics
Name | Description |
---|---|
kafka.brokers | Number of brokers in the cluster |
kafka.brokers.consumer_fetch_rate_avg | Average consumer fetch rate |
kafka.brokers.incoming_byte_rate_avg | Average incoming byte rate in bytes/second |
kafka.brokers.outgoing_byte_rate_avg | Average outgoing byte rate in bytes/second |
kafka.brokers.request_latency_avg | Request latency average in ms |
kafka.brokers.request_rate_avg | Average request rate per second |
kafka.brokers.request_size_avg | Average request size in bytes |
kafka.brokers.requests_in_flight | Requests in flight |
kafka.brokers.response_rate_avg | Average response rate per second |
kafka.brokers.response_size_avg | Average response size in bytes |
kafka.consumer_group.lag | Current approximate lag of consumer group at partition of topic |
kafka.consumer_group.lag_sum | Current approximate sum of consumer group lag across all partitions of topic |
kafka.consumer_group.members | Count of members in the consumer group |
Confluent Cloud metrics
Name | Description |
---|---|
confluent_kafka_server_received_bytes | The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_bytes | The delta count of bytes of the customer's data sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_received_records | The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_records | The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_retained_bytes | The current count of bytes retained by the cluster. The count is sampled every 60 seconds. |
confluent_kafka_server_active_connection_count | The count of active authenticated connections. |
confluent_kafka_server_request_count | The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds. |
confluent_kafka_server_partition_count | The number of partitions |
confluent_kafka_server_successful_authentication_count | The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds. |
confluent_kafka_server_consumer_lag_offsets | The lag between a group member's committed offset and the partition's high watermark. |