You can collect metrics about your Confluent Cloud-managed Kafka deployment with the OpenTelemetry collector. The collector is a component of OpenTelemetry that collects, processes, and exports telemetry data to New Relic (or any observability back-end).
If you're looking for help with other collector use cases, see the newrelic-opentelemetry-examples repository.
Complete the steps below to collect Kafka metrics from Confluent.
- If you haven't already done so, sign up for a free New Relic account.
- Get the license key for the New Relic account to which you want to report data.
See https://github.com/abeach-nr/opentelemetry-collector-contrib.git for latest installation instructions.
$$ git clone https://github.com/abeach-nr/opentelemetry-collector-contrib.git$$ cd opentelemetry-collector-contrib$$ make otelcontribcol
The binary will be installed under
Create a new file called
config.yaml from the example below.
Replace the following keys in the file with your own values:
- Cloud API key
- Kafka Client API key
- New Relic Ingest key
- NEW RELIC LICENSE KEY
- Cluster ID from Confluent cloud
- Cluster key/secret should be specific to this cluster
- bootstrap server provided by confluent for the cluster
- example: xxx-xxxx.us-east-2.aws.confluent.cloud:9092
receivers:kafkametrics:brokers:- CLUSTER_BOOTSTRAP_SERVERprotocol_version: 2.0.0scrapers:- brokers- topics- consumersauth:sasl:username: CLUSTER_API_KEYpassword: CLUSTER_API_SECRETmechanism: PLAINtls:insecure_skip_verify: falsecollection_interval: 30sprometheus:config:scrape_configs:- job_name: "confluent"scrape_interval: 60s # Do not go any lower than this or you'll hit rate limitsstatic_configs:- targets: ["api.telemetry.confluent.cloud"]scheme: httpsbasic_auth:username: CONFLUENT_API_IDpassword: CONFLUENT_API_SECRETmetrics_path: /v2/metrics/cloud/exportparams:"resource.kafka.id":- CLUSTER_IDexporters:otlp:endpoint: https://otlp.nr-data.net:4317headers:api-key: NEW_RELIC_LICENSE_KEYprocessors:batch:memory_limiter:limit_mib: 400spike_limit_mib: 100check_interval: 5sservice:telemetry:logs:pipelines:metrics:receivers: [prometheus]processors: [batch]exporters: [otlp]metrics/kafka:receivers: [kafkametrics]processors: [batch]exporters: [otlp]
Execute the following, making sure to insert the operating system (for example,
./bin/otelcontribcol_INSERT_THE_OPERATING_SYSTEM_amd64 --config config.yaml
Check out this New Relic example dashboard that uses these metrics:
Number of brokers in the cluster
Average consumer fetch rate
Average incoming byte rate in bytes/second
Average outgoing byte rate in bytes/second
Request latency average in ms
Average request rate per second
Average request size in bytes
Requests in flight
Average response rate per second
Average response size in bytes
Current approximate lag of consumer group at partition of topic
Current approximate sum of consumer group lag across all partitions of topic
Count of members in the consumer group
The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.
The delta count of bytes of the customer's data sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.
The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
The current count of bytes retained by the cluster. The count is sampled every 60 seconds.
The count of active authenticated connections.
The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
The number of partitions
The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds.
The lag between a group member's committed offset and the partition's high watermark.