• Log inStart now

Collector for Confluent Cloud & Kafka monitoring

You can collect metrics about your Confluent Cloud-managed Kafka deployment with the OpenTelemetry collector. The collector is a component of OpenTelemetry that collects, processes, and exports telemetry data to New Relic (or any observability back-end).

Complete the steps below to collect Kafka metrics from Confluent.

Step 1: Sign up for New Relic!

  • If you haven't already done so, sign up for a free New Relic account.
  • Get the license key for the New Relic account to which you want to report data.

Step 2: Prerequisites

  • Ensure Go is installed before proceeding.
  • Set GOPATH to $GOPATH/bin and add to the PATH variable.

Step 3: Compile from PR source repo

Important

New Relic supports the OpenTelemetry community by contributing our work upstream to both the Core and Contrib repos.

When PR14167 on the OpenTelemetry Collector Contrib repo has been merged, the documentation below will be updated to reflect the main branch of the Contrib repo.

See https://github.com/abeach-nr/opentelemetry-collector-contrib.git for latest installation instructions.

bash
$
$ git clone https://github.com/abeach-nr/opentelemetry-collector-contrib.git
$
$ cd opentelemetry-collector-contrib
$
$ make otelcontribcol

The binary will be installed under ./bin

Step 4: Configure Opentelemetry collectors

Create a new file called config.yaml from the example below.

Replace the following keys in the file with your own values:

  • Cloud API key
    • CONFLUENT_API_ID
    • CONFLUENT_API_SECRET
  • Kafka Client API key
    • CLUSTER_API_KEY
    • CLUSTER_API_SECRET
  • New Relic Ingest key
    • NEW RELIC LICENSE KEY
  • CLUSTER_ID
    • Cluster ID from Confluent cloud
    • Cluster key/secret should be specific to this cluster
  • CLUSTER_BOOTSTRAP_SERVER
    • bootstrap server provided by confluent for the cluster
    • example: xxx-xxxx.us-east-2.aws.confluent.cloud:9092
receivers:
kafkametrics:
brokers:
- CLUSTER_BOOTSTRAP_SERVER
protocol_version: 2.0.0
scrapers:
- brokers
- topics
- consumers
auth:
sasl:
username: CLUSTER_API_KEY
password: CLUSTER_API_SECRET
mechanism: PLAIN
tls:
insecure_skip_verify: false
collection_interval: 30s
prometheus:
config:
scrape_configs:
- job_name: "confluent"
scrape_interval: 60s # Do not go any lower than this or you'll hit rate limits
static_configs:
- targets: ["api.telemetry.confluent.cloud"]
scheme: https
basic_auth:
username: CONFLUENT_API_ID
password: CONFLUENT_API_SECRET
metrics_path: /v2/metrics/cloud/export
params:
"resource.kafka.id":
- CLUSTER_ID
exporters:
otlp:
endpoint: https://otlp.nr-data.net:4317
headers:
api-key: NEW_RELIC_LICENSE_KEY
processors:
batch:
memory_limiter:
limit_mib: 400
spike_limit_mib: 100
check_interval: 5s
service:
telemetry:
logs:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [otlp]
metrics/kafka:
receivers: [kafkametrics]
processors: [batch]
exporters: [otlp]

Step 5: Run the collector

Execute the following, making sure to insert the operating system (for example, darwin or linux):

./bin/otelcontribcol_INSERT_THE_OPERATING_SYSTEM_amd64 --config config.yaml

Step 6: Set up dashboards in New Relic

Check out this New Relic example dashboard that uses these metrics:

Kafka instance metrics

Name

Description

kafka.brokers

Number of brokers in the cluster

kafka.brokers.consumer_fetch_rate_avg

Average consumer fetch rate

kafka.brokers.incoming_byte_rate_avg

Average incoming byte rate in bytes/second

kafka.brokers.outgoing_byte_rate_avg

Average outgoing byte rate in bytes/second

kafka.brokers.request_latency_avg

Request latency average in ms

kafka.brokers.request_rate_avg

Average request rate per second

kafka.brokers.request_size_avg

Average request size in bytes

kafka.brokers.requests_in_flight

Requests in flight

kafka.brokers.response_rate_avg

Average response rate per second

kafka.brokers.response_size_avg

Average response size in bytes

kafka.consumer_group.lag

Current approximate lag of consumer group at partition of topic

kafka.consumer_group.lag_sum

Current approximate sum of consumer group lag across all partitions of topic

kafka.consumer_group.members

Count of members in the consumer group

Confluent Cloud metrics

Name

Description

confluent_kafka_server_received_bytes

The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.

confluent_kafka_server_sent_bytes

The delta count of bytes of the customer's data sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.

confluent_kafka_server_received_records

The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.

confluent_kafka_server_sent_records

The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.

confluent_kafka_server_retained_bytes

The current count of bytes retained by the cluster. The count is sampled every 60 seconds.

confluent_kafka_server_active_connection_count

The count of active authenticated connections.

confluent_kafka_server_request_count

The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.

confluent_kafka_server_partition_count

The number of partitions

confluent_kafka_server_successful_authentication_count

The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds.

confluent_kafka_server_consumer_lag_offsets

The lag between a group member's committed offset and the partition's high watermark.

Copyright © 2023 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.