Apache Flink monitoring integration

With our Apache Flink dashboard, you can easily track your logs, keep an eye on your instrumentation sources, and get an overview of uptime and downtime for all your app instances. Built with our infrastructure agent and our Prometheus OpenMetrics integration, Flink take advantage of OpenMetrics endpoint scraping, so you can view all your most important data, all in one place.

A screenshot of a dashboard with Apache Flink metrics.

After setting up Flink with New Relic, your data will display in dashboards like these, right out of the box.

Install the infrastructure agent

To use the Apache Flink integration, you need to also install the infrastructure agent on the same host. The infrastructure agent monitors the host itself, while the integration you'll install in the next step extends your monitoring with specific data such as database and instance metrics.

  1. Ensure your Flink instance contains flink-metrics-prometheus-VERSION.jar in the following path FLINK-DIRECTORY/plugins/metrics-prometheus/. Move it to that location if it isn't already there.

  2. Update the Flink configuration file to expose metrics on ports 9250 to 9260

  3. Create a config file in FLINK-DIRECTORY/conf/flink-conf.yaml. Paste the following content in:

    metrics.reporters: prom
    metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
    metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    metrics.reporter.prom.host: localhost
    metrics.reporter.prom.port: 9250-9260

Check for metrics

  1. Use the following command to start an Apache Flink cluster:

    bash
    $
    ./bin/start-cluster.sh
  2. Check for metrics on the following URLs

    Job manager metrics:

    http://YOUR_DOMAIN:9250

    Task manager metrics:

    http://YOUR_DOMAIN:9251
  1. Create the following file if it doesn't exist: /etc/newrelic-infra/integrations.d/nri-prometheus-config.yml

  2. Paste the following information, updating cluster_name and and urls:

    integrations:
    - name: nri-prometheus
    config:
    standalone: false
    # Defaults to true. When standalone is set to `false`, `nri-prometheus` requires an infrastructure agent to send data.
    emitters: infra-sdk
    # When running with infrastructure agent emitters will have to include infra-sdk
    cluster_name: "YOUR_DESIRED_CLUSTER_NAME"
    # Match the name of your cluster with the name seen in New Relic.
    targets:
    - description: "YOUR_DESIRED_DESCRIPTION_HERE"
    urls: ["http://YOUR_DOMAIN:9250", "http://YOUR_DOMAIN:9251"]
    # tls_config:
    # ca_file_path: "/etc/etcd/etcd-client-ca.crt"
    # cert_file_path: "/etc/etcd/etcd-client.crt"
    # key_file_path: "/etc/etcd/etcd-client.key"
    verbose: false
    # Defaults to false. This determines whether or not the integration should run in verbose mode.
    audit: false
    # Defaults to false and does not include verbose mode. Audit mode logs the uncompressed data sent to New Relic and can lead to a high log volume.
    # scrape_timeout: "YOUR_TIMEOUT_DURATION"
    # `scrape_timeout` is not a mandatory configuration and defaults to 30s. The HTTP client timeout when fetching data from endpoints.
    scrape_duration: "5s"
    # worker_threads: 4
    # `worker_threads` is not a mandatory configuration and defaults to `4` for clusters with more than 400 endpoints. Slowly increase the worker thread until scrape time falls between the desired `scrape_duration`. Note: Increasing this value too much results in huge memory consumption if too many metrics are scraped at once.
    insecure_skip_verify: false
    # Defaults to false. Determins if the integration should skip TLS verification or not.
    timeout: 10s

Check for metrics

You can use our log forwarding documentation to forward application specific logs to New Relic.

On installing infrastructure agent on the linux machines, your log file named logging.yml should be present in this path: /etc/newrelic-infra/logging.d/.

If you don't see your log file in the above path, then you will need to create a log file by following the above log forwarding documentation.

Here is an example of the log name which will look similar as below:

- name: flink-u1-taskexecutor-0-u1-VirtualBox.log

Add the below script to the logging.yml file to send Apache Flink logs to New Relic.

logs:
- name: flink-<REST_OF_THE_FILE_NAME>.log
file: <FLINK-DIRECTORY>/log/flink-<REST_OF_THE_FILE_NAME>.log
attributes:
logtype: flink_logs

Find and use data

Data from this integration can be found by going to: one.newrelic.com > Infrastructure > Third-party services > Apache Flink.

Apache Flink data is ingested as Dimensional Metrics. You can query this data for troubleshooting purposes or to create custom charts and dashboards.

You can use NRQL to query your data. For example, if you want to view the total number of completed checkpoints on New Relic's Query Builder, use this NRQL query:

FROM Metric SELECT latest(flink_jobmanager_job_numberOfCompletedCheckpoints) AS 'Number of Completed Checkpoints'

Apache Flink configuration options

The Apache Flink integration collects both metrics and inventory information. This table provides a description for each config setting and whether it applies to metrics, inventory, or both.

Setting

Description

Default

Applies to

STATUS_URL

The URL set up to provide the metrics using the status module.

http://127.0.0.1/server-status?auto

Metrics, inventory

REMOTE_MONITORING

Enable multi-tenancy monitoring.

true

Metrics, inventory

BINARY_PATH

Set location of the Apache Flink binary file.

[None]

Inventory

CA_BUNDLE_FILE

Alternative certificate authority bundle file.

[None]

Metrics

CA_BUNDLE_DIR

Alternative certificate authority bundle directory.

[None]

Metrics

VALIDATE_CERTS

Set to false if the status URL is HTTPS with a self-signed certificate.

true

Metrics

METRICS

Set to true to enable metrics-only collection.

false

INVENTORY

Set to true to enable inventory-only collection.

false

Example configurations

Here are some example YAML configurations:

Labels

You can further decorate your metrics using labels. Labels allow you to add attributes (key/value pairs) to your metrics, which you can then use to query, filter, or group your metrics.

Our default sample config file includes examples of labels but, because they're not mandatory, you can remove, modify, or add new ones of your choice. The following example adds the attribute 'production:load_balancer' to reported metrics.

labels:
env: production
role: load_balancer

Metric data

The Apache Flink integration collects the following metric data attributes. Each metric name is prefixed with a category indicator and a period, such as net. or server..

Name

Description

net.bytesPerSecond

Rate of the number of bytes served, in bytes per second.

net.requestsPerSecond

Rate of the number of client requests, in requests per second.

server.busyWorkers

Current number of busy workers.

server.idleWorkers

Current number of idle workers.

server.scoreboard.closingWorkers

Current number of workers closing TCP connection after serving the response.

server.scoreboard.dnsLookupWorkers

Current number of workers performing a DNS lookup.

server.scoreboard.finishingWorkers

Current number of workers gracefully finishing.

server.scoreboard.idleCleanupWorkers

Current number of idle workers ready for cleanup.

server.scoreboard.keepAliveWorkers

Current number of workers maintaining a keep-alive connection.

server.scoreboard.loggingWorkers

Current number of workers that are logging.

server.scoreboard.readingWorkers

Current number of workers reading requests (headers or body).

server.scoreboard.startingWorkers

Current number of workers that are starting up.

server.scoreboard.totalWorkers

Total number of workers available. Workers that are not needed to process requests may not be started.

server.scoreboard.writingWorkers

Current number of workers that are writing.

Inventory data

Inventory data captures the version numbers from running Apache Flink and from all loaded Apache Flink modules. Those version numbers are added under the config/Apache Flink namespace. For more about inventory data, see Understand data.

System metadata

Besides the standard attributes collected by the infrastructure agent, the integration collects inventory data associated with the Apache FlinkSample event type:

Name

Description

software.version

The version of the Apache Flink server. Example: Apache Flink/2.4.7 (Ubuntu).

Troubleshooting