Apache Flink monitoring integration
With our Apache Flink dashboard, you can easily track your logs, keep an eye on your instrumentation sources, and get an overview of uptime and downtime for all your app instances. Built with our infrastructure agent and our Prometheus OpenMetrics integration, Flink take advantage of OpenMetrics endpoint scraping, so you can view all your most important data, all in one place.
![Apache Flink dashboard landing page A screenshot of a dashboard with Apache Flink metrics.](/images/dashboards_screenshot-full_apache-flink-quickstart.webp)
After setting up Flink with New Relic, your data will display in dashboards like these, right out of the box.
Install the infrastructure agent
To use the Apache Flink integration, you need to also install the infrastructure agent on the same host. The infrastructure agent monitors the host itself, while the integration you'll install in the next step extends your monitoring with specific data such as database and instance metrics.
Configure Apache Flink to expose metrics
Ensure your Flink instance contains
flink-metrics-prometheus-VERSION.jar
in the following pathFLINK-DIRECTORY/plugins/metrics-prometheus/
. Move it to that location if it isn't already there.Update the Flink configuration file to expose metrics on ports
9250
to9260
Create a config file in
FLINK-DIRECTORY/conf/flink-conf.yaml
. Paste the following content in:metrics.reporters: prommetrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReportermetrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactorymetrics.reporter.prom.host: localhostmetrics.reporter.prom.port: 9250-9260
Check for metrics
Use the following command to start an Apache Flink cluster:
bash$./bin/start-cluster.shCheck for metrics on the following URLs
Job manager metrics:
http://YOUR_DOMAIN:9250Task manager metrics:
http://YOUR_DOMAIN:9251
Configure Prometheus for Apache Flink
Create the following file if it doesn't exist:
/etc/newrelic-infra/integrations.d/nri-prometheus-config.yml
Paste the following information, updating
cluster_name
and and urls:integrations:- name: nri-prometheusconfig:standalone: false# Defaults to true. When standalone is set to `false`, `nri-prometheus` requires an infrastructure agent to send data.emitters: infra-sdk# When running with infrastructure agent emitters will have to include infra-sdkcluster_name: "YOUR_DESIRED_CLUSTER_NAME"# Match the name of your cluster with the name seen in New Relic.targets:- description: "YOUR_DESIRED_DESCRIPTION_HERE"urls: ["http://YOUR_DOMAIN:9250", "http://YOUR_DOMAIN:9251"]# tls_config:# ca_file_path: "/etc/etcd/etcd-client-ca.crt"# cert_file_path: "/etc/etcd/etcd-client.crt"# key_file_path: "/etc/etcd/etcd-client.key"verbose: false# Defaults to false. This determines whether or not the integration should run in verbose mode.audit: false# Defaults to false and does not include verbose mode. Audit mode logs the uncompressed data sent to New Relic and can lead to a high log volume.# scrape_timeout: "YOUR_TIMEOUT_DURATION"# `scrape_timeout` is not a mandatory configuration and defaults to 30s. The HTTP client timeout when fetching data from endpoints.scrape_duration: "5s"# worker_threads: 4# `worker_threads` is not a mandatory configuration and defaults to `4` for clusters with more than 400 endpoints. Slowly increase the worker thread until scrape time falls between the desired `scrape_duration`. Note: Increasing this value too much results in huge memory consumption if too many metrics are scraped at once.insecure_skip_verify: false# Defaults to false. Determins if the integration should skip TLS verification or not.timeout: 10s
Check for metrics
You can use our log forwarding documentation to forward application specific logs to New Relic.
On installing infrastructure agent on the linux machines, your log file named logging.yml
should be present in this path: /etc/newrelic-infra/logging.d/
.
If you don't see your log file in the above path, then you will need to create a log file by following the above log forwarding documentation.
Here is an example of the log name which will look similar as below:
- name: flink-u1-taskexecutor-0-u1-VirtualBox.log
Add the below script to the logging.yml
file to send Apache Flink logs to New Relic.
logs: - name: flink-<REST_OF_THE_FILE_NAME>.log file: <FLINK-DIRECTORY>/log/flink-<REST_OF_THE_FILE_NAME>.log attributes: logtype: flink_logs
Find and use data
Data from this integration can be found by going to: one.newrelic.com > Infrastructure > Third-party services > Apache Flink.
Apache Flink data is ingested as Dimensional Metrics. You can query this data for troubleshooting purposes or to create custom charts and dashboards.
You can use NRQL to query your data. For example, if you want to view the total number of completed checkpoints on New Relic's Query Builder, use this NRQL query:
FROM Metric SELECT latest(flink_jobmanager_job_numberOfCompletedCheckpoints) AS 'Number of Completed Checkpoints'
Apache Flink configuration options
The Apache Flink integration collects both metrics and inventory information. This table provides a description for each config setting and whether it applies to metrics, inventory, or both.
Setting | Description | Default | Applies to |
---|---|---|---|
| The URL set up to provide the metrics using the status module. |
| Metrics, inventory |
Enable multi-tenancy monitoring. |
| Metrics, inventory | |
| Set location of the Apache Flink binary file. | [None] | Inventory |
| Alternative certificate authority bundle file. | [None] | Metrics |
| Alternative certificate authority bundle directory. | [None] | Metrics |
| Set to |
| Metrics |
| Set to |
| |
| Set to |
|
Example configurations
Here are some example YAML configurations:
Labels
You can further decorate your metrics using labels. Labels allow you to add attributes (key/value pairs) to your metrics, which you can then use to query, filter, or group your metrics.
Our default sample config file includes examples of labels but, because they're not mandatory, you can remove, modify, or add new ones of your choice. The following example adds the attribute 'production:load_balancer' to reported metrics.
labels: env: production role: load_balancer
Metric data
The Apache Flink integration collects the following metric data attributes. Each metric name is prefixed with a category indicator and a period, such as net.
or server.
.
Name | Description |
---|---|
| Rate of the number of bytes served, in bytes per second. |
| Rate of the number of client requests, in requests per second. |
| Current number of busy workers. |
| Current number of idle workers. |
| Current number of workers closing TCP connection after serving the response. |
| Current number of workers performing a DNS lookup. |
| Current number of workers gracefully finishing. |
| Current number of idle workers ready for cleanup. |
| Current number of workers maintaining a keep-alive connection. |
| Current number of workers that are logging. |
| Current number of workers reading requests (headers or body). |
| Current number of workers that are starting up. |
| Total number of workers available. Workers that are not needed to process requests may not be started. |
| Current number of workers that are writing. |
Inventory data
Inventory data captures the version numbers from running Apache Flink and from all loaded Apache Flink modules. Those version numbers are added under the config/Apache Flink
namespace. For more about inventory data, see Understand data.
System metadata
Besides the standard attributes collected by the infrastructure agent, the integration collects inventory data associated with the Apache FlinkSample
event type:
Name | Description |
---|---|
| The version of the Apache Flink server. Example: |