Configure Prometheus OpenMetrics integrations in large Kubernetes environments

CPU and memory limits and requests can vary according to the number of targets monitored, and the number of metrics exposed by each target. For example, a Prometheus OpenMetrics integration which scrapes 800 targets, exposing 1000 timeseries each, with a latency of 150ms and a scrape_duration of 30 seconds, consumes 2.5CPU and 700MB of RAM.

Configure the integration for large environments

To estimate the size of the environment you are monitoring, run the following query to see how many targets are being scraped:

SELECT latest(nr_stats_targets) FROM Metric where clusterName=’clusterName’ SINCE 30 MINUTES AGO TIMESERIES

In huge environments with hundreds of targets to be scraped, the latency on the /metrics endpoints must be below 1 second. Run this query to check the latency of the different targets. This query retrieves the data exposed by the Prometheus OpenMetrics integration, and shows the time required to fetch each endpoint.

SELECT average(nr_stats_integration_fetch_target_duration_seconds) FROM Metric where clusterName=’clustername' SINCE 30 MINUTES AGO FACET target LIMIT 30

In order to keep the time needed to scrape all the targets below 30 seconds, use the following configurations:

Targets Configuration
Targets < 400, with 1000 metrics each No modification is required. CPU ranges roughly between 0.1 and 1.5 cores, and the memory required should be no more than 256MB.
400 < targets < 1000, with 1000 metrics each

The number of workers should be increased to 6-8. CPU ranges roughly between 1.5 and 3.5 cores, and the memory required is around 100MB.

Targets > 1000, with 1000 metrics each The number of workers should be increased to 10 or more. CPU is over 3.5 cores, and the memory required is around 1GB or more.

For more help

If you need more help, check out these support and learning resources: