You have a device that collects the expected data sometimes, but there are inconsistent gaps in the charts.
This usually happens when a device can't reliably respond to the SNMP requests within the timeout period because the network between the
ktranslate container and the polled device is experiencing bandwidth contention, losing packets, or has high latency.
Another scenario might be that the device is overloaded and can't respond to the SNMP requests quickly. This usually happens when you try to collect OIDs from very large tables with a
poll_time_sec that is too fast for the device to keep up with.
As a general rule, locate your polling container as close to the monitored devices as you can to reduce the chance of a the UDP SNMP payloads not making the trip.
In cases where you must poll across WAN links with more latency, you may need to edit the
snmp-base.yaml file to increase the
timeout_ms from the default 5000ms to a longer interval.
If you believe you may have an unreliable connection to the remote site, consider increasing the
retries from the default of 0. Having a high number of retries, but a timeout that is too short, will likely not improve your situation and can potentially increase the loads on the monitored device as it's trying to respond to more requests and none of them get back before the timeout expires.
If you're polling large tables of data from devices, such as a busy load balancer, then it can take some additional time for the monitored device to gather the required data for the response. This can require setting a longer
timeout_ms period and and setting a longer delay for