KTranslate Docker container health monitoring

While running the KTranslate Docker container for New Relic network monitoring, you can monitor the health of the container to proactively detect potential issues.

The KTranslate container image has the -tee_logs=true and -metrics=jchf settings available during runtime, which allow it to send health metrics directly to New Relic. These are enabled by default when installing network monitoring via the New Relic guided install. We recommend you to set them up when installing network monitoring manually.

Logs from KTranslate

Tip

If you want to check the logs locally from the Docker host, run docker logs $CONTAINER_NAME. For example, docker logs ktranslate-snmp.

Tip

If you want to check the log locally from the Linux package, run journalctl -u ktranslate.

The -tee_logs=true option sends logs to New Relic when polling devices. To see them, do the following:

Go to one.newrelic.com > All capabilities > Logs.
In Find logs where, enter collector.name:"ktranslate" and click Query logs.

Common log searches

Below are some common searches that can be used during troubleshooting to gather data for support:

Logs UI:

bash

$collector.name:"ktranslate" message:"*KTranslate Running -- Version*"

NRQL:

FROM Log SELECT * WHERE `collector.name` = 'ktranslate' AND `message` LIKE '%KTranslate Running -- Version%'

Expected Results:

bash

$KTranslate Running -- Version kt-2021-12-06-1546870234; Build Mon Dec  6 22:22:56 UTC 2021

Logs UI:

bash

$collector.name:"ktranslate" message:"*KTranslate CLI:*"

NRQL:

FROM Log SELECT * WHERE `collector.name` = 'ktranslate' AND `message` LIKE '%KTranslate CLI:%'

Expected Results:

bash

$KTranslate CLI: [ktranslate -listen off -mapping /etc/ktranslate/config.json -geo /etc/ktranslate/GeoLite2-Country.mmdb -udrs /etc/ktranslate/udr.csv -api_devices /etc/ktranslate/devices.json -asn /etc/ktranslate/GeoLite2-ASN.mmdb -log_level info -snmp /snmp-base.yaml -nr_account_id=2583772 -log_level=info -metrics=jchf -tee_logs=true -service_name=snmp nr1.snmp]

Without a parsing rule applied to your logs

Logs UI:

bash

$collector.name:"ktranslate" message:-*\[Info\]*

NRQL:

FROM Log SELECT * WHERE `collector.name` = 'ktranslate' AND `message` NOT LIKE '%[Info]%'

With a parsing rule applied to your logs

Logs UI:

bash

$collector.name:"ktranslate" severity:-"Info"

NRQL:

FROM Log SELECT * WHERE `collector.name` = 'ktranslate' AND `severity` != 'Info'

Expected Results:

bash

$KTranslate>cisco-7513 There was an SNMP polling error with the CustomDeviceMetrics walking OID .1.3.6.1.2.1.4.31.1.1.21 after 0 retries: request timeout (after 0 retries).

Tip

KTranslate has the following log severity levels: Info, Warn, and Error.

Logs UI:

bash

$collector.name:"ktranslate" message:"*Match Attribute*"

NRQL:

FROM Log SELECT * WHERE `collector.name` = 'ktranslate' AND `message` LIKE '%Match Attribute%'

Expected Results:

bash

$KTranslate>cisco-7513 Added 1 Match Attribute(s)

All devices are expected to have at least 1 Match Attribute inherited from the default monitor_admin_shut: true configuration. You should expect a value of 2 to be shown for a device that you have added a single match attribute to.

Tip

You can further filter these results by adding the device name to your query: collector.name:"ktranslate" message:"*$DEVICE_NAME*Match Attribute*".

Metrics from KTranslate

The -metrics option captures the following performance metrics when polling devices:

Metric	Granularity	Description
`baseserver_healthcheck_execution_total`	Top Level	Rate of internal health checks. Shows mostly that things are not deadlocked and should always be greater than 0.
`inputq`	Top Level	Messages per second (msg/sec) received over the last 60 seconds from all SNMP, Flow, and VPC inputs combined.
`jchfq`	Top Level	Gauge rate with number of available pre-allocated buffers. It should be about 8,000.
`delivery_metrics_nr`	Delivery to New Relic	Batches per second (batches/sec) sent over the last 60 seconds for all metrics to New Relic.
`delivery_logs_nr`	Delivery to New Relic	Logs per second (logs/sec) sent over the last 60 seconds for all logs to New Relic.
`delivery_wins_nr`	Delivery to New Relic	Wins per second (wins/sec) of 200 HTTP codes received over the last 60 seconds from sending metrics and events to New Relic.
`device_metrics`	SNMP	Polls per second (polls/sec) of SNMP polling over the last 60 seconds for device level metrics.
`interface_metrics`	SNMP	Polls per second (polls/sec) of SNMP polling over the last 60 seconds for interface level metrics.
`snmp_fail`	SNMP	Gauge to monitor if SNMP polling is working faceted by `device_name`. Where 1 means good and 2 means fail.
`netflow.flows`	Netflow	Flows per second (fps) received over the last 60 seconds for all device flow data: IPFIX, NetFlow, or sFlow.
`syslog_queue`	Syslog	Gauge of syslog messages waiting to be processed.
`syslog_errors`	Syslog	Errors per second (errors/sec) over the last 60 seconds while processing syslog messages.
`syslog_messages`	Syslog	Messages per second (msg/sec) received over the last 60 seconds for all syslog data.

Common metrics searches

To see these metrics in New Relic:

Go to one.newrelic.com > All capabilities > Query your data.
Enter one of the following NRQL queries:

FROM Metric
SELECT
latest(kentik.ktranslate.chf.kkc.baseserver_healthcheck_execution_total) AS 'healthcheck_total',
latest(kentik.ktranslate.chf.kkc.inputq) AS 'input_per_second',
latest(kentik.ktranslate.chf.kkc.jchfq) AS 'buffer'
FACET host AS 'docker_host', svc AS 'container_service'
WHERE provider = 'kentik-agent'
AND instrumentation.name = 'heartbeat'

FROM Metric
SELECT
latest(kentik.ktranslate.chf.kkc.delivery_metrics_nr) AS 'delivery_metric_batches_per_second',
latest(kentik.ktranslate.chf.kkc.delivery_logs_nr) AS 'delivery_logs_per_second',
latest(kentik.ktranslate.chf.kkc.delivery_wins_nr) AS 'delivery_wins_per_second'
FACET host AS 'docker_host', svc AS 'container_service'
WHERE provider = 'kentik-agent'
AND instrumentation.name = 'heartbeat'

FROM Metric
SELECT
latest(kentik.ktranslate.chf.kkc.device_metrics) AS 'device_polls_per_second',
latest(kentik.ktranslate.chf.kkc.interface_metrics) AS 'interface_polls_per_second'
FACET host AS 'docker_host', svc AS 'container_service'
WHERE provider = 'kentik-agent'
AND instrumentation.name = 'heartbeat'

SELECT
max(snmp_fail)
FROM (
  FROM Metric
  SELECT
  latest(kentik.ktranslate.chf.kkc.snmp_fail) AS 'snmp_fail'
  FACET host AS 'docker_host', svc AS 'container_service', device_name AS 'snmp_device'
  WHERE provider = 'kentik-agent'
  AND instrumentation.name = 'heartbeat'
)
FACET docker_host, container_service, snmp_device
WHERE snmp_fail = 2

FROM Metric
SELECT
max(kentik.ktranslate.chf.kkc.netflow) AS 'flows_per_second'
FACET host AS 'docker_host', svc AS 'container_service', device_name AS 'flow_device'
WHERE provider = 'kentik-agent'
AND instrumentation.name = 'heartbeat'

FROM Metric
SELECT
latest(kentik.ktranslate.chf.kkc.syslog_queue) AS 'syslog_queue_total',
latest(kentik.ktranslate.chf.kkc.syslog_errors) AS 'syslog_errors_per_second',
latest(kentik.ktranslate.chf.kkc.syslog_messages) AS 'syslog_messages_per_second'
FACET host AS 'docker_host', svc AS 'container_service'
WHERE provider = 'kentik-agent'
AND instrumentation.name = 'heartbeat'

KTranslate Docker container health monitoring

Logs from KTranslate

Tip

Tip

Common log searches

What version of KTranslate am I running?

What arguments were passed to Docker at runtime?

What errors am I experiencing?

Tip

Is my match_attributes filter working on my device?

Tip

Metrics from KTranslate

Common metrics searches

What are the current versions of my KTranslate applications?

What is the health of my KTranslate application?

What is the health of my deliveries to New Relic?

What is the health of my SNMP collection overall?

What devices are failing SNMP collection?

What is the health of my flow data collection?

What is the health of my syslog collection?

KTranslate Docker container health monitoring

Logs from KTranslate .css-21sua1{background:none;border:none;width:0;padding:0;}

Tip

Tip

Common log searches

What arguments were passed to Docker at runtime?

What errors am I experiencing?

Is my match_attributes filter working on my device?

Metrics from KTranslate

Common metrics searches

What are the current versions of my KTranslate applications?

What is the health of my KTranslate application?

What is the health of my deliveries to New Relic?

What is the health of my SNMP collection overall?

What devices are failing SNMP collection?

What is the health of my flow data collection?

What is the health of my syslog collection?

Logs from KTranslate