Kubernetes integration troubleshooting: Error messages

Problem

You are getting error messages for the New Relic Kubernetes integration in your New Relic Infrastructure logs.

Solution

Below are some solutions to the most common Kubernetes integration errors. These errors will show up in the standard (non-verbose) Infrastructure agent logs. If you need more detailed logs (for example, when working with New Relic support), see Kubernetes logging.

Invalid New Relic license

If the license you are using is invalid then you will see an error like this in the logs:

2018-04-09T14:20:17.750893186Z time="2018-04-09T14:20:17Z" level=error 
msg="metric sender can't process 0 times" error="InventoryIngest: events
 were not accepted: 401 401 Unauthorized Invalid license key."

To resolve this problem make sure you specify a valid license key. The key should be surrounded with quotes and no leading or trailing spaces. Example:

- name: "NRIA_LICENSE_KEY"
  value: "1234567890abcdefghijklmnopqrstuvwxyz1234"
Error sending events

If the agent is not able to connect to New Relic servers you will see an error like the following in the logs:

2018-04-09T18:16:35.497195185Z time="2018-04-09T18:16:35Z" level=error 
msg="metric sender can't process 1 times" error="Error sending events: 
Post https://staging-infra-api.newrelic.com/metrics/events/bulk: 
net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

Depending on the exact nature of the error the message in the logs may differ.

To address this problem, see the New Relic networks documentation.

Failed to discover kube-state-metrics

The Kubernetes integration requires kube-state-metrics. If that is not found, you will see an error like the following in the newrelic-infra container logs:

2018-04-11T08:02:41.765236022Z time="2018-04-11T08:02:41Z" level=error 
msg="executing data source" data prefix="integration/com.newrelic.kubernetes" 
error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T08:02:41Z\" 
level=fatal msg=\"failed to discover kube-state-metrics endpoint, 
got error: no service found by label k8s-app=kube-state-metrics\"\n"

Common reasons for this error include:

  • kube-state-metrics has not been deployed into the cluster.
  • kube-state-metrics is deployed using a custom deployment.
  • There are multiple versions of kube-state-metrics running and the Kubernetes integration is not finding the correct one.

The Kubernetes integration automatically discovers kube-state-metrics in your cluster using this logic:

  1. It looks for a kube-state-metrics service running on the kube-system namespace.
  2. If that is not found, it looks for a service tagged with label "k8s-app: kube-state-metrics".

The integration also requires the kube-state-metrics pod to have the label k8s-app: kube-state-metrics or app: kube-state-metrics. If neither of those are found, there will be a log entry like the following:

2018-04-11T09:25:00.825532798Z time="2018-04-11T09:25:00Z" level=error 
msg="executing data source" data prefix="integration/com.newrelic.kubernetes" 
error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T09:25:00Z\" 
level=fatal msg=\"failed to discover nodeIP with kube-state-metrics, 
got error: no pod found by label k8s-app=kube-state-metrics\"\n

To solve this issue, add the k8s-app=kube-state-metrics label to the kube-state-metrics pod.

Missing metrics for Namespaces, Deployments, and ReplicaSets

If metrics for Kubernetes nodes, pods, and containers are showing but metrics for namespaces, deployments and replica sets are missing, the Kubernetes integration is not able to connect to kube-state-metrics.

Indicators of missing data

Indicators of missing namespace, deployment, and replicate set data:

  • In the # of K8s objects chart, that data is missing:

    Kubernetes integration missing data
  • In Insights, queries for K8sNamespaceSample, K8sDeploymentSample, and K8sReplicaSetSample don't show any data:

    Kubernetes integration missing data in Insights

There are two possible reasons for this:

  1. kube-state-metrics service has been customized to listen on port 80. If that is the case, you may see an error like the following in the logs:

    time="2018-04-04T09:35:47Z" level=error msg="executing data source" 
    data prefix="integration/com.newrelic.kubernetes" error="exit status 1" 
    plugin name=nri-kubernetes stderr="time=\"2018-04-04T09:35:47Z\" 
    level=fatal msg=\"Non-recoverable error group: error querying KSM. 
    Get http://kube-state-metrics.kube-system.svc.cluster.local:0/metrics: 
    net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\"\n"
    

    The issue is known and will be fixed in a future release.

    As a workaround, change the kube-state-metrics service to listen on a different port (for example: 8080). See this example on Github.

  2. kube-state-metrics instance is running behind kube-rbac-proxy. New Relic does not currently support this configuration. You may see an error like the following in the logs:

    time="2018-03-28T23:09:12Z" level=error msg="executing data source" 
    data prefix="integration/com.newrelic.kubernetes" error="exit status 1" 
    plugin name=nri-kubernetes stderr="time=\"2018-03-28T23:09:12Z\" 
    level=fatal msg=\"Non-recoverable error group: error querying KSM. 
    Get http://192.168.132.37:8443/metrics: net/http: HTTP/1.x 
    transport connection broken: malformed HTTP response \\\"\\\\x15\\\\x03\\\\x01\\\\x00\\\\x02\\\\x02\\\"\"\n"
    
Cannot list pods at the cluster scope

Newrelic pods and newrelic service account are not deployed in the same namespace. This is usually because the current context specifies a namespace. If this is the case, you will see an error like the following:

time=\"2018-05-31T10:55:39Z\" level=panic msg=\"p
ods is forbidden: User \\\"system:serviceaccount:kube-system:newrelic\\\" cannot list pods at the cluster scope\"

To check to see if this is the case, run:

kubectl describe serviceaccount newrelic | grep Namespace
kubectl get pods -l name=newrelic-infra --all-namespaces
kubectl config get-contexts

To resolve this problem, change the namespace for the service account in the New Relic daemonset yaml file to be the same as the namespace for the current context:

- kind: ServiceAccount
  name: newrelic
  namespace: default
---

For more help

Recommendations for learning more: