You are getting error messages for the New Relic Kubernetes integration in your New Relic Infrastructure logs.
Below are some solutions to the most common Kubernetes integration errors. These errors will show up in the standard (non-verbose) Infrastructure agent logs. If you need more detailed logs (for example, when working with New Relic support), see Kubernetes logging.
- Invalid New Relic license
If the license you are using is invalid then you will see an error like this in the logs:
2018-04-09T14:20:17.750893186Z time="2018-04-09T14:20:17Z" level=error msg="metric sender can't process 0 times" error="InventoryIngest: events were not accepted: 401 401 Unauthorized Invalid license key."
To resolve this problem make sure you specify a valid license key. The key should be surrounded with quotes and no leading or trailing spaces. Example:
- name: "NRIA_LICENSE_KEY" value: "1234567890abcdefghijklmnopqrstuvwxyz1234"
- Error sending events
If the agent is not able to connect to New Relic servers you will see an error like the following in the logs:
2018-04-09T18:16:35.497195185Z time="2018-04-09T18:16:35Z" level=error msg="metric sender can't process 1 times" error="Error sending events: Post https://staging-infra-api.newrelic.com/metrics/events/bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Depending on the exact nature of the error the message in the logs may differ.
To address this problem, see the New Relic networks documentation.
- Failed to discover kube-state-metrics
The Kubernetes integration requires kube-state-metrics. If that is not found, you will see an error like the following in the newrelic-infra container logs:
2018-04-11T08:02:41.765236022Z time="2018-04-11T08:02:41Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T08:02:41Z\" level=fatal msg=\"failed to discover kube-state-metrics endpoint, got error: no service found by label k8s-app=kube-state-metrics\"\n"
Common reasons for this error include:
- kube-state-metrics has not been deployed into the cluster.
- kube-state-metrics is deployed using a custom deployment.
- There are multiple versions of kube-state-metrics running and the Kubernetes integration is not finding the correct one.
The Kubernetes integration automatically discovers kube-state-metrics in your cluster using this logic:
- It looks for a kube-state-metrics service running on the kube-system namespace.
- If that is not found, it looks for a service tagged with label
The integration also requires the kube-state-metrics pod to have the label
app: kube-state-metrics. If neither of those are found, there will be a log entry like the following:
2018-04-11T09:25:00.825532798Z time="2018-04-11T09:25:00Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T09:25:00Z\" level=fatal msg=\"failed to discover nodeIP with kube-state-metrics, got error: no pod found by label k8s-app=kube-state-metrics\"\n
To solve this issue, add the
k8s-app=kube-state-metricslabel to the kube-state-metrics pod.
- Missing metrics for Namespaces, Deployments, and ReplicaSets
If metrics for Kubernetes nodes, pods, and containers are showing but metrics for namespaces, deployments and replica sets are missing, the Kubernetes integration is not able to connect to kube-state-metrics.
- Indicators of missing data
Indicators of missing namespace, deployment, and replicate set data:
In the # of K8s objects chart, that data is missing:
In Insights, queries for
K8sReplicasetSampledon't show any data:
There are two possible reasons for this:
kube-state-metrics service has been customized to listen on port 80. If that is the case, you may see an error like the following in the
time="2018-04-04T09:35:47Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-04T09:35:47Z\" level=fatal msg=\"Non-recoverable error group: error querying KSM. Get http://kube-state-metrics.kube-system.svc.cluster.local:0/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\"\n"
This is a known problem that happens in some clusters where it takes too much time for kube-state-metrics to collect all the cluster's information before sending to the integration.
As a workaround, increase the kube-state-metrics client timeout.
kube-state-metrics instance is running behind kube-rbac-proxy. New Relic does not currently support this configuration. You may see an error like the following in the
time="2018-03-28T23:09:12Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-03-28T23:09:12Z\" level=fatal msg=\"Non-recoverable error group: error querying KSM. Get http://192.168.132.37:8443/metrics: net/http: HTTP/1.x transport connection broken: malformed HTTP response \\\"\\\\x15\\\\x03\\\\x01\\\\x00\\\\x02\\\\x02\\\"\"\n"
- Cannot list pods at the cluster scope
Newrelic pods and newrelic service account are not deployed in the same namespace. This is usually because the current context specifies a namespace. If this is the case, you will see an error like the following:
time=\"2018-05-31T10:55:39Z\" level=panic msg=\"p ods is forbidden: User \\\"system:serviceaccount:kube-system:newrelic\\\" cannot list pods at the cluster scope\"
To check to see if this is the case, run:
kubectl describe serviceaccount newrelic | grep Namespace kubectl get pods -l name=newrelic-infra --all-namespaces kubectl config get-contexts
To resolve this problem, change the namespace for the service account in the New Relic daemonset yaml file to be the same as the namespace for the current context:
- kind: ServiceAccount name: newrelic namespace: default ---