Problem
You are getting error messages for the New Relic Kubernetes integration from your terminal during the Kubernetes integration installation or in your New Relic infrastructure logs after the Kubernetes integration installation.
If you see the following error message during your manual Kubernetes intergration installation:
repo newrelic not found
You may have forgotten or skipped this command for adding the newrelic repo to your helm chart:
helm repo add newrelic https://helm-charts.newrelic.com
If the license you are using is invalid then you will see an error like this in the logs of the agent
or forwarder
containers:
2018-04-09T14:20:17.750893186Z time="2018-04-09T14:20:17Z" level=error msg="metric sender can't process 0 times" error="InventoryIngest: events were not accepted: 401 401 Unauthorized Invalid license key."
To resolve this problem make sure you specify a valid .
If the agent is not able to connect to New Relic servers you will see an error like the following in the logs of the agent
or forwarder
containers:
2018-04-09T18:16:35.497195185Z time="2018-04-09T18:16:35Z" level=error msg="metric sender can't process 1 times" error="Error sending events: Post https://staging-infra-api.newrelic.com/metrics/events/bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Depending on the exact nature of the error the message in the logs may differ.
To address this problem, see the New Relic networks documentation.
The Kubernetes integration requires kube-state-metrics
. If that is not found, you will see an error like the following in the newrelic-infra container logs:
time="2022-06-21T09:12:20Z" level=error msg="retrieving scraper data: retrieving ksm data: discovering KSM endpoints: timeout discovering endpoints"
Common reasons for this error include:
kube-state-metrics
has not been deployed into the cluster.kube-state-metrics
is deployed using a custom deployment.- There are multiple versions of
kube-state-metrics
running and the Kubernetes integration is not finding the correct one.
The Kubernetes integration automatically discovers kube-state-metrics
in your cluster using by default leveraging the label "app.kubernetes.io/name=kube-state-metrics"
across all namespaces.
You can change the discovery behavior in the ksm.config
of the helm chart values.
During the guided install, an output error like the one below indicates that you are experiencing a networking connection issue between the Kubernetes client and Kubernetes API server. Please make sure your Kubernetes client can connect to your Kubernetes API server before you run the guided install again.
Unable to connect to the server: dial tcp [7777:777:7777:7777:77::77]:443: i/o timeout
During the guided install, an output error, like nrk8s-kubelet pod is not starting
, indicates that the Kubernetes kubelet pod cannot be started within 5 minutes and the installation script exits due to this timeout.
In this case, you can run kubectl get pods -o wide -n newrelic | grep nrk8s-kubelet
to see the pod's status and restarts.
- If the pod is in
ImagePullBackOff
status, please check you network connection to allow image pulling from the domains listed here. - If the pod is in
Pending
orContainerCreating
status, please usekubectl logs newrelic-bundle-nrk8s-kubelet-***** -n newrelic
andkubectl logs newrelic-bundle-nrk8s-kubelet-***** -n newrelic -c kubelet
to figure out the potential reasons from the logs.
Solution for integration version v2
Below are some solutions to the most common Kubernetes integration errors. These errors show up in the standard non-verbose Infrastructure agent logs.
If you need more detailed logs working with New Relic support, for example, see Kubernetes logs.
The Kubernetes integration requires kube-state-metrics
. If that is not found, you will see an error like the following in the newrelic-infra container logs:
2018-04-11T08:02:41.765236022Z time="2018-04-11T08:02:41Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T08:02:41Z\" level=fatal msg=\"failed to discover kube-state-metrics endpoint, got error: no service found by label k8s-app=kube-state-metrics\"\n"
Common reasons for this error include:
kube-state-metrics
has not been deployed into the cluster.kube-state-metrics
is deployed using a custom deployment.- There are multiple versions of
kube-state-metrics
running and the Kubernetes integration is not finding the correct one.
The Kubernetes integration automatically discovers kube-state-metrics
in your cluster using this logic:
- It looks for a
kube-state-metrics
service running on thekube-system
namespace. - If that is not found, it looks for a service tagged with label
"k8s-app: kube-state-metrics"
.
The integration also requires the kube-state-metrics pod to have the label k8s-app: kube-state-metrics
or app: kube-state-metrics
. If neither of those are found, there will be a log entry like the following:
2018-04-11T09:25:00.825532798Z time="2018-04-11T09:25:00Z" level=error msg="executing data source" data prefix="integration/com.newrelic.kubernetes" error="exit status 1" plugin name=nri-kubernetes stderr="time=\"2018-04-11T09:25:00Z\" level=fatal msg=\"failed to discover nodeIP with kube-state-metrics, got error: no pod found by label k8s-app=kube-state-metrics\"\n
To solve this issue, add the k8s-app=kube-state-metrics
label to the kube-state-metrics
pod.
If metrics for Kubernetes nodes, pods, and containers are showing but metrics for namespaces, deployments and ReplicaSets
are missing, the Kubernetes integration is not able to connect to kube-state-metrics
.
Indicators of missing namespace, deployment, and ReplicaSet
data:
- In the # of K8s objects chart, that data is missing.
- Queries for
K8sNamespaceSample
,K8sDeploymentSample
, andK8sReplicasetSample
don't show any data.
There are few possible reasons for this:
kube-state-metrics
service has been customized to listen on port 80. If that is the case, you may see an error like the following in theverbose
logs:time="2018-04-04T09:35:47Z" level=error msg="executing data source"data prefix="integration/com.newrelic.kubernetes" error="exit status 1"plugin name=nri-kubernetes stderr="time=\"2018-04-04T09:35:47Z\"level=fatal msg=\"Non-recoverable error group: error querying KSM.Get http://kube-state-metrics.kube-system.svc.cluster.local:0/metrics:net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\"\n"This is a known problem that happens in some clusters where it takes too much time for
kube-state-metrics
to collect all the cluster's information before sending to the integration.As a workaround, increase the kube-state-metrics client timeout.
kube-state-metrics
instance is running behind kube-rbac-proxy. New Relic does not currently support this configuration. You may see an error like the following in theverbose
logs:time="2018-03-28T23:09:12Z" level=error msg="executing data source"data prefix="integration/com.newrelic.kubernetes" error="exit status 1"plugin name=nri-kubernetes stderr="time=\"2018-03-28T23:09:12Z\"level=fatal msg=\"Non-recoverable error group: error querying KSM.Get http://192.168.132.37:8443/metrics: net/http: HTTP/1.xtransport connection broken: malformed HTTP response \\\"\\\\x15\\\\x03\\\\x01\\\\x00\\\\x02\\\\x02\\\"\"\n"The KSM payload is quite large, and the Kubernetes integration processing the date is being OOM-killed. Since the integration is not the main process of the container, the pod is not restarted. This situation can be spotted at the logs of the
newrelic-infra
pod running in the same node of KSM:time="2020-12-10T17:40:44Z" level=error msg="Integration command failed" error="signal: killed" instance=nri-kubernetes integration=com.newrelic.kubernetesAs a workaround, increase the DaemonSet memory limits so the process is not killed.
Newrelic pods and newrelic service account are not deployed in the same namespace. This is usually because the current context specifies a namespace. If this is the case, you will see an error like the following:
time=\"2018-05-31T10:55:39Z\" level=panic msg=\"pods is forbidden: User \\\"system:serviceaccount:kube-system:newrelic\\\" cannot list pods at the cluster scope\"
To check to see if this is the case, run:
kubectl describe serviceaccount newrelic | grep Namespacekubectl get pods -l name=newrelic-infra --all-namespaceskubectl config get-contexts
To resolve this problem, change the namespace for the service account in the New Relic DaemonSet
YAML file to be the same as the namespace for the current context:
- kind: ServiceAccount name: newrelic namespace: default---