Problem
You have completed the installation procedure for New Relic's Kubernetes integration, you are seeing Kubernetes data in your New Relic account but there is no data from any of the control plane components.
In case the control plane data is missing, for example K8sSchedulerSample
, the first thing to do is to check verbose logs of the control plane components. Read how to enable verbose logging
A possibility is that the autodiscovery tries to find in the cluster the control plane pods leveraging the most common labels, in case no pod is found for a single component it does not fail to avoid missing more data. In this scenario you will see logs similar to the following:
time="2022-06-21T12:21:25Z" level=debug msg="Autodiscovering pods for \"scheduler\""time="2022-06-21T12:21:25Z" level=debug msg="0 pods found with labels \"tier=control-plane,component=kube-scheduler\""time="2022-06-21T12:21:25Z" level=debug msg="No pod found for \"scheduler\" with labels \"tier=control-plane,component=kube-scheduler\""time="2022-06-21T12:21:25Z" level=debug msg="0 pods found with labels \"k8s-app=kube-scheduler\""time="2022-06-21T12:21:25Z" level=debug msg="No pod found for \"scheduler\" with labels \"k8s-app=kube-scheduler\""time="2022-06-21T12:21:25Z" level=debug msg="0 pods found with labels \"app=openshift-kube-scheduler,scheduler=true\""time="2022-06-21T12:21:25Z" level=debug msg="No pod found for \"scheduler\" with labels \"app=openshift-kube-scheduler,scheduler=true\""time="2022-06-21T12:21:25Z" level=debug msg="No \"scheduler\" pod has been discovered"In this case you can change the discovery behavior with the
controlplane.config.[component].autodiscover[].selector
config of the helm chart values. Read more about control plane components.It is also possible that the controlplane component is found, but the authentication with the endpoint fails. In this scenario you will see logs similar to the following:
time="2022-06-21T15:54:52Z" level=debug msg="Endpoint \"https://localhost:10257\" probe failed, skipping: http request failed with status: 403 Forbidden"In this case you can change the authentication behavior for each endpoints with the
controlplane.config.[component].autodiscover[].endpoints[].auth
config of the helm chart values.It is also possible that the controlplane component of the integration is not running on all master nodes.
You can doublecheck that running:
kubectl get pod -n <NEWRELIC_NAMESPACE> -l app.kubernetes.io/component=controlplane -o wideIf there is any controlplane pod you want to monitor running on a node whithout a Newrelic monitoring instance then you can change as needed
controlplane.affinity
,controlplane.nodeSelector
andcontrolplane.tolerations
of the helm chart values.
In case the controlplane components does not autodiscover or scrape succesfully any controlplane pod it enters in CrashLoopBackOff.
As described in the previous section you can change the behaviour of autodiscovery and the authentication methods to meet your needs.
On the other hand if you are not interested in that data you can simply disable the controlplane compoenent by setting controlplane.enabled=false
in the the helm chart values.
Solution for integration version V2
Execute the following commands to manually find the master nodes:
kubectl get nodes -l node-role.kubernetes.io/master=""
kubectl get nodes -l kubernetes.io/role="master"
If the master nodes follow the labeling convention defined in the discovery of master nodes and control plane components documentation section, you should get some output like:
NAME STATUS ROLES AGE VERSIONip-10-42-24-4.ec2.internal Ready master 42d v1.14.8
If no nodes are found, there are two scenarios:
Your master nodes don’t have the required labels that identify them as masters, in this case you need to add both labels to your master nodes.
You’re in a managed cluster and your provider is handling the master nodes for you. In this case there is nothing you can do, since your provider is limiting the access to those nodes.
Replace the placeholder in the following command with one of the node names returned in the previous step to get an integration pod running on a master node:
kubectl get pods --field-selector spec.nodeName=NODE_NAME -l name=newrelic-infra --all-namespaces
The next command is the same, just that it selects the node for you:
kubectl get pods --field-selector spec.nodeName=$(kubectl get nodes -l node-role.kubernetes.io/master="" -o jsonpath="{.items[0].metadata.name}") -l name=newrelic-infra --all-namespaces
If everything is correct you should get some output like:
NAME READY STATUS RESTARTS AGEnewrelic-infra-whvzt 1/1 Running 0 6d20h
If the integration is not running on your master nodes, check that the daemonset has all the desired instances running and ready.
kubectl get daemonsets -l app=newrelic-infra --all-namespaces
Refer to the discovery of master nodes and control plane components documentation section and look for the labels the integration uses to discover the components. Then run the following commands to see if there are any pods with such labels and the nodes where they are running:
kubectl get pods -l k8s-app=kube-apiserver --all-namespaces
If there is component with the given label you should see something like:
NAMESPACE NAME READY STATUS RESTARTS AGEkube-system kube-apiserver-ip-10-42-24-42.ec2.internal 1/1 Running 3 49d
The same should be done with the rest of the components:
kubectl get pods -l k8s-app=etcd-manager-main --all-namespaces
kubectl get pods -l k8s-app=kube-scheduler --all-namespaces
kubectl get pods -l k8s-app=kube-kube-controller-manager --all-namespaces
To retrieve the logs, follow the instructions on get logs from pod running on a master node. The integration logs for every component the following message “Running job: COMPONENT_NAME”. Ex:
Running job: scheduler
Running job: etcd
Running job: controller-manager
Running job: api-server
If you didn’t specify the ETCD_TLS_SECRET_NAME
configuration option you’ll find the following message in the logs:
Skipping job creation for component etcd: etcd requires TLS configuration, none given
If any error occurs while querying the metrics of any component it will be logged after the Running job
message.
Refer to the discovery of master nodes and control plane components documentation section to get the endpoint of the control plane component you want to query. With the endpoint we can use the integration pod that’s running on the same node as the component to query. The following are examples on how to query the Kubernetes scheduler:
kubectl exec -ti POD_NAME -- wget -O - localhost:10251/metrics
The following command does the same, but also chooses the pod for you:
kubectl exec -ti $(kubectl get pods --all-namespaces --field-selector spec.nodeName=$(kubectl get nodes -l node-role.kubernetes.io/master="" -o jsonpath="{.items[0].metadata.name}") -l name=newrelic-infra -o jsonpath="{.items[0].metadata.name}") -- wget -O - localhost:10251/metrics
If everything is correct you should get some metrics on the Prometheus format, something like:
Connecting to localhost:10251 (127.0.0.1:10251)# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0