Kubernetes monitoring integration

New Relic Infrastructure on-host integrations include an integration that gives increased visibility into the performance of your Kubernetes environment. This documentation explains:

Other resources:

Features

New Relic's Kubernetes integration instruments the container orchestration layer by reporting metrics from Kubernetes objects. The integration gives you insight into your Kubernetes nodes, namespaces, deployments, replica sets, pods, and containers. More functionality is planned for future releases.

Features include:

  • View your data in pre-built dashboards for immediate insight into your Kubernetes environment.
  • Create your own custom queries and charts in Insights from automatically reported data.
  • Create alert conditions on Kubernetes data.

These features are in addition to the data New Relic Infrastructure already reports for containerized processes running on instrumented hosts.

Compatibility and requirements

The New Relic Kubernetes integration requires:

New Relic has built and tested this integration using kube-state-metrics version 1.1.0, 1.2.0, and 1.3.0. While it may work with other versions, New Relic does not officially support them.

Install and configure

To activate the Kubernetes integration, deploy the newrelic-infra agent onto a Kubernetes cluster as a daemon set:

  1. Install kube-state-metrics and get it running on the cluster:

    • For version 1.3.0:

      curl -o kube-state-metrics-1.3.zip https://codeload.github.com/kubernetes/kube-state-metrics/zip/release-1.3 && unzip kube-state-metrics-1.3.zip && cd kube-state-metrics-release-1.3 && kubectl apply -f kubernetes
      
  2. Download the integration configuration file:

    curl -O https://download.newrelic.com/infrastructure_agent/integrations/kubernetes/newrelic-infrastructure-k8s-latest.yaml
    

    In the configuration file, add your New Relic license key and a cluster name to identify your Kubernetes cluster.

    Recommendation: Do not change the NRIA_PASSTHROUGH_ENVIRONMENT or NRIA_DISPLAY_NAME value in your configuration file.

    env:
      - name: NRIA_LICENSE_KEY
        value: YOUR_LICENSE_KEY
      - name: CLUSTER_NAME
        value: YOUR_CLUSTER_NAME
  3. Optional steps:

    1. If the URL of kube-state-metrics was changed from the default, uncomment and configure the following lines:

       - name: "KUBE_STATE_METRICS_URL"
         value: "http://<kube-state-metrics IP or FQDN>:<port>"
    2. If you intend to deploy in a different namespace from default ; change all values of namespace in the configuration file.

  4. Confirm that kube-state-metrics is installed in the namespace kube-system.

    kubectl get pods -n kube-system | grep kube-state-metrics
  5. Create the daemon set with the following command:

    kubectl create -f newrelic-infrastructure-k8s-latest.yaml
  6. Confirm that the daemon set has been created successfully by looking for newrelic-infra in the results of the following command:

    kubectl get daemonsets
  7. To confirm that the integration has been configured correctly, wait a few minutes, then run this New Relic Insights query to see if data has been reported:

    SELECT * FROM K8sPodSample since 1 day ago

If you do not see Kubernetes data, see the troubleshooting documentation.

Install on Kubernetes managed services and platforms

In addition to installation directly on a server or via VMs, New Relic provides installation instructions for the Kubernetes integration on the following platforms:

To deploy in Amazon EKS, use the version of kubectl provided by AWS. Then, follow the standard instructions to install and configure the Kubernetes integration.

The Kubernetes integration monitors worker nodes. In Amazon EKS, master nodes are managed by Amazon and abstracted from the Kubernetes platforms.

The Kubernetes integration monitors worker nodes. In GKE, master nodes are managed by Google and abstracted from the Kubernetes platforms.

To install the Kubernetes integration on GKE, you need to ensure you have sufficient permissions before installation:

  1. Go to https://console.cloud.google.com/iam-admin/iam and find your username. Click edit.
  2. Ensure you have permissions to create Roles and CloserRoles. If you are not sure, adding the Kubernetes Engine Cluster Admin role will grant you sufficient permissions.

    If you can't edit your user role, ask the owner of the GCP project to give you the necessary permissions.

  3. Ensure you have a RoleBinding that grants you the same permissions created above:

    kubectl create clusterrolebinding YOUR_USERNAME-cluster-admin-binding --clusterrole=cluster-admin --user=YOUR_GCP_EMAIL

    Creating a RoleBinding is necessary because of a known RBAC issue in Kubernetes and Kubernetes Engine versions 1.6 or later. This issue is related to the manner in which Kubernetes Engine checks permissions when you create a Role or ClusterRole. As a workaround, the code above creates a RoleBinding that gives your Google identity a cluster-admin role. For more information, see Google Cloud's documentation on Defining Permissions in a Role.

  4. Follow the standard instructions to install and configure the Kubernetes integration.

To install the Kubernetes integration with Openshift:

  1. Add the newrelic service account to your privileged Security Context Constraints:

    oc adm policy add-scc-to-user privileged \
    system:serviceaccount:<namespace>:newrelic
            
    Follow the standard instructions to install and configure the Kubernetes integration.
  2. Edit the Kubernetes integration configuration file. In the securityContext: section, add a priviliged: true setting:

    spec:
      serviceAccountName: newrelic
      containers:
         - name: newrelic-infra
           image: newrelic/infrastructure-k8s:1.0.0
           securityContext:
             privileged: true
           resources:
             limits:
               memory: 100Mi
  3. Save your changes.

The instructions are meant to be run and deployed in the default namespace.

Update to the latest version

If you are already running the Kubernetes integration and want to update the newrelic-infra agent to the latest version:

  1. Download the integration configuration file:

    curl -O https://download.newrelic.com/infrastructure_agent/integrations/kubernetes/newrelic-infrastructure-k8s-latest.yaml
    
  2. Copy the changes you made to the configuration file (at a minimum, this should include CLUSTER_NAME and NRIA_LICENSE_KEY) and paste it in the configuration file you downloaded in step 1.

  3. Delete the daemon set currently running:

    kubectl delete -f PREVIOUS_CONFIGURATION_FILE
    
  4. Install the latest daemon set with the following command:

    kubectl create -f newrelic-infrastructure-k8s-latest.yaml
    

For how to check your agent version, see Check version.

Find and use data

To view the Kubernetes integration's dashboard, go to infrastructure.newrelic.com > Integrations > On Host Integrations, then select the Kubernetes dashboard link. This will open the automatically created default Kubernetes dashboard in Insights.

To create your own dashboards, go to insights.newrelic.com and create NRQL queries. Kubernetes data is attached to the following event types:

Event name Type of Kubernetes data
K8sNodeSample Node data
K8sNamespaceSample Namespace data
K8sDeploymentSample Deployment data
K8sReplicaSetSample Replica set data
K8sPodSample Pod data
K8sContainerSample Container data

For more on general tips for using integration data, see Understand and use integration data.

Alerts

You can create alert conditions for your Kubernetes data:

Create an alert

To create an alert condition for the Kubernetes integration:

  1. Go to infrastructure.newrelic.com > Settings > Alerts > Kubernetes, then select Create alert condition.
  2. To filter the alert to Kubernetes entities that only have the chosen attributes, select Filter.
  3. Select the threshold settings. For more on the Trigger an alert when... options, see Alert types.
  4. Select an existing alert policy, or create a new one.
  5. Select Create.

When an alert condition's threshold is triggered, New Relic sends a notification to the policy's notification channels.

Kubernetes integration alert condition
infrastructure.newrelic.com > Settings > Alerts > Kubernetes > Create alert condition: Infrastructure includes alert conditions specific to Kubernetes.
Alert types and thresholds

There are several Kubernetes-specific alert criteria:

  • Available pods are less than desired pods: This alert type monitors replica sets. The alert triggers if the number of available replicas (pods) for a deployment is less than the number of replicas you chose when creating the deployment. This can happen if there is not enough resources in your cluster to schedule all pods for a deployment. The alerts will be applied individually to each deployment matching the specified filter.
  • Container CPU usage: This alert type compares the CPU consumption of a container with the limit that you defined when it was created. The alert triggers if usage exceeds the threshold. Container CPU usage is defined as:

    (CPU cores used / CPU cores limit) * 100
    
  • Container memory usage: This alert type compares the memory consumption of a container with the limit that was defined when it was created. The alert triggers if usage exceeds the threshold. Container memory usage is defined as:

    (memory used / memory limit) * 100
    
Alert notifications

When an alert condition's threshold is triggered, New Relic sends a message to the notification channel(s) chosen in the alert policy. Depending on the type of notification, you may have the following options:

The entity identifier that triggered the alert appears near the top of the notification message. The format of the identifier depends on the alert type:

  • Available pods are less than desired pods alerts:

    K8s:CLUSTER_NAME:PARENT_NAMESPACE:replicaset:REPLICASET_NAME
  • CPU or memory usage alerts:

    K8s:CLUSTER_NAME:PARENT_NAMESPACE:POD_NAME:container:CONTAINER_NAME

Here are some examples.

Pod alert notification example

For Available pods are less than desired pods alerts, the ID of the replica set triggering the issue might look like this:

k8s:beam-production:default:replicaset:nginx-deployment-1623441481

This identifier contains the following information:

  • Cluster name: beam-production
  • Parent namespace: default
  • ReplicaSet name: nginx-deployment-1623441481
Container resource notification example

For container CPU or memory usage alerts, the entity might look like this:

k8s:beam-production:kube-system:kube-state-metrics-797bb87c75-zncwn:container:kube-state-metrics

This identifier contains the following information:

  • Cluster name: beam-production
  • Parent namespace: kube-system
  • Pod namespace: kube-state-metrics-797bb87c75-zncwn
  • Container name: kube-state-metrics
Create alert conditions using NRQL

See the detailed information about creating alert conditions for NRQL queries.

Kubernetes attributes and metrics

The Kubernetes integration collects the following metrics and other attributes. For more on using integration data, see Find and use data.

Node data

Query the K8sNodeSample event in New Relic Insights for node data:

Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
cpuUsedCoreMilliseconds
gauge
Node CPU usage measured in core milliseconds.
cpuUsedCores
gauge
Node CPU usage measured in cores.
fsAvailableBytes
gauge
Bytes available in the node filesystem.
fsCapacityBytes
gauge
Total capacity of the node filesystem in bytes.
fsInodes
gauge
Total number of inodes in the node filesystem.
fsInodesFree
gauge
Free inodes in the node filesystem.
fsInodesUsed
gauge
Used inodes in the node filesystem.
fsUsedBytes
gauge
Used bytes in the node filesystem.
memoryAvailableBytes
gauge
Bytes of memory available in the node.
memoryMajorPageFaultsPerSecond
gauge
Number of major page faults per second in the node.
memoryRssBytes
gauge
Bytes of rss memory.
memoryUsedBytes
gauge
Bytes of memory used.
memoryWorkingSetBytes
gauge
Bytes of memory in the working set.
net.errorCountPerSecond
gauge
Number of errors per second while receiving/transmitting over the network.
nodeName
string
Host name that the pod is running on.
runtimeAvailableBytes
gauge
Bytes available to the container runtime filesystem.
runtimeCapacityBytes
gauge
Total capacity assigned to the container runtime filesystem in bytes.
runtimeInodes
gauge
Total number of inodes in the container runtime filesystem.
runtimeInodesFree
gauge
Free inodes in the container runtime filesystem.
runtimeInodesUsed
gauge
Used inodes in the container runtime filesystem.
runtimeUsedBytes
gauge
Used bytes in the container runtime filesystem.

Namespace data

Query the K8sNamespaceSample event in New Relic Insights for namespace data:

Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt
timestamp
Timestamp of the namespace when it was created.
namespace
string
Name of the namespace to be used as an identifier.
label.LABEL_NAME
string
Labels associated with your namespace, so you can filter and query for specific namespaces.
status
string

Current status of the namespace.

The value can be Active or Terminated.

Deployment data

Query the K8sDeploymentSample event in New Relic Insights for deployment data:

Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt
timestamp
Timestamp of when the deployment was created.
deploymentName
string
Name of the deployment to be used as an identifier.
namespace
string
Name of the namespace that the deployment belongs to.
label.LABEL_NAME
string
Labels associated with your deployment, so you can filter and query for specific deployments.
podsAvailable
gauge
Number of replicas that are currently available.
podsDesired
gauge
Number of replicas that you defined in the deployment.
podsTotal
gauge
Total number of replicas that are currently running.
podsUnavailable
gauge
Number of replicas that are currently unavailable.
podsUpdated
gauge
Number of replicas that have been updated.
updatedAt
gauge
Number of replicas that have been updated to achieve the desired state of the deployment.

Replica set data

Query the K8sReplicaSetSample event in New Relic Insights for replica set data:

Attribute Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt
timestamp
Timestamp of when the replica set was created.
deploymentName
string
Name of the deployment to be used as an identifier.
namespace
string
Name of the namespace that the replica set belongs to.
observedGeneration
integer
Generation observed by the replica set.
podsDesired
gauge
Number of replicas that you defined in the deployment.
podsFullyLabeled
gauge
Number of pods that have labels that match the replica set pod template labels.
podsReady
gauge
Number of replicas that are ready for this replica set.
podsTotal
gauge
Total number of replicas that are currently running.
replicasetName
string
Name of the replica set to be used as an identifier.

Pod data

Query the K8sPodSample event in New Relic Insights for pod data:

Attribute Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt
timestamp
Timestamp of when the pod was created.
createdBy
string
Name of the Kubernetes object that created the pod.

For example, newrelic-infra.

createdKind
string
Kind of Kubernetes object that created the pod.

For example, DaemonSet.

deploymentName
string
Name of the deployment to be used as an identifier.
isReady
boolean
Whether or not the pod is ready to serve requests.
isScheduled
boolean
Whether or not the pod has been scheduled to run on a node.
label.LABEL_NAME
string
Labels associated with your pod, so you can filter and query for specific pods.
namespace
string
Name of the namespace that the replica set belongs to.
net.errorCountPerSecond
count
Number of errors per second while receiving/transmitting over the network.
net.rxBytesPerSecond
rate
Number of bytes per second received over the network.
net.txBytesPerSecond
rate
Number of bytes per second trasmitted over the network.
nodeIP
string
Host IP address that the pod is running on.
nodeName
string
Host name that the pod is running on.
podName
string
Name of the pod to be used as an identifier.
startTime
timestamp
Timestamp of when the pod started running.
status
string
Current status of the pod.

The value can be Pending, Running, Succeeded, Failed, Unknown.

Container data

Query the K8sContainerSample event in New Relic Insights for container data:

Attribute Name Description
clusterName
string
Name that you assigned to the cluster when you installed the Kubernetes integration.
containerID
string
Unique ID associated with the container. If you are running Docker, this is the Docker container id.
containerImage
string
Name of the image that the container is running.
containerImageID
string
Unique ID associated with the image that the container is running.
containerName
string
Name associated with the container.
cpuLimitCores
integer
Limit CPU cores defined for the container in the pod specification.
cpuRequestedCores
gauge
Requested CPU cores defined for the container in the pod specification.
cpuUsedCores
gauge
CPU cores actually used by the container.
deploymentName
string
Name of the deployment to be used as an identifier.
isReady
boolean
Whether or not the container's readiness check succeeded.
label.LABEL_NAME
string
Labels associated with your container, so you can filter and query for specific containers.
memoryLimitBytes
integer
Limit bytes of memory defined for the container in the pod specification.
memoryRequestedBytes
integer
Requested bytes of memory defined for the container in the pod specification.
memoryUsedBytes
integer
Bytes of memory actually used by the container.
namespace
string
Name of the namespace that the container belongs to.
nodeIP
string
Host IP address the container is running on.
nodeName
string
Host name that the container is running on.
podName
string
Name of the pod that the container is in, to be used as an identifier.
restartCount
count
Number of times the container has been restarted.
status
string
Current status of the container.

The value can be Running, Terminated, or Unknown.

Logging and version info

Below are instructions for generating verbose logs and getting version and configuration information.

Get verbose logs

For the Kubernetes integration, the Infrastructure agent adds a log entry only in the event of an error. Most common errors are displayed in the standard (non-verbose) logs. If you are doing a more in-depth investigation on your own or with New Relic support, you can enable verbose mode.

Verbose mode will significantly increase the amount of information that is sent to log files and should only be enabled temporarily for troubleshooting purposes.

To temporarily enable verbose logging:

  1. Enable verbose mode: Edit the deployment file by setting the value of NRIA_VERBOSE to 1.
  2. Leave on verbose mode for a few minutes, or until you feel enough activity has occurred.
  3. Disable verbose mode: Set the NRIA_VERBOSE value back to 0.
  4. Apply the configuration by running:

    kubectl apply -f your_newrelic_k8s.yaml
    

You may want even more in-depth logging details. This may sometimes be necessary when attempting to troubleshoot a problem with New Relic support. Here are some recommended procedures for getting very detailed logging information:

  1. Set the log level to verbose: in the deployment file, set the value of NRIA_VERBOSE to 1.

    Verbose mode will significantly increase the amount of information that is sent to log files and should only be enabled temporarily for troubleshooting purposes.

  2. Apply your new configuration by running:

    kubectl apply -f your_newrelic_k8s.yaml
    
  3. Get a list of nodes in the environment by running:

    kubectl get nodes --all-namespaces
    
  4. Get a list of Infrastructure and kube-state-metrics pods:

    kubectl get pods --all-namespaces -o wide | egrep 'newrelic|kube-state-metrics'
    
  5. Get logs from the pod connecting to kube-state-metrics.
  6. Retrieve kube-state-metrics service configuration.
  7. Disable verbose mode: Set the NRIA_VERBOSE value in the deployment file back to 0.
  8. Apply your new configuration by running:

    kubectl apply -f your_newrelic_k8s.yaml
    
Get Infrastructure version

For the Kubernetes integration, the Infrastructure agent is distributed as a Docker image that contains the Infrastructure agent and the Kubernetes integration. The Docker image is tagged with a version and the Infrastructure agent also has its own version.

Assuming the agent is successfully sending information to New Relic, you can retrieve the versions of the Infrastructure agent for Kubernetes (the Docker image) you are running in your clusters using the following Insights query:

FROM K8sContainerSample SELECT uniqueCount(entityId) WHERE containerName = 'newrelic-infra' facet clusterName, containerImage
Kubernetes integration Infrastructure agent version
insights.newrelic.com > Query > (create your query): In this result you can see that cluster aws-rbac-cluster is running version 1.0.0-beta4, and the other two clusters are running 1.0.0-beta4-2.

If the agent is not reporting any data, you can still get the version(s) of the New Relic integration for Kubernetes that you are running in a cluster using kubectl:

kubectl get pods --all-namespaces -l name=newrelic-infra -o jsonpath="{.items..spec..containers..image}"

This will return something like:

newrelic/infrastructure-k8s:1.0.0
Get kube-state-metrics version

You can retrieve the version of kube-state-metrics running on your clusters with the following Insights query:

FROM K8sContainerSample SELECT uniqueCount(entityId) WHERE containerName = 'kube-state-metrics' facet clusterName, containerImage
Kube-state-metrics version
insights.newrelic.com > Query > (create your query): This example shows that the aws-rbac-cluster is running version 1.1.0 while the other two clusters are running version 1.2.0. New Relic's integration is only officially tested and supported with kube-state-metrics 1.1.0.
Get logs from pod connecting to kube-state-metrics

To get the logs from the pod that's connecting to kube-state-metrics:

  • Get the node that kube-state-metrics is running on:

    kubectl get pods --all-namespaces -o wide | grep kube-state-metrics
    

    This will have an output like:

    kube-system   kube-state-metrics-5c6f5cb9b5-pclhh     2/2       
    Running   4          4d        172.17.0.3   minikube
    
  • Get the New Relic Infrastructure pod that is running on the same node as kube-state-metrics:

    kubectl describe node minikube | grep newrelic-infra
    

    This will have an output like:

    default                    newrelic-infra-5wcv6                     100m (5%)
    0 (0%)      100Mi (5%)       100Mi (5%)
    
  • Retrieve the logs for that node by running:

    kubectl logs newrelic-infra-5wcv6
    
Retrieve kube-state-metrics service configuration

To retrieve the configuration, run:

kubectl get pods --all-namespaces | grep "kube-state-metrics"

And you will get a response like:

kube-system   kube-state-metrics-5c6f5cb9b5-5wf9m     2/2       
Running       8          6d

The first column is the namespace.

kubectl describe service kube-state-metrics -n <namespace>

For troubleshooting help, see Not seeing data, or Error messages.

For more help

Recommendations for learning more: