Kubernetes monitoring integration

New Relic Infrastructure on-host integrations include an integration that gives increased visibility into the performance of your Kubernetes environment. This document explains:

Features

New Relic's Kubernetes integration instruments the container orchestration layer by reporting metrics from Kubernetes objects. The integration gives you insight into your Kubernetes nodes, namespaces, deployments, replica sets, pods, and containers. More functionality is planned for future releases.

Features include:

  • View your data in pre-built dashboards for immediate insight into your Kubernetes environment.
  • Create your own custom queries and charts in Insights from automatically reported data.
  • Create alert conditions on Kubernetes data.

These features are in addition to the data New Relic Infrastructure already reports for containerized processes running on instrumented hosts.

Compatibility and requirements

The New Relic Kubernetes integration requires:

Kubernetes integration requirements Comments
New Relic Infrastructure

Infrastructure Pro or trial subscription.

Linux distribution

Must be compatible with New Relic Infrastructure.

Kubernetes cluster

Currently tested with versions 1.6 to 1.10, and 1.11 (beta).

Kubernetes cluster GKE

Currently tested with versions 1.9 and 1.10.

Kubernetes cluster EKS

Currently tested with version 1.10.

Kubernetes cluster AKS

Currently tested with version 1.10.

Kubernetes cluster OpenShift

Currently tested with versions 3.7 and 3.9.

Kubernetes cluster Tektonic

Currently tested with version 1.9.

kube-state-metrics

Requires kube-state-metrics version 1.1.0, 1.2.0, or 1.3.0 running on the cluster. Other versions may work, but New Relic does not officially support them.

Install and configure

To activate the Kubernetes integration, deploy the newrelic-infra agent onto a Kubernetes cluster as a daemon set:

  1. Install kube-state-metrics and get it running on the cluster. For example, for version 1.3.0:

    curl -o kube-state-metrics-1.3.zip https://codeload.github.com/kubernetes/kube-state-metrics/zip/release-1.3 && unzip kube-state-metrics-1.3.zip && kubectl apply -f kube-state-metrics-release-1.3/kubernetes
  2. Download the integration configuration file:

    curl -O https://download.newrelic.com/infrastructure_agent/integrations/kubernetes/newrelic-infrastructure-k8s-latest.yaml
    

    In the configuration file, add your New Relic license key and a cluster name to identify your Kubernetes cluster.

    Recommendation: Do not change the NRIA_PASSTHROUGH_ENVIRONMENT or NRIA_DISPLAY_NAME value in your configuration file.

    env:
      - name: NRIA_LICENSE_KEY
        value: YOUR_LICENSE_KEY
      - name: CLUSTER_NAME
        value: YOUR_CLUSTER_NAME
  3. Optional: Complete these steps as applicable:

    Specify the Kubernetes API host and port

    This is necessary when you are using SSL and not using the default FQDN. The Kubernetes API FQDN needs to match the FQDN of the SSL certificate.

    You do not need to specify both variables. For example, if you only specify the HOST, the default PORT will be used.

    - name: "KUBERNETES_SERVICE_HOST" 
      value: "<Kubernetes API host>"
    - name: "KUBERNETES_SERVICE_PORT" 
      value: "<Kubernetes API TCP port>"
      
    Kubernetes versions 1.6 to 1.7.5: Edit manifest file

    For Kubernetes versions 1.6 to 1.7.5, uncomment these two lines in the manifest file:

    - name: "CADVISOR_PORT" # Enable direct connection to cAdvisor by specifying the port.  Needed for Kubernetes versions prior to 1.7.6.
      value: "4194"
      
    Use environment variables

    Use environment variables that can be passed to the Kubernetes integration if you use a proxy to configure its URL.

    kube-state-metrics URL: Check if changed from default

    If the URL of kube-state-metrics was changed from the default, uncomment and configure the following lines:

     - name: "KUBE_STATE_METRICS_URL"
       value: "http://<kube-state-metrics IP or FQDN>:<port>"
    
    Non-default namespace deployments: Edit config file

    If you intend to deploy in a different namespace from default, change all values of namespace in the configuration file.

  4. Confirm that kube-state-metrics is installed.

    kubectl get pods --all-namespaces | grep kube-state-metrics
  5. Create the daemon set:

    kubectl create -f newrelic-infrastructure-k8s-latest.yaml
  6. Confirm that the daemon set has been created successfully by looking for newrelic-infra in the results generated by this command:

    kubectl get daemonsets
  7. To confirm that the integration has been configured correctly, wait a few minutes, then run this New Relic Insights query to see if data has been reported:

    SELECT * FROM K8sPodSample since 1 day ago

If you do not see Kubernetes data, review these Kubernetes integration and configuration procedures again, then follow the troubleshooting procedures. If necessary follow the uninstall procedures.

Install on Kubernetes managed services and platforms

In addition to installation directly on a server or when using VMs, the Kubernetes integration can be installed on the following platforms:

To deploy in Amazon EKS, use the version of kubectl provided by AWS. Then, follow New Relic's standard procedures to install and configure the Kubernetes integration.

The Kubernetes integration monitors worker nodes. In Amazon EKS, master nodes are managed by Amazon and abstracted from the Kubernetes platforms.

The Kubernetes integration monitors worker nodes. In GKE, master nodes are managed by Google and abstracted from the Kubernetes platforms.

Before installing the Kubernetes integration on GKE, ensure you have sufficient permissions:

  1. Go to https://console.cloud.google.com/iam-admin/iam and find your username. Click edit.
  2. Ensure you have permissions to create Roles and ClusterRoles: If you are not sure, add the Kubernetes Engine Cluster Admin role. If you cannot edit your user role, ask the owner of the GCP project to give you the necessary permissions.

  3. Ensure you have a RoleBinding that grants you the same permissions to create Roles and ClusterRoles:

    kubectl create clusterrolebinding YOUR_USERNAME-cluster-admin-binding --clusterrole=cluster-admin --user=YOUR_GCP_EMAIL

    Creating a RoleBinding is necessary because of a known RBAC issue in Kubernetes and Kubernetes Engine versions 1.6 or higher. This issue is related to the manner in which Kubernetes Engine checks permissions when you create a Role or ClusterRole. As a workaround, the code creates a RoleBinding that gives your Google identity a cluster-admin role. For more information, see Google Cloud's documentation on defining permissions in a role.

  4. Follow New Relic's standard procedures to install and configure the Kubernetes integration.

To install the Kubernetes integration with Openshift:

  1. Add the newrelic service account to your privileged Security Context Constraints:

    oc adm policy add-scc-to-user privileged \
    system:serviceaccount:<namespace>:newrelic
            
    Follow New Relic's standard procedures to install and configure the Kubernetes integration.
  2. Edit the Kubernetes integration configuration file. In the securityContext: section, add a privileged: true setting to be run and deployed in the default namespace:

    spec:
      serviceAccountName: newrelic
      containers:
         - name: newrelic-infra
           image: newrelic/infrastructure-k8s:1.0.0
           securityContext:
             privileged: true
           resources:
             limits:
               memory: 100Mi
  3. Save your changes.

To deploy in Azure Kubernetes Service (AKS), follow New Relic's standard procedures to install and configure the Kubernetes integration.

The Kubernetes integration monitors worker nodes. In Azure Kubernetes Service, master nodes are managed by Azure and abstracted from the Kubernetes platforms.

Update to the latest version

If you are already running the Kubernetes integration and want to update the newrelic-infra agent to the latest agent version:

  1. Download the integration configuration file:

    curl -O https://download.newrelic.com/infrastructure_agent/integrations/kubernetes/newrelic-infrastructure-k8s-latest.yaml
    
  2. Copy the changes you made to the configuration file. At a minimum, include CLUSTER_NAME and NRIA_LICENSE_KEY, and paste your changes in the configuration file you downloaded.

  3. Delete the daemon set currently running:

    kubectl delete -f PREVIOUS_CONFIGURATION_FILE
    
  4. Install the latest daemon set with the following command:

    kubectl create -f newrelic-infrastructure-k8s-latest.yaml
    

Find and use data

To view the Kubernetes integration's dashboard:

  1. Go to infrastructure.newrelic.com > Integrations > On host integrations.
  2. Select the Kubernetes dashboard link to open the Kubernetes dashboard.
  3. To create your own dashboards, go to insights.newrelic.com and create NRQL queries.

Kubernetes data is attached to the following event types.

Event name Type of Kubernetes data
K8sNodeSample Node data
K8sNamespaceSample Namespace data
K8sDeploymentSample Deployment data
K8sReplicaSetSample Replica set data
K8sPodSample Pod data
K8sContainerSample Container data
K8sVolumeSample Volume data

Manage alerts

You can be notified about alert violations for your Kubernetes data:

Create an alert condition

To create an alert condition for the Kubernetes integration:

  1. Go to infrastructure.newrelic.com > Settings > Alerts > Kubernetes, then select Create alert condition.
  2. To filter the alert to Kubernetes entities that only have the chosen attributes, select Filter.
  3. Select the threshold settings. For more on the Trigger an alert when... options, see Alert types.
  4. Select an existing alert policy, or create a new one.
  5. Select Create.

When an alert condition's threshold is triggered, New Relic sends a notification to the policy's notification channels.

Kubernetes integration alert condition
infrastructure.newrelic.com > Settings > Alerts > Kubernetes > Create alert condition: Infrastructure includes alert conditions specific to Kubernetes.
Use alert types and thresholds

To use any of the available Kubernetes-specific alert criteria, select the Kubernetes alert type:

Kubernetes alert types Comments
Available pods are less than desired pods

This alert type monitors replica sets. The alert triggers if the number of available replicas (pods) for a deployment is less than the number of replicas you chose when creating the deployment. This can happen if there are not enough resources in your cluster to schedule all pods for a deployment. New Relic applies the alert conditions individually to each deployment matching the specified filter.

Container CPU usage

This alert type compares the CPU consumption of a container with the limit that you defined when it was created. The alert triggers if usage exceeds the threshold. Container CPU usage is defined as:

(CPU cores used / CPU cores limit) * 100
Container memory usage

This alert type compares the memory consumption of a container with the limit that was defined when it was created. The alert triggers if usage exceeds the threshold. Container memory usage is defined as:

(memory used / memory limit) * 100

In addition, you can create an alert condition for any metric collected by any New Relic integration you use, including the Kubernetes integration:

  1. Select the alert type Integrations.
  2. From the Select a data source dropdown, select a Kubernetes (K8s) data source.
Select alert notifications

When an alert condition's threshold is triggered, New Relic sends a message to the notification channel(s) chosen in the alert policy. Depending on the type of notification, you may have the following options:

The entity identifier that triggered the alert appears near the top of the notification message. The format of the identifier depends on the alert type:

  • Available pods are less than desired pods alerts:

    K8s:CLUSTER_NAME:PARENT_NAMESPACE:replicaset:REPLICASET_NAME
  • CPU or memory usage alerts:

    K8s:CLUSTER_NAME:PARENT_NAMESPACE:POD_NAME:container:CONTAINER_NAME

Here are some examples.

Pod alert notification example

For Available pods are less than desired pods alerts, the ID of the replica set triggering the issue might look like this:

k8s:beam-production:default:replicaset:nginx-deployment-1623441481

This identifier contains the following information:

  • Cluster name: beam-production
  • Parent namespace: default
  • ReplicaSet name: nginx-deployment-1623441481
Container resource notification example

For container CPU or memory usage alerts, the entity might look like this:

k8s:beam-production:kube-system:kube-state-metrics-797bb87c75-zncwn:container:kube-state-metrics

This identifier contains the following information:

  • Cluster name: beam-production
  • Parent namespace: kube-system
  • Pod namespace: kube-state-metrics-797bb87c75-zncwn
  • Container name: kube-state-metrics
Create alert conditions using NRQL

Follow standard procedures to create alert conditions for NRQL queries.

Kubernetes attributes and metrics

The Kubernetes integration collects the following metrics and other attributes. For more on using integration data, see Find and use data.

Node data

Query the K8sNodeSample event in New Relic Insights for node data:

Node attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
cpuUsedCoreMilliseconds Node CPU usage measured in core milliseconds.
cpuUsedCores Node CPU usage measured in cores.
fsAvailableBytes Bytes available in the node filesystem.
fsCapacityBytes Total capacity of the node filesystem in bytes.
fsInodes Total number of inodes in the node filesystem.
fsInodesFree Free inodes in the node filesystem.
fsInodesUsed Used inodes in the node filesystem.
fsUsedBytes Used bytes in the node filesystem.
memoryAvailableBytes Bytes of memory available in the node.
memoryMajorPageFaultsPerSecond Number of major page faults per second in the node.
memoryRssBytes Bytes of rss memory.
memoryUsedBytes Bytes of memory used.
memoryWorkingSetBytes Bytes of memory in the working set.
net.errorCountPerSecond Number of errors per second while receiving/transmitting over the network.
nodeName Host name that the pod is running on.
runtimeAvailableBytes Bytes available to the container runtime filesystem.
runtimeCapacityBytes Total capacity assigned to the container runtime filesystem in bytes.
runtimeInodes Total number of inodes in the container runtime filesystem.
runtimeInodesFree Free inodes in the container runtime filesystem.
runtimeInodesUsed Used inodes in the container runtime filesystem.
runtimeUsedBytes Used bytes in the container runtime filesystem.

Namespace data

Query the K8sNamespaceSample event in New Relic Insights for namespace data:

Namespace attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt Timestamp of the namespace when it was created.
namespace Name of the namespace to be used as an identifier.
label.LABEL_NAME Labels associated with your namespace, so you can filter and query for specific namespaces.
status

Current status of the namespace.

The value can be Active or Terminated.

Deployment data

Query the K8sDeploymentSample event in New Relic Insights for deployment data:

Deployment attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt Timestamp of when the deployment was created.
deploymentName Name of the deployment to be used as an identifier.
namespace Name of the namespace that the deployment belongs to.
label.LABEL_NAME Labels associated with your deployment, so you can filter and query for specific deployments.
podsAvailable Number of replicas that are currently available.
podsDesired Number of replicas that you defined in the deployment.
podsTotal Total number of replicas that are currently running.
podsUnavailable Number of replicas that are currently unavailable.
podsUpdated Number of replicas that have been updated.
updatedAt Number of replicas that have been updated to achieve the desired state of the deployment.

Replica set data

Query the K8sReplicaSetSample event in New Relic Insights for replica set data:

Replica attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt Timestamp of when the replica set was created.
deploymentName Name of the deployment to be used as an identifier.
namespace Name of the namespace that the replica set belongs to.
observedGeneration Integer representing generation observed by the replica set.
podsDesired Number of replicas that you defined in the deployment.
podsFullyLabeled Number of pods that have labels that match the replica set pod template labels.
podsReady Number of replicas that are ready for this replica set.
podsTotal Total number of replicas that are currently running.
replicasetName Name of the replica set to be used as an identifier.

Pod data

Query the K8sPodSample event in New Relic Insights for pod data:

Pod attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
createdAt Timestamp of when the pod was created.
createdBy Name of the Kubernetes object that created the pod.

For example, newrelic-infra.

createdKind Kind of Kubernetes object that created the pod.

For example, DaemonSet.

deploymentName Name of the deployment to be used as an identifier.
isReady Boolean representing whether or not the pod is ready to serve requests.
isScheduled Boolean representing whether or not the pod has been scheduled to run on a node.
label.LABEL_NAME Labels associated with your pod, so you can filter and query for specific pods.
namespace Name of the namespace that the replica set belongs to.
net.errorCountPerSecond Number of errors per second while receiving/transmitting over the network.
net.rxBytesPerSecond Number of bytes per second received over the network.
net.txBytesPerSecond Number of bytes per second transmitted over the network.
nodeIP Host IP address that the pod is running on.
nodeName Host name that the pod is running on.
podName Name of the pod to be used as an identifier.
startTime Timestamp of when the pod started running.
status Current status of the pod.

The value can be Pending, Running, Succeeded, Failed, Unknown.

Container data

Query the K8sContainerSample event in New Relic Insights for container data:

Container attribute Description
clusterName Name that you assigned to the cluster when you installed the Kubernetes integration.
containerID Unique ID associated with the container. If you are running Docker, this is the Docker container id.
containerImage Name of the image that the container is running.
containerImageID Unique ID associated with the image that the container is running.
containerName Name associated with the container.
cpuLimitCores Integer representing limit CPU cores defined for the container in the pod specification.
cpuRequestedCores Requested CPU cores defined for the container in the pod specification.
cpuUsedCores CPU cores actually used by the container.
deploymentName Name of the deployment to be used as an identifier.
isReady Boolean. Whether or not the container's readiness check succeeded.
label.LABEL_NAME Labels associated with your container, so you can filter and query for specific containers.
memoryLimitBytes Integer representing limit bytes of memory defined for the container in the pod specification.
memoryRequestedBytes Integer. Requested bytes of memory defined for the container in the pod specification.
memoryUsedBytes Integer. Bytes of memory actually used by the container.
namespace Name of the namespace that the container belongs to.
nodeIP Host IP address the container is running on.
nodeName Host name that the container is running on.
podName Name of the pod that the container is in, to be used as an identifier.
restartCount Number of times the container has been restarted.
status Current status of the container.

The value can be Running, Terminated, or Unknown.

Volume data

Query the K8sVolumeSample event in New Relic Insights for volume data:

Volume attribute Description
volumeName Name that you assigned to the volume at creation.
clusterName Cluster where the volume is configured.
namespace Namespace where the volume is configured.
podName The pod that the volume is attached to. The Kubernetes monitoring integration lists Volumes that are attached to a pod.
persistent If this is a persistent volume, this value is set to "true".
pvcNamespace Namespace where the Persistent Volume Claim is configured.
pvcName Name that you assigned to the Persistent Volume Claim at creation.
fsCapacityBytes Capacity of the volume in Bytes.
fsUsedBytes Usage of the volume in Bytes.
fsAvailableBytes Capacity available of the volume in Bytes.
fsUsedPercent Usage of the volume in Percentage.
fsInodes Total Inodes of the volume.
fsInodesUsed Inodes used in the volume.
fsInodesFree Inodes available in the volume.

Add Kubernetes metadata to APM-monitored applications

New Relic APM lets you add custom attributes, and that metadata is available in transaction traces. Custom attributes can provide information about the exact Kuberentes node, pod, or namespace where the transaction happened. This blog post explains how to add this type of Kubernetes metadata to APM-monitored application transactions.

Get logs and version

To generate verbose logs and get version and configuration information, follow these procedures.

Get verbose logs

For the Kubernetes integration, the Infrastructure agent adds a log entry only in the event of an error. Most common errors are displayed in the standard (non-verbose) logs. If you are doing a more in-depth investigation on your own or with New Relic Support, you can enable verbose mode.

Verbose mode will significantly increase the amount of information that is sent to log files. Temporarily enable this mode only for troubleshooting purposes, and reset the log level when finished.

To get verbose logging details:

  1. Enable verbose logging: In the deployment file, set the value of NRIA_VERBOSE to 1.
  2. Leave on verbose mode for a few minutes, or until you feel enough activity has occurred.
  3. Disable verbose mode: Set the NRIA_VERBOSE value back to 0.
  4. Apply the configuration by running:

    kubectl apply -f your_newrelic_k8s.yaml
    
  5. Get a list of nodes in the environment:

    kubectl get nodes --all-namespaces
    
  6. Get a list of Infrastructure and kube-state-metrics pods:

    kubectl get pods --all-namespaces -o wide | egrep 'newrelic|kube-state-metrics'
    
  7. Get logs from the pod connecting to kube-state-metrics.
  8. Retrieve kube-state-metrics service configuration.
Get Infrastructure version

For the Kubernetes integration, the Infrastructure agent is distributed as a Docker image that contains the Infrastructure agent and the Kubernetes integration. The Docker image is tagged with a version, and the Infrastructure agent also has its own version.

When the agent is successfully sending information to New Relic, you can retrieve the versions of the Infrastructure agent for Kubernetes (the Docker image) you are running in your clusters by using the following Insights query:

FROM K8sContainerSample SELECT uniqueCount(entityId) WHERE containerName = 'newrelic-infra' facet clusterName, containerImage
Kubernetes integration Infrastructure agent version
insights.newrelic.com > Query > (create your query): In this result you can see that cluster aws-rbac-cluster is running version 1.0.0-beta4, and the other two clusters are running 1.0.0-beta4-2.

If the agent is not reporting any data:

  1. Get the version(s) of the New Relic integration for Kubernetes that you are running in a cluster using kubectl:

    kubectl get pods --all-namespaces -l name=newrelic-infra -o jsonpath="{.items..spec..containers..image}"
    
  2. Look for output similar to this:

    newrelic/infrastructure-k8s:1.0.0
    
Get kube-state-metrics version

To retrieve the version of kube-state-metrics running on your clusters, run the following Insights NRQL query:

FROM K8sContainerSample SELECT uniqueCount(entityId) WHERE containerName = 'kube-state-metrics' facet clusterName, containerImage
Kube-state-metrics version
insights.newrelic.com > Query > (create your query): This example shows that the aws-rbac-cluster is running version 1.1.0 while the other two clusters are running version 1.2.0. New Relic's integration is only officially tested and supported with kube-state-metrics 1.1.0.
Get logs from pod connecting to kube-state-metrics

To get the logs from the pod connecting to kube-state-metrics:

  1. Get the node that kube-state-metrics is running on:

    kubectl get pods --all-namespaces -o wide | grep kube-state-metrics
    

    Look for output similar to this:

    kube-system   kube-state-metrics-5c6f5cb9b5-pclhh     2/2       
    Running   4          4d        172.17.0.3   minikube
    
  2. Get the New Relic Infrastructure pod that is running on the same node as kube-state-metrics:

    kubectl describe node minikube | grep newrelic-infra
    

    Look for output similar to this:

    default                    newrelic-infra-5wcv6                     100m (5%)
    0 (0%)      100Mi (5%)       100Mi (5%)
    
  3. Retrieve the logs for that node by running:

    kubectl logs newrelic-infra-5wcv6
    
Retrieve kube-state-metrics service configuration

To retrieve the configuration:

  1. Run:

    kubectl get pods --all-namespaces | grep "kube-state-metrics"
    
  2. Look for a response similar to this:

    kube-system   kube-state-metrics-5c6f5cb9b5-5wf9m     2/2       
    Running       8          6d
    
  3. Review the namespace in the first column.

    kubectl describe service kube-state-metrics -n <namespace>
    

For troubleshooting help, see Not seeing data or Error messages.

Uninstall

Each cluster will have a single node where kubectl is running. To uninstall the Kubernetes integration, use the following command on each of these nodes:

kubectl delete -f newrelic-infrastructure-k8s-latest.yaml

For more help

Other Kubernetes integration resources:

Recommendations for learning more: