VMware vSphere monitoring integration

New Relic's VMware vSphere integration helps you understand the health and performance of your vSphere environment. You can:

  • Query data to get insights on the performance on your hypervisors, virtual machines, and more.
  • Go from high level views down to the most granular data.
Sample dashboard - VMware vSphere Integration
vSphere data visualized in a New Relic dashboard: operating systems, status, average CPU and memory consumption, and more.

Our integration uses the vSphere API to collect metrics and events generated by all vSphere's components, and forwards the data to our platform via the infrastructure agent.

Why it matters

With our vSphere integration you can:

  • Instrument and monitor multiple vSphere instances using the same account.

  • Collect data on snapshots, VMs, hosts, resource pools, clusters, and datastores, including tags.

  • Monitor the health of your hypervisors and VMs using our charts and dashboards.

  • Use the data retrieved to monitor key performance and key capacity scaling indicators.

  • Set alerts based on any metrics collected from vCenter.

  • Create workloads to group resources and focus on key data.

    vSphere data in New Relic Workloads
    You can create workloads using data collected via the vSphere integration.

Compatibility and requirements

Our integration is compatible with VMware vSphere 6.5 or higher.

Before installing the integration, make sure that you meet the following requirements:

In large environments, where the number of virtual machines is bigger than 800, the integration is not currently able to report all data and might fail.

There is a known workaround for these environments that will preserve all metrics and events, but it will disable entity registration. To apply the workaround add the following environment variable to the configuration file: EVENTS: true and METRICS: true.

Install and activate

To install the vSphere integration, choose your setup:

Linux installation
  1. Follow the instructions for installing an integration, using the file name nri-vsphere.
  2. Change the directory to the integrations folder:
    cd /etc/newrelic-infra/integrations.d
  3. Copy of the sample configuration file:
    sudo cp vsphere-config.yml.sample vsphere-config.yml
  4. Edit the vsphere-config.yml file as described in the configuration settings.
  5. Restart the infrastructure agent.
Windows installation
  1. Download the nri-vsphere MSI installer image from:

    https://download.newrelic.com/infrastructure_agent/windows/integrations/nri-vsphere/nri-vsphere-amd64.msi

  2. To install from the Windows command prompt, run:
    msiexec.exe /qn /i PATH\TO\nri-vsphere-amd64.msi
  3. In the Integrations directory, C:\Program Files\New Relic\newrelic-infra\integrations.d\, create a copy of the sample configuration file by running:

    cp vsphere-config.yml.sample vsphere-config.yml
  4. Edit the vsphere-config.yml file as described in the configuration settings.
  5. Restart the infrastructure agent.
Tarball installation (advanced)

You can also install the integration from a tarball file. This gives you full control over the installation and configuration process.

Configure the integration

An integration's YAML-format configuration is where you can place required login credentials and configure how data is collected. Which options you change depend on your setup and preference.

To configure the vSphere integration, you must define the URL of the vSphere API endpoint, and your vSphere username and password. For configuration examples, see the sample configuration files.

Some features of the vSphere integration are optional and can be enabled via configuration settings.

With secrets management, you can configure on-host integrations with New Relic Infrastructure's agent to use sensitive data (such as passwords) without having to write them as plain text into the integration's configuration file. For more information, see Secrets management.

Performance metrics provide a better understanding of the current status of VMware resources and can be collected in addition to the metrics collected by default;and included in the samples;described at the bottom of the page.

All metrics collected are included in the corresponding sample with the perf. prefix attached to the name. For example, net.packetsRx.summation is collected and sent as perf.net.packetsRx.summation.

To collect vSphere performance metrics, use the ENABLE_VSPHERE_PERF_METRICS environment variable.

Data is collected according to the settings in the vsphere-performance.metrics configuration file. You can override the location of the performance metrics config file using PERF_METRIC_FILE environment variable. Notice that the integration follows VMware's data collection levels (1 to 4).

When ENABLE_VSPHERE_PERF_METRICS is set, all level 1 metrics are collected. The data collection level of the performance metrics collected can be modified using PERF_LEVEL. Each metric in the config file can be commented out and new ones can be added if needed.

Collection of performance data can increase the load in vCenter and the time needed by to collect data. We recommended to only include the metrics you need in the configuration file.

To fine-tune data collection, the number of entities and metrics retrieved per request can be modified using BATCH_SIZE_PERF_ENTITIES and BATCH_SIZE_PERF_METRICS.

For more information on vSphere performance metrics, see the VMware documentation.

To collect vSphere events, use the ENABLE_VSPHERE_EVENTS environment variable.

The integration collects events between the current time and the last fetched event for each datacenter. It stores the information regarding the last fetched event in a cache that is updated after each execution. Events are only available if the integration is connected to a vCenter and not directly to an ESXi host.

The number of events collected per request can be tuned by modifying EVENTS_PAGE_SIZE, which is set to 100 by default.

Events are available in the Events page and can be queried via NRQL as InfrastructureEvent under vSphereEvent. Here is an example of vSphere events data:

"summary": "User dcui@127.0.0.1 logged out (login time: Tuesday, 14 July, 2020 08:32:09 AM, number of API invocations: 0, user agent: VMware-client/6.5.0)",
"vSphereEvent.computeResource": "cluster1",
"vSphereEvent.datacenter": "Prod Datacenter",
"vSphereEvent.date": "Tue, 14 Jul 2020 09:03:51 UTC",
"vSphereEvent.host": "192.168.0.230",
"vSphereEvent.userName": "dcui"

To collect snapshot data, use the ENABLE_VSPHERE_SNAPSHOTS environment variable.

Snapshot data can be found in VSphereSnapshotVmSample. Collected data covers total and unique space occupied by disk and memory files, snapshot tree, and creation time.

You can use this information to create NRQL queries, dashboards, and alerts, since it's linked to the corresponding virtual machine entity.

To collect vSphere tags, use the ENABLE_VSPHERE_TAGS environment variable.

Tags are available as attributes in the corresponding entity sample as label.tagCategory:tagName.

If two tags of the same category are assigned to a resource, they are added to a unique attribute separated by a pipe character. For example: label.tagCategory:tagName|tagName2.

Tags can be used to run NRQL queries, filter entities in the entity explorer, and to create dashboards and alerts.

Resource filtering allows you to specify which resources you want to monitor by declaring a set of tags that resources must have in order to be monitored.

Resources require a match on any (one or more) of the filter tags in order to be included. If none of the resource tags match any of the filter tags, no information about that resource is sent to New Relic.

To use filtering resources by tag you need to have the ENABLE_VSPHERE_TAGS environment variable enabled.

A tag filter expression is a space-separated list of pairs of strings with the format category=name.

For example, to only retrieve resources with a tag category region and include regions us and eu use a filter expression like: region=us region=eu

INCLUDE_TAGS: >
  region=us
  region=eu

To enable resource filtering by tag, edit your integration configuration file and add the option INCLUDE_TAGS with the filter expression you want.

Note that datacenter resources acting as the root of the resource tree MUST have tags attached AND match the filter expression in order for other child resources to be fetched.

If you connect the integration directly to the ESXi host, vCenter data is not available (for example, events, tags, or datacenter metadata).

Here are examples of the vSphere integration configuration, including performance metrics:

For more information, see our documentation about the general structure of on-host integration configurations.

The configuration option inventory_source is not compatible with this integration.

Update your integration

On-host integrations do not automatically update.

For best results, regularly update the integration package and the infrastructure agent.

View and use data

Data from this service is reported to an integration dashboard. You can query this data for troubleshooting purposes or to create charts and dashboards.

vSphere data is attached to these event types:

  • VSphereHostSample
  • VSphereClusterSample
  • VSphereVmSample
  • VSphereDatastoreSample
  • VSphereDatacenterSample
  • VSphereResourcePoolSample
  • VSphereSnapshotVmSample

Performance data is enabled and configured separately (see Enable and configure performance metrics).

For more on how to view and use your data, see Understand integration data.

Metric data

The vSphere integration provides metric data attached to the following New Relic events:

VSphereHost

Name Description
cpu.totalMHz Sum of the MHz for all the individual cores on the host
cpu.coreMHz Speed of the CPU cores
cpu.available Amount of free CPU MHz in the host
cpu.overallUsage CPU usage across all cores on the host in MHz
cpu.percent Percentage of CPU utilization in the host
cpu.cores Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package
cpu.threads Number of physical CPU threads on the host
disk.totalMiB Total capacity of disks mounted in host, in MiB
mem.free Amount of available memory in the host, in MiB
mem.usage Amount of used memory in the host, in MiB
mem.size Total memory capacity of the host, in MiB
vmCount Number of virtual machines in the host
hypervisorHostname Name of the host
uuid The hardware BIOS identification
datacenterName Name of the datacenter related to the host
clusterName Name of the cluster related to the host
resourcePoolNameList List of names of the resource pools related to the host
datastoreNameList List of names of datastores related to the host

datacenterLocation

Datacenter location
networkNameList List of names of networks related to the host

overallStatus

  • gray: Status is unknown
  • green: Entity is OK
  • yellow: Entity might have a problem
  • red: Entity definitely has a problem
connectionState The host connection state:
  • connected: Connected to the server. For ESX Server, this is the default setting.
  • disconnected: The user has explicitly taken the host down. VirtualCenter does not expect to receive heartbeats from the host. The next time a heartbeat is received, the host is moved to the connected state again and an event is logged.
  • notResponding: VirtualCenter is not receiving heartbeats from the server. The state automatically changes to connected once heartbeats are received again. This state is typically used to trigger an alarm on the host.
inMaintenanceMode The flag to indicate whether or not the host is in maintenance mode. This flag is set when the host has entered the maintenance mode. It is not set during the entering phase of maintenance mode.
inQuarantineMode

The flag to indicate whether or not the host is in quarantine mode. InfraUpdateHa will recommend to set this flag based on the HealthUpdates received by the HealthUpdateProviders configured for the cluster.

A host that is reported as degraded will be recommended to enter quarantine mode, while a host that is reported as healthy will be recommended to exit quarantine mode. Execution of these recommended actions will set this flag.

Hosts in quarantine mode will be avoided by vSphere DRS as long as the increased consolidation in the cluster does not negatively affect VM performance.

powerState The host power state:
  • poweredOff: The host was specifically powered off by the user through VirtualCenter. This state is not a cetain state, because after VirtualCenter issues the command to power off the host, the host might crash, or kill all the processes but fail to power off.
  • poweredOn: The host is powered on. A host that is entering standby mode entering is also in this state.
  • standBy: The host was specifically put in standby mode, either explicitly by the user or automatically by DPM. This state is not a certain state, because after VirtualCenter issues the command to put the host in standby state, the host might crash, or kill all the processes but fail to power off. A host that is exiting standby mode s also in this state.
  • unknown: If the host is disconnected or notResponding, we know its power state, so the host is marked as unknown.
standbyMode

The host’s standby mode. The property is only populated by vCenter server. If queried directly from the ESX host, the property is unset.

  • entering: The host is entering standby mode.
  • exiting: The host is exiting standby mode.
  • in: The host is in standby mode.
  • none: The host is not in standby mode, and it is not in the process of entering or exiting standby mode.
cryptoState

Encryption state of the host. Valid values are enumerated by the CryptoState type:

  • incapable: The host is not safe for receiving sensitive material.
  • prepared: The host is prepared for receiving sensitive material but does not have a host key set yet.
  • safe: The host is crypto safe and has a host key set.
bootTime The time when the host was booted.

VSphereVm

Name Description
mem.size Memory size of the virtual machine, in MiB
mem.usage Guest memory utilization statistics, in MiB. This is also known as active guest memory. The value can range between 0 and the configured memory size of the virtual machine. Valid while the virtual machine is running.
mem.free Guest memory available, in MiB. The value can range between 0 and the configured memory size of the virtual machine. Valid while the virtual machine is running.
mem.ballooned The size of the balloon driver in the virtual machine, in MiB. The host will inflate the balloon driver to reclaim physical memory from the virtual machine. This is a sign that there is memory pressure on the host.
mem.swapped The portion of memory, in MiB, that is granted to this virtual machine from the host's swap space. This is a sign that there is memory pressure on the host.
mem.swappedSsd The amount of memory swapped to fast disk device such as SSD, in MiB
cpu.allocationLimit Resource limits for CPU, in MHz. If set to -1, there is no fixed allocation limit.
cpu.overallUsage Basic CPU performance statistics, in MHz. Valid while the virtual machine is running.
cpu.hostUsagePercent Percent of the host CPU used by the virtual machine. In case a limit is configured, the percentage is calculated by taking the limit as the total.
cpu.cores Number of processors in the virtual machine
disk.totalMiB Total storage space, committed to this virtual machine across all datastores, in MiB
ipAddress Primary guest IP address, if available
ipAddresses List of IPs associated with the VM (except ipAddress). A pipe or vertical bar character (|) is used as a separator.
connectionState

Indicates whether or not the virtual machine is available for management:

  • connected: Server has access to the virtual machine.
  • disconnected: Server is currently disconnected from the virtual machine, since its host is disconnected.
  • inaccessible: One or more of the virtual machine configuration files are inaccessible.
  • invalid: The virtual machine configuration format is invalid.
  • orphaned: The virtual machine is no longer registered on its associated host.
powerState The current power state of the virtual machine: poweredOff, poweredOn, or suspended.
guestHeartbeatStatus
  • gray: Status is unknown.
  • green: Entity is OK.
  • yellow: Entity might have a problem.
  • red: Entity definitely has a problem.
operatingSystem Operating system of the virtual machine
guestFullName Guest operating system full name, if available from guest tools
hypervisorHostname Name of the host where the virtual machine is running
instanceUuid Unique identification of the virtual machine
datacenterName Name of the datacenter
clusterName Name of the cluster
resourcePoolNameList List of names of the resource pools
datastoreNameList List of names of datastores
networkNameList List of names of networks

datacenterLocation

Datacenter location
overallStatus
  • gray: Status is unknown.
  • green: Entity is OK.
  • yellow: Entity might have a problem.
  • red: Entity definitely has a problem.

disk.suspendMemory

Size of the snapshot file (bytes).

disk.suspendMemoryUnique

Size of the snapshot file, unique blocks (bytes).

disk.totalUncommittedMiB

Additional storage space potentially used by this virtual machine on all datastores. Essentially an aggregate of the property uncommitted across all datastores that this virtual machine is located on (Mebibytes).

disk.totalUnsharedMiB

Total storage space occupied by the virtual machine across all datastores, that is not shared with any other virtual machine (Mebibytes).

mem.hostUsage

Host memory usage (Mebibytes).

resourcePoolName

Resource Pool Name.

vmConfigName

Vm Config Name.

vmHostname

Vm Hostname.

VSphereDatastore

Name Description
capacity Maximum capacity of this datastore, in GiB, if accessible is true
freeSpace Available space of this datastore, in GiB, if accessible is true
uncommitted Total additional storage space, potentially used by all virtual machines on this datastore, in GiB, if accessible is true
vmCount Number of virtual machines attached to the datastore

datacenterLocation

Datacenter location

datacenterName

Datacenter name
hostCount Number of hosts attached to the datastore
overallStatus
  • gray: Status is unknown.
  • green: Entity is OK.
  • yellow: Entity might have a problem.
  • red: Entity definitely has a problem.
accessible Connectivity status of the datastore. If this is set to false, the datastore is not accessible.
url Unique locator for the datastore, if accessible is true
fileSystemType Type of file system volume, such as VMFS or NFS
name Name of the datastore
nas.remoteHost Host that runs the NFS/CIFS server
nas.remotePath Remote path of NFS/CIFS mount point

VSphereDatacenter

Name Description
datastore.totalUsedGiB Total used space in the datastores, in GiB
datastore.totalFreeGiB Total free space in the datastores, in GiB
datastore.totalGiB Total size of the datastores, in GiB
cpu.cores Total CPU count per datacenter
cpu.overallUsagePercentage Total CPU usage, in percentage
cpu.overallUsage Total CPU usage, in MHz
cpu.totalMHz Total CPU capacity, in MHz
mem.usage Total memory usage, in MiB
mem.size Total memory, in MiB
mem.usagePercentage Total memory usage as percentage
clusters Total cluster count per datacenter
resourcePools Total resource pools per datacenter
datastores Total datastores per datacenter
networks Total network adapter count per datacenter
overallStatus
  • gray: Status is unknown
  • green: Entity is OK
  • yellow: Entity might have a problem
  • red: Entity definitely has a problem​
hostCount Total host system count per datacenter
vmCount Total virtual machines count per datacenter

VSphereResourcePool

Name Description
cpu.TotalMHz Resource pool CPU total capacity, in MHz
cpu.overallUsage Resource pool CPU usage, in MHz
mem.size Resource pool total memory reserved, in MiB
mem.usage Resource pool memory usage, in MiB
mem.free Resource pool memory available, in MiB
mem.ballooned Size of the balloon driver in the resource pool, in MiB
mem.swapped Portion of memory, in MiB, that is granted to this resource pool from the host's swap space
vmCount Number of virtual machines in the resource pool
overallStatus
  • gray: Status is unknown.
  • green: Entity is OK.
  • yellow: Entity might have a problem.
  • red: Entity definitely has a problem.
resourcePoolName Name of the resource pool

datacenterLocation

Datacenter location
datacenterName Name of the datacenter
clusterName Name of the cluster

VSphereCluster

Name Description
cpu.totalEffectiveMHz Effective CPU resources, in MHz, available to virtual machines. This is the aggregated effective resource level from all running hosts. Hosts that are in maintenance mode or are unresponsive are not counted. Resources used by the VMware Service Console are not included in the aggregate. This value represents the amount of resources available for the root resource pool for running virtual machines.
cpu.totalMHz Aggregated CPU resources of all hosts, in MHz. It does not filter out cpu used by system or related to hosts under maintenance.
cpu.cores Number of physical CPU cores. Physical CPU cores are the processors contained by a CPU package.
cpu.threads Aggregated number of CPU threads.
mem.size Aggregated memory resources of all hosts, in MiB. It does not filter out memory used by system or related to hosts under maintenance.
mem.effectiveSize Effective memory resources, in MiB, available to run virtual machines. This is the aggregated effective resource level from all running hosts. Hosts that are in maintenance mode or are unresponsive are not counted. Resources used by the VMware Service Console are not included in the aggregate. This value represents the amount of resources available for the root resource pool for running virtual machines.
effectiveHosts Total number of effective hosts. This number exclude hosts under maintenance.
hosts Total number of hosts
overallStatus
  • gray: Status is unknown.
  • green: Entity is OK.
  • yellow: Entity might have a problem.
  • red: Entity definitely has a problem.
datastoreList List of datastore used by the cluster. A pipe or vertical bar character (|) is used as a separator.
hostList List of hosts belonging to the cluster. A pipe or vertical bar character (|) is used as a separator.
networkList List of networks attached to the cluster. A pipe or vertical bar character (|) is used as a separator.

drsConfig.vmotionRate

Threshold for generated ClusterRecommendations. DRS generates only those recommendations that are above the specified vmotionRate. Ratings vary from 1 to 5. This setting applies to manual, partiallyAutomated, and fullyAutomated DRS clusters.

dasConfig.restartPriorityTimeout

Maximum time the lower priority VMs should wait for the higher priority VMs to be ready (Seconds).

datacenterName

Datacenter name.

datacenterLocation

Datacenter location.

drsConfig.enabled

Flag indicating whether or not the service is enabled.

drsConfig.enableVmBehaviorOverrides

Flag that dictates whether DRS Behavior overrides for individual virtual machines (ClusterDrsVmConfigInfo) are enabled.

drsConfig.defaultVmBehavior

Specifies the cluster-wide default DRS behavior for virtual machines. You can override the default behavior for a virtual machine by using the ClusterDrsVmConfigInfo object.

dasConfig.enabled

Flag to indicate whether or not vSphere HA feature is enabled.

dasConfig.admissionControlEnabled

Flag that determines whether strict admission control is enabled

dasConfig.isolationResponse

Indicates whether or not the virtual machine should be powered off if a host determines that it is isolated from the rest of the compute resource.

dasConfig.restartPriority

Restart priority for a virtual machine.

dasConfig.hostMonitoring

Determines whether HA restarts virtual machines after a host fails.

dasConfig.vmMonitoring

Level of HA Virtual Machine Health Monitoring Service.

dasConfig.vmComponentProtecting

This property indicates if vSphere HA VM Component Protection service is enabled.

dasConfig.hbDatastoreCandidatePolicy

The policy on what datastores will be used by vCenter Server to choose heartbeat datastores: allFeasibleDs, allFeasibleDsWithUserPreference, userSelectedDs

VSphereSnapshotVm

Name Description

snapshotTreeInfo

Tree info for the snapshot. Es: Cluster:Vm:Snapshot1:Snapshot2

name

Snapshot name

creationTime

Snapshot creation time

powerState

The power state of the virtual machine when this snapshot was taken

snapshotId

The unique identifier that distinguishes this snapshot from other snapshots of the virtual machine

quiesced

Flag to indicate whether or not the snapshot was created with the "quiesce" option, ensuring a consistent state of the file system

backupManifest

The relative path from the snapshotDirectory pointing to the backup manifest. Available for certain quiesced snapshots only

description

Description of the snapshot

replaySupported

Flag to indicate whether this snapshot is associated with a recording session on the virtual machine that can be replayed

totalMemoryInDisk

Total size of memory in disk.

totalUniqueMemoryInDisk

Total size of the file corresponding to the file blocks that were allocated uniquely to store memory. In other words, if the underlying storage supports sharing of file blocks across disk files, the property corresponds to the size of the file blocks that were allocated only in context of this file, i.e. it does not include shared blocks that were allocated in other files. This property will be unset if the underlying implementation is unable to compute this information.

totalDisk

Total size of snapshot files in disk

totalUniqueDisk

Total size of the file corresponding to the file blocks that were allocated uniquely to store snapshot data in disk. In other words, if the underlying storage supports sharing of file blocks across disk files, the property corresponds to the size of the file blocks that were allocated only in context of this file, i.e. it does not include shared blocks that were allocated in other files. This property will be unset if the underlying implementation is unable to compute this information.

datastorePathDisk

Disk file path in the datastore

datastorePathMemory

Memory file path in the datastore

For more help

If you need more help, check out these support and learning resources: