Google Cloud Dataproc monitoring integration

BETA

Access to this feature depends on your subscription level. Requires Infrastructure Pro.

New Relic Infrastructure's integrations include an integration for reporting your GCP Dataproc data to our products. Here we explain how to activate the integration and what data it collects.

Activate integration

To enable the integration follow standard procedures to connect your GCP service to New Relic Infrastructure.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

Default polling information for the GCP Dataproc integration:

  • New Relic polling interval: 5 minutes

Find and use data

To find your integration data in Infrastructure, go to infrastructure.newrelic.com > GCP and select an integration.

Data is attached to the following event type:

Entity Event Type Provider
Cluster GcpDataprocClusterSample GcpDataprocCluster

For more on how to use your data, see Understand and use integration data.

Metric data

This integration collects GCP Dataproc data for Cluster.

Dataproc Cluster data

Metric Unit Description

cluster.hdfs.datanodes

Count Indicates the number of HDFS DataNodes that are running inside a cluster.

cluster.Hdfs.StorageCapacity

Gibibytes Indicates capacity of HDFS system running on cluster in GB.

cluster.Hdfs.StorageUtilization

Percent The percentage of HDFS storage currently used.

cluster.Hdfs.UnhealthyBlocks

Count Indicates the number of unhealthy blocks inside the cluster.

cluster.Job.CompletionTime

Seconds The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.

cluster.job.duration

Seconds The time jobs have spent in a given state.

cluster.job.Failures

Count Indicates the number of jobs that have failed on a cluster.

cluster.Job.Running

Count Indicates the number of jobs that are running on a cluster.

cluster.Job.Submitted

Count Indicates the number of jobs that have been submitted to a cluster.

cluster.Operation.CompletionTime

Seconds The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.

cluster.operation.duration

Seconds The time operations have spent in a given state.

cluster.operation.Failures

Count Indicates the number of operations that have failed on a cluster.

cluster.Operation.Running

Count Indicates the number of operations that are running on a cluster.

cluster.Operation.Submitted

Count Indicates the number of operations that have been submitted to a cluster.

cluster.Yarn.AllocatedMemoryPercentage

Percent The percentage of YARN memory is allocated.

cluster.yarn.apps

Count Indicates the number of active YARN applications.

cluster.yarn.containers

Count Indicates the number of YARN containers.

cluster.Yarn.MemorySize

Gibibytes Indicates the YARN memory size in GB.

cluster.yarn.nodemanagers

Count Indicates the number of YARN NodeManagers running inside cluster.

cluster.Yarn.PendingMemorySize

Gibibytes The current memory request, in GB, that is pending to be fulfilled by the scheduler.

cluster.Yarn.VirtualCores

Count Indicates the number of virtual cores in YARN.

For more help

Recommendations for learning more: