• EnglishEspañol日本語한국어Português
  • Log inStart now

DataBricks integration

The New Relic Databricks integration can collect telemetry from Spark running on Databricks as well as from any Spark deployment that is not running on Databricks.

By default, the integration automatically connects to and collects telemetry from Spark deployments in all clusters created through the UI or API in the specified workspace. This integration supports the Collect Spark telemetry capability.

Set up the integration

This integration uses a standalone tool from the New Relic experimental repository. This integration can be run on a host, or locally for testing. This integration runs on these host platforms:

  • Linux amd64
  • Windows amd64

Deploy on-host

To deploy this integration on a host (example: EC2), follow these steps:

  1. Download the appropriate archive for your platform from the latest release.

  2. Extract the archive to a new or existing directory.

  3. Create a directory named configs in the same directory.

  4. Create a file named config.yml in the configs directory and copy the contents of the configs/config.template.yml file in this repository into it.

  5. Edit the config.yml file to configure the integration appropriately for your environment.

  6. From the directory where the archive was extracted, execute the integration binary using the command following command copying any command line options as necessary:

bash
$
# Linux
$
./newrelic-databricks-integration
$
$
# Windows
$
.\newrelic-databricks-integration.exe

Deploy on a databricks cluster

The New Relic Databricks integration can be deployed on the driver node of a Databricks cluster using a cluster-scoped init script. The init script uses custom environment variables to specify configuration parameters necessary for the integration configuration.

To install the init script, follow these steps:

  1. Login to your Databricks account and navigate to the desired workspace.

  2. Follow the recommendations for init scripts to store the cluster_init_integration.sh script within your workspace in the recommended manner. For example, if your workspace is enabled for unity catalog, you should store the init script in a unity catalog volume.

  3. Go to the Compute tab and select the desired all-purpose or job compute to open the compute details UI.

  4. Click the Edit button to edit the compute's configuration.

  5. Follow the steps to use the UI to configure a cluster-scoped init script and point to the location where you stored the init script in step 2 above.

  6. If your cluster is not running, click the Confirm button to save your changes. Then, restart the cluster. If your cluster is already running, click the Confirm and restart button to save your changes, and restart the cluster.

Additionally, follow the steps to set environment variables to add the following environment variables:

Tip

Note that the NEW_RELIC_API_KEY and NEW_RELIC_ACCOUNT_ID are currently unused, but are required by the new-relic-client-go module used by the integration.

Additionally, note that only the personal access token or OAuth credentials need to be specified, but not both. If both are specified, the OAuth credentials take precedence.

Finally, make sure to restart the cluster following the configuration of the environment variables.

Install our DataBricks monitoring dashboard

To set up our pre-built DataBricks dashboard to monitor your application metrics, go to the DataBricks dashboard installation and follow the instructions. Once installed, the dashboard should display metrics.

If you need help with dashboards, see:

  • Introduction to dashboards to customize your dashboard and carry out different actions.
  • Manage your dashboard to adjust your display mode, or to add more content to your dashboard.
Copyright © 2024 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.