Skip to main content

Databricks

Heimdall ML offers a direct connection to Databricks, allowing you to bring in your tables from Databricks for modeling. During the build process for ML models, you will be able to select Databricks as a data source. After selecting Databricks, you will be prompted to enter your Databricks connection parameters.

Connection Parameters

  • Hostname - the hostname associated with your SQL warehouse
  • HTTP path - the path to your specific warehouse instance
  • Personal Access Token - an authentication token for your instance
note

Heimdall does not save the connection parameters for your Databricks warehouse.

To find the connection details for a Databricks SQL warehouse:

  1. Log in to your Databricks workspace.
  2. In the sidebar, click SQL > SQL Warehouses.
  3. From the list of available warehouses, click the target warehouse’s name.
  4. On the Connection Details tab, copy the server hostname and HTTP path.

To create a Databricks personal access token for your Databricks workspace user:

  1. In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the dropdown.
  2. Click Developer.
  3. Next to Access tokens, click Manage.
  4. Click Generate new token.
tip

Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

  1. Click Generate.
  2. Copy the displayed token and store it in a secure location, then click Done.
warning

This token will only be issued once, so make sure to save it before proceeding.

Importing and Using your Data

Once you are authenticated, you will be presented with a dropdown of tables from your SQL warehouse. Select the table you want to build a model with and continue. The next screen will allow you to pick your target variable or the variable you are interested in predicting.

tip

Based on your target variable, Heimdall will automatically select the best classification or regression algorithms.

After selecting your target variable, click Optimize to allow Heimdall to find the modeling technique with the best overall performance/accuracy. You can then see some high level metrics for your model and can give it a unique name. Your new model will appear in the model inventory, and you will be able to view more details.