Skip to main content

Heimdall Lab

Lab runs Python analytics on your Lake tables — profile data, chart it, and save transforms back to silver.

Before you start
  • At least one bronze or silver table in Lake
  • Sidebar → Data WarehouseLab (/lab)
  • Part of Path C when data quality is unknown
Product page

See heimdallapp.org/product/lab for overview and use cases.

Loading data in code

Lab follows the same pattern as Databricks (spark.table()), Hex, and Observable — you load tables in code, not through a separate import UI.

list_tables() # print your bronze/silver catalog

orders = lake("bronze", "Q1 Upload") # load a bronze table
clean = lake("silver", "Clean Orders") # load a silver dataset

result = orders.groupby("category")["amount"].sum().reset_index()
result

The Lake catalog sidebar inserts lake() snippets when you click a table. Table names must match your catalog exactly (case-insensitive).

What you can do

  • Load up to 6 tables per cell via lake()
  • Run pandas, numpy, matplotlib, seaborn, and plotly in code cells
  • Preview tables and charts inline
  • Save results as a new silver dataset (never overwrites existing data)

Workflow

  1. Open Lab from the sidebar.
  2. Create a notebook and run list_tables() to see available data.
  3. Load tables with lake("bronze", "name") or click tables in the catalog sidebar.
  4. Run cells to explore and visualize.
  5. Use Save result as silver when you want to persist a DataFrame output to the Lake catalog.
  6. Publish gold from Lake when ready for ML or Forecast.

Profiling bronze → silver

Use Lab between bronze ingest and gold publish when you need to inspect quality before modeling:

  1. Profile bronze — load with lake("bronze", "Q1 Upload"), check dtypes, null counts, and distributions with pandas.
  2. Chart outliers — plot key columns with matplotlib, seaborn, or plotly to spot bad rows or skew.
  3. Save to silver — filter, dedupe, or derive columns in code, then Save result as silver (creates a new catalog entry; never overwrites bronze).
  4. Publish gold — when the silver dataset is model-ready, switch to Lake and publish for modeling.

This matches the exploratory profiling and ad-hoc transforms use cases on the marketing site.

Security model

  • Cells are read-only — they cannot modify bronze, silver, or gold tables.
  • Code runs in an isolated worker with no AWS credentials or network access.
  • Saving to silver is a separate explicit action that only creates new datasets.

Limits

  • 30 second execution timeout per cell
  • Up to 6 lake() loads per run
  • Allowed imports: pandas, numpy, matplotlib, seaborn, plotly, and basic stdlib helpers

Common mistakes

MistakeFix
lake() can't find tableRun list_tables() — names must match catalog
Cell timeoutReduce data scanned or aggregate before plotting
Skipped silver saveBronze is never overwritten — use Save result as silver to persist transforms

Next steps