Heimdall Lab
Lab runs Python analytics on your Lake tables — profile data, chart it, and save transforms back to silver.
Before you start
- At least one bronze or silver table in Lake
- Sidebar → Data Warehouse → Lab (
/lab) - Part of Path C when data quality is unknown
Product page
See heimdallapp.org/product/lab for overview and use cases.
Loading data in code
Lab follows the same pattern as Databricks (spark.table()), Hex, and Observable — you load tables in code, not through a separate import UI.
list_tables() # print your bronze/silver catalog
orders = lake("bronze", "Q1 Upload") # load a bronze table
clean = lake("silver", "Clean Orders") # load a silver dataset
result = orders.groupby("category")["amount"].sum().reset_index()
result
The Lake catalog sidebar inserts lake() snippets when you click a table. Table names must match your catalog exactly (case-insensitive).
What you can do
- Load up to 6 tables per cell via
lake() - Run pandas, numpy, matplotlib, seaborn, and plotly in code cells
- Preview tables and charts inline
- Save results as a new silver dataset (never overwrites existing data)
Workflow
- Open Lab from the sidebar.
- Create a notebook and run
list_tables()to see available data. - Load tables with
lake("bronze", "name")or click tables in the catalog sidebar. - Run cells to explore and visualize.
- Use Save result as silver when you want to persist a DataFrame output to the Lake catalog.
- Publish gold from Lake when ready for ML or Forecast.
Profiling bronze → silver
Use Lab between bronze ingest and gold publish when you need to inspect quality before modeling:
- Profile bronze — load with
lake("bronze", "Q1 Upload"), check dtypes, null counts, and distributions with pandas. - Chart outliers — plot key columns with matplotlib, seaborn, or plotly to spot bad rows or skew.
- Save to silver — filter, dedupe, or derive columns in code, then Save result as silver (creates a new catalog entry; never overwrites bronze).
- Publish gold — when the silver dataset is model-ready, switch to Lake and publish for modeling.
This matches the exploratory profiling and ad-hoc transforms use cases on the marketing site.
Security model
- Cells are read-only — they cannot modify bronze, silver, or gold tables.
- Code runs in an isolated worker with no AWS credentials or network access.
- Saving to silver is a separate explicit action that only creates new datasets.
Limits
- 30 second execution timeout per cell
- Up to 6
lake()loads per run - Allowed imports: pandas, numpy, matplotlib, seaborn, plotly, and basic stdlib helpers
Common mistakes
| Mistake | Fix |
|---|---|
lake() can't find table | Run list_tables() — names must match catalog |
| Cell timeout | Reduce data scanned or aggregate before plotting |
| Skipped silver save | Bronze is never overwritten — use Save result as silver to persist transforms |
Next steps
- Publish gold when data is model-ready
- User journeys Path C
- Clean and combine — UI alternative to Lab for simple joins