Configure Bundles¶
This how-to walks you through wiring an existing Lakehouse Plumber (LHP) project to
Databricks Asset Bundles: initializing with bundle scaffolding, the file layout LHP
creates, the lhp generate → bundle YAML flow, environment overrides, and the
databricks bundle deploy invocation.
Bundle integration is enabled by default for new LHP projects. The text below documents the integration knobs LHP exposes. For a Databricks-level introduction to Asset Bundles, see the Databricks documentation.
Initialize a bundle-enabled project¶
Run lhp init without any bundle flag. Bundle scaffolding is created by default:
lhp init my_data_platform
cd my_data_platform
To create a project without bundle scaffolding, pass --no-bundle:
lhp init my_data_platform --no-bundle
There is no --bundle flag. Opt out only — bundle is on by default.
What LHP creates¶
lhp init with bundle support writes two bundle-specific artifacts at the project
root:
databricks.ymlThe top-level bundle definition. LHP pre-fills
bundle.name,bundle.uuid, anincludeblock, and threetargets:dev,tst, andprod. Each target has a placeholderworkspace.hostyou must edit.resources/lhp/The directory LHP owns. Every pipeline you define under
pipelines/produces one<pipeline_name>.pipeline.ymlfile here, regenerated on eachlhp generaterun.
LHP only writes inside resources/lhp/. Hand-written bundle resource files
(jobs, dashboards, secret scopes, schemas) belong under resources/ directly —
LHP never touches them. The default include block in databricks.yml picks
up both directories:
include:
- resources/*.yml
- resources/lhp/*.yml
Edit databricks.yml¶
Open databricks.yml and replace the <databricks_host> placeholder in each
target with your workspace URL. The generated targets look like this:
targets:
dev:
mode: development
default: true
workspace:
host: https://your-dev-workspace.cloud.databricks.com
root_path: ~/.bundle/${bundle.name}/${bundle.target}
prod:
mode: production
workspace:
host: https://your-prod-workspace.cloud.databricks.com
root_path: /Workspace/<code_location>/.bundle/${bundle.name}/${bundle.target}
run_as:
service_principal_name: <service_principal_id>
Note
Target names in databricks.yml MUST match the substitution file names under
substitutions/. If a target tst exists in databricks.yml, a file
substitutions/tst.yaml must exist too. LHP rejects mismatches.
Generate pipeline resources¶
Run lhp generate with the target environment. LHP generates Python files
and synchronizes resources/lhp/ with one .pipeline.yml per pipeline:
lhp generate --env dev
Expected output:
Updated 1 bundle resource file(s)
Inspect a generated resource file:
# Generated by LakehousePlumber - Bundle Resource for bronze_load
resources:
pipelines:
bronze_load_pipeline:
name: bronze_load_pipeline
catalog: my_dev_catalog
schema: bronze
serverless: true
libraries:
- glob:
include: ${workspace.file_path}/generated/${bundle.target}/bronze_load/**
root_path: ${workspace.file_path}/generated/${bundle.target}/bronze_load
configuration:
bundle.sourcePath: ${workspace.file_path}/generated/${bundle.target}
The catalog and schema values come from pipeline_config.yaml. See
Configuring catalog and schema for pipelines for per-pipeline and project-default resolution.
Every lhp generate run wipes resources/lhp/ and regenerates one
<pipeline_name>.pipeline.yml per pipeline directory. Files outside
resources/lhp/ are never touched.
To skip bundle sync entirely for one generate run, pass --no-bundle:
lhp generate --env dev --no-bundle
Directory layout¶
resources/lhp/ is exclusively managed by LHP. Every lhp generate wipes
its contents and regenerates them. Place your own resource YAML files in
resources/ (top-level) or any subdirectory other than resources/lhp/.
Files outside resources/lhp/ are never touched by LHP, with one exception:
the monitoring job YAML at resources/<name>.job.yml, which LHP identifies
by its sentinel header (# Generated by LakehousePlumber - Monitoring Job)
and replaces on each run.
Apply environment-specific overrides¶
Bundle resource files contain values resolved per environment. LHP applies
substitution tokens from substitutions/<env>.yaml to every field in
pipeline_config.yaml before rendering the bundle YAML — node types, policies,
notification emails, autoscale bounds, and catalog/schema are all
environment-aware.
The recommended layout pairs a per-environment pipeline config with matching substitution files:
config/
├── pipeline_config-dev.yaml
└── pipeline_config-prod.yaml
substitutions/
├── dev.yaml
└── prod.yaml
Generate each environment with its own config:
lhp generate --env dev --pipeline-config config/pipeline_config-dev.yaml
lhp generate --env prod --pipeline-config config/pipeline_config-prod.yaml
Typical fields that differ across environments include cluster size and node
type, photon and edition settings, autoscale bounds, notification
recipients, performance_target on jobs, and run_as service principals.
All belong in pipeline_config-<env>.yaml rather than databricks.yml.
For every key the bundle pipeline template accepts (catalog, schema,
clusters, serverless, photon, edition, channel,
notifications, tags, configuration, event_log, environment,
permissions), see the configuration reference.
Deploy¶
After lhp generate, deploy with the Databricks CLI from the project root:
databricks bundle validate --target dev
databricks bundle deploy --target dev
To run a deployed pipeline:
databricks bundle run bronze_load_pipeline --target dev
For CI/CD wiring — pinning required_lhp_version, promoting the same commit
SHA across targets, and managing approval gates — follow the CI/CD guide linked
below.
CLI quick reference¶
Command |
Purpose |
|---|---|
|
New project, bundle scaffolding included. |
|
New project, no bundle scaffolding. |
|
Regenerate Python and rewrite |
|
Generate Python only; skip bundle sync. |
|
Generate an orchestration job under |
|
Deploy bundle. |
See also¶
Architecture — How LHP separates YAML authoring from generated Python and bundle YAML.
Configuring catalog and schema for pipelines — Catalog and schema configuration via
pipeline_config.yaml.Bundle Configuration Reference — Exhaustive bundle configuration schema:
pipeline_config.yamlfields,job_config.yamlfields, and bundle sync decision matrix.How to Set Up CI/CD for an LHP Project — Trunk-based promotion, tag-driven deployment, and approval gates for LHP bundle projects.
Dependency Analysis & Job Generation — Building orchestration jobs that chain LHP pipelines.