Configure Bundles¶

This how-to walks you through wiring an existing Lakehouse Plumber (LHP) project to Databricks Asset Bundles: initializing with bundle scaffolding, the file layout LHP creates, the lhp generate → bundle YAML flow, environment overrides, and the databricks bundle deploy invocation.

Bundle integration is enabled by default for new LHP projects. The text below documents the integration knobs LHP exposes. For a Databricks-level introduction to Asset Bundles, see the Databricks documentation.

Initialize a bundle-enabled project¶

Run lhp init without any bundle flag. Bundle scaffolding is created by default:

lhp init my_data_platform
cd my_data_platform

To create a project without bundle scaffolding, pass --no-bundle:

lhp init my_data_platform --no-bundle

There is no --bundle flag. Opt out only — bundle is on by default.

What LHP creates¶

lhp init with bundle support writes two bundle-specific artifacts at the project root:

databricks.yml: The top-level bundle definition. LHP pre-fills bundle.name, bundle.uuid, an include block, and three targets: dev, tst, and prod. Each target has a placeholder workspace.host you must edit.
resources/lhp/: The directory LHP owns. Every pipeline you define under pipelines/ produces one <pipeline_name>.pipeline.yml file here, regenerated on each lhp generate run.

LHP only writes inside resources/lhp/. Hand-written bundle resource files (jobs, dashboards, secret scopes, schemas) belong under resources/ directly — LHP never touches them. The default include block in databricks.yml picks up both directories:

include:
  - resources/*.yml
  - resources/lhp/*.yml

Edit `databricks.yml`¶

Open databricks.yml and replace the <databricks_host> placeholder in each target with your workspace URL. The generated targets look like this:

databricks.yml (excerpt)¶

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://your-dev-workspace.cloud.databricks.com
      root_path: ~/.bundle/${bundle.name}/${bundle.target}

  prod:
    mode: production
    workspace:
      host: https://your-prod-workspace.cloud.databricks.com
      root_path: /Workspace/<code_location>/.bundle/${bundle.name}/${bundle.target}
    run_as:
      service_principal_name: <service_principal_id>

Note

Target names in databricks.yml MUST match the substitution file names under substitutions/. If a target tst exists in databricks.yml, a file substitutions/tst.yaml must exist too. LHP rejects mismatches.

Generate pipeline resources¶

Run lhp generate with the target environment. LHP generates Python files and synchronizes resources/lhp/ with one .pipeline.yml per pipeline:

lhp generate --env dev

Expected output:

Updated 1 bundle resource file(s)

Inspect a generated resource file:

resources/lhp/bronze_load.pipeline.yml¶

# Generated by LakehousePlumber - Bundle Resource for bronze_load
resources:
  pipelines:
    bronze_load_pipeline:
      name: bronze_load_pipeline
      catalog: my_dev_catalog
      schema: bronze
      serverless: true
      libraries:
        - glob:
            include: ${workspace.file_path}/generated/${bundle.target}/bronze_load/**
      root_path: ${workspace.file_path}/generated/${bundle.target}/bronze_load
      configuration:
        bundle.sourcePath: ${workspace.file_path}/generated/${bundle.target}

The catalog and schema values come from pipeline_config.yaml. See Configuring catalog and schema for pipelines for per-pipeline and project-default resolution.

Every lhp generate run wipes resources/lhp/ and regenerates one <pipeline_name>.pipeline.yml per pipeline directory. Files outside resources/lhp/ are never touched.

To skip bundle sync entirely for one generate run, pass --no-bundle:

lhp generate --env dev --no-bundle

Directory layout¶

resources/lhp/ is exclusively managed by LHP. Every lhp generate wipes its contents and regenerates them. Place your own resource YAML files in resources/ (top-level) or any subdirectory other than resources/lhp/. Files outside resources/lhp/ are never touched by LHP, with one exception: the monitoring job YAML at resources/<name>.job.yml, which LHP identifies by its sentinel header (# Generated by LakehousePlumber - Monitoring Job) and replaces on each run.

Apply environment-specific overrides¶

Bundle resource files contain values resolved per environment. LHP applies substitution tokens from substitutions/<env>.yaml to every field in pipeline_config.yaml before rendering the bundle YAML — node types, policies, notification emails, autoscale bounds, and catalog/schema are all environment-aware.

The recommended layout pairs a per-environment pipeline config with matching substitution files:

config/
├── pipeline_config-dev.yaml
└── pipeline_config-prod.yaml
substitutions/
├── dev.yaml
└── prod.yaml

Generate each environment with its own config:

lhp generate --env dev --pipeline-config config/pipeline_config-dev.yaml
lhp generate --env prod --pipeline-config config/pipeline_config-prod.yaml

Typical fields that differ across environments include cluster size and node type, photon and edition settings, autoscale bounds, notification recipients, performance_target on jobs, and run_as service principals. All belong in pipeline_config-<env>.yaml rather than databricks.yml.

For every key the bundle pipeline template accepts (catalog, schema, clusters, serverless, photon, edition, channel, notifications, tags, configuration, event_log, environment, permissions), see the configuration reference.

Deploy¶

After lhp generate, deploy with the Databricks CLI from the project root:

databricks bundle validate --target dev
databricks bundle deploy --target dev

To run a deployed pipeline:

databricks bundle run bronze_load_pipeline --target dev

For CI/CD wiring — pinning required_lhp_version, promoting the same commit SHA across targets, and managing approval gates — follow the CI/CD guide linked below.

CLI quick reference¶

Command	Purpose
`lhp init <name>`	New project, bundle scaffolding included.
`lhp init <name> --no-bundle`	New project, no bundle scaffolding.
`lhp generate --env <env> -pc <file>`	Regenerate Python and rewrite `resources/lhp/`.
`lhp generate --env <env> --no-bundle`	Generate Python only; skip bundle sync.
`lhp deps --format job --bundle-output`	Generate an orchestration job under `resources/lhp/`.
`databricks bundle deploy --target <env>`	Deploy bundle.

Configure Bundles¶

Initialize a bundle-enabled project¶

What LHP creates¶

Edit databricks.yml¶

Generate pipeline resources¶

Directory layout¶

Apply environment-specific overrides¶

Deploy¶

CLI quick reference¶

See also¶

Edit `databricks.yml`¶