Configure Bundles ================= .. meta:: :description: How to integrate a Lakehouse Plumber project with Databricks Asset Bundles — initialization, bundle structure, generation, and deployment. This how-to walks you through wiring an existing Lakehouse Plumber (LHP) project to :term:`Databricks Asset Bundles `: initializing with bundle scaffolding, the file layout LHP creates, the ``lhp generate`` → bundle YAML flow, environment overrides, and the ``databricks bundle deploy`` invocation. Bundle integration is **enabled by default** for new LHP projects. The text below documents the integration knobs LHP exposes. For a Databricks-level introduction to Asset Bundles, see the Databricks documentation. Initialize a bundle-enabled project ----------------------------------- Run ``lhp init`` without any bundle flag. Bundle scaffolding is created by default: .. code-block:: bash lhp init my_data_platform cd my_data_platform To create a project **without** bundle scaffolding, pass ``--no-bundle``: .. code-block:: bash lhp init my_data_platform --no-bundle There is no ``--bundle`` flag. Opt out only — bundle is on by default. What LHP creates ---------------- ``lhp init`` with bundle support writes two bundle-specific artifacts at the project root: ``databricks.yml`` The top-level bundle definition. LHP pre-fills ``bundle.name``, ``bundle.uuid``, an ``include`` block, and three ``targets``: ``dev``, ``tst``, and ``prod``. Each target has a placeholder ``workspace.host`` you must edit. ``resources/lhp/`` The directory LHP owns. Every pipeline you define under ``pipelines/`` produces one ``.pipeline.yml`` file here, regenerated on each ``lhp generate`` run. LHP only writes inside ``resources/lhp/``. Hand-written bundle resource files (jobs, dashboards, secret scopes, schemas) belong under ``resources/`` directly — LHP never touches them. The default ``include`` block in ``databricks.yml`` picks up both directories:: include: - resources/*.yml - resources/lhp/*.yml Edit ``databricks.yml`` ----------------------- Open ``databricks.yml`` and replace the ```` placeholder in each target with your workspace URL. The generated targets look like this: .. code-block:: yaml :caption: databricks.yml (excerpt) targets: dev: mode: development default: true workspace: host: https://your-dev-workspace.cloud.databricks.com root_path: ~/.bundle/${bundle.name}/${bundle.target} prod: mode: production workspace: host: https://your-prod-workspace.cloud.databricks.com root_path: /Workspace//.bundle/${bundle.name}/${bundle.target} run_as: service_principal_name: .. note:: Target names in ``databricks.yml`` MUST match the substitution file names under ``substitutions/``. If a target ``tst`` exists in ``databricks.yml``, a file ``substitutions/tst.yaml`` must exist too. LHP rejects mismatches. Generate pipeline resources --------------------------- Run ``lhp generate`` with the target environment. LHP generates Python files **and** synchronizes ``resources/lhp/`` with one ``.pipeline.yml`` per pipeline: .. code-block:: bash lhp generate --env dev Expected output:: Updated 1 bundle resource file(s) Inspect a generated resource file: .. code-block:: yaml :caption: resources/lhp/bronze_load.pipeline.yml # Generated by LakehousePlumber - Bundle Resource for bronze_load resources: pipelines: bronze_load_pipeline: name: bronze_load_pipeline catalog: my_dev_catalog schema: bronze serverless: true libraries: - glob: include: ${workspace.file_path}/generated/${bundle.target}/bronze_load/** root_path: ${workspace.file_path}/generated/${bundle.target}/bronze_load configuration: bundle.sourcePath: ${workspace.file_path}/generated/${bundle.target} The ``catalog`` and ``schema`` values come from ``pipeline_config.yaml``. See :doc:`configure_catalog_schema` for per-pipeline and project-default resolution. Every ``lhp generate`` run wipes ``resources/lhp/`` and regenerates one ``.pipeline.yml`` per pipeline directory. Files outside ``resources/lhp/`` are never touched. To skip bundle sync entirely for one generate run, pass ``--no-bundle``: .. code-block:: bash lhp generate --env dev --no-bundle Directory layout ~~~~~~~~~~~~~~~~ ``resources/lhp/`` is exclusively managed by LHP. Every ``lhp generate`` wipes its contents and regenerates them. Place your own resource YAML files in ``resources/`` (top-level) or any subdirectory other than ``resources/lhp/``. Files outside ``resources/lhp/`` are never touched by LHP, with one exception: the monitoring job YAML at ``resources/.job.yml``, which LHP identifies by its sentinel header (``# Generated by LakehousePlumber - Monitoring Job``) and replaces on each run. Apply environment-specific overrides ------------------------------------ Bundle resource files contain values resolved per environment. LHP applies substitution tokens from ``substitutions/.yaml`` to **every field** in ``pipeline_config.yaml`` before rendering the bundle YAML — node types, policies, notification emails, autoscale bounds, and catalog/schema are all environment-aware. The recommended layout pairs a per-environment pipeline config with matching substitution files: .. code-block:: text config/ ├── pipeline_config-dev.yaml └── pipeline_config-prod.yaml substitutions/ ├── dev.yaml └── prod.yaml Generate each environment with its own config: .. code-block:: bash lhp generate --env dev --pipeline-config config/pipeline_config-dev.yaml lhp generate --env prod --pipeline-config config/pipeline_config-prod.yaml Typical fields that differ across environments include cluster size and node type, ``photon`` and ``edition`` settings, autoscale bounds, notification recipients, ``performance_target`` on jobs, and ``run_as`` service principals. All belong in ``pipeline_config-.yaml`` rather than ``databricks.yml``. For every key the bundle pipeline template accepts (``catalog``, ``schema``, ``clusters``, ``serverless``, ``photon``, ``edition``, ``channel``, ``notifications``, ``tags``, ``configuration``, ``event_log``, ``environment``, ``permissions``), see the configuration reference. Deploy ------ After ``lhp generate``, deploy with the Databricks CLI from the project root: .. code-block:: bash databricks bundle validate --target dev databricks bundle deploy --target dev To run a deployed pipeline: .. code-block:: bash databricks bundle run bronze_load_pipeline --target dev For CI/CD wiring — pinning ``required_lhp_version``, promoting the same commit SHA across targets, and managing approval gates — follow the CI/CD guide linked below. CLI quick reference ------------------- ============================================ ===================================================== Command Purpose ============================================ ===================================================== ``lhp init `` New project, bundle scaffolding included. ``lhp init --no-bundle`` New project, no bundle scaffolding. ``lhp generate --env -pc `` Regenerate Python and rewrite ``resources/lhp/``. ``lhp generate --env --no-bundle`` Generate Python only; skip bundle sync. ``lhp deps --format job --bundle-output`` Generate an orchestration job under ``resources/lhp/``. ``databricks bundle deploy --target `` Deploy bundle. ============================================ ===================================================== See also -------- * :doc:`architecture` — How LHP separates YAML authoring from generated Python and bundle YAML. * :doc:`configure_catalog_schema` — Catalog and schema configuration via ``pipeline_config.yaml``. * :doc:`bundle_config_reference` — Exhaustive bundle configuration schema: ``pipeline_config.yaml`` fields, ``job_config.yaml`` fields, and bundle sync decision matrix. * :doc:`cicd` — Trunk-based promotion, tag-driven deployment, and approval gates for LHP bundle projects. * :doc:`dependency_analysis` — Building orchestration jobs that chain LHP pipelines.