Configure Bundles
=================

.. meta::
   :description: How to integrate a Lakehouse Plumber project with Databricks Asset Bundles — initialization, bundle structure, generation, and deployment.

This how-to walks you through wiring an existing Lakehouse Plumber (LHP) project to
:term:`Databricks Asset Bundles <DAB>`: initializing with bundle scaffolding, the file layout LHP
creates, the ``lhp generate`` → bundle YAML flow, environment overrides, and the
``databricks bundle deploy`` invocation.

Bundle integration is **enabled by default** for new LHP projects. The text below
documents the integration knobs LHP exposes. For a Databricks-level introduction to
Asset Bundles, see the Databricks documentation.

Initialize a bundle-enabled project
-----------------------------------

Run ``lhp init`` without any bundle flag. Bundle scaffolding is created by default:

.. code-block:: bash

   lhp init my_data_platform
   cd my_data_platform

To create a project **without** bundle scaffolding, pass ``--no-bundle``:

.. code-block:: bash

   lhp init my_data_platform --no-bundle

There is no ``--bundle`` flag. Opt out only — bundle is on by default.

What LHP creates
----------------

``lhp init`` with bundle support writes two bundle-specific artifacts at the project
root:

``databricks.yml``
   The top-level bundle definition. LHP pre-fills ``bundle.name``, ``bundle.uuid``,
   an ``include`` block, and three ``targets``: ``dev``, ``tst``, and ``prod``.
   Each target has a placeholder ``workspace.host`` you must edit.

``resources/lhp/``
   The directory LHP owns. Every pipeline you define under ``pipelines/`` produces
   one ``<pipeline_name>.pipeline.yml`` file here, regenerated on each
   ``lhp generate`` run.

LHP only writes inside ``resources/lhp/``. Hand-written bundle resource files
(jobs, dashboards, secret scopes, schemas) belong under ``resources/`` directly —
LHP never touches them. The default ``include`` block in ``databricks.yml`` picks
up both directories::

   include:
     - resources/*.yml
     - resources/lhp/*.yml

Edit ``databricks.yml``
-----------------------

Open ``databricks.yml`` and replace the ``<databricks_host>`` placeholder in each
target with your workspace URL. The generated targets look like this:

.. code-block:: yaml
   :caption: databricks.yml (excerpt)

   targets:
     dev:
       mode: development
       default: true
       workspace:
         host: https://your-dev-workspace.cloud.databricks.com
         root_path: ~/.bundle/${bundle.name}/${bundle.target}

     prod:
       mode: production
       workspace:
         host: https://your-prod-workspace.cloud.databricks.com
         root_path: /Workspace/<code_location>/.bundle/${bundle.name}/${bundle.target}
       run_as:
         service_principal_name: <service_principal_id>

.. note::
   Target names in ``databricks.yml`` MUST match the substitution file names under
   ``substitutions/``. If a target ``tst`` exists in ``databricks.yml``, a file
   ``substitutions/tst.yaml`` must exist too. LHP rejects mismatches.

Generate pipeline resources
---------------------------

Run ``lhp generate`` with the target environment. LHP generates Python files
**and** synchronizes ``resources/lhp/`` with one ``.pipeline.yml`` per pipeline:

.. code-block:: bash

   lhp generate --env dev

Expected output::

   Updated 1 bundle resource file(s)

Inspect a generated resource file:

.. code-block:: yaml
   :caption: resources/lhp/bronze_load.pipeline.yml

   # Generated by LakehousePlumber - Bundle Resource for bronze_load
   resources:
     pipelines:
       bronze_load_pipeline:
         name: bronze_load_pipeline
         catalog: my_dev_catalog
         schema: bronze
         serverless: true
         libraries:
           - glob:
               include: ${workspace.file_path}/generated/${bundle.target}/bronze_load/**
         root_path: ${workspace.file_path}/generated/${bundle.target}/bronze_load
         configuration:
           bundle.sourcePath: ${workspace.file_path}/generated/${bundle.target}

The ``catalog`` and ``schema`` values come from ``pipeline_config.yaml``. See
:doc:`configure_catalog_schema` for per-pipeline and project-default resolution.

Every ``lhp generate`` run wipes ``resources/lhp/`` and regenerates one
``<pipeline_name>.pipeline.yml`` per pipeline directory. Files outside
``resources/lhp/`` are never touched.

To skip bundle sync entirely for one generate run, pass ``--no-bundle``:

.. code-block:: bash

   lhp generate --env dev --no-bundle

Directory layout
~~~~~~~~~~~~~~~~

``resources/lhp/`` is exclusively managed by LHP. Every ``lhp generate`` wipes
its contents and regenerates them. Place your own resource YAML files in
``resources/`` (top-level) or any subdirectory other than ``resources/lhp/``.
Files outside ``resources/lhp/`` are never touched by LHP, with one exception:
the monitoring job YAML at ``resources/<name>.job.yml``, which LHP identifies
by its sentinel header (``# Generated by LakehousePlumber - Monitoring Job``)
and replaces on each run.

Apply environment-specific overrides
------------------------------------

Bundle resource files contain values resolved per environment. LHP applies
substitution tokens from ``substitutions/<env>.yaml`` to **every field** in
``pipeline_config.yaml`` before rendering the bundle YAML — node types, policies,
notification emails, autoscale bounds, and catalog/schema are all
environment-aware.

The recommended layout pairs a per-environment pipeline config with matching
substitution files:

.. code-block:: text

   config/
   ├── pipeline_config-dev.yaml
   └── pipeline_config-prod.yaml
   substitutions/
   ├── dev.yaml
   └── prod.yaml

Generate each environment with its own config:

.. code-block:: bash

   lhp generate --env dev --pipeline-config config/pipeline_config-dev.yaml
   lhp generate --env prod --pipeline-config config/pipeline_config-prod.yaml

Typical fields that differ across environments include cluster size and node
type, ``photon`` and ``edition`` settings, autoscale bounds, notification
recipients, ``performance_target`` on jobs, and ``run_as`` service principals.
All belong in ``pipeline_config-<env>.yaml`` rather than ``databricks.yml``.

For every key the bundle pipeline template accepts (``catalog``, ``schema``,
``clusters``, ``serverless``, ``photon``, ``edition``, ``channel``,
``notifications``, ``tags``, ``configuration``, ``event_log``, ``environment``,
``permissions``), see the configuration reference.

Deploy
------

After ``lhp generate``, deploy with the Databricks CLI from the project root:

.. code-block:: bash

   databricks bundle validate --target dev
   databricks bundle deploy --target dev

To run a deployed pipeline:

.. code-block:: bash

   databricks bundle run bronze_load_pipeline --target dev

For CI/CD wiring — pinning ``required_lhp_version``, promoting the same commit
SHA across targets, and managing approval gates — follow the CI/CD guide linked
below.

CLI quick reference
-------------------

============================================ =====================================================
Command                                      Purpose
============================================ =====================================================
``lhp init <name>``                          New project, bundle scaffolding included.
``lhp init <name> --no-bundle``              New project, no bundle scaffolding.
``lhp generate --env <env> -pc <file>``      Regenerate Python and rewrite ``resources/lhp/``.
``lhp generate --env <env> --no-bundle``     Generate Python only; skip bundle sync.
``lhp deps --format job --bundle-output``    Generate an orchestration job under ``resources/lhp/``.
``databricks bundle deploy --target <env>``  Deploy bundle.
============================================ =====================================================

See also
--------

* :doc:`architecture` — How LHP separates YAML authoring from generated Python and bundle YAML.
* :doc:`configure_catalog_schema` — Catalog and schema configuration via
  ``pipeline_config.yaml``.
* :doc:`bundle_config_reference` — Exhaustive bundle configuration schema:
  ``pipeline_config.yaml`` fields, ``job_config.yaml`` fields, and bundle sync
  decision matrix.
* :doc:`cicd` — Trunk-based promotion, tag-driven deployment, and approval gates
  for LHP bundle projects.
* :doc:`dependency_analysis` — Building orchestration jobs that chain LHP
  pipelines.