Bundle Configuration Reference ============================== .. meta:: :description: Exhaustive schema reference for LHP's Databricks Asset Bundle integration — CLI flags, pipeline_config.yaml, job_config.yaml, multi-job rules, and generated resource layout. This page catalogs every option LHP exposes for :term:`Databricks Asset Bundle ` (DAB) integration. For the step-by-step walk-through, see :doc:`configure_bundles`. Scope ----- LHP generates DAB pipeline and job resource YAML under ``resources/lhp/``. It does not replace the Databricks CLI and never modifies ``databricks.yml``. Catalog and schema must come from ``pipeline_config.yaml`` — see :doc:`configure_catalog_schema` for the resolution rules. Bundle activation ----------------- CLI flags ~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 25 15 60 * - Flag - Command - Behavior * - ``--no-bundle`` - ``lhp init`` - Skip ``databricks.yml`` and ``resources/lhp/`` scaffolding. Bundle is enabled by default. * - ``--no-bundle`` - ``lhp generate`` - Disable bundle sync even when ``databricks.yml`` exists. * - ``--pipeline-config FILE``, ``-pc FILE`` - ``lhp generate`` - Path to pipeline config YAML (relative to project root). * - ``--force``, ``-f`` - ``lhp generate`` - With ``--pipeline-config``, rewrites existing LHP-owned bundle YAML resource files. Without ``-pc`` the flag has no effect (Python is always regenerated). * - ``--job-config FILE``, ``-jc FILE`` - ``lhp deps`` - Path to job config YAML. * - ``--bundle-output``, ``-b`` - ``lhp deps`` - Write generated job YAML under ``resources/`` for bundle deployment. * - ``--format``, ``-f`` - ``lhp deps`` - One of ``dot``, ``json``, ``text``, ``job``, ``all``. Default ``all``. There is no ``--bundle`` flag. Bundle is on whenever ``databricks.yml`` exists in the project root and ``--no-bundle`` is not set. Project layout -------------- .. code-block:: text / ├── databricks.yml # User-owned; LHP does not modify it ├── lhp.yaml # LHP project config ├── pipelines/ # Flowgroup YAML ├── substitutions/ # .yaml per target ├── config/ # Optional pipeline_config.yaml / job_config.yaml ├── resources/ │ ├── lhp/ # LHP-owned; regenerated each run │ └── *.job.yml # User-owned; LHP leaves them alone └── generated/ # Auto-generated Python; do not edit Files under ``resources/lhp/`` carry a ``# Generated by LakehousePlumber`` header. Manual edits are overwritten on the next generate cycle. Sync behavior ------------- Conservative sync runs after every successful ``lhp generate``. The decision matrix is enforced by ``BundleManager.sync_resources_with_generated_files``: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Generated dir - File in ``resources/lhp/`` - Action * - Exists - LHP-owned - Preserve (no rewrite) * - Exists - LHP-owned + ``--force`` + ``--pipeline-config`` - Regenerate * - Exists - User-edited (no LHP header) - Rename to ``.bkup`` and recreate * - Exists - None - Create * - Missing - Any - Delete * - Any - Multiple files defining the same pipeline - Raise ``BundleResourceError`` Backup names use ``.bkup``, ``.bkup.1``, ``.bkup.2`` on collision. Target binding rules -------------------- - Each target in ``databricks.yml`` must have a matching ``substitutions/.yaml``. - Pipeline names in ``pipelines/`` must use ``[a-zA-Z0-9_-]+`` only. - Generated resource files: ``.pipeline.yml`` (preferred) or ``.yml``. - Generated pipeline resource key: ``_pipeline``. Pipeline configuration ---------------------- File format ~~~~~~~~~~~ Multi-document YAML. The first document holds ``project_defaults``; each subsequent document targets one or more pipelines via the ``pipeline`` key. .. code-block:: yaml :caption: config/pipeline_config.yaml project_defaults: catalog: "${catalog}" schema: "${schema}" serverless: true --- pipeline: - bronze_load serverless: false clusters: - label: default node_type_id: Standard_D16ds_v5 autoscale: min_workers: 2 max_workers: 10 Pass file path with ``--pipeline-config`` / ``-pc``. Top-level keys ~~~~~~~~~~~~~~ Explicitly rendered by ``pipeline_resource.yml.j2``. Source of truth: ``EXPLICITLY_RENDERED_PIPELINE_CONFIG_KEYS`` in ``src/lhp/bundle/manager.py``. .. list-table:: :header-rows: 1 :widths: 20 12 18 50 * - Key - Type - Default - Notes * - ``catalog`` - string - none - Unity Catalog name. Required if ``schema`` is set. Supports ``${token}``. * - ``schema`` - string - none - Schema name. Required if ``catalog`` is set. Supports ``${token}``. * - ``serverless`` - bool - ``true`` - Pipeline compute mode. * - ``edition`` - string - ``ADVANCED`` - One of ``CORE``, ``PRO``, ``ADVANCED``. Ignored when ``serverless: true``. * - ``channel`` - string - ``CURRENT`` - One of ``CURRENT``, ``PREVIEW``. * - ``continuous`` - bool - ``false`` - Streaming/continuous mode. * - ``photon`` - bool - none - Photon engine. Non-serverless only. * - ``clusters`` - list - none - Cluster specs. Used when ``serverless: false``. See `Cluster keys`_. * - ``configuration`` - dict - none - Spark/DLT properties. Values must be quoted strings. * - ``notifications`` - list - none - Email recipients + alert types. See `Notification keys`_. * - ``tags`` - dict - none - Pipeline tags. Non-serverless only. * - ``event_log`` - dict or ``false`` - none - Per-pipeline event log override. ``false`` opts out of project-level event_log. * - ``environment`` - dict - none - Pip dependencies passed through as-is. * - ``permissions`` - list - none - Pipeline ACL entries. See `Permission keys`_. Any other top-level key is rendered verbatim via the pass-through filter, including ``run_as``, ``trigger``, ``budget_policy_id``, ``edit_mode``, and any Databricks Pipelines API field added after your LHP release. Cluster keys ~~~~~~~~~~~~ Each entry under ``clusters``: .. list-table:: :header-rows: 1 :widths: 25 60 * - Key - Notes * - ``label`` - Required. ``default`` for the main cluster. * - ``node_type_id`` - Optional. Mutually exclusive with ``instance_pool_id``. * - ``instance_pool_id`` - Optional. * - ``driver_node_type_id`` - Optional. * - ``driver_instance_pool_id`` - Optional. * - ``policy_id`` - Optional cluster policy. * - ``autoscale.min_workers`` - Required when ``autoscale`` set. * - ``autoscale.max_workers`` - Required when ``autoscale`` set. * - ``autoscale.mode`` - Optional. Typically ``ENHANCED``. Notification keys ~~~~~~~~~~~~~~~~~ Each entry under ``notifications``: - ``email_recipients``: list of email strings. - ``alerts``: list of alert types — ``on-update-success``, ``on-update-failure``, ``on-update-fatal-failure``, ``on-flow-failure``. Permission keys ~~~~~~~~~~~~~~~ Each entry under ``permissions``: - ``level``: one of ``CAN_VIEW``, ``CAN_RUN``, ``CAN_MANAGE``. - ``user_name``, ``group_name``, or ``service_principal_name``: exactly one. Configuration block ~~~~~~~~~~~~~~~~~~~ The ``configuration`` dict is merged with LHP's mandatory ``bundle.sourcePath`` entry. All values **must** be quoted strings; unquoted booleans or integers raise a validation error. Any user-supplied ``bundle.sourcePath`` is silently ignored. Monitoring alias ~~~~~~~~~~~~~~~~ The reserved key ``__eventlog_monitoring`` under ``pipeline:`` targets the monitoring pipeline generated by ``monitoring`` in ``lhp.yaml``. See :doc:`monitoring_reference` for resolution rules. Merge precedence ~~~~~~~~~~~~~~~~ ``DEFAULT_PIPELINE_CONFIG`` → ``project_defaults`` → pipeline-specific. Deep merge for dicts; lists are replaced wholesale. Substitution applies to every field. Tokens resolve from ``substitutions/.yaml`` at generate time. Catalog/schema validation ~~~~~~~~~~~~~~~~~~~~~~~~~ - Both ``catalog`` and ``schema`` must be set, or neither. - Both must be non-empty after substitution. - Missing or partial definition raises ``BundleResourceError`` with ``docs_reference="docs/configure_catalog_schema.rst"``. See :doc:`configure_catalog_schema` for per-pipeline and ``project_defaults`` configuration, resolution order, and the full error reference. Job configuration ----------------- File format ~~~~~~~~~~~ .. code-block:: yaml :caption: config/job_config.yaml project_defaults: max_concurrent_runs: 1 performance_target: STANDARD queue: enabled: true --- job_name: - bronze_ingestion_job timeout_seconds: 7200 schedule: quartz_cron_expression: "0 0 2 * * ?" timezone_id: America/New_York Pass with ``--job-config`` / ``-jc``. Use ``--bundle-output`` to write the job file under ``resources/`` for bundle deployment. Top-level keys ~~~~~~~~~~~~~~ Explicitly rendered by ``job_resource.yml.j2``. Source of truth: ``EXPLICITLY_RENDERED_JOB_CONFIG_KEYS`` in ``src/lhp/core/services/job_generator.py``. Defaults from ``JobGenerator.DEFAULT_JOB_CONFIG``. .. list-table:: :header-rows: 1 :widths: 25 18 57 * - Key - Default - Notes * - ``max_concurrent_runs`` - ``1`` - Concurrent run cap. * - ``performance_target`` - ``STANDARD`` - One of ``STANDARD``, ``PERFORMANCE_OPTIMIZED``. * - ``queue.enabled`` - ``true`` - Job queueing. * - ``timeout_seconds`` - none - Job-level timeout. * - ``tags`` - none - Job tag dict. * - ``email_notifications`` - none - ``on_start`` / ``on_success`` / ``on_failure`` lists of recipients. * - ``webhook_notifications`` - none - ``on_start`` / ``on_success`` / ``on_failure`` lists of ``{id}`` entries. * - ``permissions`` - none - Job ACL entries (same shape as pipeline ``permissions``). * - ``schedule.quartz_cron_expression`` - none - Required when ``schedule`` set. * - ``schedule.timezone_id`` - none - Required when ``schedule`` set. * - ``schedule.pause_status`` - none - ``PAUSED`` or ``UNPAUSED``. * - ``notebook_cluster`` - none - Monitoring job only. ``new_cluster`` dict or ``existing_cluster_id``. * - ``generate_master_job`` - ``true`` - LHP-internal; controls master-job emission. Never written to output. * - ``master_job_name`` - none (auto) - LHP-internal; overrides the master-job name. Never written to output. Any other top-level key passes through as-is. Common examples: ``trigger.file_arrival``, ``continuous``, ``run_as.service_principal_name``, ``git_source``, ``health``, ``parameters``, ``environments``, ``edit_mode``, ``budget_policy_id``. LHP does not validate pass-through fields against the Databricks Jobs API; misspellings surface at deploy time. Merge precedence ~~~~~~~~~~~~~~~~ ``DEFAULT_JOB_CONFIG`` → ``project_defaults`` → job-specific. Deep merge for dicts; lists are replaced wholesale. Author key order preserved. Multi-job orchestration ----------------------- Set ``job_name`` on flowgroups in ``pipelines/*.yaml`` to split execution into named jobs. .. code-block:: yaml :caption: pipelines/bronze/customer.yaml (excerpt) pipeline: data_bronze flowgroup: customer_ingestion job_name: - bronze_ingestion_job Rules ~~~~~ - All-or-nothing: if any flowgroup sets ``job_name``, every flowgroup must set it. - Format: ``^[a-zA-Z0-9_-]+$``. - ``--pipeline`` filter is rejected in multi-job mode. Generated artifacts ~~~~~~~~~~~~~~~~~~~ .. code-block:: text resources/ ├── .job.yml # One per unique job_name └── _master.job.yml # Master orchestrator The master job wires individual jobs together via ``task_key`` references with ``depends_on`` edges resolved from dependency analysis. Generated resource example -------------------------- .. code-block:: yaml :caption: resources/lhp/bronze_load.pipeline.yml # Generated by LakehousePlumber - Bundle Resource for bronze_load resources: pipelines: bronze_load_pipeline: name: bronze_load_pipeline catalog: ${var.catalog} schema: ${var.bronze_schema} serverless: true libraries: - glob: include: ${workspace.file_path}/generated/${bundle.target}/bronze_load/** root_path: ${workspace.file_path}/generated/${bundle.target}/bronze_load configuration: bundle.sourcePath: ${workspace.file_path}/generated/${bundle.target} LHP always emits ``libraries`` as a glob, ``root_path`` under ``${workspace.file_path}/generated/${bundle.target}``, and the ``bundle.sourcePath`` configuration entry. Configuration templates ----------------------- ``lhp init`` writes starter templates under ``config/``: - ``config/pipeline_config.yaml.tmpl`` - ``config/job_config.yaml.tmpl`` Copy each to drop the ``.tmpl`` suffix before editing. Version enforcement ------------------- The optional ``required_lhp_version`` key in ``lhp.yaml`` pins generation to a specific LHP release range, so the same project produces the same Python output across development and CI. ``lhp validate`` and ``lhp generate`` fail when the installed LHP version falls outside the range. Informational commands such as ``lhp show`` skip the check so you can inspect a project even on a mismatched LHP version. LHP accepts any `PEP 440 `_ version specifier: .. code-block:: yaml :caption: lhp.yaml — version specifier examples # Exact pin required_lhp_version: "==0.4.1" # Allow patch updates only (equivalent to >=0.4.1,<0.5.0) required_lhp_version: "~=0.4.1" # Range with exclusion required_lhp_version: ">=0.4.1,<0.5.0,!=0.4.3" # Allow minor updates required_lhp_version: ">=0.4.0,<1.0.0" Projects without ``required_lhp_version`` run on any installed LHP version. Emergency bypass ~~~~~~~~~~~~~~~~ Set ``LHP_IGNORE_VERSION=1`` to skip version checking temporarily: .. code-block:: bash :caption: Bypass version checking export LHP_IGNORE_VERSION=1 lhp generate -e dev # Or inline for a single command LHP_IGNORE_VERSION=1 lhp validate -e prod .. warning:: ``LHP_IGNORE_VERSION=1`` defeats the purpose of version pinning. Reserve it for incident response, not regular workflows. CI/CD integration ~~~~~~~~~~~~~~~~~ Install the LHP version matching the project requirement before running ``lhp validate`` or ``lhp generate``: .. code-block:: bash :caption: CI pipeline with version enforcement # Install the exact range from lhp.yaml pip install "lakehouse-plumber$(yq -r .required_lhp_version lhp.yaml)" # Or pin a known-good range pip install "lakehouse-plumber>=0.4.1,<0.5.0" # Validate and generate (fail-fast on mismatch) lhp validate -e prod lhp generate -e prod Error codes ----------- - ``BundleResourceError`` — Missing, incomplete, or empty ``catalog``/``schema`` after substitution (carries ``docs_reference="docs/configure_catalog_schema.rst"``). Also raised on multiple files defining the same pipeline, malformed YAML in ``resources/lhp/``, or filesystem failure. See :doc:`configure_catalog_schema` for catalog/schema cases. - ``LHPConfigError 028`` — ``BundleManager`` initialized with no ``project_root``. See also -------- - :doc:`configure_bundles` — Bundle setup walk-through. - :doc:`configure_catalog_schema` — Catalog and schema configuration via ``pipeline_config.yaml``. - :doc:`cicd` — CI/CD patterns and deployment workflows. - :doc:`architecture` — How LHP's generation and sync layers fit together. - :doc:`dependency_analysis` — Pipeline dependency graph and orchestration job generation. - :doc:`monitoring_reference` — Event log and monitoring pipeline schema. - :doc:`errors_reference` — Full error code catalog.