Glossary ======== .. meta:: :description: Definitions of Lakehouse Plumber (LHP) terms — Pipeline, FlowGroup, Action, Preset, Template, Blueprint, substitution syntaxes, and Lakeflow target types. Definitions of Lakehouse Plumber (LHP) terms used throughout this documentation. Each entry is the canonical wording; other pages link here with ``:term:`` roles. .. glossary:: :sorted: Pipeline A logical grouping label declared by every FlowGroup as ``pipeline: ``. All FlowGroups sharing that name generate Python files into the same output folder and produce one Databricks Lakeflow Declarative Pipeline resource per Pipeline name. A Pipeline is the deployment unit for Asset Bundles. See :doc:`architecture`. FlowGroup A logical slice of a Pipeline — typically one source table or business entity — composed of an ordered list of Actions. One YAML file can hold one or many FlowGroups and declares its parent Pipeline, its own name, optional ``job_name``, optional local ``variables``, applied presets, and an optional template reference. Action A single step inside a FlowGroup. Actions have four top-level ``type`` values (``load``, ``transform``, ``write``, ``test``), each with sub-types selected by an additional discriminator field. The full catalogue is in :doc:`actions/index`. Load action An Action with ``type: load`` that reads external data into a temporary view. Sub-types: ``cloudfiles``, ``delta``, ``sql``, ``python``, ``jdbc``, ``kafka``, ``custom_datasource``. One Load action per data source. Transform action An Action with ``type: transform`` that reshapes or checks data already loaded into a view. Sub-types: ``sql``, ``python``, ``data_quality``, ``schema``, ``temp_table``. Zero or many per FlowGroup. Write action An Action with ``type: write`` that persists the final dataset. Sub-types: ``streaming_table``, ``materialized_view``, ``sink``. One Write action per output table or sink. Test action An Action with ``type: test`` that asserts a property of the data — row count, uniqueness, referential integrity, completeness, range, schema match, lookup coverage, custom SQL, or custom expectations. Test actions only run when ``lhp generate`` is invoked with ``--include-tests``. Preset A YAML file of default *values* deep-merged into Actions matched by type. Presets resolve before substitutions and before validation; explicit FlowGroup config wins over preset defaults. Presets may extend other presets via ``extends:``. See :doc:`presets_reference`. Template A YAML file of parametrised *actions* applied to a FlowGroup via ``use_template:`` and ``template_parameters:``. LHP renders Jinja2 ``{{ }}`` placeholders inside the template and appends the rendered actions to the FlowGroup. See :doc:`templates_reference`. Blueprint A higher-order template that instantiates *multiple* FlowGroups at once. Where a template parameterises actions inside one FlowGroup, a blueprint parameterises the FlowGroups themselves. Blueprints declare ``parameters`` and a ``flowgroups`` array of FlowGroup specs. See :doc:`blueprints`. Blueprint instance A YAML file that supplies parameter values to a Blueprint via ``use_blueprint:`` plus a nested ``parameters:`` block. The legacy ``blueprint:`` plus flat top-level keys form is deprecated and removed in V0.9; mixing the two raises ``LHP-CFG-061``. Substitution Token replacement in YAML resolved by LHP before code generation. Four syntaxes resolve in a fixed order: ``%{local_var}`` then ``{{ template_param }}`` then ``${env_token}`` then ``${secret:scope/key}``. The order is enforced inside ``FlowgroupProcessor`` so each layer's output can feed the next. See :doc:`substitutions`. Local variable A FlowGroup-scoped value declared under ``variables:`` and referenced as ``%{name}``. Local variables resolve first in the substitution order, so their output can contain template parameters, environment tokens, or secret references that later layers resolve. Environment token A substitution of the form ``${name}`` resolved against ``substitutions/.yaml``. The same FlowGroup YAML generates different Python files per environment because environment tokens differ between ``dev``, ``staging``, and ``prod``. The bare ``{name}`` form is deprecated. Streaming table A Lakeflow Declarative Pipelines target type for incremental, append-only or CDC datasets. Generated by Write actions with ``write_target.type: streaming_table``. Supports standard, CDC, and snapshot CDC modes plus append flows from multiple sources. Materialized view A Lakeflow Declarative Pipelines target type for full-refresh datasets defined by a SQL query. Generated by Write actions with ``write_target.type: materialized_view``. Supports optional ``refresh_schedule`` and inline ``sql`` or external ``sql_path``. Sink A Write action with ``write_target.type: sink`` that emits data to an external system rather than a Unity Catalog table. Supported sink types are ``delta``, ``kafka``, ``custom``, and ``foreachbatch``. Append flow A Lakeflow Declarative Pipelines construct that appends rows from a source into a streaming table. LHP emits one append flow per source when a Write action lists multiple sources, and one combined flow when a single source feeds the target. Snapshot CDC A streaming table mode where LHP invokes a user-supplied ``source_function`` to produce point-in-time snapshots, then applies them via Lakeflow's ``AUTO CDC FROM SNAPSHOT`` API. Configured via ``write_target.mode: snapshot_cdc`` and ``snapshot_cdc_config``. CDC Change Data Capture — the practice of replicating row-level inserts, updates, and deletes from a source system into a target table. LHP exposes CDC through streaming table Write actions with ``mode: cdc`` or ``mode: snapshot_cdc``. SCD Slowly Changing Dimension — a dimensional modelling pattern for tracking historical attribute changes. LHP supports SCD Type 1 (overwrite) and Type 2 (history rows with effective dates) through streaming table CDC configuration. Schema hints A semicolon-separated DDL string passed to Auto Loader as the ``cloudFiles.schemaHints`` option, pinning column types during schema inference. LHP can load schema hints from an external file referenced by path; the ``schema`` transform action also generates schema hints from structured YAML. Expectation A data quality rule applied by a ``data_quality`` transform action. Expectations are loaded from an ``expectations_file`` and translated into Lakeflow ``@dlt.expect``-family decorators. Combined with quarantine mode they support DLQ recycling. Quarantine A data quality mode (``mode: quarantine``) that splits a stream into passing rows and failing rows, writing failures to a configured ``dlq_table`` for replay rather than failing the pipeline. Also called DLQ recycling. See :doc:`quarantine`. Operational metadata Auto-injected columns (timestamps, source file paths, pipeline run IDs, and similar) added to Write targets. Defined under ``operational_metadata`` in ``lhp.yaml`` as named columns and presets, then enabled per FlowGroup or Action. See :doc:`operational_metadata`. Event log monitoring A project-level feature that generates two artefacts from ``monitoring:`` in ``lhp.yaml``: a notebook that unions pipeline event logs into a Delta table, and a Lakeflow pipeline of materialized views reading that table. A Databricks job chains the two. Skill The LHP Claude Code skill installed by ``lhp skill install``. The skill package ships inside LHP and is copied to ``.claude/skills/lhp/`` (or ``~/.claude/skills/lhp/`` with ``--user``); it provides agent-targeted context for authoring LHP configurations. Lakeflow Declarative Pipeline The current canonical name for the Databricks framework that LHP generates Python for. LHP code uses the ``pyspark.pipelines`` API. This framework was previously called Delta Live Tables (DLT) and Spark Declarative Pipelines (SDP). All three names refer to the same runtime across LHP history. DLT Delta Live Tables — the historical name for what is now called Lakeflow Declarative Pipelines. The term survives in older documentation, generated Python comments, and migration notes; new content uses *Lakeflow Declarative Pipelines*. SDP Spark Declarative Pipelines — a transitional name for Lakeflow Declarative Pipelines used between the DLT and Lakeflow renames. LHP internal modules and generated imports reflect this lineage (``pyspark.pipelines``). DAB Databricks Asset Bundle — the deployment format LHP targets. When a project is initialised with ``lhp init --bundle``, generation also emits resource YAML under ``resources/lhp/`` so ``databricks bundle deploy`` can deploy the pipelines. FlowgroupProcessor The internal service in ``core/services/flowgroup_processor.py`` that runs each FlowGroup through the substitution layer cake, preset merge, template expansion, and Pydantic validation. Referenced here because error messages and several documentation pages cite it by name. See also -------- * :doc:`architecture` — explanation of how these terms compose. * :doc:`how_to_index` — task-oriented how-to landing page.