Glossary
========

.. meta::
   :description: Definitions of Lakehouse Plumber (LHP) terms — Pipeline, FlowGroup, Action, Preset, Template, Blueprint, substitution syntaxes, and Lakeflow target types.

Definitions of Lakehouse Plumber (LHP) terms used throughout this documentation.
Each entry is the canonical wording; other pages link here with ``:term:`` roles.

.. glossary::
   :sorted:

   Pipeline
       A logical grouping label declared by every FlowGroup as ``pipeline: <name>``.
       All FlowGroups sharing that name generate Python files into the same output
       folder and produce one Databricks Lakeflow Declarative Pipeline resource per
       Pipeline name. A Pipeline is the deployment unit for Asset Bundles. See
       :doc:`architecture`.

   FlowGroup
       A logical slice of a Pipeline — typically one source table or business
       entity — composed of an ordered list of Actions. One YAML file can hold one
       or many FlowGroups and declares its parent Pipeline, its own name, optional
       ``job_name``, optional local ``variables``, applied presets, and an optional
       template reference.

   Action
       A single step inside a FlowGroup. Actions have four top-level ``type`` values
       (``load``, ``transform``, ``write``, ``test``), each with sub-types selected
       by an additional discriminator field. The full catalogue is in
       :doc:`actions/index`.

   Load action
       An Action with ``type: load`` that reads external data into a temporary view.
       Sub-types: ``cloudfiles``, ``delta``, ``sql``, ``python``, ``jdbc``, ``kafka``,
       ``custom_datasource``. One Load action per data source.

   Transform action
       An Action with ``type: transform`` that reshapes or checks data already
       loaded into a view. Sub-types: ``sql``, ``python``, ``data_quality``,
       ``schema``, ``temp_table``. Zero or many per FlowGroup.

   Write action
       An Action with ``type: write`` that persists the final dataset. Sub-types:
       ``streaming_table``, ``materialized_view``, ``sink``. One Write action per
       output table or sink.

   Test action
       An Action with ``type: test`` that asserts a property of the data —
       row count, uniqueness, referential integrity, completeness, range,
       schema match, lookup coverage, custom SQL, or custom expectations. Test
       actions only run when ``lhp generate`` is invoked with ``--include-tests``.

   Preset
       A YAML file of default *values* deep-merged into Actions matched by type.
       Presets resolve before substitutions and before validation; explicit
       FlowGroup config wins over preset defaults. Presets may extend other
       presets via ``extends:``. See :doc:`presets_reference`.

   Template
       A YAML file of parametrised *actions* applied to a FlowGroup via
       ``use_template:`` and ``template_parameters:``. LHP renders Jinja2
       ``{{ }}`` placeholders inside the template and appends the rendered
       actions to the FlowGroup. See :doc:`templates_reference`.

   Blueprint
       A higher-order template that instantiates *multiple* FlowGroups at once.
       Where a template parameterises actions inside one FlowGroup, a blueprint
       parameterises the FlowGroups themselves. Blueprints declare ``parameters``
       and a ``flowgroups`` array of FlowGroup specs. See :doc:`blueprints`.

   Blueprint instance
       A YAML file that supplies parameter values to a Blueprint via
       ``use_blueprint:`` plus a nested ``parameters:`` block. The legacy
       ``blueprint:`` plus flat top-level keys form is deprecated and removed in
       V0.9; mixing the two raises ``LHP-CFG-061``.

   Substitution
       Token replacement in YAML resolved by LHP before code generation. Four
       syntaxes resolve in a fixed order: ``%{local_var}`` then
       ``{{ template_param }}`` then ``${env_token}`` then ``${secret:scope/key}``.
       The order is enforced inside ``FlowgroupProcessor`` so each layer's output
       can feed the next. See :doc:`substitutions`.

   Local variable
       A FlowGroup-scoped value declared under ``variables:`` and referenced as
       ``%{name}``. Local variables resolve first in the substitution order, so
       their output can contain template parameters, environment tokens, or
       secret references that later layers resolve.

   Environment token
       A substitution of the form ``${name}`` resolved against
       ``substitutions/<env>.yaml``. The same FlowGroup YAML generates different
       Python files per environment because environment tokens differ between
       ``dev``, ``staging``, and ``prod``. The bare ``{name}`` form is deprecated.

   Streaming table
       A Lakeflow Declarative Pipelines target type for incremental, append-only
       or CDC datasets. Generated by Write actions with
       ``write_target.type: streaming_table``. Supports standard, CDC, and
       snapshot CDC modes plus append flows from multiple sources.

   Materialized view
       A Lakeflow Declarative Pipelines target type for full-refresh datasets
       defined by a SQL query. Generated by Write actions with
       ``write_target.type: materialized_view``. Supports optional
       ``refresh_schedule`` and inline ``sql`` or external ``sql_path``.

   Sink
       A Write action with ``write_target.type: sink`` that emits data to an
       external system rather than a Unity Catalog table. Supported sink types are
       ``delta``, ``kafka``, ``custom``, and ``foreachbatch``.

   Append flow
       A Lakeflow Declarative Pipelines construct that appends rows from a source
       into a streaming table. LHP emits one append flow per source when a Write
       action lists multiple sources, and one combined flow when a single source
       feeds the target.

   Snapshot CDC
       A streaming table mode where LHP invokes a user-supplied
       ``source_function`` to produce point-in-time snapshots, then applies them
       via Lakeflow's ``AUTO CDC FROM SNAPSHOT`` API. Configured via
       ``write_target.mode: snapshot_cdc`` and ``snapshot_cdc_config``.

   CDC
       Change Data Capture — the practice of replicating row-level inserts,
       updates, and deletes from a source system into a target table. LHP exposes
       CDC through streaming table Write actions with ``mode: cdc`` or
       ``mode: snapshot_cdc``.

   SCD
       Slowly Changing Dimension — a dimensional modelling pattern for tracking
       historical attribute changes. LHP supports SCD Type 1 (overwrite) and
       Type 2 (history rows with effective dates) through streaming table CDC
       configuration.

   Schema hints
       A semicolon-separated DDL string passed to Auto Loader as the
       ``cloudFiles.schemaHints`` option, pinning column types during schema
       inference. LHP can load schema hints from an external file referenced by
       path; the ``schema`` transform action also generates schema hints from
       structured YAML.

   Expectation
       A data quality rule applied by a ``data_quality`` transform action.
       Expectations are loaded from an ``expectations_file`` and translated into
       Lakeflow ``@dlt.expect``-family decorators. Combined with quarantine mode
       they support DLQ recycling.

   Quarantine
       A data quality mode (``mode: quarantine``) that splits a stream into
       passing rows and failing rows, writing failures to a configured
       ``dlq_table`` for replay rather than failing the pipeline. Also called
       DLQ recycling. See :doc:`quarantine`.

   Operational metadata
       Auto-injected columns (timestamps, source file paths, pipeline run IDs,
       and similar) added to Write targets. Defined under ``operational_metadata``
       in ``lhp.yaml`` as named columns and presets, then enabled per FlowGroup or
       Action. See :doc:`operational_metadata`.

   Event log monitoring
       A project-level feature that generates two artefacts from
       ``monitoring:`` in ``lhp.yaml``: a notebook that unions pipeline event
       logs into a Delta table, and a Lakeflow pipeline of materialized views
       reading that table. A Databricks job chains the two.

   Skill
       The LHP Claude Code skill installed by ``lhp skill install``. The skill
       package ships inside LHP and is copied to ``.claude/skills/lhp/`` (or
       ``~/.claude/skills/lhp/`` with ``--user``); it provides agent-targeted
       context for authoring LHP configurations.

   Lakeflow Declarative Pipeline
       The current canonical name for the Databricks framework that LHP
       generates Python for. LHP code uses the ``pyspark.pipelines`` API. This
       framework was previously called Delta Live Tables (DLT) and Spark
       Declarative Pipelines (SDP). All three names refer to the same runtime
       across LHP history.

   DLT
       Delta Live Tables — the historical name for what is now called Lakeflow
       Declarative Pipelines. The term survives in older documentation,
       generated Python comments, and migration notes; new content uses
       *Lakeflow Declarative Pipelines*.

   SDP
       Spark Declarative Pipelines — a transitional name for Lakeflow
       Declarative Pipelines used between the DLT and Lakeflow renames. LHP
       internal modules and generated imports reflect this lineage
       (``pyspark.pipelines``).

   DAB
       Databricks Asset Bundle — the deployment format LHP targets. When a
       project is initialised with ``lhp init --bundle``, generation also emits
       resource YAML under ``resources/lhp/`` so ``databricks bundle deploy``
       can deploy the pipelines.

   FlowgroupProcessor
       The internal service in ``core/services/flowgroup_processor.py`` that
       runs each FlowGroup through the substitution layer cake, preset merge,
       template expansion, and Pydantic validation. Referenced here because
       error messages and several documentation pages cite it by name.

See also
--------

* :doc:`architecture` — explanation of how these terms compose.
* :doc:`how_to_index` — task-oriented how-to landing page.