Glossary

Definitions of Lakehouse Plumber (LHP) terms used throughout this documentation. Each entry is the canonical wording; other pages link here with :term: roles.

Action

A single step inside a FlowGroup. Actions have four top-level type values (load, transform, write, test), each with sub-types selected by an additional discriminator field. The full catalogue is in Actions Reference.

Append flow

A Lakeflow Declarative Pipelines construct that appends rows from a source into a streaming table. LHP emits one append flow per source when a Write action lists multiple sources, and one combined flow when a single source feeds the target.

Blueprint

A higher-order template that instantiates multiple FlowGroups at once. Where a template parameterises actions inside one FlowGroup, a blueprint parameterises the FlowGroups themselves. Blueprints declare parameters and a flowgroups array of FlowGroup specs. See Blueprints.

Blueprint instance

A YAML file that supplies parameter values to a Blueprint via use_blueprint: plus a nested parameters: block. The legacy blueprint: plus flat top-level keys form is deprecated and removed in V0.9; mixing the two raises LHP-CFG-061.

CDC

Change Data Capture — the practice of replicating row-level inserts, updates, and deletes from a source system into a target table. LHP exposes CDC through streaming table Write actions with mode: cdc or mode: snapshot_cdc.

DAB

Databricks Asset Bundle — the deployment format LHP targets. When a project is initialised with lhp init --bundle, generation also emits resource YAML under resources/lhp/ so databricks bundle deploy can deploy the pipelines.

DLT

Delta Live Tables — the historical name for what is now called Lakeflow Declarative Pipelines. The term survives in older documentation, generated Python comments, and migration notes; new content uses Lakeflow Declarative Pipelines.

Environment token

A substitution of the form ${name} resolved against substitutions/<env>.yaml. The same FlowGroup YAML generates different Python files per environment because environment tokens differ between dev, staging, and prod. The bare {name} form is deprecated.

Event log monitoring

A project-level feature that generates two artefacts from monitoring: in lhp.yaml: a notebook that unions pipeline event logs into a Delta table, and a Lakeflow pipeline of materialized views reading that table. A Databricks job chains the two.

Expectation

A data quality rule applied by a data_quality transform action. Expectations are loaded from an expectations_file and translated into Lakeflow @dlt.expect-family decorators. Combined with quarantine mode they support DLQ recycling.

FlowGroup

A logical slice of a Pipeline — typically one source table or business entity — composed of an ordered list of Actions. One YAML file can hold one or many FlowGroups and declares its parent Pipeline, its own name, optional job_name, optional local variables, applied presets, and an optional template reference.

FlowgroupProcessor

The internal service in core/services/flowgroup_processor.py that runs each FlowGroup through the substitution layer cake, preset merge, template expansion, and Pydantic validation. Referenced here because error messages and several documentation pages cite it by name.

Lakeflow Declarative Pipeline

The current canonical name for the Databricks framework that LHP generates Python for. LHP code uses the pyspark.pipelines API. This framework was previously called Delta Live Tables (DLT) and Spark Declarative Pipelines (SDP). All three names refer to the same runtime across LHP history.

Load action

An Action with type: load that reads external data into a temporary view. Sub-types: cloudfiles, delta, sql, python, jdbc, kafka, custom_datasource. One Load action per data source.

Local variable

A FlowGroup-scoped value declared under variables: and referenced as %{name}. Local variables resolve first in the substitution order, so their output can contain template parameters, environment tokens, or secret references that later layers resolve.

Materialized view

A Lakeflow Declarative Pipelines target type for full-refresh datasets defined by a SQL query. Generated by Write actions with write_target.type: materialized_view. Supports optional refresh_schedule and inline sql or external sql_path.

Operational metadata

Auto-injected columns (timestamps, source file paths, pipeline run IDs, and similar) added to Write targets. Defined under operational_metadata in lhp.yaml as named columns and presets, then enabled per FlowGroup or Action. See Operational Metadata.

Pipeline

A logical grouping label declared by every FlowGroup as pipeline: <name>. All FlowGroups sharing that name generate Python files into the same output folder and produce one Databricks Lakeflow Declarative Pipeline resource per Pipeline name. A Pipeline is the deployment unit for Asset Bundles. See Architecture.

Preset

A YAML file of default values deep-merged into Actions matched by type. Presets resolve before substitutions and before validation; explicit FlowGroup config wins over preset defaults. Presets may extend other presets via extends:. See Presets Reference.

Quarantine

A data quality mode (mode: quarantine) that splits a stream into passing rows and failing rows, writing failures to a configured dlq_table for replay rather than failing the pipeline. Also called DLQ recycling. See Quarantine (Dead Letter Queue).

SCD

Slowly Changing Dimension — a dimensional modelling pattern for tracking historical attribute changes. LHP supports SCD Type 1 (overwrite) and Type 2 (history rows with effective dates) through streaming table CDC configuration.

Schema hints

A semicolon-separated DDL string passed to Auto Loader as the cloudFiles.schemaHints option, pinning column types during schema inference. LHP can load schema hints from an external file referenced by path; the schema transform action also generates schema hints from structured YAML.

SDP

Spark Declarative Pipelines — a transitional name for Lakeflow Declarative Pipelines used between the DLT and Lakeflow renames. LHP internal modules and generated imports reflect this lineage (pyspark.pipelines).

Sink

A Write action with write_target.type: sink that emits data to an external system rather than a Unity Catalog table. Supported sink types are delta, kafka, custom, and foreachbatch.

Skill

The LHP Claude Code skill installed by lhp skill install. The skill package ships inside LHP and is copied to .claude/skills/lhp/ (or ~/.claude/skills/lhp/ with --user); it provides agent-targeted context for authoring LHP configurations.

Snapshot CDC

A streaming table mode where LHP invokes a user-supplied source_function to produce point-in-time snapshots, then applies them via Lakeflow’s AUTO CDC FROM SNAPSHOT API. Configured via write_target.mode: snapshot_cdc and snapshot_cdc_config.

Streaming table

A Lakeflow Declarative Pipelines target type for incremental, append-only or CDC datasets. Generated by Write actions with write_target.type: streaming_table. Supports standard, CDC, and snapshot CDC modes plus append flows from multiple sources.

Substitution

Token replacement in YAML resolved by LHP before code generation. Four syntaxes resolve in a fixed order: %{local_var} then {{ template_param }} then ${env_token} then ${secret:scope/key}. The order is enforced inside FlowgroupProcessor so each layer’s output can feed the next. See Substitutions & Secrets.

Template

A YAML file of parametrised actions applied to a FlowGroup via use_template: and template_parameters:. LHP renders Jinja2 {{ }} placeholders inside the template and appends the rendered actions to the FlowGroup. See Templates Reference.

Test action

An Action with type: test that asserts a property of the data — row count, uniqueness, referential integrity, completeness, range, schema match, lookup coverage, custom SQL, or custom expectations. Test actions only run when lhp generate is invoked with --include-tests.

Transform action

An Action with type: transform that reshapes or checks data already loaded into a view. Sub-types: sql, python, data_quality, schema, temp_table. Zero or many per FlowGroup.

Write action

An Action with type: write that persists the final dataset. Sub-types: streaming_table, materialized_view, sink. One Write action per output table or sink.

See also

  • Architecture — explanation of how these terms compose.

  • Overview — task-oriented how-to landing page.