Glossary¶
Definitions of Lakehouse Plumber (LHP) terms used throughout this documentation.
Each entry is the canonical wording; other pages link here with :term: roles.
- Action¶
A single step inside a FlowGroup. Actions have four top-level
typevalues (load,transform,write,test), each with sub-types selected by an additional discriminator field. The full catalogue is in Actions Reference.- Append flow¶
A Lakeflow Declarative Pipelines construct that appends rows from a source into a streaming table. LHP emits one append flow per source when a Write action lists multiple sources, and one combined flow when a single source feeds the target.
- Blueprint¶
A higher-order template that instantiates multiple FlowGroups at once. Where a template parameterises actions inside one FlowGroup, a blueprint parameterises the FlowGroups themselves. Blueprints declare
parametersand aflowgroupsarray of FlowGroup specs. See Blueprints.- Blueprint instance¶
A YAML file that supplies parameter values to a Blueprint via
use_blueprint:plus a nestedparameters:block. The legacyblueprint:plus flat top-level keys form is deprecated and removed in V0.9; mixing the two raisesLHP-CFG-061.- CDC¶
Change Data Capture — the practice of replicating row-level inserts, updates, and deletes from a source system into a target table. LHP exposes CDC through streaming table Write actions with
mode: cdcormode: snapshot_cdc.- DAB¶
Databricks Asset Bundle — the deployment format LHP targets. When a project is initialised with
lhp init --bundle, generation also emits resource YAML underresources/lhp/sodatabricks bundle deploycan deploy the pipelines.- DLT¶
Delta Live Tables — the historical name for what is now called Lakeflow Declarative Pipelines. The term survives in older documentation, generated Python comments, and migration notes; new content uses Lakeflow Declarative Pipelines.
- Environment token¶
A substitution of the form
${name}resolved againstsubstitutions/<env>.yaml. The same FlowGroup YAML generates different Python files per environment because environment tokens differ betweendev,staging, andprod. The bare{name}form is deprecated.- Event log monitoring¶
A project-level feature that generates two artefacts from
monitoring:inlhp.yaml: a notebook that unions pipeline event logs into a Delta table, and a Lakeflow pipeline of materialized views reading that table. A Databricks job chains the two.- Expectation¶
A data quality rule applied by a
data_qualitytransform action. Expectations are loaded from anexpectations_fileand translated into Lakeflow@dlt.expect-family decorators. Combined with quarantine mode they support DLQ recycling.- FlowGroup¶
A logical slice of a Pipeline — typically one source table or business entity — composed of an ordered list of Actions. One YAML file can hold one or many FlowGroups and declares its parent Pipeline, its own name, optional
job_name, optional localvariables, applied presets, and an optional template reference.- FlowgroupProcessor¶
The internal service in
core/services/flowgroup_processor.pythat runs each FlowGroup through the substitution layer cake, preset merge, template expansion, and Pydantic validation. Referenced here because error messages and several documentation pages cite it by name.- Lakeflow Declarative Pipeline¶
The current canonical name for the Databricks framework that LHP generates Python for. LHP code uses the
pyspark.pipelinesAPI. This framework was previously called Delta Live Tables (DLT) and Spark Declarative Pipelines (SDP). All three names refer to the same runtime across LHP history.- Load action¶
An Action with
type: loadthat reads external data into a temporary view. Sub-types:cloudfiles,delta,sql,python,jdbc,kafka,custom_datasource. One Load action per data source.- Local variable¶
A FlowGroup-scoped value declared under
variables:and referenced as%{name}. Local variables resolve first in the substitution order, so their output can contain template parameters, environment tokens, or secret references that later layers resolve.- Materialized view¶
A Lakeflow Declarative Pipelines target type for full-refresh datasets defined by a SQL query. Generated by Write actions with
write_target.type: materialized_view. Supports optionalrefresh_scheduleand inlinesqlor externalsql_path.- Operational metadata¶
Auto-injected columns (timestamps, source file paths, pipeline run IDs, and similar) added to Write targets. Defined under
operational_metadatainlhp.yamlas named columns and presets, then enabled per FlowGroup or Action. See Operational Metadata.- Pipeline¶
A logical grouping label declared by every FlowGroup as
pipeline: <name>. All FlowGroups sharing that name generate Python files into the same output folder and produce one Databricks Lakeflow Declarative Pipeline resource per Pipeline name. A Pipeline is the deployment unit for Asset Bundles. See Architecture.- Preset¶
A YAML file of default values deep-merged into Actions matched by type. Presets resolve before substitutions and before validation; explicit FlowGroup config wins over preset defaults. Presets may extend other presets via
extends:. See Presets Reference.- Quarantine¶
A data quality mode (
mode: quarantine) that splits a stream into passing rows and failing rows, writing failures to a configureddlq_tablefor replay rather than failing the pipeline. Also called DLQ recycling. See Quarantine (Dead Letter Queue).- SCD¶
Slowly Changing Dimension — a dimensional modelling pattern for tracking historical attribute changes. LHP supports SCD Type 1 (overwrite) and Type 2 (history rows with effective dates) through streaming table CDC configuration.
- Schema hints¶
A semicolon-separated DDL string passed to Auto Loader as the
cloudFiles.schemaHintsoption, pinning column types during schema inference. LHP can load schema hints from an external file referenced by path; theschematransform action also generates schema hints from structured YAML.- SDP¶
Spark Declarative Pipelines — a transitional name for Lakeflow Declarative Pipelines used between the DLT and Lakeflow renames. LHP internal modules and generated imports reflect this lineage (
pyspark.pipelines).- Sink¶
A Write action with
write_target.type: sinkthat emits data to an external system rather than a Unity Catalog table. Supported sink types aredelta,kafka,custom, andforeachbatch.- Skill¶
The LHP Claude Code skill installed by
lhp skill install. The skill package ships inside LHP and is copied to.claude/skills/lhp/(or~/.claude/skills/lhp/with--user); it provides agent-targeted context for authoring LHP configurations.- Snapshot CDC¶
A streaming table mode where LHP invokes a user-supplied
source_functionto produce point-in-time snapshots, then applies them via Lakeflow’sAUTO CDC FROM SNAPSHOTAPI. Configured viawrite_target.mode: snapshot_cdcandsnapshot_cdc_config.- Streaming table¶
A Lakeflow Declarative Pipelines target type for incremental, append-only or CDC datasets. Generated by Write actions with
write_target.type: streaming_table. Supports standard, CDC, and snapshot CDC modes plus append flows from multiple sources.- Substitution¶
Token replacement in YAML resolved by LHP before code generation. Four syntaxes resolve in a fixed order:
%{local_var}then{{ template_param }}then${env_token}then${secret:scope/key}. The order is enforced insideFlowgroupProcessorso each layer’s output can feed the next. See Substitutions & Secrets.- Template¶
A YAML file of parametrised actions applied to a FlowGroup via
use_template:andtemplate_parameters:. LHP renders Jinja2{{ }}placeholders inside the template and appends the rendered actions to the FlowGroup. See Templates Reference.- Test action¶
An Action with
type: testthat asserts a property of the data — row count, uniqueness, referential integrity, completeness, range, schema match, lookup coverage, custom SQL, or custom expectations. Test actions only run whenlhp generateis invoked with--include-tests.- Transform action¶
An Action with
type: transformthat reshapes or checks data already loaded into a view. Sub-types:sql,python,data_quality,schema,temp_table. Zero or many per FlowGroup.- Write action¶
An Action with
type: writethat persists the final dataset. Sub-types:streaming_table,materialized_view,sink. One Write action per output table or sink.
See also¶
Architecture — explanation of how these terms compose.
Overview — task-oriented how-to landing page.