Lakehouse Plumber

Managing dozens of Lakeflow/DLT pipelines means thousands of lines of repetitive Python — inconsistent patterns, boilerplate sprawl, and painful maintenance across environments.

Lakehouse Plumber turns concise YAML actions into fully-featured Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables) — without hiding the Databricks platform you already know and love.

How LHP Solves It

  • Eliminates boilerplate — a template + 5-line config replaces 86 lines of Python per table.

  • Zero runtime overhead — pure code generation, not a runtime framework.

  • Transparent output — readable Python files, version-controlled and debuggable in the Databricks IDE.

  • Fits DataOps workflows — CI/CD, automated testing, multi-environment substitutions.

  • No lock-in — the output is plain Python & SQL you own and control.

  • Data democratization — power users create artifacts within platform standards.

Real-World Example

Instead of repeating 86 lines of Python per table, write a 5-line configuration:

customer_ingestion.yaml (5 lines per table)
pipeline: raw_ingestions
flowgroup: customer_ingestion

use_template: csv_ingestion_template
template_parameters:
  table_name: customer
  landing_folder: customer

Result: 4,300 lines of repetitive Python → 250 lines total (1 template + 50 simple configs). See Quickstart for the full template and generated output.

Quick Start

Get started in minutes:

pip install lakehouse-plumber
lhp init my_project
cd my_project

# Edit your YAML flowgroups (IntelliSense auto-configured)
lhp validate --env dev
lhp generate --env dev

# Inspect the generated/ directory — readable Python ready for Databricks

Note

New to LHP? Follow the Quickstart to build your first pipeline in 10 minutes.

Core Workflow

The execution model is deliberately simple:

        graph LR
    A[Load] --> B{0..N Transform}
    B --> C[Write]
    
  1. Load Ingest raw data from CloudFiles, Delta, JDBC, SQL, or custom Python.

  2. Transform Apply zero or many transforms (SQL, Python, schema, data-quality, temp-tables…).

  3. Write Persist results as Streaming Tables, Materialized Views, or Snapshots.

Where to next

The sidebar groups documentation by purpose, following the Diátaxis framework:

  • Get Started — install LHP, set up your editor, and ship your first pipeline.

  • How-to — task-shaped recipes for common data-engineering problems.

  • Explanation — the why behind LHP’s design and patterns.

  • Reference — exhaustive lookup tables for CLI flags, YAML keys, error codes, and the public API.

If you have a specific problem, jump straight to Overview. If you want to understand the execution model first, read Architecture.