Requirements

Prerequisites to install and use Lakehouse Plumber (LHP). Facts here mirror pyproject.toml for LHP 0.8.6.

System requirements

Python

LHP requires Python 3.11 or later. Tested against:

Python version

Status

3.11

Supported

3.12

Recommended

3.13

Supported

pip install lakehouse-plumber on Python 3.10 or earlier fails the requires-python check.

Operating system

LHP runs on Linux, macOS, and Windows. CI exercises Linux and macOS on Python 3.12, plus import smoke tests across Python 3.11, 3.12, and 3.13 on all three platforms.

Runtime dependencies

pip install lakehouse-plumber pulls in the libraries pinned in pyproject.toml:

  • click >= 8.3.0 — CLI framework.

  • pyyaml >= 6.0.3, ruamel.yaml >= 0.19.0 — YAML parsing.

  • jinja2 >= 3.0.0 — template rendering.

  • pydantic >= 2.12.0 — configuration validation.

  • jsonschema >= 4.26.0 — schema validation.

  • rich >= 14.0.0 — formatted CLI output.

  • networkx >= 3.6.0 — dependency graph for lhp deps.

  • packaging >= 23.2 — version parsing for required_lhp_version.

  • black == 26.3.1 — formats generated Python before write.

Pinning required_lhp_version in lhp.yaml (see How to Set Up CI/CD for an LHP Project) locks this set transitively.

Databricks workspace requirements

LHP generates code, not infrastructure. The target workspace needs:

  • Unity Catalog enabled. Generated pipelines write three-part names (catalog.schema.table). Hive metastore-only workspaces are unsupported.

  • Write access to one catalog and schema. Their names populate substitutions/<env>.yaml as ${catalog} and ${bronze_schema}.

  • Lakeflow Declarative Pipeline available. Serverless is the default for bundles produced by lhp init; set serverless: false in pipeline_config.yaml for classic compute.

LHP does not pin a Databricks Runtime version. The generated from pyspark import pipelines as dp import requires whatever Lakeflow Declarative Pipelines version Databricks ships with the pipelines module — match your workspace defaults.

Some sovereign-cloud workspaces (GovCloud, China) do not provision the samples catalog used by Quickstart. Use your own landing volume instead.

Editor requirements

LHP ships JSON schemas for every YAML file it consumes. Editor support is optional but recommended — it surfaces invalid YAML before lhp validate runs.

Editor

Status

VS Code

Supported; lhp init writes .vscode/settings.json.

Cursor

Supported (VS Code-compatible).

JetBrains IDEs

Manual setup; map YAML schemas to schemas/.

Other editors

Any editor with JSON Schema support over YAML works.

For VS Code and Cursor, install the YAML extension by Red Hat. lhp init generates .vscode/settings.json and .vscode/schemas/ — IntelliSense, autocomplete, and hover docs work without further setup. See Editor setup for manual configuration.

Optional dependencies

Asset Bundle deployment

Deploying generated bundles requires the Databricks CLI on the machine running databricks bundle deploy:

  • CLI version 0.205 or later (Asset Bundle support).

  • Authentication via OAuth, a personal access token, or a service principal. The databricks/setup-cli GitHub Action installs the CLI in CI runners.

LHP does not invoke the Databricks CLI itself — lhp generate produces bundle YAML; you call databricks bundle deploy separately. See Configure Bundles.

CI/CD

For pipelined deployment from a runner:

  • A Databricks service principal with deploy permissions on each target workspace.

  • OIDC trust (recommended) or a stored client secret.

  • Python 3.11+ in the runner image. Examples in How to Set Up CI/CD for an LHP Project use Python 3.12 on ubuntu-latest.

Development

To develop LHP itself, install the dev extra (pip install -e ".[dev]"). Pulls in pytest, pytest-cov, pytest-mock, flake8, isort, mypy, pre-commit, plus security tooling (pip-audit, bandit, liccheck).

See also