Requirements¶
Prerequisites to install and use Lakehouse Plumber (LHP). Facts here mirror
pyproject.toml for LHP 0.8.6.
System requirements¶
Python¶
LHP requires Python 3.11 or later. Tested against:
Python version |
Status |
|---|---|
3.11 |
Supported |
3.12 |
Recommended |
3.13 |
Supported |
pip install lakehouse-plumber on Python 3.10 or earlier fails the
requires-python check.
Operating system¶
LHP runs on Linux, macOS, and Windows. CI exercises Linux and macOS on Python 3.12, plus import smoke tests across Python 3.11, 3.12, and 3.13 on all three platforms.
Runtime dependencies¶
pip install lakehouse-plumber pulls in the libraries pinned in
pyproject.toml:
click >= 8.3.0— CLI framework.pyyaml >= 6.0.3,ruamel.yaml >= 0.19.0— YAML parsing.jinja2 >= 3.0.0— template rendering.pydantic >= 2.12.0— configuration validation.jsonschema >= 4.26.0— schema validation.rich >= 14.0.0— formatted CLI output.networkx >= 3.6.0— dependency graph forlhp deps.packaging >= 23.2— version parsing forrequired_lhp_version.black == 26.3.1— formats generated Python before write.
Pinning required_lhp_version in lhp.yaml (see How to Set Up CI/CD for an LHP Project) locks this
set transitively.
Databricks workspace requirements¶
LHP generates code, not infrastructure. The target workspace needs:
Unity Catalog enabled. Generated pipelines write three-part names (
catalog.schema.table). Hive metastore-only workspaces are unsupported.Write access to one catalog and schema. Their names populate
substitutions/<env>.yamlas${catalog}and${bronze_schema}.Lakeflow Declarative Pipeline available. Serverless is the default for bundles produced by
lhp init; setserverless: falseinpipeline_config.yamlfor classic compute.
LHP does not pin a Databricks Runtime version. The generated
from pyspark import pipelines as dp import requires whatever Lakeflow
Declarative Pipelines version Databricks ships with the pipelines module
— match your workspace defaults.
Some sovereign-cloud workspaces (GovCloud, China) do not provision the
samples catalog used by Quickstart. Use your own landing volume
instead.
Editor requirements¶
LHP ships JSON schemas for every YAML file it consumes. Editor support is
optional but recommended — it surfaces invalid YAML before lhp validate
runs.
Editor |
Status |
|---|---|
VS Code |
Supported; |
Cursor |
Supported (VS Code-compatible). |
JetBrains IDEs |
Manual setup; map YAML schemas to |
Other editors |
Any editor with JSON Schema support over YAML works. |
For VS Code and Cursor, install the YAML extension by Red Hat. lhp init
generates .vscode/settings.json and .vscode/schemas/ — IntelliSense,
autocomplete, and hover docs work without further setup. See
Editor setup for manual configuration.
Optional dependencies¶
Asset Bundle deployment¶
Deploying generated bundles requires the Databricks CLI on the machine
running databricks bundle deploy:
CLI version 0.205 or later (Asset Bundle support).
Authentication via OAuth, a personal access token, or a service principal. The
databricks/setup-cliGitHub Action installs the CLI in CI runners.
LHP does not invoke the Databricks CLI itself — lhp generate produces
bundle YAML; you call databricks bundle deploy separately. See
Configure Bundles.
CI/CD¶
For pipelined deployment from a runner:
A Databricks service principal with deploy permissions on each target workspace.
OIDC trust (recommended) or a stored client secret.
Python 3.11+ in the runner image. Examples in How to Set Up CI/CD for an LHP Project use Python 3.12 on
ubuntu-latest.
Development¶
To develop LHP itself, install the dev extra
(pip install -e ".[dev]"). Pulls in pytest, pytest-cov,
pytest-mock, flake8, isort, mypy, pre-commit, plus
security tooling (pip-audit, bandit, liccheck).
See also¶
Quickstart — Build a first pipeline against a workspace that meets these requirements.
Configure Bundles — Enable Asset Bundle integration.
Editor setup — Configure VS Code, Cursor, or JetBrains IDEs for YAML IntelliSense.
How to Set Up CI/CD for an LHP Project — Promote bundles across
dev,uat, andprod.