Requirements ============ .. meta:: :description: System, workspace, and editor requirements for Lakehouse Plumber (LHP), plus optional dependencies for Asset Bundle deployment and CI/CD. Prerequisites to install and use Lakehouse Plumber (LHP). Facts here mirror ``pyproject.toml`` for LHP ``0.8.6``. System requirements ------------------- Python ~~~~~~ LHP requires **Python 3.11 or later**. Tested against: ============== =========== Python version Status ============== =========== 3.11 Supported 3.12 Recommended 3.13 Supported ============== =========== ``pip install lakehouse-plumber`` on Python 3.10 or earlier fails the ``requires-python`` check. Operating system ~~~~~~~~~~~~~~~~ LHP runs on Linux, macOS, and Windows. CI exercises Linux and macOS on Python 3.12, plus import smoke tests across Python 3.11, 3.12, and 3.13 on all three platforms. Runtime dependencies ~~~~~~~~~~~~~~~~~~~~ ``pip install lakehouse-plumber`` pulls in the libraries pinned in ``pyproject.toml``: * ``click >= 8.3.0`` — CLI framework. * ``pyyaml >= 6.0.3``, ``ruamel.yaml >= 0.19.0`` — YAML parsing. * ``jinja2 >= 3.0.0`` — template rendering. * ``pydantic >= 2.12.0`` — configuration validation. * ``jsonschema >= 4.26.0`` — schema validation. * ``rich >= 14.0.0`` — formatted CLI output. * ``networkx >= 3.6.0`` — dependency graph for ``lhp deps``. * ``packaging >= 23.2`` — version parsing for ``required_lhp_version``. * ``black == 26.3.1`` — formats generated Python before write. Pinning ``required_lhp_version`` in ``lhp.yaml`` (see :doc:`cicd`) locks this set transitively. Databricks workspace requirements --------------------------------- LHP generates code, not infrastructure. The target workspace needs: * **Unity Catalog enabled.** Generated pipelines write three-part names (``catalog.schema.table``). Hive metastore-only workspaces are unsupported. * **Write access to one catalog and schema.** Their names populate ``substitutions/.yaml`` as ``${catalog}`` and ``${bronze_schema}``. * :term:`Lakeflow Declarative Pipeline` **available.** Serverless is the default for bundles produced by ``lhp init``; set ``serverless: false`` in ``pipeline_config.yaml`` for classic compute. LHP does not pin a Databricks Runtime version. The generated ``from pyspark import pipelines as dp`` import requires whatever Lakeflow Declarative Pipelines version Databricks ships with the ``pipelines`` module — match your workspace defaults. Some sovereign-cloud workspaces (GovCloud, China) do not provision the ``samples`` catalog used by :doc:`quickstart`. Use your own landing volume instead. Editor requirements ------------------- LHP ships JSON schemas for every YAML file it consumes. Editor support is optional but recommended — it surfaces invalid YAML before ``lhp validate`` runs. ================ ========================================================== Editor Status ================ ========================================================== VS Code Supported; ``lhp init`` writes ``.vscode/settings.json``. Cursor Supported (VS Code-compatible). JetBrains IDEs Manual setup; map YAML schemas to ``schemas/``. Other editors Any editor with JSON Schema support over YAML works. ================ ========================================================== For VS Code and Cursor, install the **YAML** extension by Red Hat. ``lhp init`` generates ``.vscode/settings.json`` and ``.vscode/schemas/`` — IntelliSense, autocomplete, and hover docs work without further setup. See :doc:`editor_setup` for manual configuration. Optional dependencies --------------------- Asset Bundle deployment ~~~~~~~~~~~~~~~~~~~~~~~ Deploying generated bundles requires the **Databricks CLI** on the machine running ``databricks bundle deploy``: * CLI version 0.205 or later (Asset Bundle support). * Authentication via OAuth, a personal access token, or a service principal. The ``databricks/setup-cli`` GitHub Action installs the CLI in CI runners. LHP does not invoke the Databricks CLI itself — ``lhp generate`` produces bundle YAML; you call ``databricks bundle deploy`` separately. See :doc:`configure_bundles`. CI/CD ~~~~~ For pipelined deployment from a runner: * A Databricks service principal with deploy permissions on each target workspace. * OIDC trust (recommended) or a stored client secret. * Python 3.11+ in the runner image. Examples in :doc:`cicd` use Python 3.12 on ``ubuntu-latest``. Development ~~~~~~~~~~~ To develop LHP itself, install the ``dev`` extra (``pip install -e ".[dev]"``). Pulls in ``pytest``, ``pytest-cov``, ``pytest-mock``, ``flake8``, ``isort``, ``mypy``, ``pre-commit``, plus security tooling (``pip-audit``, ``bandit``, ``liccheck``). See also -------- * :doc:`quickstart` — Build a first pipeline against a workspace that meets these requirements. * :doc:`configure_bundles` — Enable Asset Bundle integration. * :doc:`editor_setup` — Configure VS Code, Cursor, or JetBrains IDEs for YAML IntelliSense. * :doc:`cicd` — Promote bundles across ``dev``, ``uat``, and ``prod``.