Package Pipelines as Wheels¶
This how-to switches a Lakehouse Plumber (LHP) pipeline from emitting loose
.py files to emitting a single deterministic Python wheel. Wheel packaging
collapses a pipeline’s synced workspace footprint to one small runner file plus
one uploaded artifact, which reduces databricks bundle deploy upload time and
the file-write slice of lhp generate on large projects. Packaging is opt-in
per pipeline; the generated pipeline logic is byte-identical to source mode.
Added in version 0.9.0.
When to use wheel packaging¶
Wheel packaging is aimed at large projects. As a rule of thumb, reach for it
once a project generates on the order of 5,000 or more Python files: at that
scale the per-file write phase of lhp generate and the file-by-file upload of
databricks bundle deploy both grow long, and collapsing each pipeline to one
uploaded artifact plus a one-line runner is where that time is recovered. Below
that scale the saving is marginal and source mode is simpler to operate.
LHP recommends keeping dev and test in source mode, even when production runs
on wheels. Source mode writes the generated logic as clear-text .py files you
can open, read, and edit in place in the workspace for a quick iteration — a
workflow that is central to developing with LHP and that a wheel, which must be
rebuilt and reinstalled to change, cannot offer. Because the packaging toggle is
per environment (see Turn on wheel packaging), a pipeline can be source in
dev and wheel in prod with no change to its logic. To read the code a
wheel-mode build actually produced, lhp inspect-wheel lists or extracts its
modules (see Inspect a built wheel) — read-only inspection, not the in-place
edit loop source mode gives you.
Requirements¶
An LHP project with Declarative Automation Bundles enabled (a
databricks.ymlat the project root). See Configure Bundles.A Unity Catalog volume your deploy identity can write to. Serverless compute installs custom wheels only from a Unity Catalog volume or PyPI, so the artifact destination must be a
/Volumes/...path.In wheel mode each flowgroup is packed under a sanitized import package, so a top-level Python
importof one flowgroup module from another will not resolve. Relate flowgroups through the Lakeflow Spark Declarative Pipelines (SDP) dataset graph, not Python imports.
Set the artifact volume¶
Declare the Unity Catalog volume that built wheels upload to in a top-level
wheel block in lhp.yaml. The value is substitution-token-aware: LHP
resolves ${...} tokens from substitutions/<env>.yaml per environment,
exactly as it resolves any other per-environment value, so one lhp.yaml entry
serves every environment.
wheel:
artifact_volume: "/Volumes/${catalog}/${artifact_schema}/bundle_artifacts"
Define the tokens per environment so the resolved path is environment-specific:
prod:
catalog: prod_catalog
artifact_schema: lhp_artifacts
The resolved value must start with /Volumes/. If a pipeline is set to
wheel mode and wheel.artifact_volume is missing, empty, or resolves to a
non-/Volumes/ path, lhp generate fails with LHP-CFG-061. A malformed
wheel block — for example artifact_volume set to a non-string — fails with
LHP-CFG-060.
Turn on wheel packaging¶
Set packaging: wheel for a pipeline. The toggle lives in
pipeline_config-<env>.yaml and resolves with this precedence, lowest to
highest:
the built-in default,
source;project_defaults.packaging— the environment-wide default;packaginginside a per-pipeline block — wins for that pipeline.
Because the toggle lives in the per-environment pipeline config, a pipeline can be
source in one environment and wheel in another. The only accepted values
are source and wheel; any other value fails with LHP-VAL-062.
To package every pipeline in an environment as a wheel, set the default and opt individual pipelines back out:
project_defaults:
catalog: "${catalog}"
schema: "${raw_schema}"
packaging: wheel # environment-wide default
---
pipeline: interactive_debug_pipeline
packaging: source # opt this one pipeline back out
---
pipeline:
- large_ingest_a
- large_ingest_b # inherit project_defaults.packaging: wheel
To package a single pipeline and leave the rest as source, omit the default and
set packaging only on that pipeline:
---
pipeline: large_ingest_a
packaging: wheel
Generate and deploy¶
Run lhp generate for the target environment. Source-mode and wheel-mode
pipelines mix freely in one run; LHP builds each wheel once, in process.
lhp generate --env prod --pipeline-config config/pipeline_config-prod.yaml
databricks bundle validate --target prod
databricks bundle deploy --target prod
On deploy, Declarative Automation Bundles uploads each pre-built wheel to
<artifact_path>/.internal/ under the artifact volume and rewrites the
pipeline’s environment dependency to that uploaded location; it does not trigger a
second Python build.
What LHP generates¶
For a wheel-mode pipeline, lhp generate writes a runner and a wheel, and wires
the bundle:
generated/<env>/<pipeline>/<import_pkg>_runner.pyThe only file synced to the workspace. The runner is the single source file SDP loads; it publishes the ambient
sparkanddbutilsglobals intobuiltinsand then imports the wheel’s flowgroup modules. Its content references only the import-package name, so it is stable across content changes.generated/<env>/_wheels/<pipeline>/dist/lhp_<pipeline>_<env>_<hash>-<lhp-version>-py3-none-any.whlThe built wheel; its filename encodes the pipeline’s identity and content (see How wheels are named and versioned). It is gitignored and excluded from the bundle file sync, and reaches the workspace only when Declarative Automation Bundles uploads it on deploy. It contains the pipeline’s flowgroup package, the
custom_python_functionspackage as a top-level import root, and any generated test or quality and reporting modules.resources/lhp/<pipeline>.pipeline.ymlThe pipeline resource. LHP appends a file-relative local path to the prebuilt wheel as the last entry under
environment.dependencies, preserving any dependencies you declared. The reference is local (../../generated/<env>/_wheels/<pipeline>/dist/...whl), not a/Volumes/...path: Declarative Automation Bundles classifies it as a prebuilt wheel, uploads it to<artifact_path>/.internal/on deploy, and rewrites the reference to the uploaded location itself. LHP does not compose the remote path. This file never contains thepackagingkey — the toggle is consumed by LHP, not by Databricks.resources/lhp/_wheels.bundle.ymlAn LHP-owned bundle file, regenerated on every run. It sets
targets.<env>.workspace.artifact_pathto the resolved artifact volume and adds async.excludefor the_wheels/staging directory. It deliberately carries noartifactsblock: because each wheel is a prebuilt local dependency reference that Declarative Automation Bundles uploads and rewrites, declaring a wheel artifact with no source files would make the deploy try to rebuild the already-built wheel. LHP does not edit yourdatabricks.yml. Because Declarative Automation Bundles resolvesartifact_pathitself at upload time, per-user development-mode isolation works without any further configuration.
How wheels are named and versioned¶
The wheel is deterministic and content-addressed, and its filename encodes the pipeline’s identity — so reading the filename tells you what changed between builds:
lhp_<pipeline>_<env>_<hash>-<lhp-version>-py3-none-any.whl
lhp_A fixed brand prefix that marks the distribution as LHP-generated. It appears in the distribution name only; the package imported inside the wheel keeps a separate, sanitized name derived from the pipeline name.
<pipeline>and<env>The pipeline name and target environment, so the same pipeline produces a distinct wheel per environment.
<hash>A content hash: the first 12 hexadecimal characters of a SHA-256 digest taken over the packaged
.pymembers — each member’s path and its exact bytes. It covers generated code only. An unchanged pipeline therefore hashes to the same value and rebuilds a byte-identical wheel with an identical filename, so its deploy is a no-op; any change to the generated logic changes the hash, and with it the filename.<lhp-version>The version of LHP that built the wheel (for example
0.9.0), normalized to a PEP 440 public version. The wheel has no version of its own — it is stamped with the tool version for traceability. One consequence, an LHP upgrade re-stamping every wheel, is covered under Limitations below.
The same commit regenerates an identical wheel on any machine, so rollback is a
redeploy of a prior commit. lhp validate and dependency analysis operate on
YAML and are unaffected by the packaging mode.
Inspect a built wheel¶
Wheel mode leaves no loose .py files on disk, so to see what code a build
packaged, use lhp inspect-wheel. It reads a built wheel and either lists its
Python modules (the default) or extracts them with --extract; it never
modifies the wheel.
Identify the wheel by pipeline name — which resolves the content-hashed
filename for you and so requires --env — or by an explicit path to the
.whl:
# List the .py modules in a pipeline's built wheel (resolves the hashed name)
lhp inspect-wheel large_ingest_a --env prod
# List by explicit path (the shell glob expands to the single wheel)
lhp inspect-wheel generated/prod/_wheels/large_ingest_a/dist/*.whl
The listing has one row per packaged module — its in-wheel path and uncompressed
size — covering the flowgroup package, the custom_python_functions import
root, and any generated test or quality modules. The .dist-info metadata is
not listed.
To read or diff the generated code in an editor, extract the modules with
--extract. LHP writes only the .py members, preserves their in-wheel
directory structure, and creates the target directory if it is missing:
lhp inspect-wheel large_ingest_a --env prod --extract /tmp/large_ingest_a
Extraction is for inspection only: editing the extracted files does not change
the deployed wheel, which is rebuilt from your YAML on the next lhp generate.
A selector that ends in .whl or contains a path separator is read as a path;
anything else is a pipeline name (--env is ignored in path mode). The command
exits non-zero when the wheel is missing (LHP-IO-022), is not
a wheel file (LHP-IO-023), or is corrupt
(LHP-IO-024); when a pipeline name resolves to zero or several
wheels (LHP-GEN-001); or when a pipeline name is given without --env (a
usage error). See CLI Reference for the full command syntax.
Limitations¶
Wheel packaging in this release has the following boundaries.
The LHP tool version is the wheel version. Every wheel’s metadata
Versionis the LHP tool version that built it. Upgrading LHP re-stamps every wheel filename, which triggers a one-time reinstall of all wheel-mode pipelines on the next deploy even where the pipeline logic is unchanged. This is intended for traceability.Code that reads files relative to
__file__behaves differently. Inside a wheel,__file__resolves into the installed package location, notgenerated/, and only.pyfiles are packaged. A flowgroup or custom helper that loads a sidecar data file (CSV, JSON, SQL) by a__file__-relative path, or that assumes a loose on-disk layout, does not find it in wheel mode. Avoid__file__-relative reads, or keep such a pipeline in source mode.Monitoring pipelines are not wheeled. The centralized event-log monitoring pipeline is always emitted in source mode regardless of the packaging toggle (internal tracking: WHEEL-MONITORING-DEFER). See Enable Monitoring.
Note
The mechanisms behind two of these behaviors are now implemented, with only
live-workspace verification still pending: the bundle honoring the
sync.exclude for the wheels directory (internal tracking:
WHEEL-RUNTIME-O3 — LHP emits the sync.exclude), and deploy uploading the
pre-built wheel without rebuilding (WHEEL-RUNTIME-O4 — the wheel ships as a
prebuilt local dependency reference with no artifacts block, so the deploy
uploads it rather than rebuilding it). Two further behaviors are confirmed
against the design but not yet exercised on a live workspace
(WHEEL-RUNTIME-O2 and WHEEL-RUNTIME-O5): wheel install on classic compute, and
serverless reusing a cached environment when a pipeline’s dependencies are
unchanged. Verify all of these against your own workspace before relying on them
in production.