Environments¶
For the step-by-step procedure for writing substitution files, see Substitutions & Secrets. This page explains why LHP separates environment concerns the way it does and what that buys you.
Why environment-agnostic configs matter¶
LHP’s central environment-promotion claim is that the same YAML configs deploy to dev, staging, and prod. Only the substitution file changes. The generated Python is environment-specific because the substituted values differ, but the YAML source you commit is identical.
The reason this matters is that environment drift is the most common shape of production bug. A pipeline works in dev because the dev catalog has a slightly different schema, or the dev table happens to have a column that prod does not, or a flag was flipped manually in prod six months ago. When the YAML is identical across environments, those drifts can only come from a substitution token — and the substitution file is a small, reviewable YAML document.
The other shape of environment bug is the reverse: a change works in
prod and fails in dev. With environment-agnostic configs, you can run
lhp generate --env dev and lhp generate --env prod side by side
and diff the outputs. Any difference is intentional.
The four-tier substitution order¶
LHP resolves substitution syntaxes in a specific order: local variables, then template parameters, then environment tokens, then secret references. The order is not arbitrary — it reflects the lifecycle of each kind of substitution.
Syntax |
Scope |
Resolved when |
|---|---|---|
|
One FlowGroup |
YAML parse time, before template expansion |
|
One template instance |
Template expansion time |
|
One environment |
After template expansion, before code generation |
|
Runtime |
Generated as |
The earliest layer (local variables) has the narrowest scope — one FlowGroup. The latest layer (secrets) has the widest — the value is only retrieved when the pipeline runs in Databricks. Each layer resolves before the next sees the YAML, so a local variable can be used as a template parameter, which can produce text containing an environment token, which can include a secret reference.
The case difference between ${SCREAMING_SNAKE_CASE} for env
tokens and %{lower_snake_case} for local variables is intentional:
a reader can tell at a glance which resolution layer applies, without
remembering syntax details. This pays off in PR review, where you want
to spot a mistake by reading.
Local variables are for FlowGroup-scoped repetition¶
When the same value — usually a table name, a schema, or a path segment — appears multiple times within one FlowGroup, define it as a local variable instead of repeating it:
variables:
entity: orders
source_schema: raw
actions:
- name: load_%{entity}
source:
type: delta
catalog: "${BRONZE_CATALOG}"
database: "%{source_schema}"
table: "%{entity}"
The local variable does not change between environments — orders
is orders everywhere. The environment token (${BRONZE_CATALOG})
captures what does change. Mixing the two is the common mistake: people
put environment-varying values in local variables, then discover the
FlowGroup cannot be promoted across environments because the
variables: block hard-codes a dev value.
The rule is: if the value differs between dev and prod, it is an environment token. If it is the same everywhere but repeats inside one FlowGroup, it is a local variable.
The global section eliminates duplication¶
Substitution files support a global section whose values apply to
every environment. Environment-specific sections override globals:
global:
catalog_prefix: main
storage_account: companylake
dev:
catalog: "${catalog_prefix}_dev"
landing_path: "abfss://landing@${storage_account}.dfs.core.windows.net/dev"
prod:
catalog: "${catalog_prefix}_prod"
landing_path: "abfss://landing@${storage_account}.dfs.core.windows.net/prod"
LHP supports recursive token expansion — a token can reference another
token, up to ten iterations. The combination of recursive expansion and
the global section means most substitution files end up small and
diff-friendly. A change to the storage account name affects one line.
A standard medallion token set keeps the substitution files predictable across projects:
global:
bronze_catalog: "${catalog_prefix}_bronze"
silver_catalog: "${catalog_prefix}_silver"
gold_catalog: "${catalog_prefix}_gold"
landing_path_base: "abfss://landing@${storage_account}.dfs.core.windows.net"
The same token names across projects means the same FlowGroup template works in different repos.
Why secret literals stay out of substitution files¶
Substitution files are committed to version control. A secret literal in one of them is a leak — even if the file is committed to a private repo, every developer who clones the repo can read the secret, and nothing prevents the value from showing up in a backup, a Slack paste, or a deleted-but-recoverable branch.
LHP’s ${secret:scope/key} syntax solves this. The substitution file
contains a reference, not a value. LHP converts the reference into a
dbutils.secrets.get(scope="scope", key="key") call in the generated
Python. The actual secret is stored in a Databricks secret scope and
retrieved at pipeline runtime. The version-controlled artifact contains
only the indirection.
Warning
Any secret literal in a substitution file is leaked the moment the
file is committed. There is no “private substitution file” — they all
ship to version control as part of the project. Always use
${secret:scope/key}.
The substitution file can declare a default scope and named scope aliases so references stay readable:
secrets:
default_scope: prod-secrets
scopes:
data-vault: data-vault-prod
jdbc_password: "${secret:default/db_password}"
vault_token: "${secret:data-vault/api_token}"
Per-environment overrides for behaviour, not just values¶
Substitution tokens cover most environment differences — catalog names, schemas, storage paths, alert email addresses. The cases that need different behaviour per environment (different DQE expectations in dev versus prod, for example) usually have to be modelled through presets or template parameters that take a token as input.
The reason is that LHP keeps substitution resolution textual: it
replaces tokens with values. It does not flip flags or skip actions.
If a pipeline needs to drop bad rows in prod and only warn in dev, the
failureAction in the expectations file must come from a token, and
the substitution file picks the value per environment.
# expectations file
- name: valid_order_id_not_null
constraint: "order_id IS NOT NULL"
failureAction: "${BRONZE_DQE_ACTION}" # warn in dev, drop in prod
This is the canonical pattern: parameterise behaviour the same way you parameterise values. The substitution file then captures the whole environment difference in one place.
Auditing the available tokens¶
Before writing FlowGroups in a fresh project, run
lhp substitutions --env dev to dump the resolved token set. The
command prints every token visible to --env dev, including
inherited globals and recursively expanded values. The most common
class of bug — an unresolved-token error at generation time — comes
from a typo or a missing token. lhp substitutions surfaces both
before you write the FlowGroup that uses them.
Anti-patterns¶
Secrets in substitution files. Leaks to version control. Use
${secret:scope/key} syntax.
Hardcoded catalog or schema names in YAML. Breaks environment
promotion. The whole point of the substitution layer is to push these
out of FlowGroup YAML. Use ${BRONZE_CATALOG}.${schema}.${table}.
Local variables for environment-varying values. %{var} is
flowgroup-scoped and resolved at parse time, before LHP sees which
environment is in play. Putting a dev value in a local variable
silently breaks prod generation.
Different YAML configs per environment. If you find yourself
maintaining orders_dev.yaml and orders_prod.yaml with mostly
the same content, the difference belongs in the substitution file, not
in duplicate sources.
See also¶
Substitutions & Secrets for the procedural how-to and full syntax reference.
Configure Bundles for how environment substitutions integrate with Databricks Asset Bundle targets.
Governance for operational-metadata patterns that include
pipeline_idandpipeline_run_idfor cross-environment audit.