Environments ============ .. meta:: :description: Reasoning behind LHP environment management — substitutions, the four-tier resolution order, secret references, and per-environment overrides. For the step-by-step procedure for writing substitution files, see :doc:`../substitutions`. This page explains why LHP separates environment concerns the way it does and what that buys you. Why environment-agnostic configs matter --------------------------------------- LHP's central environment-promotion claim is that the same YAML configs deploy to dev, staging, and prod. Only the substitution file changes. The generated Python is environment-specific because the substituted values differ, but the YAML source you commit is identical. The reason this matters is that environment drift is the most common shape of production bug. A pipeline works in dev because the dev catalog has a slightly different schema, or the dev table happens to have a column that prod does not, or a flag was flipped manually in prod six months ago. When the YAML is identical across environments, those drifts can only come from a substitution token — and the substitution file is a small, reviewable YAML document. The other shape of environment bug is the reverse: a change works in prod and fails in dev. With environment-agnostic configs, you can run ``lhp generate --env dev`` and ``lhp generate --env prod`` side by side and diff the outputs. Any difference is intentional. The four-tier substitution order -------------------------------- LHP resolves :term:`substitution ` syntaxes in a specific order: :term:`local variables `, then template parameters, then :term:`environment tokens `, then secret references. The order is not arbitrary — it reflects the lifecycle of each kind of substitution. .. list-table:: :header-rows: 1 :widths: 20 25 55 * - Syntax - Scope - Resolved when * - ``%{local_var}`` - One FlowGroup - YAML parse time, before template expansion * - ``{{ template_param }}`` - One template instance - Template expansion time * - ``${env_token}`` - One environment - After template expansion, before code generation * - ``${secret:scope/key}`` - Runtime - Generated as ``dbutils.secrets.get(...)`` call The earliest layer (local variables) has the narrowest scope — one FlowGroup. The latest layer (secrets) has the widest — the value is only retrieved when the pipeline runs in Databricks. Each layer resolves before the next sees the YAML, so a local variable can be used as a template parameter, which can produce text containing an environment token, which can include a secret reference. The case difference between ``${SCREAMING_SNAKE_CASE}`` for env tokens and ``%{lower_snake_case}`` for local variables is intentional: a reader can tell at a glance which resolution layer applies, without remembering syntax details. This pays off in PR review, where you want to spot a mistake by reading. Local variables are for FlowGroup-scoped repetition --------------------------------------------------- When the same value — usually a table name, a schema, or a path segment — appears multiple times within one FlowGroup, define it as a local variable instead of repeating it: .. code-block:: yaml :caption: Local variables compress repetition variables: entity: orders source_schema: raw actions: - name: load_%{entity} source: type: delta catalog: "${BRONZE_CATALOG}" database: "%{source_schema}" table: "%{entity}" The local variable does not change between environments — ``orders`` is ``orders`` everywhere. The environment token (``${BRONZE_CATALOG}``) captures what does change. Mixing the two is the common mistake: people put environment-varying values in local variables, then discover the FlowGroup cannot be promoted across environments because the ``variables:`` block hard-codes a dev value. The rule is: if the value differs between dev and prod, it is an environment token. If it is the same everywhere but repeats inside one FlowGroup, it is a local variable. The ``global`` section eliminates duplication --------------------------------------------- Substitution files support a ``global`` section whose values apply to every environment. Environment-specific sections override globals: .. code-block:: yaml :caption: Shape of a substitution file global: catalog_prefix: main storage_account: companylake dev: catalog: "${catalog_prefix}_dev" landing_path: "abfss://landing@${storage_account}.dfs.core.windows.net/dev" prod: catalog: "${catalog_prefix}_prod" landing_path: "abfss://landing@${storage_account}.dfs.core.windows.net/prod" LHP supports recursive token expansion — a token can reference another token, up to ten iterations. The combination of recursive expansion and the ``global`` section means most substitution files end up small and diff-friendly. A change to the storage account name affects one line. A standard medallion token set keeps the substitution files predictable across projects: .. code-block:: yaml global: bronze_catalog: "${catalog_prefix}_bronze" silver_catalog: "${catalog_prefix}_silver" gold_catalog: "${catalog_prefix}_gold" landing_path_base: "abfss://landing@${storage_account}.dfs.core.windows.net" The same token names across projects means the same FlowGroup template works in different repos. Why secret literals stay out of substitution files -------------------------------------------------- Substitution files are committed to version control. A secret literal in one of them is a leak — even if the file is committed to a private repo, every developer who clones the repo can read the secret, and nothing prevents the value from showing up in a backup, a Slack paste, or a deleted-but-recoverable branch. LHP's ``${secret:scope/key}`` syntax solves this. The substitution file contains a reference, not a value. LHP converts the reference into a ``dbutils.secrets.get(scope="scope", key="key")`` call in the generated Python. The actual secret is stored in a Databricks secret scope and retrieved at pipeline runtime. The version-controlled artifact contains only the indirection. .. warning:: Any secret literal in a substitution file is leaked the moment the file is committed. There is no "private substitution file" — they all ship to version control as part of the project. Always use ``${secret:scope/key}``. The substitution file can declare a default scope and named scope aliases so references stay readable: .. code-block:: yaml secrets: default_scope: prod-secrets scopes: data-vault: data-vault-prod jdbc_password: "${secret:default/db_password}" vault_token: "${secret:data-vault/api_token}" Per-environment overrides for behaviour, not just values -------------------------------------------------------- Substitution tokens cover most environment differences — catalog names, schemas, storage paths, alert email addresses. The cases that need different *behaviour* per environment (different DQE expectations in dev versus prod, for example) usually have to be modelled through presets or template parameters that take a token as input. The reason is that LHP keeps substitution resolution textual: it replaces tokens with values. It does not flip flags or skip actions. If a pipeline needs to drop bad rows in prod and only warn in dev, the ``failureAction`` in the expectations file must come from a token, and the substitution file picks the value per environment. .. code-block:: yaml # expectations file - name: valid_order_id_not_null constraint: "order_id IS NOT NULL" failureAction: "${BRONZE_DQE_ACTION}" # warn in dev, drop in prod This is the canonical pattern: parameterise behaviour the same way you parameterise values. The substitution file then captures the whole environment difference in one place. Auditing the available tokens ----------------------------- Before writing FlowGroups in a fresh project, run ``lhp substitutions --env dev`` to dump the resolved token set. The command prints every token visible to ``--env dev``, including inherited globals and recursively expanded values. The most common class of bug — an unresolved-token error at generation time — comes from a typo or a missing token. ``lhp substitutions`` surfaces both before you write the FlowGroup that uses them. Anti-patterns ------------- **Secrets in substitution files.** Leaks to version control. Use ``${secret:scope/key}`` syntax. **Hardcoded catalog or schema names in YAML.** Breaks environment promotion. The whole point of the substitution layer is to push these out of FlowGroup YAML. Use ``${BRONZE_CATALOG}.${schema}.${table}``. **Local variables for environment-varying values.** ``%{var}`` is flowgroup-scoped and resolved at parse time, before LHP sees which environment is in play. Putting a dev value in a local variable silently breaks prod generation. **Different YAML configs per environment.** If you find yourself maintaining ``orders_dev.yaml`` and ``orders_prod.yaml`` with mostly the same content, the difference belongs in the substitution file, not in duplicate sources. See also -------- - :doc:`../substitutions` for the procedural how-to and full syntax reference. - :doc:`../configure_bundles` for how environment substitutions integrate with Databricks Asset Bundle targets. - :doc:`governance` for operational-metadata patterns that include ``pipeline_id`` and ``pipeline_run_id`` for cross-environment audit.