Monitoring Reference¶
This page catalogs every event_log and monitoring key in lhp.yaml, the
schema of the dedicated job-config file, the reserved __eventlog_monitoring
alias, and the validation errors LHP raises. For the walk-through, see
Enable Monitoring.
Prerequisites: Databricks Asset Bundles enabled (databricks.yml exists),
Unity Catalog workspace, and a cloud storage path (typically a Unity Catalog
volume) for streaming checkpoints.
Event Log Configuration¶
The event_log block in lhp.yaml injects event_log: blocks into every
generated pipeline resource file during lhp generate. String fields support
${token} substitution from substitutions/<env>.yaml.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
Set to |
|
string |
(required when |
Unity Catalog name for event log tables. |
|
string |
(required when |
Schema name for event log tables. |
|
string |
|
Prefix prepended to the generated event log table name. |
|
string |
|
Suffix appended to the generated event log table name. |
Per-pipeline table name formula: {name_prefix}{pipeline_name}{name_suffix}.
Pipeline-Level Overrides¶
In pipeline_config.yaml (loaded by lhp generate -pc):
An
event_log:mapping for a pipeline fully replaces the project-level block (no merge).event_log: falseopts that pipeline out of event logging. Opt-outs are excluded from the monitoring notebook’s source list.
For the full pipeline_config.yaml schema, see Bundle Configuration Reference.
Monitoring Configuration¶
The monitoring block generates three artifacts: a union notebook, an
MVs-only Lakeflow Declarative Pipeline, and a Databricks Workflow job chaining
them. Monitoring requires event_log enabled (LHP raises LHP-CFG-008
otherwise).
Field |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
When |
|
string |
required when |
Base path for streaming checkpoints. Each monitored pipeline gets the
subdirectory |
|
string |
required when |
Project-root-relative path to a flat single-document YAML file describing the monitoring Workflow job. See Workflow Job Schema. |
|
integer |
|
|
|
string |
|
Used for the generated pipeline, job, and the directory under
|
|
string |
inherits |
Override catalog for monitoring tables. |
|
string |
inherits |
Override schema for monitoring tables. |
|
string |
|
Delta table the union notebook writes into (regular Delta table, created by Structured Streaming on first run). |
|
list |
one default view |
Materialized view definitions. See Materialized Views. |
|
boolean |
|
Adds a |
Both required strings support ${token} substitution. When job_config_path
contains unresolved tokens, LHP defers the file-existence check until the
environment is selected. monitoring: {} parses but always fails validation,
because both required strings are empty.
Minimal Valid Configuration¶
event_log:
catalog: "${catalog}"
schema: _meta
name_suffix: "_event_log"
monitoring:
checkpoint_path: "/Volumes/${catalog}/_meta/checkpoints/event_logs"
job_config_path: "config/monitoring_job_config.yaml"
Override Resolution¶
Monitoring catalog and schema resolve as: monitoring.catalog /
monitoring.schema (when set) wins; otherwise inherits from event_log. The
Delta table fully qualified name is {catalog}.{schema}.{streaming_table}.
Materialized Views¶
The materialized_views field has three modes:
Value |
Behavior |
|---|---|
omitted or |
LHP generates the default |
|
No materialized views and no MVs pipeline file. The Workflow job contains
only the notebook task. Cleanup removes any previously generated
|
explicit list |
Only the listed views are generated, replacing the default. |
View Definition Fields¶
Each entry must declare name (required, unique within the list) and exactly
one of sql (inline SQL string) or sql_path (project-root-relative path to
a SQL file). Each view becomes a @dp.materialized_view function in
generated/<env>/<pipeline_name>/monitoring.py, fully qualified as
{monitoring_catalog}.{monitoring_schema}.{view_name}.
monitoring:
checkpoint_path: "/Volumes/${catalog}/_meta/checkpoints/event_logs"
job_config_path: "config/monitoring_job_config.yaml"
materialized_views:
- name: error_events
sql: "SELECT * FROM all_pipelines_event_log WHERE event_type = 'error'"
- name: custom_analysis
sql_path: "sql/monitoring_custom_analysis.sql"
Default events_summary Schema¶
The default view aggregates run status, timing, row metrics, and run config from the union Delta table.
Column |
Type |
Description |
|---|---|---|
|
STRING |
Pipeline identifiers and run (update) ID. |
|
STRING |
Final status ( |
|
STRING |
|
|
BOOLEAN |
Full refresh vs. incremental. |
|
STRING |
Databricks Runtime version. |
|
STRING |
|
|
TIMESTAMP |
Run start and end. |
|
DOUBLE |
Run duration in minutes (2 decimal places). |
|
BIGINT |
Distinct tables (flows) processed. |
|
BIGINT |
Row counts across all tables. |
|
BIGINT |
Records dropped by data quality expectations. |
Workflow Job Schema¶
monitoring.job_config_path must point to a flat single-document YAML mapping.
LHP deep-merges the file over its defaults (max_concurrent_runs=1,
performance_target=STANDARD, queue.enabled=true) and then token-substitutes
via the active environment’s substitutions/<env>.yaml. The job name is fixed
at {pipeline_name}_job and the pipeline_task is generated automatically.
Field |
Description |
|---|---|
|
New-cluster spec for the notebook task. Mutually exclusive with
|
|
Attach the notebook task to an existing cluster. |
|
Enable job-run queueing. |
|
|
|
Job-level timeout in seconds. |
|
Maximum concurrent runs. Default |
|
Quartz cron schedule: |
|
Free-form |
|
|
|
User/group permission entries. |
Disallowed keys: a top-level project_defaults: wrapper, job_name:, and
pipeline_task entries (silently overridden).
Warning
Legacy auto-pickup of templates/bundle/job_config.yaml for monitoring jobs
is removed. Use monitoring.job_config_path to point at a dedicated
file. The __eventlog_monitoring alias inside the generic
config/job_config.yaml consumed by lhp deps for orchestration jobs is
unaffected.
Job Monitoring¶
When monitoring.enable_job_monitoring: true, LHP adds a Python-load chain
that correlates Databricks Jobs with their pipeline runs via the Databricks SDK.
Generated files under generated/<env>/<pipeline_name>/:
monitoring.py— adds av_jobs_statsview and ajobs_statsMV.jobs_stats_loader.py— calls the SDK to scan recent job runs, correlate each pipeline update with its triggering job, and enrich rows with pipeline tags (spec.tags) and job tags (settings.tags).
Default SDK lookback is 7 days, configurable via the lookback_hours pipeline
parameter. The jobs_stats view inherits the monitoring pipeline’s catalog
and schema.
Column |
Type |
Description |
|---|---|---|
|
STRING |
Pipeline identifiers. |
|
STRING |
Pipeline update correlated with the job run. |
|
STRING |
Triggering job identifiers and name. |
|
TIMESTAMP |
Job run start and end. |
|
STRING |
Final status ( |
|
STRING |
JSON map of |
Reserved Aliases¶
The __eventlog_monitoring alias (double-underscore prefix) targets the
monitoring pipeline from pipeline_config.yaml without hardcoding its name. It
resolves to monitoring.pipeline_name (default
${project_name}_event_log_monitoring) at generation time.
Constraint |
Violation result |
|---|---|
Monitoring must be enabled in |
Alias entry is silently dropped with a warning. |
Cannot coexist with the actual monitoring pipeline name in the same config. |
LHP raises |
Must appear as a standalone |
LHP raises |
The alias inside the generic orchestration config/job_config.yaml (consumed
by lhp deps) is independent and still supported.
Automatic Cleanup¶
LHP reconciles monitoring artifacts on every lhp generate. Toggling
monitoring, renaming pipeline_name, or switching materialized_views
between populated and empty never leaves stale files.
Artifact |
Reconciliation |
|---|---|
Notebook directory |
|
Job resource |
Any |
Pipeline directory |
When monitoring is disabled, LHP scans |
Validation Errors¶
Code |
Trigger |
|---|---|
|
|
|
|
|
|
|
Both |
|
|
|
At post-substitution time, the resolved |
For the full error catalog, see Error Reference.
See also¶
Enable Monitoring — how-to walk-through for turning monitoring on, with the workflow-job configuration steps and pre-V0.8.2 migration notes.
Operational Metadata — reference for project-level operational metadata columns that complement event log monitoring.
Architecture — explanation of the LHP generation model, including how the orchestrator finalizes monitoring artifacts.
Bundle Configuration Reference — bundle integration and
pipeline_config.yamlschema.Error Reference — full error code reference.