Overview
A climate tech product team shipped complex model outputs into customer dashboards, but the connection between model versions, input datasets, and dashboard tiles was opaque. Customers misinterpreted figures, support fielded preventable questions, and PMs hesitated to rely on usage signals. Intelligex implemented a lineage view that tied model runs to input datasets and downstream dashboard tiles, with validation checks and publish gates. Customers saw coherent narratives from source data to visualization, support tickets related to misalignment declined, and PMs trusted what dashboards said about adoption and impactwithout changing the modeling stack or the BI tools.
Client Profile
- Industry: Climate technology and environmental analytics
- Company size (range): Multi-product platform serving enterprise and public sector customers
- Stage: Mature modeling pipelines and customer-facing dashboards; lineage and version mapping handled ad hoc
- Department owner: Product Management & R&D
- Other stakeholders: Data Science/Modeling, Data Engineering, Analytics, Customer Success, Design/UX, Security/Privacy, Legal/Compliance, DevOps, Sales/Partnerships
The Challenge
Models produced forecasts, risk indices, and scenario analyses using heterogeneous inputs: meteorological datasets, emissions inventories, grid telemetry, and customer-supplied data. Each model had variants by region, horizon, and scenario assumptions. Downstream, BI dashboards combined outputs into tiles and narratives. When a customer asked why a tile changed, the answer lived somewhere between model notes, transformation jobs, and cached extracts. Not every dashboard tile was clearly tied to a model version or input dataset, and units or scenario flags were sometimes lost in transformation. The same output appeared with different labels depending on where it landed.
Misinterpretation was common. A dashboard might show a risk index that corresponded to one scenario, while the customer expected another. Units and normalization choices were not always visible at the point of decision. Customer Success spent time rebuilding context, PMs hesitated to draw product insights from usage, and Data Science fielded basic lineage questions instead of working on improvements. The team wanted a predictable way to see what model and data each tile represented, if it passed validation, and when changes occurred.
Constraints were practical. Modeling ran in Python notebooks and pipelines using xarray and netCDF; outputs landed in object storage and the warehouse; dashboards ran in the existing BI tool. Any solution needed to respect customer data boundaries, avoid duplicating large artifacts, and align with domain standards for climate and geospatial data. For lineage and registry patterns, the team referenced OpenLineage and the MLflow Model Registry. Validation checks followed ideas from Great Expectations, and conventions for geospatial arrays were informed by the CF Conventions.
Why It Was Happening
Root causes were fragmented metadata and inconsistent handoffs. Modeling pipelines wrote outputs with informal naming for scenario and units. Data Engineering transformed arrays and tables for the warehouse without consistently carrying forward version and assumption fields. BI projects referenced curated tables but did not expose lineage at the tile level. Effective dates and cache refreshes drifted, so a model update might not propagate cleanly, and the dashboard would mix versions until the next full run. Without a canonical link across model, dataset, transformation, and tile, customers and PMs reconstructed the story by hand.
Ownership was diffuse. Data Science owned the models, Engineering owned transformations, Analytics owned the warehouse, and PMs owned the dashboards. Each team optimized locally: fast runs, clean tables, attractive tiles. No shared layer tied artifacts together with validation rules before surfaces were updated, which made late-stage questions frequent and avoidable.
The Solution
Intelligex implemented a lineage and validation layer that connected model runs, input datasets, transformations, and dashboard tiles. Pipelines emitted run metadatamodel version, parameters, scenario tags, input dataset fingerprintsand registered artifacts. Transform jobs carried lineage forward to curated tables. BI tiles were tagged with lineage identifiers and assumptions, and a publish gate withheld updates if validation failed. A lineage view let users click from a tile to the model run and inputs, with validation status and effective dates. Human reviewers remained in the loop for assumption changes and customer-specific overrides.
- Integrations: Model metadata and artifacts registered in the teams registry (patterns aligned with MLflow Model Registry); transformation pipelines emitted lineage events to a metadata store using the OpenLineage spec; validation checks based on Great Expectations-style assertions; dashboards in the existing BI tool (for example, Looker or Mode) with tile-level tags; warehouse remained the system of record.
- Canonical lineage schema: Fields for model name, version, scenario, parameters, input dataset identifiers and checksums, units, spatial/temporal resolution, transformations applied, and consuming tiles. Effective dating preserved history.
- Tile tagging and exposure: BI tiles carried lineage IDs and assumption badges; deep links opened a lineage view with model run, inputs, validations, and effective dates.
- Validation checks: Assertions on units, coordinate conventions, expected ranges, spatial/temporal alignment, and schema compatibility between inputs and model expectations. Failures blocked tile publication and opened review tasks.
- Publish gates: Tiles refreshed only when lineage was complete and validations passed; assumption changes required reviewer approval and customer communications where needed.
- Dashboards: Customer-facing assumption banners and tooltips; internal views of lineage completeness, recent changes, and validation status by model and product area.
- Governance and audit: Change control for lineage fields, validation suites, and assumption mappings; immutable logs of model promotions, validation results, approvals, and tile publishes.
- Security and privacy: Row- and project-level permissions; lineage stored metadata and fingerprints, not raw customer datasets; access aligned to data-sharing agreements.
Implementation
- Discovery: Mapped model families, input datasets, and existing transformation flows; inventoried dashboard tiles and their data sources; collected common misinterpretations from support tickets; reviewed domain conventions for units, coordinates, and scenario tags.
- Design: Defined the lineage schema and required fields; authored validation suites per model family; specified how tiles would carry lineage tags and assumption badges; designed the lineage view and publish gates; agreed on ownership and reviewer roles for assumption changes.
- Build: Instrumented modeling and transformation pipelines to emit lineage; integrated the registry and metadata store; implemented validation checks and failure routing; added tile-level tags in BI and built the lineage view; created assumption banners and tooltips; assembled dashboards and notifications.
- Testing/QA: Ran in shadow mode: captured lineage and ran validations while dashboards continued with current refresh patterns; replayed prior model updates and customer scenarios to verify lineage and checks; tuned unit mappings, scenario tags, and spatial/temporal tolerances; included a human-in-the-loop panel with Data Science, Analytics, PMs, and Customer Success.
- Rollout: Enabled lineage and validation on high-traffic tiles first; activated publish gates with conservative rules; expanded across model families and product areas as teams gained confidence; retained legacy refresh as a controlled fallback early on.
- Training/hand-off: Delivered sessions for PMs, Data Science, Analytics, and Customer Success on reading lineage, interpreting assumption badges, and handling validation failures; updated SOPs for model promotion and customer communications; transferred ownership of lineage fields and validation suites to Product Ops and Data under change control.
Results
Customers saw a coherent story from model to dashboard. Tiles displayed the scenario and units in context, and a click revealed the model run, inputs, and validation status. When assumptions changed, banners and release notes reflected the update, and Customer Success answered questions with linked evidence. The same model output could no longer appear under different labels, and transformations preserved the metadata needed to interpret results.
PMs trusted usage signals and planned accordingly. Adoption trends tied to specific model families and scenarios, so roadmap debates referenced shared facts. Support tickets about why did this change decreased as tiles carried lineage and assumptions. Data Science spent less time reconstructing run context and more time improving models. The modeling stack, warehouse, and BI stayed in place; the shift was a governed layer that connected artifacts and validated them before they reached customers.
What Changed for the Team
- Before: Model outputs and dashboard tiles drifted apart. After: Tiles were tagged with lineage IDs and assumption badges linked to model runs and inputs.
- Before: Units and scenarios were hard to verify. After: Validation checks enforced units and scenario tags, blocking publishes when mismatched.
- Before: Customers asked for backstory on changes. After: Lineage views and assumption banners made context visible at the point of use.
- Before: PMs questioned usage signals. After: Adoption tied to model families and scenarios, enabling confident prioritization.
- Before: Data Science fielded basic lineage questions. After: A shared view answered provenance and validation status on demand.
- Before: Refreshes were all-or-nothing. After: Publish gates promoted only validated, fully tagged tiles.
Key Takeaways
- Connect models to tiles; lineage from run to visualization prevents misinterpretation.
- Carry assumptions forward; scenario and unit tags need to survive every transformation.
- Validate before publish; checks on units, ranges, and alignment catch silent drift.
- Expose provenance; tile badges and lineage views reduce support effort and build trust.
- Keep humans on assumptions; reviewer gates for scenario changes protect customers.
- Integrate, dont replace; instrument pipelines and BI to add lineage and validation around existing tools.
FAQ
What tools did this integrate with? Modeling pipelines registered artifacts and versions in the existing registry (patterns aligned with the MLflow Model Registry), transformation jobs emitted lineage events using the OpenLineage spec, and validation checks followed patterns from Great Expectations. Dashboards in the teams BI tool carried lineage tags and assumption badges; the warehouse remained the system of record.
How did you handle quality control and governance? Required lineage fields were enforced in pipelines; validation suites checked units, schema, and spatial/temporal alignment; and publish gates blocked tiles when checks failed. Assumption changes required human approval with reason codes, and all promotions, validations, and edits were immutably logged. Lineage and validation rules lived under change control with Product and Data ownership.
How did you roll this out without disruption? The lineage and validation layer ran in shadow mode first, capturing metadata and running checks while dashboards continued normal refreshes. Prior updates and customer scenarios were replayed to tune mappings and tolerances. Publish gates were enabled on high-visibility tiles first, with legacy refresh retained as a controlled fallback until teams were comfortable.
How were model and dataset versions mapped to dashboard tiles? Pipelines emitted lineage IDs per model run tied to input dataset fingerprints. Transformation jobs carried those IDs into curated tables. BI tiles included the lineage ID in query tags or metadata, and the lineage view resolved the path from tile to model run and inputs with effective dates and validation status.
How did you handle units, geospatial conventions, and scenarios? Models declared units, coordinate reference, and scenario tags explicitly. Validation checked for expected units and CF-aligned coordinate metadata, verified spatial and temporal resolution, and ensured the scenario tag matched tile intent. Tiles showed assumption badges and tooltips, and changes required reviewer sign-off.
What about customer-specific or sensitive datasets? Lineage stored metadata and fingerprints, not raw payloads. Access respected row-level and project-level permissions, and customer data never left agreed boundaries. Lineage views displayed only what the viewer was entitled to see, and sensitive fields were masked in dashboards.
Department/Function: Analytics & Executive LeadershipIT & InfrastructureProduct Management & R&DStrategy
Capability: Data IntegrationPipelines & Reliability
Get a FREE
Proof of Concept
& Consultation
No Cost, No Commitment!


