Overview

A robotics R&D group struggled to reuse simulation scenarios across teams, which fragmented test coverage and hid risks until late stages. Each team kept its own bags, worlds, and scripts, and there was no reliable way to find or compare similar scenarios. Intelligex deployed a governed scenario library with enforced metadata standards, a vector search index that grouped like scenarios via embeddings, and pipeline hooks for Azure DevOps so squads pulled the same vetted scenarios into continuous integration. Teams reused validated scenarios, coverage conversations aligned around shared artifacts, and failures traced back to consistent scenarios rather than ad?hoc scripts—without changing simulators, ROS test harnesses, or existing pipelines.

Client Profile

  • Industry: Robotics and autonomous systems (mobile platforms, perception, and manipulation)
  • Company size (range): Multi?team R&D organization with shared simulation infrastructure
  • Stage: Established ROS-based stacks and simulators; scenarios scattered across repos and notebooks
  • Department owner: Product Management & R&D
  • Other stakeholders: Perception, Planning, Controls, Simulation/Tools, Test Engineering, DevOps, Compliance/Safety, Program Management, IT/Security

The Challenge

Simulation was core to validation, yet scenarios lived as world files, launch configs, and notebooks stored in separate team repos. One team’s “warehouse pallet spill” differed from another’s despite similar goals. Metadata was sparse or inconsistent, so finding a scenario that matched a new bug or requirement meant asking around or re?creating it. Tests drifted as forks multiplied, and CI jobs invoked different sets of worlds and assets depending on who authored the pipeline step.

In practice, artifacts spanned ROS launch files, Gazebo or Isaac worlds, bag files, random seeds, sensor models, and control constraints. Some scenarios embedded assumptions about robot frames or sensor latency that were not obvious from filenames. Coverage reports compared counts of runs rather than intent or risk categories. When a failure surfaced in one program, other teams struggled to reproduce it because the underlying scenario was named differently, configured differently, or missing entirely.

Leadership wanted reuse and traceability, not a new simulator. The group used ROS and common simulators, and Azure DevOps pipelines ran tests on shared runners. The missing piece was a searchable scenario library with enforced metadata, a way to find similar scenarios by description and artifacts, and pipeline integration so CI could resolve scenarios by tags rather than by file paths. For simulator context, many teams worked with ROS and Gazebo; see ROS and Gazebo. Pipelines ran in Azure DevOps; reference: Azure DevOps Pipelines.

Why It Was Happening

Root causes were unstructured scenarios and uneven metadata. Teams encoded environment, sensor, and actuator details in launch files and scripts without a shared schema. Names and tags were local, so “dusty aisle” in one repo meant something else elsewhere. Without a canonical model for intent, terrain, assets, sensors, faults, and success criteria, search devolved to tribal knowledge. As scenarios forked, subtle differences in seeds, physics, or models created inconsistent outcomes.

Ownership was split. Simulation Tools owned runners and base images, Perception and Planning owned their test worlds, and Product asked for coverage across risks and requirements. No shared library or governance enforced that a scenario was documented, deduplicated, and approved for reuse. CI pulled whatever a job listed by path, so teams optimized locally and coverage became hard to compare.

The Solution

Intelligex built a scenario library backed by a metadata standard and a vector search index, then integrated it with Azure DevOps. Each scenario carried a canonical manifest describing intent, assets, sensors, faults, randomization, configuration, and expected assertions. An embedding service indexed descriptions and artifacts so teams could find similar scenarios by natural language or by drop?in similarity. Pipeline tasks resolved scenario references by tags and risk categories, fetched validated artifacts, and wrote results back to the library. A human?in?the?loop curation gate prevented drift and kept duplicates under control.

  • Integrations: Azure DevOps pipelines for CI orchestration; ROS launch and test harnesses; simulators such as Gazebo or NVIDIA Isaac Sim; artifact storage in the existing registry; dashboards surfaced in the team’s BI tool.
  • Metadata standard: Canonical manifest fields for scenario intent, environment, assets, sensor models, robot configs, faults, random seeds or ranges, required assertions, and simulator compatibility. Optional alignment to domain scenario formats like ASAM OpenSCENARIO where applicable.
  • Embedding and search: Vector index over scenario descriptions, tags, and manifest snippets; keyword and semantic search combined to retrieve near?matches and variants quickly.
  • Validation and deduplication: Linting for manifests and file layouts; similarity checks flagged near?duplicates for review; required fields enforced before publish.
  • Pipeline tasks: Azure DevOps tasks pulled scenarios by tag or manifest query, hydrated assets and configs, invoked the simulator with standard arguments, and uploaded results and logs back to the library.
  • Assertions and outcomes: Common assertion templates for perception, planning, and control (e.g., detection latency bands, path feasibility, stability), emitted as machine?readable results for coverage views.
  • Dashboards: Coverage by risk category, subsystem, and scenario family; failure clustering by scenario lineage; trend views of regressions tied to common worlds and seeds.
  • Permissions and audit: Role?based publish rights, read access by program; immutable logs of publishes, edits, merges, and pipeline results.

Implementation

  • Discovery: Inventory of existing scenarios, simulators, and pipelines; cataloged common environments, assets, and failure modes; reviewed how coverage was reported and where duplication and drift occurred.
  • Design: Defined the manifest schema and tag taxonomy; specified similarity signals for embedding and keyword search; authored lint rules; designed Azure DevOps tasks and result posting; agreed on curator roles and approval gates.
  • Build: Implemented the library service and storage layout; built the embedding index and search API; added manifest linters and dedupe checks; created Azure DevOps tasks to resolve scenarios, hydrate assets, and post results; assembled dashboards.
  • Testing/QA: Ran in shadow mode with selected teams; mirrored pipeline jobs to pull scenarios from the library while legacy paths remained; tuned search signals, manifests, and assertion templates; included human?in?the?loop curation to manage early duplicates.
  • Rollout: Onboarded by subsystem and program; migrated high?value scenarios first; enabled pipeline tasks as defaults after stable cycles; kept direct file paths as a controlled fallback while teams adjusted.
  • Training/hand?off: Delivered short sessions for Perception, Planning, Controls, and Test on manifests, search, and pipeline use; updated SOPs for publishing scenarios and defining assertions; transferred ownership of schema and curation to Simulation Tools and Product Ops under change control.

Results

Scenario reuse became the norm. Teams searched by intent or environment and pulled the same validated scenarios rather than re?creating close variants. CI jobs referenced tags and risk categories instead of hard?coded paths, and dashboards showed coverage by scenario family and subsystem. When a regression appeared, failures traced to shared scenarios, so root cause discussions started from the same artifacts.

Coverage conversations improved. Product and technical leads reviewed risk categories tied to scenario families rather than lists of files. Curators merged duplicates and evolved manifests, and assertions carried forward across programs. Simulators and ROS harnesses remained in place; the difference was a governed library and pipeline integration that made simulation a shared resource instead of a set of team islands.

What Changed for the Team

  • Before: Scenarios lived in team repos with ad?hoc names. After: A library stored scenarios with enforced manifests and tags.
  • Before: Finding similar scenarios meant asking around. After: Vector and keyword search surfaced near?matches and validated variants.
  • Before: Pipelines invoked worlds by path. After: Azure DevOps tasks resolved scenarios by tag and hydrated assets automatically.
  • Before: Coverage was a count of runs. After: Coverage rolled up by risk category, subsystem, and scenario family.
  • Before: Failures were local to a team’s script. After: Failures traced to shared scenarios with lineage and assertions.
  • Before: Duplicates proliferated silently. After: Dedupe checks and curation merged overlap and kept variants meaningful.

Key Takeaways

  • Define a scenario manifest; shared metadata is the foundation for reuse and search.
  • Index for similarity, not just names; embeddings and tags make near?matches easy to find.
  • Integrate with CI; resolve scenarios by intent in pipelines so teams run the same tests.
  • Standardize assertions; comparable outcomes turn coverage into a cross?team conversation.
  • Curate with a light touch; human review merges duplicates and keeps the library clean.
  • Integrate, don’t replace; keep ROS and simulators, add governance and search around them.

FAQ

What tools did this integrate with? The library connected to existing simulators such as Gazebo and ROS launch/test harnesses, added Azure DevOps tasks so pipelines could resolve scenarios by tag, and posted results back to the library and dashboards. Teams continued using their artifact registries and runners; the integration added manifests, search, and CI hooks.

How did you handle quality control and governance? Manifests followed a schema with required fields and linting. Similarity checks flagged near?duplicates for curator review, and publish rights were role?based. Scenario changes and merges were logged with rationale, and assertion templates were versioned under change control. Pipelines posted outcomes to the library, creating a traceable link between scenarios and results.

How did you roll this out without disruption? The library ran alongside existing paths first. Selected teams mirrored pipeline steps to pull scenarios from the library while keeping direct paths as a fallback. After manifests and search signals stabilized, pipeline tasks became defaults by subsystem and program. No simulator or ROS changes were required.

How did the embedding index work and what about privacy? Descriptions, tags, and select manifest fields were embedded to enable semantic search. Only metadata entered the vector index; large artifacts stayed in existing storage. Access respected project permissions, and previews redacted sensitive details where necessary. All searches and publishes were auditable.

How did you keep scenarios portable across simulators? Manifests captured simulator compatibility and required assets, and scenarios included adapter scripts where needed. The library surfaced compatible variants and encouraged common assertion templates so outcomes remained comparable even when teams used different engines.

How did you manage compute cost and pipeline time? Pipelines selected scenarios by tag and risk category rather than running everything by default. Cached assets and containerized runners reduced setup time, and the library tracked runtime characteristics to guide which scenarios ran on quick vs. deep suites.

You need a similar solution?

Get a FREE
Proof of Concept
& Consultation

No Cost, No Commitment!