Overview

A metals processor was logging production deviations but rarely closing them with consistent root cause and evidence, so similar issues kept returning. Intelligex implemented an investigation workflow inside the existing Quality Management System (QMS) that automatically pulled context from the Manufacturing Execution System (MES) and the plant historian, enforced standard root cause methods, and routed approvals to the right roles. Investigations became faster to complete, evidence packages were more complete, and repeat deviations tapered as corrective actions were verified and embedded in standard work—without changing how operators ran the lines.

Client Profile

  • Industry: Metals processing (coils, plate, and finishing)
  • Company size (range): Mid-market, multi-site operations
  • Stage: Mature ISO 9001 environment with legacy MES and historian
  • Department owner: Operations & Manufacturing
  • Other stakeholders: Quality Assurance, Process Engineering, Maintenance, IT/OT, EHS, Supply Chain

The Challenge

Deviations were recorded in the QMS, but follow-up was inconsistent. Some tickets carried short notes and a quick closure, others sat open while engineers chased logs and production memories faded. Evidence like trend charts, setpoints, operator comments, and inspection results lived in different systems. By the time a meeting convened, screens had changed, tags were re-labeled, and the team debated over anecdotes rather than shared facts.

Each investigation required manual exports from MES and the historian. Engineers hand-built plots of furnace temperatures, line speeds, or coolant pressures, then pasted images into documents. Approvals traveled by email, and action items that required maintenance or programming changes were tracked in separate tools. Nothing guaranteed that a corrective action was verified later, so the same failure modes reappeared on other shifts or lines.

The company could not pause production or rebuild controls. OT security limited new connections into Programmable Logic Controllers (PLCs), change windows were tight, and the plant historian was managed by a central team with strict policies. Any solution had to sit within the QMS people already used, bring data to the investigation rather than pulling engineers into new tools, and respect read-only access to production systems.

Why It Was Happening

Root causes centered on fragmentation and process drift. The QMS captured the existence of a deviation but did not capture the evidence needed for analysis. MES, historian, and lab systems held the facts, yet each investigation assembled them from scratch. Engineers used different templates and methods; one preferred a fishbone diagram, another a quick “5 Whys,” and a third skipped analysis to clear the queue. Without standardized steps, handoffs broke down and learning did not stick.

Ownership was also blurred. Operations opened the deviation, Quality managed the record, Engineering had the tools to pull data, and Maintenance owned many corrective actions. Approvals bounced between teams in email threads. The result was slow progress and shallow conclusions, making recurrence more likely. Measures of success focused on closing tickets rather than preventing repeat events.

The Solution

Intelligex configured a structured investigation workflow inside the QMS and connected it to plant data sources. When a deviation was logged, the workflow automatically assembled a time-aligned evidence package from the MES and historian, then walked the team through a standard root cause process with clear roles, required fields, and gated approvals. Corrective and preventive actions were issued from the same record and tracked to completion, with verification steps scheduled and visible.

  • Integrations: Read-only connectors pulled batch context, genealogy, and operator comments from MES platforms such as SAP ME, Siemens Opcenter, or Rockwell FactoryTalk ProductionCentre. Trend data and alarms were sourced from the historian via approved APIs or OPC UA mirrors. Where applicable, lab results were imported from LIMS exports.
  • Data shaping: Time windows were anchored to the reported event, product, and line. Tags were mapped to a common asset model so plots of temperatures, speeds, or pressures were consistent across lines.
  • Root cause methods: The workflow embedded standard tools—5 Whys, cause-and-effect (fishbone) diagrams, and optional 8D—for consistency. Guidance prompts and examples were available in-line. For background on these techniques, see the American Society for Quality’s overview of root cause analysis.
  • Evidence capture: Auto-attached plots, alarm histories, and relevant MES records. Users could annotate charts, add photos, and link to control plan elements. Document versioning ensured attachments were immutable.
  • Approvals and roles: Role-based gates moved records from containment to root cause, corrective action, and verification. Operations, Quality, Engineering, and Maintenance sign-offs were required at each stage, with e-signature and audit trails.
  • Action tracking: Corrective actions generated tasks in the QMS and, when equipment work was needed, opened work orders in the Computerized Maintenance Management System (CMMS) such as IBM Maximo. Verification of effectiveness steps were scheduled automatically.
  • Dashboards: Live views showed aging, bottlenecks, and recurrence by line, shift, and failure mode. Leaders could see which actions were overdue and which controls had been updated.
  • Security: All plant data access used read-only, least-privilege service accounts. OT network boundaries remained intact; data crossed via approved gateways. For interoperability with control systems, the team leveraged OPC UA to avoid custom drivers.

Implementation

  • Discovery: Mapped the current deviation-to-CAPA path, cataloged data sources and tag owners, and reviewed example investigations to identify evidence gaps. Documented approval roles and the most common recurrence patterns.
  • Design: Defined a standard investigation workflow with severity-based paths, required fields, and sign-off gates. Built a tag-to-asset model and a crosswalk for legacy tag names. Selected 5 Whys and fishbone as defaults, with 8D for complex events.
  • Build: Configured QMS forms, e-signature checkpoints, and automated evidence pulls. Implemented data transforms for time alignment and unit consistency. Set up dashboards and notification channels in Microsoft Teams for line-specific queues.
  • Testing/QA: Ran the workflow in shadow mode against recent deviations to validate evidence capture, approvals, and data ties to MES and historian records. Adjusted templates and prompts based on engineer feedback. Included a human-in-the-loop review board to evaluate early cases.
  • Rollout: Launched in phases by area, starting with lines that showed frequent recurrence. Paper or spreadsheet templates remained as a controlled fallback for the first cycles. No changes were made to PLC logic or SCADA alarms.
  • Training/hand-off: Delivered role-based sessions for operators, supervisors, QA, and engineers. Provided quick-reference guides for the analysis steps. Instituted a weekly cross-functional review where open investigations and verifications were checked live.

Results

Investigations moved faster because the evidence showed up already assembled. Engineers no longer spent hours pulling historian trends and MES records; they focused on analysis. Root cause narratives became more consistent, with supporting plots, annotations, and operator comments embedded directly in the record. Approvals flowed through the QMS with clear ownership, and corrective tasks were visible in one place.

Recurrence declined as corrective actions were verified and institutionalized. Changes to setpoints, maintenance intervals, or work instructions were linked to the investigation and tracked through verification of effectiveness. Supervisors gained visibility into patterns by line and shift, making it easier to prioritize improvements. Auditors found a predictable set of records with complete audit trails, and leaders saw fewer repeat events attributed to the same causes.

What Changed for the Team

  • Before: Engineers exported historian data and built plots manually. After: Time-aligned plots and alarm histories attached automatically to the investigation.
  • Before: Each person used a different analysis template. After: Standard 5 Whys, fishbone, and 8D steps guided every case.
  • Before: Approvals bounced in email. After: Role-based gates and e-signatures moved investigations through the QMS.
  • Before: Corrective actions lived in separate trackers. After: Actions and work orders linked to the case and stayed visible to closure and verification.
  • Before: Repeat issues resurfaced across shifts. After: Verified changes updated standard work and control plans to prevent recurrence.
  • Before: Leaders saw a list of open deviations. After: Dashboards showed where investigations stalled and where patterns emerged.

Key Takeaways

  • Put the investigation where governance already lives—the QMS—and bring plant data into it automatically.
  • Standardize root cause steps so every case reaches the same depth of analysis, not just the fastest path to closure.
  • Auto-pull MES and historian context; engineers should analyze, not assemble data.
  • Route approvals by role and link corrective actions to maintenance and procedure changes, with verification scheduled.
  • Use open, read-only interfaces like OPC UA to respect OT boundaries while improving visibility.
  • Start with high-severity or high-recurrence areas, prove the workflow, then expand by line or product family.

FAQ

What tools did this integrate with? The workflow ran in the existing QMS (for example, ETQ Reliance, MasterControl, or TrackWise). It pulled batch context and operator entries from the MES (such as SAP ME, Siemens Opcenter, or Rockwell FactoryTalk ProductionCentre) and retrieved trends and alarms from the plant historian via approved APIs or OPC UA mirrors. Corrective actions that required equipment work opened work orders in the CMMS (for example, IBM Maximo). Notifications and status updates were shared in Microsoft Teams.

How did you handle quality control and governance? The QMS enforced required fields, standardized analysis steps, and e-signature checkpoints with full audit trails. All data feeds used read-only, least-privilege service accounts. Changes to templates and workflows followed site change control. A human-in-the-loop review board met regularly to evaluate investigations, approve root causes, and confirm that corrective actions and verification plans were appropriate.

How did you roll this out without disruption? The team piloted the workflow on a subset of lines and deviation types, keeping the previous process as a controlled fallback. Evidence pulls ran in shadow mode at first to validate accuracy. No modifications were made to PLC logic, and OT boundaries were maintained using approved gateways. Training was short and role-based, focused on how to use the new steps without changing how the line runs.

How did the automatic evidence collection work? When a deviation was created and linked to a line, product, and time window, the workflow requested relevant MES records and historian tags mapped to that asset. It generated plots of key parameters and included alarm summaries and operator comments. The system normalized units, flagged bad data quality, and attached the package to the QMS record for annotation and review. For interoperability, connectors leveraged OPC UA and vendor-supported APIs.

Which root cause methods were supported? The workflow included guided 5 Whys, cause-and-effect (fishbone) diagrams, and an 8D path for complex or customer-impacting events. Prompts, examples, and required fields kept analyses consistent. Background and best practices were aligned with resources from the American Society for Quality, including this overview of root cause analysis.

You need a similar solution?

Get a FREE
Proof of Concept
& Consultation

No Cost, No Commitment!