Intelligex | Inconsistent Metric Answers And Leaks To Guarded Copilot

Note: This example reflects the types of solutions Intelligex can deliver. Actual engagements are tailored to each client’s goals, systems, constraints, timeline, and resources, so implementation details and outcomes may vary.

Overview

A retail chain launched an AI KPI copilot for the CEO, but early use produced inconsistent answers and exposed sensitive customer segments. The copilot pulled from mixed sources with uneven definitions and no guardrails, so outputs varied by prompt and some responses surfaced personally identifiable information (PII). Intelligex rebuilt the service with a retrieval layer constrained to certified data marts and endorsed BI models, added PII redaction and row-level security enforcement, and introduced an approval queue for prompts that required new sources or fields. Executives received quick, trustworthy answers grounded in certified metrics, with safer access and fewer errors.

Client Profile

Industry: Retail (multi-banner, omnichannel)
Company size (range): Enterprise with corporate, regional, and store operations
Stage: Scaling AI-assisted decision support for the C-suite
Department owner: Strategy, Analytics & Executive Leadership (Corporate Strategy / Enterprise Analytics)
Other stakeholders: Finance/FP&A, Merchandising, Digital/CRM, Store Operations, Data Governance, Information Security, Legal & Privacy, IT/Cloud Platforms

The Challenge

The initial KPI copilot answered natural-language questions against a patchwork of tables and wikis. Some prompts hit draft datasets, others bypassed row-level security, and the model sometimes stitched answers from outdated definitions. An innocent question about customer behavior returned segment labels that were not approved for executive distribution. Meanwhile, the same KPI could return different values depending on phrasing because the copilot lacked a stable semantic or certified data layer.

Security and trust suffered. Leaders encountered caveats and walk-backs after answers were checked against Finances books, and Information Security paused rollout when PII appeared in test transcripts. The business did not want to abandon the copilot; it needed to harden it: limit retrieval to certified marts and endorsed BI datasets, redact sensitive text by default, and add a governed path to expand scope when new data was truly needed.

Why It Was Happening

Inputs and definitions were fragmented. The copilot had access to exploratory schemas and knowledge base pages that used different metric names and refresh windows. There was no central registry for certified datasets or a way for the model to inherit row/column security from BI semantic models. Prompt routing had no policy awareness, so a cleverly worded question could pull fields that should have been masked or excluded.

Governance arrived after the fact. There was no approval step to add a new subject area to the copilots scope, no audit trail that tied a response to dataset versions, and no PII screening before answers were returned. The model optimized for recall, not for certified truth or privacy, which led to inconsistent outputs and unsafe content.

The Solution

We rebuilt the copilot around a governed retrieval and execution layer. Retrieval was limited to certified marts in Snowflake and endorsed BI semantic models; the RAG service used only whitelisted documentation and metric definitions. Row-level and column-level security followed the user via tokens, PII was redacted in transit, and every answer included citations to certified datasets with last-refresh stamps. A gated workflow handled scope changes: when a prompt required new sources or fields, it entered an approval queue for Data Governance and Legal. Nothing was replatformed: Snowflake remained the warehouse, dbt the transformation layer, Power BI the semantic and visualization layer, and identity continued through Microsoft Entra ID. The new layer enforced which data the copilot could see, how it could query, and what it could say.

Certified marts and endorsed models as the only retrieval sources, with dataset endorsement visible in BI (Power BI) and transformations encoded in dbt
Warehouse access constrained to governed schemas in Snowflake with role-based policies and masking rules
Retrieval-augmented generation using a use your data pattern scoped to certified content (Azure OpenAI: Use your data)
Vector index built only from approved metric definitions, glossary entries, and executive-ready narratives using Azure Cognitive Search
PII detection and redaction in responses using Microsoft Presidio, with human-in-the-loop review for flagged content
Row- and column-level security passed through from BI semantic models; results trimmed by users entitlements (Microsoft Entra groups)
Answer cards that cite dataset names, last-refresh, and definition links; inline warnings when a definition is in transition
Approval queue for scope expansion: prompts that require non-certified sources route to Data Governance and Legal with reason codes (using Power Automate Approvals)
Audit logging for prompts, retrieved sources, executed queries, redaction events, and approvers
Change control for adding datasets to the certified scope, including dbt tests and steward sign-off

Implementation

Discovery: Cataloged all datasets the copilot touched, mapped which were certified vs. exploratory, and reviewed lineage and refresh windows. Collected examples of inconsistent answers and unsafe content. Aligned with Data Governance, Security, and Legal on the certification bar, PII policies, and approval roles.
Design: Defined the certified retrieval scope and the semantic model pass-through pattern for row/column security. Authored the answer card template with citations and refresh stamps. Specified PII detection thresholds and redaction behaviors. Designed the approval queue for adding sources, with reason codes and steward sign-off. Documented audit fields for prompts, sources, and outcomes.
Build: Restricted warehouse roles to certified schemas in Snowflake; encoded definitions and tests in dbt; indexed only approved definitions and narratives in Azure Cognitive Search; configured Azure OpenAI use your data with source whitelisting; implemented Presidio redaction; built the approval flow in Power Automate; and integrated Entra ID groups for entitlement trimming.
Testing and QA: Ran a library of prompts to compare outputs before/after guardrails; validated that only certified datasets were cited; confirmed row/column security trimming per role; stress-tested PII detection with edge cases; and dry-ran the approval queue with mock scope expansion requests.
Rollout: Deployed the governed copilot in parallel with the prior service for a limited audience. After validating answer consistency, citations, and redaction, retired the legacy endpoint and enforced the approval queue for new sources. Maintained a manual exception lane for urgent executive asks with post-review documentation.
Training and hand-off: Delivered quick guides for executives on asking effective questions and reading citations, for stewards on certifying datasets and maintaining definitions, and for Governance/Legal on approval workflows. Established a cadence to review prompt logs, false-positive redactions, and scope changes.

Results

Executives received concise answers that matched certified dashboards and Finances views, with clear citations to governed datasets and definition pages. Sensitive terms were masked automatically, and follow-up questions respected the same security scopes users saw in BI. When a question required data outside the certified scope, the system explained why and routed a request with context rather than guessing.

Trust and safety improved. Answer variance fell as retrieval anchored on endorsed models, and PII exposures were prevented at the pipeline level. Governance became part of the flow: new data needs were approved with rationale, dataset changes appeared with visible definition links, and audit logs captured what the copilot saw and why it answered the way it did. The CEOs team kept the speed of a chat interface without sacrificing provenance or control.

What Changed for the Team

Before: The copilot pulled from mixed sources with drifting definitions. After: Retrieval was limited to certified marts and endorsed models with tested definitions.
Before: Answers sometimes exposed sensitive segments. After: PII detection and redaction ran by default, with row/column security enforced from the semantic model.
Before: The same KPI returned different values by phrasing. After: Answers cited the certified dataset and definition, producing consistent results.
Before: New data appeared in answers without review. After: Scope expansion requests entered an approval queue with steward and Legal sign-off.
Before: No clear record of how answers were constructed. After: Audit logs tied prompts to sources, queries, redactions, and approvers.

Key Takeaways

Constrain retrieval to certified data and endorsed semantic models; speed is valuable only when answers are authoritative.
Carry row/column security and PII redaction into the copilot layer; privacy must be enforced before text is generated.
Require citations and refresh stamps in every answer; provenance builds confidence and speeds follow-up.
Introduce an approval queue for scope changes; expanding what the copilot can see should be deliberate and auditable.
Keep the stack; add governance, definitions in code, and safe retrieval rather than rebuilding platforms.

FAQ

What tools did this integrate with?
The copilot retrieved from certified marts in Snowflake and endorsed semantic models surfaced in Power BI, with definitions and tests encoded in dbt. Retrieval and generation used Azure OpenAI use your data and Azure Cognitive Search. PII redaction used Microsoft Presidio, approvals ran in Power Automate, and access was governed by Microsoft Entra groups.

How did you handle quality control and governance?
We whitelisted certified schemas and endorsed BI models as the only retrieval sources, encoded metric definitions and tests in dbt, and enforced row/column security via the semantic model. Every answer carried citations and refresh stamps. PII detection and redaction ran in the response pipeline, and prompts that required new data entered an approval queue for Data Governance and Legal, with audit logs binding prompts to sources and decisions.

How did you roll this out without disruption?
The governed copilot ran alongside the legacy interface for a period while executives compared answers and citations. After validation, the legacy endpoint was retired and the approval queue became the standard path for scope changes. Existing warehouse, BI, and identity tools remained in place; the new layer added retrieval controls, redaction, and approvals.

How did you ensure the copilot respected security scopes?
The service generated answers by querying the semantic model with the users identity token, inheriting row and column security. Retrieval from documentation and definitions used the same role-based filters. If a user lacked access to a field or slice, the copilot trimmed the result and cited the applicable scope.

What happens when executives need a new source or field?
The copilot returns an explanation that the request falls outside the certified scope and creates a pre-filled approval request with the prompt, intended use, and proposed source. Data Governance and Legal review the request, confirm definitions and controls, and either approve the addition with steward sign-off and tests, or suggest an alternative within the current scope.

Department/Function: Analytics & Executive Leadership IT & Infrastructure Legal & Compliance Strategy

Capability: AI Agents AI Security Copilots & Intelligent Automation Privacy & Governance

Get a FREE
Proof of Concept
& Consultation

No Cost, No Commitment!

Copilot exposed PII and inconsistent KPIs — Snowflake & Power BI enforce certified sources