Overview
A government agencys engineers and analysts struggled to answer routine questions because operational logs and knowledge lived in many places. Splunk contained security and application logs, Elasticsearch powered other workloads, and runbooks and incident write?ups sat in wikis and tickets. People relied on tribal knowledge to form the right queries, and escalations repeated because prior fixes were hard to find. Intelligex built an enterprise search portal that federated Splunk and Elasticsearch queries, layered permissions?aware retrieval?augmented generation (RAG) over runbooks and past incidents, and embedded citations and runbook links in answers. Analysts found relevant evidence without guesswork, escalation loops subsided, and resolution steps became consistentwhile Splunk, Elasticsearch, wikis, and the ITSM system remained as they were.
Client Profile
- Industry: Public sector agency
- Company size (range): Central IT with distributed program teams and shared services
- Stage: Splunk and Elasticsearch in use for logging and analytics; wikis and ITSM tickets for runbooks and incident history; ad hoc search and expert lookups
- Department owner: IT & Infrastructure (Platform Engineering and Operations)
- Other stakeholders: Security Operations, Application Support, Network Engineering, Helpdesk/NOC, Compliance/Records, Privacy and Legal, Internal Audit
The Challenge
Logs were scattered across platforms, and every tool spoke a different dialect. Analysts toggled between Splunk searches, Elasticsearch queries, device consoles, and ticket histories, then pasted snippets into chats to compare findings. Saved searches and dashboards existed, but they reflected how a few experts worked rather than how the broader team investigated. Runbooks covered the same issues in multiple places, with different commands and owners. The result was slow triage, repeated escalations, and inconsistent fixes.
Context was missing at the moment of need. When a service alarm fired, responders wanted similar incidents and the steps that worked last time. Instead, they started new threads, asked who knew the system, and rebuilt searches from scratch. Compliance and privacy requirements added friction: certain indices and fields were limited to specific roles, and there was no consistent way to apply those controls in a cross?tool search. Audit requests for what was searched, by whom, with which source data, were met by log fragments and screenshots.
Why It Was Happening
Root causes were fragmented telemetry, uneven search fluency, and no shared knowledge retrieval. Splunk and Elasticsearch were both systems of record, but there was no federated search or schema to unify index names, sourcetypes, or field conventions. Runbooks and incident write?ups were rarely linked to the logs and dashboards that proved a fix, so lessons learned stayed buried. Permissions were enforced within each tool, not across tools, so cross?platform searches either broke or risked overexposure. Without a governed search front door, people defaulted to personal bookmarks and memory.
Ownership and timing were misaligned. Platform teams ran logging, Application teams wrote runbooks, and the NOC managed incidents, yet there was no single place that joined logs, prior incidents, and resolution steps with role?appropriate access. As products and programs evolved, naming and index structures drifted, and search recipes went stale.
The Solution
Intelligex delivered a permissions?aware enterprise search portal that federates queries across Splunk and Elasticsearch and augments results with runbooks and incident history. The portal normalizes index and field naming through a mapping layer, executes live searches against each backend using the requesters identity, and ranks results by relevance and freshness. A RAG layer uses vector search over runbooks and tickets to suggest likely causes and step?by?step fixes, with citations back to source pages and past incidents. Answers include links to the exact dashboards and saved searches in Splunk and Elasticsearch. Integrations aligned with Splunk REST patterns and Elasticsearch APIs, and vector search leveraged capabilities available in Elasticsearch k?NN search.
- Integrations: Splunk search via REST; Elasticsearch queries via native APIs; ITSM for incident and problem tickets; wiki and document repositories for runbooks; identity provider for SSO and group claims; optional chat integration for one?click search from channels.
- Federated query layer: Mapping of common fields across Splunk and Elasticsearch; live query orchestration with per?user credentials; relevance merging and deduplication of results.
- Permissions?aware RAG: Vector index of runbooks, KB articles, and incident write?ups limited by document ACLs; retrieval and summarization constrained to what the requester can view; citations to pages and tickets included in every answer.
- Search templates and playbooks: Curated queries per platform and service with parameters for time, host, environment, and correlation IDs; links to dashboards and saved searches; one?click pivot to the native tool.
- Operational context: Enrichment with recent incidents, open changes, and service ownership; related alerts and monitors surfaced alongside results.
- Governance and audit: Full logging of queries, sources touched, documents retrieved, and links followed; reason codes for sensitive searches; exportable evidence packs for audit.
- Safety and privacy: Field?level redaction in summaries; strict pass?through of access using SSO claims; no expansion of scope beyond what the native systems allow.
Implementation
- Discovery: Cataloged Splunk indexes and saved searches, Elasticsearch indices and query patterns, and the top runbooks and incident classes; inventoried SSO groups and data access policies; reviewed common escalation paths and where searches stalled; gathered audit and records requirements.
- Design: Defined a canonical search schema and field mappings; specified how SSO claims constrain queries and document retrieval; authored search templates per service; selected citation formats and evidence capture; designed dashboards for portal usage and coverage; planned redaction and retention policies.
- Build: Implemented connectors to Splunk and Elasticsearch; built the federated orchestrator and result ranker; created the vector index of runbooks and incidents with document?level ACLs; developed the permissions?aware RAG component and answer templates with citations; integrated ITSM tickets for context; enabled SSO and audit logging.
- Testing/QA: Ran in shadow mode using recorded queries; validated that answers respected permissions and matched native tool results; tuned field mappings and ranker signals; tested redaction with privacy partners; piloted with NOC and Security on common incident types; refined search templates and runbook links.
- Rollout: Launched to the NOC and Application Support first; retained native search as the everyday tool with the portal as the front door for cross?platform questions; expanded to Security and Network Engineering as trust grew; required runbooks to include portal?ready summaries and tags for new services.
- Training/hand?off: Delivered short clinics on asking effective questions, using templates, and interpreting citations; onboarded runbook owners to tagging and updates; updated SOPs to reference the portal for initial triage; transferred ownership of templates, mappings, and RAG content filters to Platform Operations under change control.
- Human?in?the?loop review: Established a review cadence for low?confidence answers, stale runbooks, and missing templates; decisions recorded with rationale and effective dates; improvements looped back into mappings and content.
Results
Triage began with a single search instead of a chain of pings and screenshots. The portal returned live evidence from Splunk and Elasticsearch, suggested relevant past incidents and runbooks that matched the context, and linked to the exact dashboards and tickets. Analysts no longer needed to know which index or sourcetype hid an answer; they selected a template, adjusted parameters, and followed citations.
Escalations grew cleaner and less frequent. When a hand?off was needed, the receiving team got links, queries, and runbook steps rather than summaries of summaries. Answers respected permissions and redacted sensitive fields, which satisfied privacy partners and audit. Runbooks improved because owners saw which steps were used and where citations failed. The logging stack and ITSM stayed; the change was a federated, permissions?aware search and RAG layer that turned scattered knowledge into consistent, cited guidance.
What Changed for the Team
- Before: Analysts guessed index names and rebuilt searches. After: Templates, field mappings, and federated queries surfaced the right data sources.
- Before: Prior fixes were buried in wikis and tickets. After: RAG suggested relevant runbooks and incidents with citations.
- Before: Cross?tool permissions caused breaks or overexposure. After: SSO claims constrained queries and document retrieval end?to?end.
- Before: Escalations repeated without clear steps. After: Hand?offs carried links to dashboards, queries, and consistent runbook actions.
- Before: Audit trails were screenshots. After: Queries, sources, and citations were logged with evidence packs.
- Before: Runbooks drifted from reality. After: Usage insights and citations drove updates under change control.
Key Takeaways
- Federate, dont duplicate; query Splunk and Elasticsearch in place and normalize fields through a mapping layer.
- Make knowledge discoverable; permissions?aware RAG over runbooks and incidents brings relevant steps to the front.
- Bind access to identity; carry SSO claims through searches and retrieval so answers reflect need?to?know.
- Cite everything; link answers to the exact dashboards, queries, and documents used.
- Run in shadow first; validate mappings, redaction, and confidence thresholds before broad rollout.
- Govern content; version templates and runbooks and review low?confidence answers on a schedule.
FAQ
What tools did this integrate with? The portal executed live searches against Splunk via REST and Elasticsearch via native APIs. Runbooks and incident histories came from the agencys wikis and ITSM. Vector search and k?NN capabilities in Elasticsearch supported the RAG layer, and SSO enforced identity and group claims across all calls.
How did you handle quality control and governance? Field mappings, search templates, and content filters lived under change control with owners and rationale. The portal used the requesters identity for every data call, enforced document ACLs in retrieval, and redacted sensitive fields in summaries. All queries, sources touched, and citations were immutably logged with reason codes for sensitive searches, and evidence packs supported audit and records needs.
How did you roll this out without disruption? The system ran in shadow mode, replaying common searches and comparing portal results to native tools. Early users validated permissions, citations, and redaction. Rollout started with NOC and Application Support, expanded gradually, and native tools remained in use. Adoption increased as templates and runbooks were tuned to real questions.
How were permissions enforced across tools? The portal never broadened access. It passed SSO claims to Splunk and Elasticsearch and filtered document retrieval for RAG by the same claims. If a user lacked rights to a source or page, the answer excluded that content and indicated a restricted source, with a link to request access through the standard process.
How did you keep content and mappings current? A scheduled job refreshed index metadata and field mappings, and runbook and ticket corpora were re?indexed under change control. Usage analytics flagged stale templates and uncited steps. Owners received prompts to update content, and updates were reviewed before promotion.
How were citations and accuracy handled? Every answer carried links to the exact dashboards, saved searches, runbook sections, and tickets used. Low?confidence answers were flagged for human review, and analysts could rate or correct summaries. Feedback fed back into relevance tuning and template updates.
What about privacy and records requirements? The design respected native RBAC and applied redaction for sensitive fields. Retention and export followed agency records policy, with role?based access to logs and evidence. Search and retrieval actions were logged for accountability without exposing content beyond authorized roles.
Department/Function: IT & InfrastructureLegal & Compliance
Capability: Enterprise Search & Knowledge Management
Get a FREE
Proof of Concept
& Consultation
No Cost, No Commitment!


