Insurance Client UC1 POC Specification

Executive Summary

Problem: Insurance Client struggles to reliably extract structured fields from unstructured policy documents (PDS, schedules, endorsements). Poor extraction cascades downstream, causing delays, rework, and critical errors.

Solution: Deploy an AI-powered document extraction platform using Kyndryl's Agent Builder to automatically extract key policy information, embed quality checks, and deliver trusted structured data to downstream systems.

            Business Outcomes:

            ✓ 95% Extraction Accuracy

            ✓ 60% Processing Speed Improvement

            ✓ 80% Error Reduction

Phase 4: Prototype (5 Days)

Rapid Validation

Interactive Demo

Browser-based prototype demonstrating:

Policy document upload workflow
Automated extraction of key fields
Real-time quality scoring
Dual-persona UI (Claims Adjuster + Underwriter)
Side-by-side document vs. extracted data comparison
Human review and correction workflow

Key Features

Document Upload: Drag-and-drop for PDF/image files with 50+ samples
LLM Extraction: Azure AI services or alternative LLM with confidence scoring
Quality Dashboard: Field-level confidence, anomaly detection, standardization validation
Persona Views: Adjuster focus (coverage) vs. Underwriter focus (risk assessment)
Side-by-Side UI: Original document (left) + extracted fields (right) with highlighting

Success Criteria (Phase 4)

UI prototype fully functional and clickable
LLM extraction working on demo documents
Quality scoring visible and actionable
Dual-persona experience validated
Stakeholder sign-off on workflow and UX

Phase 5: POC (30 Days)

Production Validation

Data: 50+ real Insurance Client policy documents (PDS, schedules, endorsements) + synthetic variations
Integration: Azure Document Intelligence + LLM extraction + downstream system APIs

Architecture: 10-Step Processing Pipeline

Step 1: UPLOAD HANDLER └─ Receives uploaded policy documents (PDF, JPG, PNG) Step 2: DOCUMENT IDENTIFIER └─ Detects document type & extracts metadata Step 3: OCR / LAYOUT EXTRACTION ⭐ [NEW] └─ Azure Document Intelligence / Claude Vision └─ Extracts text + spatial layout Step 4: CLAUSE / SECTION PARSER ⭐ [NEW] └─ Identifies policy sections & hierarchical structure └─ Extracts condition boundaries Step 5: POLICY EXTRACTOR └─ LLM-powered field extraction (policy#, date, coverage, exclusions) └─ Confidence scoring per field Step 6: QUALITY CHECKER └─ Validates fields & detects anomalies └─ Missing field detection & consistency checks Step 7: DATA NORMALIZER └─ Standardizes formats (dates, amounts, text casing) └─ Converts to canonical forms Step 8: CLAUSE INDEXER └─ Indexes normalized clauses and sections for rapid retrieval └─ Creates structured clause metadata (section#, type, hierarchy) └─ Enables fast lookup and cross-reference capabilities Step 9: EMBEDDING / VECTORIZATION └─ Converts normalized structured data into vector embeddings └─ Uses embedding models (OpenAI, Azure OpenAI embeddings) └─ Enables semantic search, similarity matching, ML workflows Step 10: VECTOR STORE └─ Stores embeddings in Vector Database (Azure Cognitive Search, Pinecone, etc.) └─ Indexes vectors for semantic retrieval and similarity search └─ Links vectors to source policy records for traceability Step 11: RULES TRANSFORMER └─ Converts extracted clauses → machine-executable rules └─ PolicyAsCode format with vector-backed context Step 12: POLICY STORE └─ PostgreSQL: Structured data + clause metadata └─ MongoDB: Audit trail & full document metadata └─ Redis: Cache for fast lookups Step 13: RULE EVALUATOR / DECISION ENGINE └─ Claims eligibility checking └─ Underwriting decisioning └─ Benefit calculation

Data Flow Summary

Step	Component	Input	Output	Purpose
1	Upload Handler	PDF/Image	Stored document	Ingestion & validation
2	Document ID	Stored file	Type + metadata	Route to pipeline
3-4	OCR & Parser	Image	Sections + boundaries	Text extraction & structure
5-6	Extractor & QC	Structured sections	Extracted fields + quality score	LLM extraction & validation
7-10	Normalizer → Vector Store	Validated fields	Indexed vectors + clause metadata	Standardization, indexing, vectorization, storage
11-13	Rules Transformer → Decision	Vectors + clauses	Decision + audit log	Rule generation and business logic execution

Key Capabilities

Batch & Real-Time: Handle 50+ documents in batch or real-time
Multi-Document: PDS, schedules, endorsements, amendments
Quality Engine: Field confidence scoring, anomaly detection, escalation workflow
Clause Indexing: Fast retrieval and cross-reference of policy clauses and sections
Vectorization + Semantic Search: Post-extraction vectorization for semantic search, similarity matching, and ML integration
Vector Store Integration: Embeddings stored for rapid semantic retrieval across policy corpus
Integration: Claims Management, Underwriting Platform, Data Warehouse
Human-in-Loop: Manual review for low-confidence extractions with feedback loop

Success Criteria (Phase 5)

95% Extraction Accuracy on 50+ test documents
60% Processing Speed Improvement (time per document)
80% Error Reduction in downstream data errors
99% Uptime in extraction pipeline
2+ System Integration (claims + underwriting)
Full audit trail and compliance validation

Technology Stack

Infrastructure

Document Processing

OCR/Reading: Azure Document Intelligence or Claude Vision
LLM: Claude 3.5 Sonnet, GPT-4, or Azure OpenAI (flexible)
Output Format: JSON with metadata, confidence scores, source tracking

Agent Architecture

Framework: Kyndryl Agent Builder (LangChain, CrewAI, or Semantic Kernel)
Protocol: JSON-RPC 2.0 (A2A Open Protocol for agent communication)
Orchestration: Multi-agent workflow with quality validation gates

Infrastructure

Cloud: Azure (client preference)
Services: Document Intelligence, OpenAI API, Storage, Functions, Cognitive Search
Data Stores: PostgreSQL (structured + clause metadata), MongoDB (audit trail), Redis (cache), Azure Blob (documents)
Vector Infrastructure: Azure Cognitive Search (vector store), Embedding models (OpenAI/Azure OpenAI)

Resources & Timeline

Project Delivery

Kyndryl Team

Role	Duration	FTE
Engagement Lead	10 weeks	0.5
AI/Extraction Architect	10 weeks	0.8
LLM/Integration Engineer	10 weeks	1.0
Azure/DevOps	10 weeks	0.5
QA/Testing	10 weeks	0.4

Timeline

Weeks 1-2: Phase 4 Prototype (5D build + 2D refinement)
Weeks 3-4: Phase 4 Stakeholder Review & Feedback
Weeks 5-8: Phase 5 POC Build & Integration
Weeks 9-10: Phase 5 Testing, Validation, Documentation
Week 11: Project Wrap-up, Business Case, Handover

Risk Mitigation

Risk	Likelihood	Impact	Mitigation
Document quality / OCR accuracy	Medium	High	Azure DI + LLM validation; test diverse samples
LLM hallucinations	Medium	High	Quality gates; human-in-loop for low confidence
Policy domain complexity	Medium	High	Close collaboration with underwriting/claims teams
Azure integration delays	Low	Medium	Pre-stage Azure resources; test environment setup
Data privacy/confidentiality	Medium	High	Anonymize samples; Privacy Act compliance; governance approval

KAF Non-Functional Requirements

Enterprise AI Governance

Kyndryl Agentic Framework (KAF) provides the enterprise control plane for Agentic AI, addressing trust, safety, cost, and compliance gaps that Kubernetes does not natively solve. The Insurance Client UC1 extraction pipeline leverages KAF capabilities:

KAF Dimension	Requirement for AIA UC1	How It Enables Success
Agent Identity & Trust	Secure agent-to-agent communication (A2A)	Each extraction agent (Indexer, Vectorizer, Transformer, etc.) has verified identity, certificates, mutual trust
Agent Lifecycle	Register, version, retire agents dynamically	Hot-swap agents (e.g., upgrade LLM model) without stopping the 13-step pipeline
A2A Protocol & Routing	JSON-RPC 2.0 standardized messaging	Extraction steps communicate reliably; guarantees message delivery, routing
Tool Governance (MCP)	Centralized tool catalog with permissions	Agents discover & execute tools (Azure DI, LLM, Vector DB) with audit trails
LLM Management	Model routing, versioning, A/B testing	Switch between Claude, GPT-4, Azure OpenAI; test model changes before production
Token Economics	Cost attribution & budgets per agent	Track LLM tokens per extraction step; alert when over budget; optimize cost
Prompt Security	Injection & jailbreak protection	Prevent prompt manipulation attacks; secure field extraction from documents
Output Guardrails	Hallucination & PII checks	Validation gates on extracted data; detect anomalies; mask sensitive info
Memory & State	Short & long-term context	Agents retain cross-document context for consistency; session state for auditing
RAG Infrastructure	Governed embeddings & retrieval	Vector store (Step 10) with access controls, refresh policies, audit trail
Human-in-the-Loop	Approval workflows for flagged extractions	Low-confidence fields (< 85%) escalate to reviewers; feedback improves models
Explainability & Audit	Decision traces & audit logs	Full trace of extraction decisions (why field extracted, which LLM, which version, confidence score)
Agent Observability	Per-agent metrics & cost tracking	Monitor latency, accuracy, cost per step; identify bottlenecks; optimize pipeline
Multi-Tenancy	Tenant-isolated agents & policies	Insurance Client data isolated; separate policies for claims vs. underwriting agents
Graceful AI Degradation	Fallback when models fail	If LLM unavailable, escalate field to human; don't fail entire pipeline
Responsible AI	Bias & fairness enforcement	Ensure extraction unbiased across customer demographics; policy compliance
Testing & Simulation	Synthetic data for regression testing	Test pipeline with synthetic 500K documents before production; validate accuracy targets

KAF Governance Summary: Every extraction agent in the 13-step pipeline is governed by KAF controls — identity verification, message routing guarantees, cost tracking, tool permissions, output validation, human escalation, audit trails, and explainability. This enables Insurance Client to trust, control, explain, and safely run extraction at enterprise scale.

View Related Documents

← Return to Table of Contents for Executive Summary, Architecture Diagram, and Sprint Plan.