Executive Summary
Problem: Insurance Client struggles to reliably extract structured fields from unstructured policy documents (PDS, schedules, endorsements). Poor extraction cascades downstream, causing delays, rework, and critical errors.
Solution: Deploy an AI-powered document extraction platform using Kyndryl's Agent Builder to automatically extract key policy information, embed quality checks, and deliver trusted structured data to downstream systems.
✓ 95% Extraction Accuracy
✓ 60% Processing Speed Improvement
✓ 80% Error Reduction
Phase 4: Prototype (5 Days)
Interactive Demo
Browser-based prototype demonstrating:
- Policy document upload workflow
- Automated extraction of key fields
- Real-time quality scoring
- Dual-persona UI (Claims Adjuster + Underwriter)
- Side-by-side document vs. extracted data comparison
- Human review and correction workflow
Key Features
- Document Upload: Drag-and-drop for PDF/image files with 50+ samples
- LLM Extraction: Azure AI services or alternative LLM with confidence scoring
- Quality Dashboard: Field-level confidence, anomaly detection, standardization validation
- Persona Views: Adjuster focus (coverage) vs. Underwriter focus (risk assessment)
- Side-by-Side UI: Original document (left) + extracted fields (right) with highlighting
Success Criteria (Phase 4)
- UI prototype fully functional and clickable
- LLM extraction working on demo documents
- Quality scoring visible and actionable
- Dual-persona experience validated
- Stakeholder sign-off on workflow and UX
Phase 5: POC (30 Days)
Data: 50+ real Insurance Client policy documents (PDS, schedules, endorsements) + synthetic variations
Integration: Azure Document Intelligence + LLM extraction + downstream system APIs
Architecture: 10-Step Processing Pipeline
Data Flow Summary
| Step | Component | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | Upload Handler | PDF/Image | Stored document | Ingestion & validation |
| 2 | Document ID | Stored file | Type + metadata | Route to pipeline |
| 3-4 | OCR & Parser | Image | Sections + boundaries | Text extraction & structure |
| 5-6 | Extractor & QC | Structured sections | Extracted fields + quality score | LLM extraction & validation |
| 7-10 | Normalizer → Vector Store | Validated fields | Indexed vectors + clause metadata | Standardization, indexing, vectorization, storage |
| 11-13 | Rules Transformer → Decision | Vectors + clauses | Decision + audit log | Rule generation and business logic execution |
Key Capabilities
- Batch & Real-Time: Handle 50+ documents in batch or real-time
- Multi-Document: PDS, schedules, endorsements, amendments
- Quality Engine: Field confidence scoring, anomaly detection, escalation workflow
- Clause Indexing: Fast retrieval and cross-reference of policy clauses and sections
- Vectorization + Semantic Search: Post-extraction vectorization for semantic search, similarity matching, and ML integration
- Vector Store Integration: Embeddings stored for rapid semantic retrieval across policy corpus
- Integration: Claims Management, Underwriting Platform, Data Warehouse
- Human-in-Loop: Manual review for low-confidence extractions with feedback loop
Success Criteria (Phase 5)
- 95% Extraction Accuracy on 50+ test documents
- 60% Processing Speed Improvement (time per document)
- 80% Error Reduction in downstream data errors
- 99% Uptime in extraction pipeline
- 2+ System Integration (claims + underwriting)
- Full audit trail and compliance validation
Technology Stack
Document Processing
- OCR/Reading: Azure Document Intelligence or Claude Vision
- LLM: Claude 3.5 Sonnet, GPT-4, or Azure OpenAI (flexible)
- Output Format: JSON with metadata, confidence scores, source tracking
Agent Architecture
- Framework: Kyndryl Agent Builder (LangChain, CrewAI, or Semantic Kernel)
- Protocol: JSON-RPC 2.0 (A2A Open Protocol for agent communication)
- Orchestration: Multi-agent workflow with quality validation gates
Infrastructure
- Cloud: Azure (client preference)
- Services: Document Intelligence, OpenAI API, Storage, Functions, Cognitive Search
- Data Stores: PostgreSQL (structured + clause metadata), MongoDB (audit trail), Redis (cache), Azure Blob (documents)
- Vector Infrastructure: Azure Cognitive Search (vector store), Embedding models (OpenAI/Azure OpenAI)
Resources & Timeline
Kyndryl Team
| Role | Duration | FTE |
|---|---|---|
| Engagement Lead | 10 weeks | 0.5 |
| AI/Extraction Architect | 10 weeks | 0.8 |
| LLM/Integration Engineer | 10 weeks | 1.0 |
| Azure/DevOps | 10 weeks | 0.5 |
| QA/Testing | 10 weeks | 0.4 |
Timeline
- Weeks 1-2: Phase 4 Prototype (5D build + 2D refinement)
- Weeks 3-4: Phase 4 Stakeholder Review & Feedback
- Weeks 5-8: Phase 5 POC Build & Integration
- Weeks 9-10: Phase 5 Testing, Validation, Documentation
- Week 11: Project Wrap-up, Business Case, Handover
Risk Mitigation
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Document quality / OCR accuracy | Medium | High | Azure DI + LLM validation; test diverse samples |
| LLM hallucinations | Medium | High | Quality gates; human-in-loop for low confidence |
| Policy domain complexity | Medium | High | Close collaboration with underwriting/claims teams |
| Azure integration delays | Low | Medium | Pre-stage Azure resources; test environment setup |
| Data privacy/confidentiality | Medium | High | Anonymize samples; Privacy Act compliance; governance approval |
KAF Non-Functional Requirements
Kyndryl Agentic Framework (KAF) provides the enterprise control plane for Agentic AI, addressing trust, safety, cost, and compliance gaps that Kubernetes does not natively solve. The Insurance Client UC1 extraction pipeline leverages KAF capabilities:
| KAF Dimension | Requirement for AIA UC1 | How It Enables Success |
|---|---|---|
| Agent Identity & Trust | Secure agent-to-agent communication (A2A) | Each extraction agent (Indexer, Vectorizer, Transformer, etc.) has verified identity, certificates, mutual trust |
| Agent Lifecycle | Register, version, retire agents dynamically | Hot-swap agents (e.g., upgrade LLM model) without stopping the 13-step pipeline |
| A2A Protocol & Routing | JSON-RPC 2.0 standardized messaging | Extraction steps communicate reliably; guarantees message delivery, routing |
| Tool Governance (MCP) | Centralized tool catalog with permissions | Agents discover & execute tools (Azure DI, LLM, Vector DB) with audit trails |
| LLM Management | Model routing, versioning, A/B testing | Switch between Claude, GPT-4, Azure OpenAI; test model changes before production |
| Token Economics | Cost attribution & budgets per agent | Track LLM tokens per extraction step; alert when over budget; optimize cost |
| Prompt Security | Injection & jailbreak protection | Prevent prompt manipulation attacks; secure field extraction from documents |
| Output Guardrails | Hallucination & PII checks | Validation gates on extracted data; detect anomalies; mask sensitive info |
| Memory & State | Short & long-term context | Agents retain cross-document context for consistency; session state for auditing |
| RAG Infrastructure | Governed embeddings & retrieval | Vector store (Step 10) with access controls, refresh policies, audit trail |
| Human-in-the-Loop | Approval workflows for flagged extractions | Low-confidence fields (< 85%) escalate to reviewers; feedback improves models |
| Explainability & Audit | Decision traces & audit logs | Full trace of extraction decisions (why field extracted, which LLM, which version, confidence score) |
| Agent Observability | Per-agent metrics & cost tracking | Monitor latency, accuracy, cost per step; identify bottlenecks; optimize pipeline |
| Multi-Tenancy | Tenant-isolated agents & policies | Insurance Client data isolated; separate policies for claims vs. underwriting agents |
| Graceful AI Degradation | Fallback when models fail | If LLM unavailable, escalate field to human; don't fail entire pipeline |
| Responsible AI | Bias & fairness enforcement | Ensure extraction unbiased across customer demographics; policy compliance |
| Testing & Simulation | Synthetic data for regression testing | Test pipeline with synthetic 500K documents before production; validate accuracy targets |
KAF Governance Summary: Every extraction agent in the 13-step pipeline is governed by KAF controls — identity verification, message routing guarantees, cost tracking, tool permissions, output validation, human escalation, audit trails, and explainability. This enables Insurance Client to trust, control, explain, and safely run extraction at enterprise scale.
View Related Documents
← Return to Table of Contents for Executive Summary, Architecture Diagram, and Sprint Plan.