Insurance Client UC1 — System Architecture

Policy Data Extraction & Quality Improvement — 10-Step Processing Pipeline

Complete 13-Step Processing Pipeline (NEW: Steps 3-4 OCR/Parser, Steps 8-10 Vectorization)

13-Step Policy Extraction & Vectorization Pipeline 📥 INPUT LAYER Policy Documents (PDF, JPG, PNG) - 50+ samples STAGE 1: RECEPTION & ROUTING 1 Upload Handler File validation 2 Document ID Type detection STAGE 2: OCR & TEXT EXTRACTION ⭐ [NEW STEPS] 3 OCR/Layout Ext. Text + structure 4 Clause Parser Section boundaries STAGE 3: EXTRACTION & PROCESSING 5 Policy Extractor (LLM-based) 6 Quality Checker Validation 7 Data Normalizer Format std. 8 Clause Indexer Fast retrieval STAGE 4: VECTORIZATION & INDEXING 9 Embedding/Vector OpenAI API 10 Vector Store Cog. Search 11 Rules Transform PolicyAsCode STAGE 5: STORAGE & DECISION 12 Policy Store DB / Cache 13 Rule Evaluator Decisioning 📤 DOWNSTREAM INTEGRATIONS 📤 DOWNSTREAM INTEGRATIONS Claims Management System Underwriting Platform Data Warehouse Analytics Reporting BI ⭐ 13-STEP PIPELINE SUMMARY STEPS 3-4 (OCR & PARSING): OCR/Layout Extraction + Clause Parser for document structure analysis STEPS 8-10 (VECTORIZATION): Clause Indexer → Embedding/Vectorization (OpenAI) → Vector Store (Azure Cognitive Search) for semantic search STEPS 11-13 (RULES & DECISION): Rules Transformer → Policy Store (PostgreSQL/MongoDB) → Rule Evaluator for business logic execution Target: 95% accuracy | 60% speed improvement | 80% error reduction | Human-in-loop for <85% confidence fields

✓ Complete 13-step architecture showing extraction pipeline (Steps 1-7), vectorization & indexing (Steps 8-10), rules transformation (Step 11), storage (Step 12), and decision engine (Step 13)

📥 Input Layer
Input Policy Documents
Unstructured policy documents (PDFs, images, scanned documents)
PDS (Product Disclosure Statements)
Policy Schedules
Endorsements & Amendments
🔄 Stage 1: Document Reception & Routing
1 Upload Handler
Receives and validates uploaded files
File format validation (PDF, JPG, PNG)
Virus/malware scanning
Temporary storage in buffer
2 Document Identifier
Classifies document type and extracts metadata
PDS vs. Schedule vs. Endorsement
Extract issue date, policy#
Route to appropriate pipeline
🖼️ Stage 2: OCR & Text Extraction (NEW)
3 OCR / Layout Extraction New
Cloud-based OCR with spatial layout awareness
Azure Document Intelligence / Claude Vision
Extract text + spatial positioning
Handle skewed/poor quality images
Output: Normalized text + layout data
4 Clause / Section Parser New
Hierarchical document structure parsing
Identify sections (Coverage, Exclusions, etc.)
Parse clauses & sub-clauses
Extract condition boundaries
Tag regulatory references
🤖 Stage 3: AI-Powered Field Extraction
5 Policy Extractor (LLM)
Claude / GPT-4 powered field extraction
Extract: Policy#, date, coverage, benefits
Insured details: name, DOB, occupation
Exclusions & special conditions
Generate confidence score per field
6 Quality Checker
Validation & anomaly detection
Schema validation (required fields present)
Anomaly detection (dates, values out of range)
Cross-reference consistency checks
Output: Quality score 0-100%
🔢 Stage 4: Vectorization & Indexing (NEW)
8 Clause Indexer New
Index normalized clauses for fast retrieval
Create clause index with metadata (type, coverage area)
Map original text to extracted clauses
Build clause relationships and cross-references
Enable fast semantic lookup
9 Embedding / Vectorization New
Convert clause text to semantic vectors
OpenAI Embeddings API (1536-dimension vectors)
Batch processing for 50+ documents
Semantic similarity computation
Token economics tracking for cost
10 Vector Store New
Store vectors in searchable index
Azure Cognitive Search or Pinecone
Semantic search for policy clause retrieval
Real-time query performance optimization
Multi-tenant vector isolation
⚙️ Stage 5: Rules Transformation, Storage & Decision
11 Rules Transformer
Convert policies to machine-executable rules
Generate PolicyAsCode rules from extracted data
Build decision trees for claims/underwriting
Version control & conflict detection
Integrate vector embeddings for semantic matching
12 Policy Store
Persist rules and metadata across systems
PostgreSQL: Structured policies & rules
MongoDB: Flexible document metadata
Redis: Cache hot policies & vectors
Blob: Full document archive & audit trail
13 Rule Evaluator / Decision Engine
Execute rules and generate decisions
Real-time policy rule evaluation
A2A protocol messages to Claims, Underwriting, DW
Audit logging & traceability
Performance monitoring (95%, 60%, 80% KPIs)
📤 Output Layer: Downstream System Integration
Claims Management System
API integration for claim processing
Underwriting Platform
API for coverage & risk assessment
Policy Admin System
Master data synchronization
Data Warehouse
Batch analytics & BI dashboards
Architecture Key
Base Steps (1-2, 5-10)
New Steps (3-4): OCR & Clause Parser
Human-in-the-Loop for low-confidence extractions
📊 Data Flow & Step Details
Step Component Input Output Technology
1 Upload Handler PDF/Image file Validated file in buffer Python Flask
2 Document Identifier Buffered file Doc type + metadata LLM Classification
3 OCR / Layout Extraction ⭐ Image/scanned PDF Text + spatial layout Azure Document Intelligence
4 Clause / Section Parser ⭐ Normalized text Structured sections Claude / LLM
5 Policy Extractor (LLM) Structured sections Extracted fields + confidence Claude 3.5 / GPT-4
6 Quality Checker Extracted fields Quality score + anomalies Validation rules engine
7 Data Normalizer Validated fields Normalized canonical values Python transformers
8 Clause Indexer ⭐ Normalized clauses Indexed clause metadata Search index + metadata store
9 Embedding / Vectorization ⭐ Clause text + metadata Dense vectors (1536-dim) OpenAI Embeddings API
10 Vector Store ⭐ Vectors + metadata Searchable vector index Azure Cognitive Search
11 Rules Transformer Normalized data + vectors PolicyAsCode rules Rule DSL + LLM
12 Policy Store Rules + metadata + vectors Stored in multi-DB PostgreSQL, MongoDB, Redis, Blob
13 Rule Evaluator / Decision Engine Stored rules + request + vectors Decision + audit log Decision engine + observability