Insurance Client UC1 — System Architecture
Policy Data Extraction & Quality Improvement — 10-Step Processing Pipeline
Complete 13-Step Processing Pipeline (NEW: Steps 3-4 OCR/Parser, Steps 8-10 Vectorization)
✓ Complete 13-step architecture showing extraction pipeline (Steps 1-7), vectorization & indexing (Steps 8-10), rules transformation (Step 11), storage (Step 12), and decision engine (Step 13)
📥 Input Layer
Input
Policy Documents
Unstructured policy documents (PDFs, images, scanned documents)
PDS (Product Disclosure Statements)
Policy Schedules
Endorsements & Amendments
🔄 Stage 1: Document Reception & Routing
1
Upload Handler
Receives and validates uploaded files
File format validation (PDF, JPG, PNG)
Virus/malware scanning
Temporary storage in buffer
2
Document Identifier
Classifies document type and extracts metadata
PDS vs. Schedule vs. Endorsement
Extract issue date, policy#
Route to appropriate pipeline
🖼️ Stage 2: OCR & Text Extraction (NEW)
3
OCR / Layout Extraction
New
Cloud-based OCR with spatial layout awareness
Azure Document Intelligence / Claude Vision
Extract text + spatial positioning
Handle skewed/poor quality images
Output: Normalized text + layout data
4
Clause / Section Parser
New
Hierarchical document structure parsing
Identify sections (Coverage, Exclusions, etc.)
Parse clauses & sub-clauses
Extract condition boundaries
Tag regulatory references
🤖 Stage 3: AI-Powered Field Extraction
5
Policy Extractor (LLM)
Claude / GPT-4 powered field extraction
Extract: Policy#, date, coverage, benefits
Insured details: name, DOB, occupation
Exclusions & special conditions
Generate confidence score per field
6
Quality Checker
Validation & anomaly detection
Schema validation (required fields present)
Anomaly detection (dates, values out of range)
Cross-reference consistency checks
Output: Quality score 0-100%
🔢 Stage 4: Vectorization & Indexing (NEW)
8
Clause Indexer
New
Index normalized clauses for fast retrieval
Create clause index with metadata (type, coverage area)
Map original text to extracted clauses
Build clause relationships and cross-references
Enable fast semantic lookup
9
Embedding / Vectorization
New
Convert clause text to semantic vectors
OpenAI Embeddings API (1536-dimension vectors)
Batch processing for 50+ documents
Semantic similarity computation
Token economics tracking for cost
10
Vector Store
New
Store vectors in searchable index
Azure Cognitive Search or Pinecone
Semantic search for policy clause retrieval
Real-time query performance optimization
Multi-tenant vector isolation
⚙️ Stage 5: Rules Transformation, Storage & Decision
11
Rules Transformer
Convert policies to machine-executable rules
Generate PolicyAsCode rules from extracted data
Build decision trees for claims/underwriting
Version control & conflict detection
Integrate vector embeddings for semantic matching
12
Policy Store
Persist rules and metadata across systems
PostgreSQL: Structured policies & rules
MongoDB: Flexible document metadata
Redis: Cache hot policies & vectors
Blob: Full document archive & audit trail
13
Rule Evaluator / Decision Engine
Execute rules and generate decisions
Real-time policy rule evaluation
A2A protocol messages to Claims, Underwriting, DW
Audit logging & traceability
Performance monitoring (95%, 60%, 80% KPIs)
📤 Output Layer: Downstream System Integration
Claims Management System
API integration for claim processing
Underwriting Platform
API for coverage & risk assessment
Policy Admin System
Master data synchronization
Data Warehouse
Batch analytics & BI dashboards
Architecture Key
Base Steps (1-2, 5-10)
New Steps (3-4): OCR & Clause Parser
Human-in-the-Loop for low-confidence extractions
📊 Data Flow & Step Details
| Step |
Component |
Input |
Output |
Technology |
| 1 |
Upload Handler |
PDF/Image file |
Validated file in buffer |
Python Flask |
| 2 |
Document Identifier |
Buffered file |
Doc type + metadata |
LLM Classification |
| 3 |
OCR / Layout Extraction ⭐ |
Image/scanned PDF |
Text + spatial layout |
Azure Document Intelligence |
| 4 |
Clause / Section Parser ⭐ |
Normalized text |
Structured sections |
Claude / LLM |
| 5 |
Policy Extractor (LLM) |
Structured sections |
Extracted fields + confidence |
Claude 3.5 / GPT-4 |
| 6 |
Quality Checker |
Extracted fields |
Quality score + anomalies |
Validation rules engine |
| 7 |
Data Normalizer |
Validated fields |
Normalized canonical values |
Python transformers |
| 8 |
Clause Indexer ⭐ |
Normalized clauses |
Indexed clause metadata |
Search index + metadata store |
| 9 |
Embedding / Vectorization ⭐ |
Clause text + metadata |
Dense vectors (1536-dim) |
OpenAI Embeddings API |
| 10 |
Vector Store ⭐ |
Vectors + metadata |
Searchable vector index |
Azure Cognitive Search |
| 11 |
Rules Transformer |
Normalized data + vectors |
PolicyAsCode rules |
Rule DSL + LLM |
| 12 |
Policy Store |
Rules + metadata + vectors |
Stored in multi-DB |
PostgreSQL, MongoDB, Redis, Blob |
| 13 |
Rule Evaluator / Decision Engine |
Stored rules + request + vectors |
Decision + audit log |
Decision engine + observability |