Insurance Data Quality AI Readiness Assessment: A Practitioner’s Checklist
I’ve seen claims teams burn months of runway because they treated data quality as an afterthought. One mid-size P&C carrier kicked off a fraud detection pilot with 12% missing claimant addresses. The model’s precision tanked, and the project was shelved. That’s avoidable. This guide walks through a repeatable framework to assess whether your data is fit for AI—before you write a single line of code.
Why a checklist is non-negotiable
Insurance data is messy by design: policy inception dates, loss runs, bordereaux from TPAs, and unstructured adjuster notes. AI models chew through this raw material. If your data has a 20% duplicate rate or 30% missing values in key rating fields, your combined ratio on AI projects will climb fast.
Phase 1: Inventory & Criticality Mapping
Start with a data catalog that answers: what exists, where it lives, who owns it, and how it’s used. Without this, you’re flying blind.
Step 1: Define your AI use case boundary
- Goal: Narrow the scope to something bounded like “predictive underwriting for small commercial auto” or “fraud signal extraction from SIU reports.”
- Trade-off: If you try to boil the ocean (e.g., “transform claims end-to-end”), your checklist becomes a sprawling spreadsheet that no one updates.
Example: A regional carrier targeting a 5% uplift in loss ratio through telematics underwriting limited its AI scope to renewal auto policies with ≤$500K premium.
Step 2: Map data sources to use case
Create a table like the one below. Include the data’s origin (TPA, MGA, internal system), format (CSV, JSON, PDF), and update cadence.
| Data Asset | Source System | Format | Update Frequency | Use Case Role |
|---|---|---|---|---|
| Loss Runs (last 3 years) | Guidewire ClaimCenter | CSV (via ETL) | Weekly | Target variable for severity models |
| Vehicle Telematics (last 12 months) | OEM API (via third-party telematics vendor) | JSON | Daily | Predictive underwriting features |
| SIU Referral Narratives | Internal SharePoint | Unstructured PDF | Ad-hoc | Fraud signal extraction |
| Bordereaux (TPA) | Cognizant TPA Platform | Excel (monthly batch) | Monthly | Premium audit reconciliation |
Step 3: Identify key fields and business rules
For each use case, list the 10–15 fields that directly impact outcomes. For example:
- Underwriting: Driver age, vehicle make/model, ZIP code, prior losses.
- Claims: Date of loss, incurred loss amount, cause of loss, adjuster notes.
Trade-off: Including “nice-to-have” fields like policyholder’s favorite color will dilute model performance and inflate data cleaning costs.
Resource estimate (Phase 1)
- Analyst time: 2–3 weeks full-time.
- Tooling: Collibra or Alation (if you have budget); otherwise, a shared Confluence page with manual updates.
Phase 2: Data Profiling & Quality Metrics
Step 4: Run automated profiling
Use open-source tools like pandas-profiling (Python) or commercial suites like Informatica Data Quality. Focus on:
- Completeness: % missing values per field.
- Uniqueness: Duplicate policy numbers or claim IDs.
- Consistency: Inconsistent date formats (MM/DD/YYYY vs. YYYY-MM-DD).
- Validity: ZIP codes outside 5-digit ranges or ICD-10 codes that don’t exist.
Example snippet:
import pandas as pd
from ydata_profiling import ProfileReport
# Load loss run data
df = pd.read_csv("loss_runs_2023.csv")
# Generate profile
profile = ProfileReport(df, title="Loss Runs 2023")
profile.to_file("loss_runs_profile.html")
Trade-off: Profiling tools can flag 100+ issues. Prioritize those that violate business rules (e.g., negative incurred loss amounts) over cosmetic outliers.
Step 5: Calculate downstream impact
Quantify the cost of poor quality. For example:
- If 15% of claims are missing “cause of loss,” your fraud model’s precision drops by ~12% (based on a 2022 study by ISO ClaimSearch).
- If ZIP code is missing for 20% of policies, your territorial ratemaking error could exceed 8% of premium (per NAIC data).
Step 6: Validate against regulatory & actuarial standards
Check against:
- NAIC Model Laws: For example, Schedule P requires loss development patterns to be auditable. If your data lacks transaction IDs for each loss payment, you’re non-compliant.
- IFRS 17: For reserving models, ensure granularity for each cash-flow component (e.g., claims handling expenses broken down by adjuster).
Trade-off: Fixing regulatory gaps is mandatory but often requires manual workarounds (e.g., retrofitting legacy systems). This can delay AI timelines by 3–6 months.
Resource estimate (Phase 2)
- Analyst time: 3–4 weeks.
- Tooling:
pandas-profiling(free), Informatica Data Quality (~$50K/year for mid-size carriers), or Great Expectations (open-source).
Phase 3: Data Lineage & Governance Gaps
Step 7: Trace lineage from source to model input
Use a lightweight tool like OpenLineage or manually document:
- ETL/ELT pipelines that transform raw claims data into features.
- Downstream systems (e.g., rating engine) that consume the data.
- Manual overrides (e.g., a underwriter changing a ZIP code post-issuance).
Example: A carrier’s telematics pipeline pulled “hard braking events” from an OEM API, but the API’s refresh rate was daily while the rating engine expected real-time. This caused a 7% uplift in model error during validation.
Step 8: Identify governance gaps
Check for:
- Access controls: Can only 3 people in claims access loss runs? This will bottleneck feature engineering.
- Metadata standards: Are field definitions consistent across systems? For example, “incurred loss” might mean different things in ClaimCenter vs. the general ledger.
- Change management: Does the TPA update bordereaux schemas without notice? If so, your auto-feature pipelines will break.
Trade-off: Implementing strict governance (e.g., data contracts) adds friction but prevents pipeline failures. One carrier saw a 40% reduction in downstream breaks after adopting Great Expectations for schema validation.
Step 9: Assess bias and fairness risks
For models affecting underwriting or claims, run a bias scan on protected classes (e.g., age, gender, ZIP code). Use tools like IBM’s AI Fairness 360 or Google’s What-If Tool.
- Risk: If your training data has 10x more claims from urban ZIP codes, your model may underpredict severity for rural areas, leading to adverse selection.
Resource estimate (Phase 3)
- Analyst time: 2–3 weeks.
- Tooling: OpenLineage (free), Collibra (~$100K/year), or manual tracking in Jira.
Phase 4: Feature Engineering Readiness
Step 10: Evaluate feature availability and latency
For each feature in your model spec, ask:
- Is it available at prediction time? For example, a telematics feature like “average speed over 7 days” must be calculable at renewal, not just at claim time.
- What’s the latency? If a bordereaux from the TPA arrives monthly, you can’t use it for real-time underwriting.
Example: A carrier tried to use “prior claim frequency” as a feature for new business underwriting, but the TPA’s loss runs arrived 6 weeks after policy inception. The model’s lift vanished.
Step 11: Test for leakage
Common leakage sources in insurance:
- Post-loss features: Including “incurred loss amount” in a renewal underwriting model.
- Look-ahead features: Using “loss ratio for the policy year” before it’s closed.
- Aggregated features: “Total paid losses” for the policyholder without accounting for IBNR.
Trade-off: Detecting leakage requires deep domain knowledge. One team spent 8 weeks debugging a model that used “adjuster notes” as a feature—only to realize the notes were written after the claim was flagged for fraud.
Step 12: Simulate feature pipelines
Build a minimal feature pipeline to test end-to-end latency and failure modes. For example:
from pyspark.sql import functions as F
# Simulate telematics feature engineering
telematics_df = (
spark.read.parquet("s3://telematics-raw/2023/")
.filter(F.col("event_date") >= F.date_sub(F.current_date(), 30))
.groupBy("policy_id")
.agg(
F.avg("hard_braking_events").alias("avg_hard_brakes"),
F.max("speeding_events").alias("max_speeding_events")
)
)
# Write to feature store
telematics_df.write.format("delta").save("s3://feature-store/telematics_latest/")
Trade-off: Spark pipelines add operational overhead. A mid-size carrier spent $120K/year on Databricks compute before realizing they could downsize to dbt + Snowflake for 60% less.
Resource estimate (Phase 4)
- Data engineer time: 4–6 weeks.
- Tooling: Feature store (Feast, Tecton), Spark/Databricks, or dbt + Snowflake.
Phase 5: Model Readiness & Risk Mitigation
Step 13: Assess label quality
For supervised models, label quality is everything. For example:
- Underwriting: Labels = “loss ratio in next 12 months.” If 30% of policies have immature losses (e.g., 3 months old), your label is noisy.
- Claims: Labels = “fraud vs. non-fraud.” If SIU referrals are inconsistent (some adjusters flag everything), your label is biased.
Trade-off: Cleaning labels is labor-intensive. One team hired 5 retired adjusters for 3 months to re-label 10K claims—costing ~$75K but improving model precision by 18%.
Step 14: Test for distribution shift
Insurance data drifts fast. For example:
- Catastrophe events: After a hurricane, claim severity distributions skew for years.
- Regulatory changes: New no-fault laws can invalidate territorial ratemaking assumptions.
Monitor drift using:
- Population stability index (PSI): Flag if PSI > 0.25 (per industry benchmarks).
- Kullback-Leibler divergence: For continuous features like loss ratios.
Example: A carrier’s auto underwriting model saw PSI jump from 0.12 to 0.38 after COVID-19 lockdowns—prompting a full retrain.
Step 15: Document model risk controls
For each model, define:
- Performance thresholds: Precision ≥ 85% for fraud models, MAE ≤ $2K for severity models.
- Monitoring cadence: Daily for high-velocity models (e.g., telematics underwriting), monthly for low-velocity (e.g., catastrophe reserves).
- Fallback mechanisms: If drift is detected, revert to a simpler model or manual underwriting.
Trade-off: Over-engineering monitoring adds cost. One insurtech spent $200K/year on model monitoring before realizing their fraud model’s drift was negligible.
Resource estimate (Phase 5)
- Data scientist time: 3–5 weeks.
- Tooling: Evidently AI (free tier), Arize ($10K/year), or custom Python scripts.
Phase 6: Deployment & Operational Readiness
Step 16: Validate STP (Straight-Through Processing) pipelines
For models in production, test:
- Latency: Can the model score a policy in < 500ms? If not, real-time underwriting is off the table.
- Reliability: What’s the uptime SLA? One carrier’s telematics model had 99.5% uptime, but the OEM API’s SLA was 99.9%—leading to 5 hours of downtime/year.
- Fallbacks: If the model API fails, does the system revert to a rule-based model or manual review?
Example: A carrier’s fraud model ran in a microservice with a 2-second timeout. During peak claims, latency spiked to 4 seconds, causing 12% of claims to bypass the model entirely.
Step 17: Align with actuarial and regulatory reviews
For models affecting ratemaking or reserving:
- Actuarial sign-off: Review model documentation (e.g., GLMs with feature importance, neural nets with SHAP values).
- Regulatory filings: If the model is used for ratemaking, file it with the DOI. Include data lineage, feature descriptions, and validation results.
- External audit: Engage a third-party actuary to validate model assumptions (e.g., loss development patterns).
Trade-off: Regulatory filings add 2–3 months to timelines. One carrier waited 6 months for DOI approval on a predictive underwriting model, missing a competitive window.
Step 18: Plan for continuous improvement
Define a feedback loop:
- Human-in-the-loop: Adjusters flag model errors (e.g., incorrect severity predictions).
- Automated retraining: Trigger retraining when drift exceeds thresholds (e.g., PSI > 0.25).
- Model versioning: Use MLflow or SageMaker Model Registry to track versions and roll back if needed.
Example: A carrier’s telematics model retrained monthly using new data. They saw a 4% improvement in loss ratio over 12 months but had to manually approve 15% of model predictions due to edge cases.
Resource estimate (Phase 6)
- DevOps/ML engineer time: 4–6 weeks.