Decision Intelligence

Insurance Data Quality AI Readiness Assessment: A Practitioner’s Checklist

Insurance Data Quality AI Readiness Assessment: A Practitioner’s Checklist

I’ve seen claims teams burn months of runway because they treated data quality as an afterthought. One mid-size P&C carrier kicked off a fraud detection pilot with 12% missing claimant addresses. The model’s precision tanked, and the project was shelved. That’s avoidable. This guide walks through a repeatable framework to assess whether your data is fit for AI—before you write a single line of code.

Why a checklist is non-negotiable

Insurance data is messy by design: policy inception dates, loss runs, bordereaux from TPAs, and unstructured adjuster notes. AI models chew through this raw material. If your data has a 20% duplicate rate or 30% missing values in key rating fields, your combined ratio on AI projects will climb fast.


Phase 1: Inventory & Criticality Mapping

Start with a data catalog that answers: what exists, where it lives, who owns it, and how it’s used. Without this, you’re flying blind.

Step 1: Define your AI use case boundary

  • Goal: Narrow the scope to something bounded like “predictive underwriting for small commercial auto” or “fraud signal extraction from SIU reports.”
  • Trade-off: If you try to boil the ocean (e.g., “transform claims end-to-end”), your checklist becomes a sprawling spreadsheet that no one updates.

Example: A regional carrier targeting a 5% uplift in loss ratio through telematics underwriting limited its AI scope to renewal auto policies with ≤$500K premium.

Step 2: Map data sources to use case

Create a table like the one below. Include the data’s origin (TPA, MGA, internal system), format (CSV, JSON, PDF), and update cadence.

Data Asset Source System Format Update Frequency Use Case Role
Loss Runs (last 3 years) Guidewire ClaimCenter CSV (via ETL) Weekly Target variable for severity models
Vehicle Telematics (last 12 months) OEM API (via third-party telematics vendor) JSON Daily Predictive underwriting features
SIU Referral Narratives Internal SharePoint Unstructured PDF Ad-hoc Fraud signal extraction
Bordereaux (TPA) Cognizant TPA Platform Excel (monthly batch) Monthly Premium audit reconciliation

Step 3: Identify key fields and business rules

For each use case, list the 10–15 fields that directly impact outcomes. For example:

  • Underwriting: Driver age, vehicle make/model, ZIP code, prior losses.
  • Claims: Date of loss, incurred loss amount, cause of loss, adjuster notes.

Trade-off: Including “nice-to-have” fields like policyholder’s favorite color will dilute model performance and inflate data cleaning costs.

Resource estimate (Phase 1)

  • Analyst time: 2–3 weeks full-time.
  • Tooling: Collibra or Alation (if you have budget); otherwise, a shared Confluence page with manual updates.

Phase 2: Data Profiling & Quality Metrics

Step 4: Run automated profiling

Use open-source tools like pandas-profiling (Python) or commercial suites like Informatica Data Quality. Focus on:

  • Completeness: % missing values per field.
  • Uniqueness: Duplicate policy numbers or claim IDs.
  • Consistency: Inconsistent date formats (MM/DD/YYYY vs. YYYY-MM-DD).
  • Validity: ZIP codes outside 5-digit ranges or ICD-10 codes that don’t exist.

Example snippet:

import pandas as pd
from ydata_profiling import ProfileReport

# Load loss run data
df = pd.read_csv("loss_runs_2023.csv")

# Generate profile
profile = ProfileReport(df, title="Loss Runs 2023")
profile.to_file("loss_runs_profile.html")

Trade-off: Profiling tools can flag 100+ issues. Prioritize those that violate business rules (e.g., negative incurred loss amounts) over cosmetic outliers.

Step 5: Calculate downstream impact

Quantify the cost of poor quality. For example:

  • If 15% of claims are missing “cause of loss,” your fraud model’s precision drops by ~12% (based on a 2022 study by ISO ClaimSearch).
  • If ZIP code is missing for 20% of policies, your territorial ratemaking error could exceed 8% of premium (per NAIC data).

Step 6: Validate against regulatory & actuarial standards

Check against:

  • NAIC Model Laws: For example, Schedule P requires loss development patterns to be auditable. If your data lacks transaction IDs for each loss payment, you’re non-compliant.
  • IFRS 17: For reserving models, ensure granularity for each cash-flow component (e.g., claims handling expenses broken down by adjuster).

Trade-off: Fixing regulatory gaps is mandatory but often requires manual workarounds (e.g., retrofitting legacy systems). This can delay AI timelines by 3–6 months.

Resource estimate (Phase 2)

  • Analyst time: 3–4 weeks.
  • Tooling: pandas-profiling (free), Informatica Data Quality (~$50K/year for mid-size carriers), or Great Expectations (open-source).

Phase 3: Data Lineage & Governance Gaps

Step 7: Trace lineage from source to model input

Use a lightweight tool like OpenLineage or manually document:

  • ETL/ELT pipelines that transform raw claims data into features.
  • Downstream systems (e.g., rating engine) that consume the data.
  • Manual overrides (e.g., a underwriter changing a ZIP code post-issuance).

Example: A carrier’s telematics pipeline pulled “hard braking events” from an OEM API, but the API’s refresh rate was daily while the rating engine expected real-time. This caused a 7% uplift in model error during validation.

Step 8: Identify governance gaps

Check for:

  • Access controls: Can only 3 people in claims access loss runs? This will bottleneck feature engineering.
  • Metadata standards: Are field definitions consistent across systems? For example, “incurred loss” might mean different things in ClaimCenter vs. the general ledger.
  • Change management: Does the TPA update bordereaux schemas without notice? If so, your auto-feature pipelines will break.

Trade-off: Implementing strict governance (e.g., data contracts) adds friction but prevents pipeline failures. One carrier saw a 40% reduction in downstream breaks after adopting Great Expectations for schema validation.

Step 9: Assess bias and fairness risks

For models affecting underwriting or claims, run a bias scan on protected classes (e.g., age, gender, ZIP code). Use tools like IBM’s AI Fairness 360 or Google’s What-If Tool.

  • Risk: If your training data has 10x more claims from urban ZIP codes, your model may underpredict severity for rural areas, leading to adverse selection.

Resource estimate (Phase 3)

  • Analyst time: 2–3 weeks.
  • Tooling: OpenLineage (free), Collibra (~$100K/year), or manual tracking in Jira.

Phase 4: Feature Engineering Readiness

Step 10: Evaluate feature availability and latency

For each feature in your model spec, ask:

  • Is it available at prediction time? For example, a telematics feature like “average speed over 7 days” must be calculable at renewal, not just at claim time.
  • What’s the latency? If a bordereaux from the TPA arrives monthly, you can’t use it for real-time underwriting.

Example: A carrier tried to use “prior claim frequency” as a feature for new business underwriting, but the TPA’s loss runs arrived 6 weeks after policy inception. The model’s lift vanished.

Step 11: Test for leakage

Common leakage sources in insurance:

  • Post-loss features: Including “incurred loss amount” in a renewal underwriting model.
  • Look-ahead features: Using “loss ratio for the policy year” before it’s closed.
  • Aggregated features: “Total paid losses” for the policyholder without accounting for IBNR.

Trade-off: Detecting leakage requires deep domain knowledge. One team spent 8 weeks debugging a model that used “adjuster notes” as a feature—only to realize the notes were written after the claim was flagged for fraud.

Step 12: Simulate feature pipelines

Build a minimal feature pipeline to test end-to-end latency and failure modes. For example:

from pyspark.sql import functions as F

# Simulate telematics feature engineering
telematics_df = (
    spark.read.parquet("s3://telematics-raw/2023/")
    .filter(F.col("event_date") >= F.date_sub(F.current_date(), 30))
    .groupBy("policy_id")
    .agg(
        F.avg("hard_braking_events").alias("avg_hard_brakes"),
        F.max("speeding_events").alias("max_speeding_events")
    )
)

# Write to feature store
telematics_df.write.format("delta").save("s3://feature-store/telematics_latest/")

Trade-off: Spark pipelines add operational overhead. A mid-size carrier spent $120K/year on Databricks compute before realizing they could downsize to dbt + Snowflake for 60% less.

Resource estimate (Phase 4)

  • Data engineer time: 4–6 weeks.
  • Tooling: Feature store (Feast, Tecton), Spark/Databricks, or dbt + Snowflake.

Phase 5: Model Readiness & Risk Mitigation

Step 13: Assess label quality

For supervised models, label quality is everything. For example:

  • Underwriting: Labels = “loss ratio in next 12 months.” If 30% of policies have immature losses (e.g., 3 months old), your label is noisy.
  • Claims: Labels = “fraud vs. non-fraud.” If SIU referrals are inconsistent (some adjusters flag everything), your label is biased.

Trade-off: Cleaning labels is labor-intensive. One team hired 5 retired adjusters for 3 months to re-label 10K claims—costing ~$75K but improving model precision by 18%.

Step 14: Test for distribution shift

Insurance data drifts fast. For example:

  • Catastrophe events: After a hurricane, claim severity distributions skew for years.
  • Regulatory changes: New no-fault laws can invalidate territorial ratemaking assumptions.

Monitor drift using:

  • Population stability index (PSI): Flag if PSI > 0.25 (per industry benchmarks).
  • Kullback-Leibler divergence: For continuous features like loss ratios.

Example: A carrier’s auto underwriting model saw PSI jump from 0.12 to 0.38 after COVID-19 lockdowns—prompting a full retrain.

Step 15: Document model risk controls

For each model, define:

  • Performance thresholds: Precision ≥ 85% for fraud models, MAE ≤ $2K for severity models.
  • Monitoring cadence: Daily for high-velocity models (e.g., telematics underwriting), monthly for low-velocity (e.g., catastrophe reserves).
  • Fallback mechanisms: If drift is detected, revert to a simpler model or manual underwriting.

Trade-off: Over-engineering monitoring adds cost. One insurtech spent $200K/year on model monitoring before realizing their fraud model’s drift was negligible.

Resource estimate (Phase 5)

  • Data scientist time: 3–5 weeks.
  • Tooling: Evidently AI (free tier), Arize ($10K/year), or custom Python scripts.

Phase 6: Deployment & Operational Readiness

Step 16: Validate STP (Straight-Through Processing) pipelines

For models in production, test:

  • Latency: Can the model score a policy in < 500ms? If not, real-time underwriting is off the table.
  • Reliability: What’s the uptime SLA? One carrier’s telematics model had 99.5% uptime, but the OEM API’s SLA was 99.9%—leading to 5 hours of downtime/year.
  • Fallbacks: If the model API fails, does the system revert to a rule-based model or manual review?

Example: A carrier’s fraud model ran in a microservice with a 2-second timeout. During peak claims, latency spiked to 4 seconds, causing 12% of claims to bypass the model entirely.

Step 17: Align with actuarial and regulatory reviews

For models affecting ratemaking or reserving:

  • Actuarial sign-off: Review model documentation (e.g., GLMs with feature importance, neural nets with SHAP values).
  • Regulatory filings: If the model is used for ratemaking, file it with the DOI. Include data lineage, feature descriptions, and validation results.
  • External audit: Engage a third-party actuary to validate model assumptions (e.g., loss development patterns).

Trade-off: Regulatory filings add 2–3 months to timelines. One carrier waited 6 months for DOI approval on a predictive underwriting model, missing a competitive window.

Step 18: Plan for continuous improvement

Define a feedback loop:

  • Human-in-the-loop: Adjusters flag model errors (e.g., incorrect severity predictions).
  • Automated retraining: Trigger retraining when drift exceeds thresholds (e.g., PSI > 0.25).
  • Model versioning: Use MLflow or SageMaker Model Registry to track versions and roll back if needed.

Example: A carrier’s telematics model retrained monthly using new data. They saw a 4% improvement in loss ratio over 12 months but had to manually approve 15% of model predictions due to edge cases.

Resource estimate (Phase 6)

  • DevOps/ML engineer time: 4–6 weeks.