Decision Intelligence

AI in Insurance Is a Regulatory Powder Keg — And Most Carriers Aren’t Ready

AI in Insurance Is a Regulatory Powder Keg — And Most Carriers Aren’t Ready

By 2027, the EU’s AI Act will force every insurer operating in Europe to classify AI models as either "unacceptable risk" (banned), "high risk" (heavily regulated), or "limited/minimal risk" (self-governed). The catch? Most carriers still treat AI governance like an IT checkbox instead of a core business risk. In a 2024 Deloitte survey, 68% of insurers said they lack a formal AI governance framework. That number rises to 84% for P&C carriers using AI in underwriting.

While regulators scramble to define rules, the market is already fragmenting. Some insurers are building internal frameworks that exceed regulatory minimums. Others are outsourcing governance to third-party risk models that may not align with regional laws. The result? A compliance minefield where one misstep could trigger fines, reputational damage, or even litigation.

This isn’t just about ticking boxes for AI use in claims or pricing. It’s about proving to regulators that your models are auditable, explainable, and fair — or facing penalties that could dwarf GDPR fines. Let’s break down what an AI governance framework actually needs to include, where the biggest gaps lie, and why the clock is ticking faster than most realize.

---

Why AI Governance in Insurance Isn’t Optional Anymore

Insurance is one of the most regulated industries in the world, but until recently, AI governance was treated as an afterthought. That changed in 2023 when the European Banking Authority (EBA) and European Insurance and Occupational Pensions Authority (EIOPA) jointly warned that AI models could violate anti-discrimination laws under the EU’s Gender Equality Directive and the Race Equality Directive. Within months, the EU AI Act was finalized, slotting AI-driven underwriting and pricing into the "high risk" category.

That classification isn’t symbolic. High-risk AI systems require:

  • Risk management systems that identify and mitigate bias, errors, and unintended outcomes.
  • Data governance that tracks lineage, quality, and compliance with data protection laws.
  • Transparency and explainability — not just for regulators, but for policyholders who may challenge decisions.
  • Human oversight in critical processes like underwriting and claims adjudication.

In the U.S., the regulatory picture is even messier. The NAIC AI Task Force has been working on model governance principles since 2019, but progress has been slow. Only 14 states have adopted any AI-specific regulations, and those that have are inconsistent. California’s SB 1199, for example, requires insurers to disclose AI use in underwriting, while New York’s Circular Letter 1 (2023) demands annual model risk reports. Meanwhile, the Federal Reserve is quietly pushing banks and insurers to adopt the same principles as the FDIC’s AI risk management framework, even if it’s not yet law.

The trade-off here is clear: carriers that wait for regulators to act will face higher compliance costs and potential legal exposure. Those that act now are spending heavily to stay ahead. Allianz recently disclosed it spent €50 million in 2023 on AI risk management, including hiring former regulators to audit its models. AXA went further, creating a global AI Ethics Board with external experts to pre-approve high-risk models before deployment.

---

The Core Components of an Insurance AI Governance Framework

An effective AI governance framework isn’t just a policy document — it’s a living system that evolves with regulatory changes and business needs. Here’s what it must include, broken into six critical layers.

1. Model Inventory and Classification

First, you need a single source of truth for every AI model in use. This isn’t just a spreadsheet — it’s a dynamic register that tracks:

  • Model purpose (e.g., fraud detection, dynamic pricing, claims triage).
  • Risk tier (based on the EU AI Act, NAIC principles, or internal criteria).
  • Data sources and lineage (where data comes from, how it’s transformed).
  • Deployment status (pilot, production, deprecated).
  • Model owner, developer, and validator.

EY found that 42% of insurers can’t fully trace their AI models’ data lineage, a critical failure when regulators demand explanations for denied claims or pricing decisions. The fix? Automated model discovery tools like Fiddler AI or Aria Systems’ Model Governance, which integrate with cloud platforms to log models in real time.

Limitation: These tools can’t fix poor data quality. If your training data is biased or incomplete, the best governance framework won’t save you from regulatory scrutiny.

2. Risk Assessment and Tiering

Not all AI models carry the same risk. A chatbot that schedules appointments poses far less regulatory risk than a deep learning model that predicts mortality rates. Most insurers use a tiering system like this:

Tier Examples Regulatory Requirements Example Insurers
Tier 1: Critical Underwriting models, mortality predictions, fraud detection in claims Full model risk management, annual third-party audits, human-in-the-loop validation Swiss Re, Munich Re
Tier 2: High Dynamic pricing, chatbots for policy servicing, customer segmentation Documented explainability, bias testing, periodic reviews Lemonade, Hippo
Tier 3: Medium Predictive maintenance for insured assets, internal analytics Self-assessment, transparency reports Most mid-size P&C carriers
Tier 4: Low Marketing personalization, internal NLP for document processing Minimal oversight, no regulatory filings Smaller MGAs, InsurTechs

PwC recommends using a risk matrix that weighs:

  • Impact (financial, reputational, legal harm from model failure).
  • Frequency (how often the model is used in critical decisions).
  • Uncertainty (how well the model’s behavior is understood).

For example, Liberty Mutual classifies its AI underwriting model as Tier 1, requiring sign-off from its Chief Risk Officer, CIO, and external auditors. The model, which uses 15 years of historical claims data, was paused for six months in 2023 after an internal audit flagged potential bias in pricing for certain ZIP codes.

Trade-off: Over-classifying models as high-risk increases compliance costs but reduces liability. Under-classifying risks regulatory penalties. The sweet spot is dynamic tiering that adjusts as models evolve.

3. Bias Testing and Fairness Monitoring

Bias in insurance AI isn’t theoretical — it’s already costing carriers. In 2022, ProPublica analyzed a major insurer’s pricing model and found it charged Black policyholders 10% more on average for the same coverage. The insurer, which was not named, settled a class-action lawsuit for $14 million.

To avoid this, insurers need a bias testing framework that goes beyond basic demographic parity. Fairlearn, an open-source toolkit from Microsoft, is gaining traction, but it’s not enough on its own. Insurers should also:

  • Segment analysis: Test model performance across protected classes (race, gender, age, ZIP code).
  • Counterfactual fairness: Check if the model would give the same decision if a policyholder’s protected attributes were changed.
  • Adversarial testing: Use synthetic data to probe for hidden biases that might not appear in real-world datasets.

Verisk offers a commercial tool called AI Fairness, which integrates with underwriting models to flag biased predictions in real time. Guidewire’s Cyence Risk Analytics goes further, simulating how changes in socioeconomic factors (e.g., gentrification) could impact loss ratios over time.

Limitation: Bias testing is only as good as the data you feed it. If your training data excludes certain demographics (a common issue in legacy underwriting datasets), no fairness tool will fix the underlying problem.

4. Explainability and Transparency

Regulators aren’t just asking for model transparency — they’re demanding it in a way that’s understandable to non-technical stakeholders. This is where most insurers fail. McKinsey found that 73% of insurers can’t explain how their AI models arrive at decisions in plain language.

Explainability isn’t a one-size-fits-all requirement. The approach depends on the model type:

  • Linear models (e.g., GLMs): Coefficients are inherently explainable, but you still need to document how they’re used in underwriting.
  • Tree-based models (e.g., XGBoost, Random Forest): Use SHAP values or LIME to show feature importance. DataRobot and H2O.ai offer built-in explainability tools for these models.
  • Deep learning (e.g., neural networks): Requires post-hoc explainability methods like Grad-CAM or LIME. These are less precise but still provide some insight into model behavior.

For high-risk models, insurers should also provide:

  • Model cards: Standardized documents that explain the model’s purpose, limitations, and performance metrics. Google’s Model Cards are a good template.
  • Human-readable summaries: A plain-language explanation of how the model affects policyholders. Lloyd’s of London now requires this for all AI-driven underwriting models.
  • Audit trails: A log of every model decision, including inputs, outputs, and the rationale behind the decision. This is critical for regulatory requests.

Chubb faced a regulatory challenge in 2023 when a state insurance department demanded an explanation for why a commercial property policy was denied. The insurer’s AI model flagged the risk as "high" due to a combination of factors, including roof age and proximity to wildfire-prone areas. While the model’s decision was technically correct, Chubb struggled to explain it in a way that satisfied the regulator. The company now uses Anaconda’s AI governance suite to generate automated explanations for denied policies.

Trade-off: More granular explainability increases operational overhead. A model that requires a 10-page explanation for each decision may not be scalable for high-volume underwriting.

5. Human Oversight and Escalation Pathways

AI governance frameworks often pay lip service to "human-in-the-loop" oversight, but in practice, many insurers treat it as a rubber stamp. That’s a mistake. The EU AI Act explicitly requires "meaningful human oversight" for high-risk models, meaning humans must be able to:

  • Override model decisions without undue delay.
  • Understand the model’s limitations and potential failure modes.
  • Escalate issues to a designated oversight committee.

Most insurers implement this through a tiered approval system:

Model Type Oversight Level Approval Path Example
Tier 1 (Underwriting) Executive Committee CRO, CIO, and CEO sign-off for deployment; quarterly reviews Prudential Financial
Tier 2 (Dynamic Pricing) Model Risk Manager Automated alerts for anomalies; monthly reviews Progressive Insurance
Tier 3 (Customer Segmentation) Line of Business Head Ad-hoc reviews for complaints; annual audits Many regional carriers

Prudential took this a step further in 2024 by creating a dedicated AI Ethics Committee, chaired by the CRO and including external advisors. The committee has veto power over any AI model that could disproportionately impact protected classes. AXA has a similar structure, with a board that includes a former regulator and a civil rights advocate.

Limitation: Human oversight adds latency to processes. For example, a claims triage model that requires manual review for every high-severity case could slow down payouts, increasing complaints and potentially violating state prompt-pay laws.

6. Continuous Monitoring and Model Drift Detection

AI models degrade over time. A pricing model trained on 2019 data may not account for post-pandemic inflation, supply chain disruptions, or climate-related risks. That’s why regulators require continuous monitoring of:

  • Performance drift: Has the model’s accuracy or precision changed significantly?
  • Concept drift: Has the relationship between input features and outcomes shifted? (e.g., the correlation between credit score and loss ratio may have changed post-2020).
  • Data drift: Have the statistical properties of the input data changed? (e.g., a surge in claims for water damage in a region