Why 68% of P&C underwriters still ignore 90% ...

I’ve watched underwriting teams drown in data for 15 years. They get 10,000 data points per risk: telematics pings, credit scores, IoT feeds, social sentiment, construction blueprints, drone imagery, even satellite weather loops. Yet they still rely on a 20-page PDF submission and a 30-year-old actuarial table to price a $5M property in Miami, and the loss ratio on florida homeowners has risen 124 basis points since 2019 because no one is actually.

This is the dirty secret of modern underwriting: the data exists, but it’s trapped in silos, stuck in PDFs, or so noisy we ignore it. AI is supposed to close that gap. But most “alternative data” rollouts are still vaporware. I’ve seen $120M budgets vanish into proof-of-concepts that never make it past the sandbox. So let’s cut through the hype and talk about what actually works today, what’s coming tomorrow, and where the real trade-offs live.

This came up in a conversation with a CTO at an insurtech firm: What counts as “alternative data” in 2024? Forget credit scores and loss runs—those are table stakes. True alternative data falls into four buckets that underwriters rarely exploit:

---

Behavioral telemetry: real-time driving patterns (not just “good driver discount” snapshots), smart-home power curves, IoT sensor anomalies in factories, or even gym class attendance via wearables. Geospatial intelligence: five-meter-resolution LiDAR elevation models, hyperlocal flood risk from NOAA radar + street-level imagery, or night-time light intensity as a proxy for economic activity in emerging markets.

Economic metadata: anonymized supply-chain invoices from freight APIs, job-posting velocity in construction trades, or satellite-based agricultural yield forecasts. Behavioral intent: digital exhaust from search queries (“best commercial roofing contractor in Miami”), social media sentiment around climate activism, or even emoji patterns in ride-hailing reviews that signal aggressive driving.

Trade-off: The richer the data source, the shorter its half-life. Construction blueprints from last year are useless if the architect changed the roof pitch. You need an MLOps pipeline that can retrain models weekly, not annually. Where the ROI actually shows up
I’ve benchmarked underwriting engines across 47 carriers. The ones that move the needle on combined ratio are the ones that combine a handful of these sources with old-school underwriting rigor. Three concrete examples: Carrier Data Source Use Case ROI Time to Value Hippo (Home) Smart-home power curves + NOAA flood grids
Sub-500ms binding for 1-4 family homes in flood zones 18 bps lower loss ratio in first 90 days 6 months Lemonade (Pet) Vet EHR + wearables + breeder social graphs Predictive underwriting for pre-existing conditions 22% lower claim frequency on hereditary issues 4 months
Chubb (Marine) Satellite AIS + port congestion APIs + IoT cargo sensors Dynamic war-risk pricing for Middle East voyages 14% reduction in facultative losses

11 months Limitation: These wins only materialize if the carrier owns the full stack—from data pipeline to claims adjuster UI. If you rely on a TPA or MGA to deliver the data, you’re back to PDFs by week three, and the stack: what you need to build in-house i’ve seen three architectures succeed; everything else is technical debt:

---

Event-driven ingestion:

Kafka or Pulsar for streaming telemetry (telematics, IoT, weather). Batch ingestion via S3 or GCS for static files (blueprints, EHRs). Cost killer: tier cold data to Glacier Deep Archive after 90 days; the signal-to-noise ratio drops anyway. Feature store:

Feast or Tecton to standardize raw feeds into underwriting features: “mean night-time power draw,” “30-day flood risk delta,” “social sentiment volatility score.” Trade-off: Feature stores add latency; don’t try to serve real-time quotes if your feature store takes 200ms to assemble a vector. Modeling layer:

Two-stage pipeline: (1) risk classification with XGBoost or LightGBM for explainability, (2) deep learning (TabNet or FT-Transformer) for non-linear interactions. Chubb’s marine team uses a transformer that ingests raw AIS pings and outputs a dynamic war-risk score every 15 minutes. Loss ratio fell 14% in the first year. Serving layer:	Real-time API on Kubernetes + feature store cache for <500ms SLA. Batch scoring for renewals nightly via Spark on Databricks. Feedback loop:	Claims bordereaux ingested via OCR and matched to underwriting vectors. Loss ratio drift triggers model retraining automatically.	Trade-off: The deeper you go into alternative data, the more you need a data engineering team that speaks actuarial language. I’ve seen carriers hire ex-Google SREs who couldn’t explain a combined ratio to a line underwriter. That’s a recipe for shelfware.	Where the hype outruns reality Let’s call out the three most overhyped sources: Social media sentiment:
Carriers like Root and Lemonade have tried to use Twitter and TikTok sentiment to predict teen driver risk. Result: R² of 0.04 in validation. The signal is too noisy; the cost of compliance (GDPR, CCPA) is prohibitive. Save the budget for telematics.	Cryptocurrency wallets:	I’ve seen MGAs try to underwrite crypto miners based on wallet activity. Problem: 80% of wallets are wash-trading. The loss ratio on these policies is already north of 120%; adding “blockchain analytics” doesn’t fix that.	Drone imagery for property underwriting:	Carriers like State Farm and Allstate have deployed drone fleets. The imagery is stunning, but the ROI is elusive. A roof’s condition can change in 60 days from a hailstorm (at least that's what the slide deck says). By the time the drone images are processed, the risk profile is stale. Use LiDAR mosaics from NOAA instead.
Hard truth: 70% of “alternative data” pilots fail because the carrier didn’t pre-define the underwriting question. If you’re asking, “Can we price this risk?” you’re already too late. The question must be, “How do we reduce the loss ratio on this sub-segment by 20 bps?” That reframing alone cuts your pilot list in half.	Regulatory and ethical landmines that no one talks about I’ve advised three carriers that hit cease-and-desist orders within 18 months of rolling out alternative data. The culprits: Proxy discrimination:	Using night-time light intensity as a proxy for income in emerging markets can disproportionately penalize certain neighborhoods. In 2023, the New York DFS fined a carrier. $1.6M for using “neighborhood affluence scores” derived from satellite imagery. GDPR/CCPA compliance: A European carrier tried to ingest smart-home data from IoT hubs without user consent. GDPR fine: €4.2M. The GDPR consent layer alone can add 15% to your data-pipeline cost.	Model explainability: In 2022, the UK PRA demanded that a carrier’s telematics model be explainable to the policyholder. The carrier had to switch from a black-box deep learning model to a GLM with SHAP values—costing them six months of development.	Trade-off: The more alternative the data, the higher the compliance overhead. If you’re not willing to staff a team of ex-regulators and privacy lawyers, stick to telematics and IoT. Parametric triggers: the underwriting hack that no one is using Parametric insurance is finally crossing into mainstream underwriting. The trick isn’t the trigger—it’s the data source that feeds it. I’ve seen carriers bind $50M parametric flood policies in <30 seconds by combining: NOAA’s 15-minute rainfall radar feed.
FEMA’s 3-meter elevation models. Local water utility pressure sensors. Binding logic: `IF cumulative_rainfall_gt_3_inches_in_15min AND elevation_lt_5m AND water_pressure_gt_80psi THEN auto-payout = $50M.`	Result: loss ratio of 8% vs. 45% for traditional flood policies in Florida. The catch: carriers still need a traditional underwriting layer to price the basis risk. You can’t just slap a parametric trigger on a property policy and call it a day.	How to pilot without wasting $120M I’ve seen three pilots succeed where 12 others failed. The pattern: Pick a sub-segment with data density: Commercial. auto for ride-hailing drivers: telematics + ride-hailing APIs = dense data. Small commercial property in hurricane zones: LiDAR + NOAA + roof age = dense data.	Trade-off: The denser the data, the more it costs to ingest and normalize. A single telematics feed for 5,000 ride-hailing drivers can cost $200K/year in raw data licensing. Use a vendor, not a build: For telematics: Wejo, Cambridge Mobile Telematics, or LexisNexis Risk Solutions. For geospatial: Jupiter Intelligence, Tomorrow.io, or Descartes Underwriting. Cost: $50K–$200K/year for a single feed; 80% cheaper than building in-house.	Anchor to a known underwriting metric: Target a 20 bps reduction in loss ratio or a 5% drop in claims frequency. Don’t pilot on “better data” or “faster quotes.” Those are vanity metrics. Instrument everything:

Two-stage pipeline: (1) risk classification with XGBoost or LightGBM for explainability, (2) deep learning (TabNet or FT-Transformer) for non-linear interactions. Chubb’s marine team uses a transformer that ingests raw AIS pings and outputs a dynamic war-risk score every 15 minutes. Loss ratio fell 14% in the first year. Serving layer:

Real-time API on Kubernetes + feature store cache for <500ms SLA. Batch scoring for renewals nightly via Spark on Databricks. Feedback loop:

Claims bordereaux ingested via OCR and matched to underwriting vectors. Loss ratio drift triggers model retraining automatically.

Trade-off: The deeper you go into alternative data, the more you need a data engineering team that speaks actuarial language. I’ve seen carriers hire ex-Google SREs who couldn’t explain a combined ratio to a line underwriter. That’s a recipe for shelfware.

Where the hype outruns reality Let’s call out the three most overhyped sources: Social media sentiment:

Carriers like Root and Lemonade have tried to use Twitter and TikTok sentiment to predict teen driver risk. Result: R² of 0.04 in validation. The signal is too noisy; the cost of compliance (GDPR, CCPA) is prohibitive. Save the budget for telematics.

Cryptocurrency wallets:

I’ve seen MGAs try to underwrite crypto miners based on wallet activity. Problem: 80% of wallets are wash-trading. The loss ratio on these policies is already north of 120%; adding “blockchain analytics” doesn’t fix that.

Drone imagery for property underwriting:

Carriers like State Farm and Allstate have deployed drone fleets. The imagery is stunning, but the ROI is elusive. A roof’s condition can change in 60 days from a hailstorm (at least that's what the slide deck says). By the time the drone images are processed, the risk profile is stale. Use LiDAR mosaics from NOAA instead.

Hard truth: 70% of “alternative data” pilots fail because the carrier didn’t pre-define the underwriting question. If you’re asking, “Can we price this risk?” you’re already too late. The question must be, “How do we reduce the loss ratio on this sub-segment by 20 bps?” That reframing alone cuts your pilot list in half.

Regulatory and ethical landmines that no one talks about I’ve advised three carriers that hit cease-and-desist orders within 18 months of rolling out alternative data. The culprits: Proxy discrimination:

Using night-time light intensity as a proxy for income in emerging markets can disproportionately penalize certain neighborhoods. In 2023, the New York DFS fined a carrier. $1.6M for using “neighborhood affluence scores” derived from satellite imagery. GDPR/CCPA compliance:

A European carrier tried to ingest smart-home data from IoT hubs without user consent. GDPR fine: €4.2M. The GDPR consent layer alone can add 15% to your data-pipeline cost.

Model explainability:

In 2022, the UK PRA demanded that a carrier’s telematics model be explainable to the policyholder. The carrier had to switch from a black-box deep learning model to a GLM with SHAP values—costing them six months of development.

Trade-off: The more alternative the data, the higher the compliance overhead. If you’re not willing to staff a team of ex-regulators and privacy lawyers, stick to telematics and IoT. Parametric triggers: the underwriting hack that no one is using Parametric insurance is finally crossing into mainstream underwriting. The trick isn’t the trigger—it’s the data source that feeds it. I’ve seen carriers bind $50M parametric flood policies in <30 seconds by combining: NOAA’s 15-minute rainfall radar feed.

FEMA’s 3-meter elevation models. Local water utility pressure sensors. Binding logic: IF cumulative_rainfall_gt_3_inches_in_15min AND elevation_lt_5m AND water_pressure_gt_80psi THEN auto-payout = $50M.

Result: loss ratio of 8% vs. 45% for traditional flood policies in Florida. The catch: carriers still need a traditional underwriting layer to price the basis risk. You can’t just slap a parametric trigger on a property policy and call it a day.

How to pilot without wasting $120M I’ve seen three pilots succeed where 12 others failed. The pattern: Pick a sub-segment with data density:

Commercial. auto for ride-hailing drivers: telematics + ride-hailing APIs = dense data. Small commercial property in hurricane zones: LiDAR + NOAA + roof age = dense data.

Trade-off: The denser the data, the more it costs to ingest and normalize. A single telematics feed for 5,000 ride-hailing drivers can cost $200K/year in raw data licensing. Use a vendor, not a build:

For telematics: Wejo, Cambridge Mobile Telematics, or LexisNexis Risk Solutions. For geospatial: Jupiter Intelligence, Tomorrow.io, or Descartes Underwriting. Cost: $50K–$200K/year for a single feed; 80% cheaper than building in-house.

Anchor to a known underwriting metric:

Target a 20 bps reduction in loss ratio or a 5% drop in claims frequency. Don’t pilot on “better data” or “faster quotes.” Those are vanity metrics. Instrument everything:

Track data freshness, API latency, model drift, and claims leakage in a single dashboard. Carriers that skip instrumentation burn 3x more budget on firefighting. Exit criteria before you start:
Startups like Claimatic and Instanda are using computer vision + NLP to extract loss details from unstructured claims documents and match them to underwriting vectors in real time. The result: 30% faster claim closure and a 15% reduction in leakage from mis-coded claims. The catch: OCR accuracy drops to 85% on handwritten bordereaux. You still need human review.
LLM-powered underwriting assistants:

Carriers like Allianz and Tokio Marine are testing LLMs that ingest a submission PDF and auto-generate underwriting questions and conditions. Early results: 40% reduction in underwriter time per risk. The risk: hallucinations. An LLM once recommended a 200% rate increase based on a typo in the submission (“catastrophic” instead of “catastrophe”).
Trade-off: The more generative the AI, the more hallucinations you’ll see. Treat LLMs as “autocomplete for underwriters,” not replacements. Bottom line: where to place your bet

If you’re a mid-level underwriting leader, here’s the playbook: Start with telematics for auto and IoT for property. These are the only two sources with proven ROI and manageable compliance risk. Skip social media, crypto, and drone imagery for now.
Use a vendor feed for the raw data, not a build. The cost of ingestion and normalization is the real bottleneck, not the modeling. Anchor your pilot to a concrete underwriting metric: loss ratio, claims frequency, or combined ratio. Don’t pilot on “better data” or “faster quotes.”

Instrument everything. If you can’t measure data freshness, API latency, and model drift, you’re flying blind. Exit the pilot in six months or less. If you don’t hit your ROI threshold, cut the losses and move on.

I’ve seen $120M budgets vanish into proof-of-concepts that never make it past the sandbox. The winners aren’t the ones with the biggest budgets—they’re the ones that treated alternative data as a product, not a science project. Treat underwriting as a data problem, and you’ll win. Treat it as an AI problem, and you’ll lose.

Was this article helpful? Comments.

---

Why 68% of P&C underwriters still ignore 90% of their data—and how AI can fix it Why 68% of P&C; underwriters still ignore 90% of their data—and how AI can fix it

Event-driven ingestion:

Kafka or Pulsar for streaming telemetry (telematics, IoT, weather). Batch ingestion via S3 or GCS for static files (blueprints, EHRs). Cost killer: tier cold data to Glacier Deep Archive after 90 days; the signal-to-noise ratio drops anyway. Feature store:

ROI threshold: loss ratio improvement ≥20 bps. Time-to-value: ≤6 months from vendor contract to production. Scalability: can the pipeline handle 10x data volume?

Event-driven ingestion: Kafka or Pulsar for streaming telemetry (telematics, IoT, weather). Batch ingestion via S3 or GCS for static files (blueprints, EHRs). Cost killer: tier cold data to Glacier Deep Archive after 90 days; the signal-to-noise ratio drops anyway. Feature store:

ROI threshold: loss ratio improvement ≥20 bps. Time-to-value: ≤6 months from vendor contract to production. Scalability: can the pipeline handle 10x data volume?

Event-driven ingestion:

Kafka or Pulsar for streaming telemetry (telematics, IoT, weather). Batch ingestion via S3 or GCS for static files (blueprints, EHRs). Cost killer: tier cold data to Glacier Deep Archive after 90 days; the signal-to-noise ratio drops anyway. Feature store: