Skip to main content

Agentic AI + Human Oversight: Insurance's Winning Formula

AIG deployed agentic AI across underwriting and claims with orchestration. The result: 370,000 submissions processed in 2025 without proportional headcount growth. Here's the blueprint for scale without sacrificing human judgment.

Marcus DeWittFeb 20, 20269 min read

The Core Problem

Insurance has a scale problem that hiring alone can't solve. The industry processes millions of submissions annually—each one requiring an underwriter to extract data, compare it against historical cases, flag anomalies, and render a risk decision. The volume is growing. The pool of qualified underwriters isn't. Hiring your way to capacity costs more than the book of business justifies.

AIG's answer wasn't to build one mega-AI and hand it the keys. Instead, they layered multiple specialized agents—data extraction, enrichment, anomaly detection, recommendation—and built an orchestration layer to route work between them based on complexity and risk level. Lexington Insurance processed 370,000 submissions in 2025 through this system and is tracking toward 500,000 by 2030, without proportional headcount growth. In the Lloyd's Syndicate 2479 deal with Palantir and Amwins, agents pre-screened risks against structured appetite rules, cutting human review time without removing human judgment from the equation. That distinction—cutting time, not judgment—matters enormously in a regulated industry.

What AIG Actually Shipped

AIG Assist: The internal agent layer

AIG Assist is now live across most commercial lines. It sits between intake and the underwriter's desktop, handling the repetitive work that eats 60–70% of an underwriter's day: extracting submission data, summarizing policy history, surfacing comparable cases, and flagging inconsistencies. The tool doesn't make the underwriting decision. It compresses the analysis that precedes it.

An underwriter who would spend 45 minutes reading a submission packet, cross-referencing historical claims, and comparing to similar accounts can now review a pre-processed summary and comparative insights in about 10 minutes. That's not a marginal improvement—it's a capacity multiplier. The underwriter's judgment stays central; the grunt work doesn't.

The Everest portfolio absorption

When AIG acquired Everest's retail commercial portfolio, they faced a classic integration challenge: two books of business with different underwriting philosophies, different data structures, and different risk definitions. Manually aligning them would have taken months of analyst time.

Instead, AIG built an ontology of Everest's portfolio, mapped it to their own risk taxonomy using LLMs, and used agents to surface which accounts matched which categories. Humans approved the mapping decisions; agents did the classification heavy lifting. What would have been a manual nightmare—thousands of hours of analyst work—became a structured agent task with a clear audit trail.

Pre-screening at Lloyd's Syndicate 2479

In the Palantir and Amwins syndicate deal, AIG deployed agents to check whether Amwins' program portfolio matched Syndicate 2479's risk appetite. The syndicate has explicit rules: no environmental liability over a certain threshold, no Class A contractors in certain regions, a defined maximum aggregate per account. Rather than having underwriters scroll through Amwins' program database manually, agents did the pre-screening. Only accounts that genuinely required human judgment went to underwriters. Clear cases—fit the appetite or don't—got filtered algorithmically with full auditability. The result was a significant reduction in human review time without any reduction in human accountability for final decisions.

How Orchestration and Oversight Actually Manage Risk

Orchestration versus monolithic automation

The defining feature of AIG's approach is that they didn't deploy one model to "solve underwriting." They built an orchestration layer—a router that coordinates multiple specialized agents and decides which agent handles which piece of work. This matters because underwriting isn't one task; it's dozens of subtasks with different data requirements and different risk profiles. One agent is good at OCR—pulling numbers from forms accurately. Another excels at anomaly detection—finding outliers in claims history that shouldn't be there. A third specializes in geospatial risk: flood, wildfire, seismic exposure by account location. Rather than forcing one model to be mediocre at all three, orchestration lets each agent be genuinely strong in its domain. The orchestration layer coordinates them like a thoughtful manager, not a traffic cop.

Humans as decision owners, agents as companions

The critical design principle throughout AIG's system: agents are companions to human judgment, not replacements for it. This plays out at every stage of the workflow. At intake, agents extract data; humans decide if intake is complete. In risk assessment, agents flag concerns; humans decide if the risk is acceptable. On edge cases, agents surface what they can and defer when they can't resolve it without clearer guidance. In claims handling, agents triage and recommend; humans approve payouts. Every final decision is human-owned.

This isn't just philosophical—it's a liability design choice. If an agent makes a bad recommendation and a human approves it without understanding why, both the company and the human are exposed. By keeping final authority with underwriters and maintaining documented reasoning throughout, AIG preserves regulatory accountability and legal defensibility at every step.

Front-to-back workflow compression

AIG describes their approach as "front-to-back workflow compression": the entire path from intake through risk assessment through claims now flows through orchestrated agents that eliminate redundant, repetitive steps. But not all steps—only the ones where compression doesn't degrade judgment. A complex submission runs through an intake agent that pulls data from application documents and flags missing information, an enrichment agent that cross-references claims history and geospatial data, an anomaly agent that identifies inconsistencies between stated information and external sources, and a recommendation agent that synthesizes those signals and suggests an underwriting decision with confidence scores attached. The human underwriter reviews the agents' work, makes the final call, and documents their reasoning. Every agent action throughout is logged with timestamps, models used, and confidence scores—a complete, auditable record.

Three Core Risk-Control Levers

1. Codified ontologies (risk appetite made explicit)

Before agents can make useful recommendations, they need a clear definition of what "good" looks like. AIG codifies risk appetite into structured ontologies that define acceptable risk by line of business, geography, account size, and claims history. These aren't principles or guidelines—they're rules with numbers attached. If the claims ratio exceeds 85%, flag for human review. If the combined ratio projection crosses 100%, recommend decline. If geospatial aggregate exposure in a given area exceeds a defined threshold, route to a senior underwriter. Agents reference these rules constantly and can't override them—they can only flag exceptions. When humans override agent recommendations, they do so with documented reasoning that auditors can examine later. The agents are constrained; the humans retain judgment.

2. Escalation rules (when agents defer)

Not every decision is agent-ready, and the system is honest about that. Complex submissions—novel risk profiles, high-value accounts, cases where multiple agents flag concerns but can't reach consensus—route to humans immediately. AIG defines escalation triggers explicitly: if confidence score falls below 70%, route to the underwriter. If multiple agents flag concerns without reaching agreement, route to a senior underwriter. If account size exceeds a defined threshold, always require human review. If claims history looks unusual in either direction—too clean or too problematic—route to a human who can add context the data can't capture. These rules aren't static; they're tuned over time based on agent performance and actual loss ratio outcomes. If agents are consistently over-trusting certain submission types, the rules tighten. If they're routing too much to humans unnecessarily, the rules relax as confidence data accumulates.

3. Auditability (who did what, when, and why)

Insurance is a regulated industry where decisions get scrutinized. If a claim is denied, regulators can ask for the full decision trail: what information was reviewed, who reviewed it, what model outputs informed the recommendation, and why the final decision went the way it did. Orchestrated agents maintain that trail automatically. Every agent logs its reasoning, confidence scores, data sources consulted, and recommendation made. When a human overrides an agent or accepts its recommendation, that decision and reasoning gets logged alongside. AIG can show regulators a complete, chronological record: here's what the intake agent found, here's what anomaly detection flagged, here's the underwriter's note on why they accepted the risk despite the flag, here's the appeal process, here are the outcomes for similar accounts over the past three years. That level of transparency is only possible with orchestration-first architecture. Monolithic black-box models can't produce it.

The Mid-Tier Carrier Playbook

Most mid-tier carriers don't have AIG's engineering budget or data infrastructure. But the pattern is genuinely portable—the architecture doesn't require AIG's scale to work.

Start small and measure relentlessly. Don't try to orchestrate the entire book of business out of the gate. Pick one line—claims triage or low-complexity underwriting—and deploy agents there first. Measure loss ratio, cycle time, underwriter satisfaction, and claims accuracy before scaling to other lines. Twelve months of data from a contained pilot is worth more than broad deployment with no clear baseline.

Build the orchestration layer without building from scratch. You don't need custom models for everything. Use LLM APIs (OpenAI, Claude, Anthropic) for text understanding and summarization. Use commercial ML platforms (DataRobot, H2O) for classification and recommendation. Glue them together with a simple orchestration framework in Python that routes work between models based on complexity and confidence scores. The orchestration logic is the durable investment; the underlying models can be swapped as better ones become available.

Define risk appetite explicitly before agents touch anything. This is the step most carriers underinvest in, and it's where most implementations struggle. Before agents can run, underwriters—not IT—must codify what good looks like: what claims ratios are acceptable by line, what geographies carry elevated risk, what account characteristics should trigger escalation. Once those rules exist in a structured, queryable form, agents can reference them. Once they're automated, humans can focus on the exceptions that actually require judgment rather than clearing cases that follow obvious patterns.

Establish human-review triggers that err conservative at first. Define which decisions always route to humans: confidence score below 65%, multiple conflicting signals from different agents, high-value accounts above a defined premium threshold, novel risk profiles the system hasn't seen before, anything with regulatory sensitivity. Start with rules that send more to humans than necessary. Tune them down as trust builds and performance data shows where the agents are reliably right.

Measure the right things over 12 months. Cycle time from submission to underwriting decision. Submissions per underwriter per day. Loss ratio on agent-assisted decisions compared to fully manual underwriting. Appeal rate—how often are decisions successfully overturned? Agent trust rate—do underwriters actually use agent recommendations, or are they ignoring the system? If loss ratio rises, agents are being too permissive; tighten the rules. If underwriters consistently override agent recommendations, the agents aren't reliable yet—retrain or simplify the model. If cycle time doesn't improve, the orchestration layer has friction somewhere—trace the bottleneck and remove it.

Why This Matters for the Industry

Insurance profit has always come from underwriting quality—making better risk decisions more consistently than your competitors. The historical constraint is that good underwriters are expensive, hard to find, and can only personally process so much volume. Legacy rule-based automation tried to solve this with rigid logic: "If loss ratio exceeds 80%, decline." Rules are brittle; they fail at the edges and require constant manual updates. Agents plus orchestration are different. They learn from thousands of examples, surface patterns that humans miss at scale, and defer to human judgment on the cases that genuinely require it.

Orchestration also addresses the problem regulators have with AI in insurance. Traditional neural networks making underwriting decisions can't explain themselves—they produce an output without a legible reasoning trail. Orchestration sidesteps this: every agent decision is logged with the data it consulted, the rules it checked, and the confidence level it assigned. The reasoning is transparent and inspectable. That's not just a compliance advantage—it's a trust advantage with the humans actually using the system.

AIG's competitive edge isn't that they process more submissions than before. It's that they process more submissions in the same time with the same headcount, while maintaining—and in some lines improving—underwriting accuracy. Competitors that hire more staff and keep manual workflows will see cycle times and per-submission costs drift upward. Competitors that deploy orchestrated agents intelligently will compress both. By 2027, the gap between those two groups will be measurable in margin and market share.

The Bottom Line

Agentic AI in insurance isn't about replacing underwriters. It's about letting underwriters spend their time on judgment rather than busywork. AIG's model—orchestrated agents, explicit risk appetite, human decision authority, complete auditability—is the pattern the industry is converging on. Carriers that haven't adopted some form of orchestrated automation by 2027 won't be competitive on cycle time. And carriers that deploy agents without meaningful human oversight will face regulatory pressure they aren't prepared for.

The formula is straightforward even if the implementation isn't: agents exceptional in their specific domains, humans owning final decisions and accountability, orchestration routing work intelligently between them. AIG proved it works at real scale. The mid-tier carriers that build toward this pattern first will have a 12–18 month advantage before it becomes the baseline expectation across the industry.

Share:

On this page

MD

Marcus DeWitt

Staff Writer

Curated insights from the NEXAIRI editorial desk, tracking the shifts shaping how we live and work.

You might also like