We independently score your live AI agents against your own policy forms — and flag the answers that don't hold up before they become claims.
EasyEssence is an independent oversight function for the customer-facing AI agents that mid-market P&C carriers already have in production. On a bi-weekly cadence, we sample live conversations and score them against your own policy forms and escalation rules — using a six-dimension rubric built for insurance agent behavior. We detect drift as your agents evolve, and deliver monthly executive scorecards your leadership can act on. We also map our work to the NAIC Evaluation Tool exhibits, so the evidence your AIS Program references is documented as it accumulates — not assembled in a panic when an examiner asks.
The NAIC Model Bulletin is rewriting what regulators expect from carriers deploying AI.
“Outcome-only oversight does not capture responsible AI governance — regulators want process-level continuous behavioral oversight.”
— Commissioner Yaworsky · NAIC Spring 2026
This is the layer carriers most often miss. Performance dashboards tell you the conversation felt good. Script QA tells you the playbook was followed. Neither tells you whether the answer was actually right against the policy form.
The Oversight Gap
Most carriers can tell you their AI is running. Few can tell you whether it's giving the right answers.
Performance & CX Analytics
Sentiment, resolution rate, talk time, CSAT, conversation intelligence. Tells you how the conversation felt and whether the system was up.
Doesn't evaluate whether the answer was actually right.
Script & Keyword QA
Script adherence, required-disclosure presence, keyword matching, prohibited-language flags. Tells you whether rule-based formatting requirements were satisfied.
Can't detect when an agent sounds right but is factually wrong.
Behavioral Risk & Decision Integrity
Are the answers actually correct? Independent evaluation of agent decisions against your policy forms, regulatory expectations, and the NAIC AIS Program framework. Six-dimension rubric, scored bi-weekly.
Flags what doesn't hold up before it becomes a claim.
AI Governance Platforms
Not behavioral evaluation.
Enterprise governance fabric — AI inventory, policy enforcement workflows, drift signals pulled from MLOps pipelines, framework-mapped audit documentation for EU AI Act / NIST / ISO 42001.
Governs that AI is in use; doesn't evaluate what the AI is saying against your policy forms.
The three layers are complementary, not competitive — most insurance AI deployments will need all three. A governance platform may be a fourth piece in the stack, but it doesn't substitute for any of them.
The Cost of a Wrong Answer
A confident AI agent can sound professional while misrepresenting a policyholder's actual terms — creating liability the carrier doesn't see until it's too late.
Regulatory Fines
Misrepresented terms trigger Market Conduct Exams.
Unintended Coverage Liability
Overstated benefits can bind the carrier in court.
Claims Leakage
Wrong coverage amounts compound across thousands of interactions.
Erosion of Regulatory Trust
Repeated inaccuracies give DOIs grounds for deeper examination.
I was rear-ended last week and my car is at the shop. Does my policy cover a rental car while it's being repaired?
Absolutely — your auto policy includes rental reimbursement coverage at $50/day for up to 30 days while your vehicle is in the shop. I can help you get that set up right now.
Pass: Performance · Script Compliance · Keyword Scan
The agent cited $50/day for 30 days. The customer's actual policy endorsement shows $30/day with a 14-day cap. Wrong coverage tier applied.
Six Dimensions of Agent Behavior
The six-dimension rubric is fixed; what we calibrate is the anchor language for your specific agent and policy forms — a claims chatbot scores against different correctness anchors than a policy Q&A bot.
How We Work
Not a one-time audit. A bi-weekly rhythm that catches drift as it emerges — and a documented record of when, what, and how it was caught.
Sample
Live conversations pulled bi-weekly — random plus risk-triggered based on coverage language and escalation signals.
Score
Each conversation evaluated across six rubric dimensions against actual policy documents and escalation rules.
Flag
Below-threshold interactions flagged, classified by failure type, and ranked by severity for human review.
Report
Monthly executive scorecards with pass rates, trends, and risk exposure — built for the boardroom and the regulator.
Improve
Actionable recommendations for prompt and escalation refinements. Then we sample again.
From the Founder
For ten years, I've watched what happens when oversight gets treated as something that can wait. I've seen the consent orders. I've seen the remediation budgets that run into the billions. I've seen executive teams spend years rebuilding what should have been built right the first time.
Insurance carriers running customer-facing AI agents in production are early in that same story — and the structure is similar. Technology has outrun the function built to assure it. The carriers who treat this as a checkbox will repeat that arc. The carriers who treat it as a real function — independent, documented, defensible — will not.
I built EasyEssence to be that independent assurance function. Not the authors of your governance program — the function that produces the evidence your program references. My career has been split between insurance and finance — at one of the country's largest insurance brokerages, and across two senior program roles at one of the country's largest banks (a CEO-reviewed liquidity forecasting capability, and delivery for the firm's federal consent-order remediation). That's the conventional credential.
The less conventional one is this: I have lived through the aftermath of this pattern more than once. I know what it costs. I know what it asks. I'd like to help you write a different ending — quietly, properly, before anyone asks. PMP-certified.
— Phillip Kangari
Mapped to the NAIC Evaluation Tool
Our scoring framework aligns with all four NAIC exhibits — the same questionnaire regulators use during market conduct exams. Every scorecard we produce is documentation your compliance team can hand directly to examiners.

Independence is the asset.
An independent assurance function. Producers of scored conversations, drift reports, monthly executive scorecards, and the evidence record your AIS Program references.
The authors of your AIS Program. Your representative to regulators. A legal or compliance opinion. A real-time guardrail in your agent's request path.
If we wrote your Program, we couldn't credibly evaluate the agents that operate under it. If we represented you to regulators, we couldn't credibly produce evidence a regulator would accept.
The Questions That Bring Carriers to Us
"What's our liability exposure?"
For the leaders who own risk. Your AI agents are making coverage statements on your behalf. If they're wrong, you own the outcome.
"How do we prove our oversight is real?"
For the leaders facing regulators. When the NAIC Evaluation Tool arrives, you need evidence that your AI oversight is operating, not theoretical.
"Can we scale without adding headcount?"
For the leaders building AI strategy. Independent oversight lets your engineering team focus on building while someone else watches the output.
"Are our agents making promises we'll have to honor?"
For the leaders who own claims and operations. A confident agent can overstate coverage and bind the carrier — months before anyone sees the claim.
Let's Talk About Your AI Agents
Tell us what your agents handle, how they're built, and where you think the risks might be. No commitment required.
Independent assurance for insurance AI — evidence your board, your regulator, and your E&O carrier can rely on.