What Happened at KPMG — And Why It's Your Warning
KPMG set a mandatory target: 75% AI usage across the firm. Its people responded by hitting the number. One accounting team member asked the AI what the weather was. Another had it generate filler text. The metric was met. No workflows changed.
This is not a KPMG failure. This is what happens when a firm measures the wrong thing.
According to KPMG's own 2026 Global AI in Finance Report, active AI use has doubled — climbing from 30% to 75% in two years. At the same time, only 42% of organizations can trace or explain the AI-assisted financial decisions they've made. The math is simple: rising usage does not equal rising integration. Your employees can hit the adoption target without changing how a single deliverable gets built.
For any firm rolling out AI mandates or tracking adoption metrics, this matters now. Not in six months when the initiative stalls. Now.
The Three Measurement Tiers — Which One Are You Using?
Most firms measure AI adoption using one of three approaches. They work differently, and only one actually tells you whether AI is changing your firm.
Tier 1: Usage Counts (Gameable)
This is what KPMG's employees hacked. Usage metrics count activity: prompts submitted, tool logins, documents processed. They're easy to track. They're also the easiest to fake.
A partner can hit a 75% usage target by asking the AI trivial questions. A staff member can run a standard template through the tool every Friday and call it integration. A tax manager at a regional firm could report that 90% of her team used an AI research tool, when in fact they were only using it to generate boilerplate client letters, not to actually change the research workflow. The metric reports success. The work hasn't changed.
Usage counts tell you whether someone is using the tool. They do not tell you whether the tool is changing how they work.
Tier 2: Task Integration (Harder to Game)
Integration metrics track whether AI has been woven into actual workflows. Did the team redesign the client intake process to use AI? Did the tax team reorder the return assembly steps now that research is faster? Did the staff member shift from manual data entry to reviewing AI output?
At one 30-person regional CPA firm in the Midwest, the partners thought they had achieved strong AI integration after six months. They pointed to new procedures and training. But when audited, only one team had actually changed its standard operating procedures. The other three teams had downloaded the tool and logged in (usage metric hit). They hadn't redesigned a single workflow (integration wasn't real).
These metrics require actual process change, not just activity bumps. A firm can't claim integration without documentation. They can't report workflow updates without showing new standard operating procedures.
Integration metrics are better than usage metrics. They're still not the final truth.
Tier 3: Outcome Change (The Real Measure)
Only outcome metrics tell you whether AI adoption is working. Outcome metrics measure what changed as a result: faster client deliverables, higher quality (fewer revisions, fewer errors), lower cost per engagement, higher margin on services that use AI.
At Baker Tilly's Colorado practice, when they implemented an AI research tool, they measured cycle time for standard tax returns before and after. Before: 6.5 days average, 3.2 revisions per return. After: 4.8 days average, 2.1 revisions per return. These numbers made the adoption real. They became the basis for scaling the tool across offices.
These are the metrics that matter to a firm's business. And they're the hardest to argue with. You can't claim a 20% improvement in tax return cycle time without the data to back it. You can't report a 15% reduction in rework without audit trails.
Outcome metrics require baseline measurement (before AI) and ongoing comparison (after implementation). They take work to track. They're impossible to game.
| Measurement Tier | What It Counts | Ease of Gaming | What It Actually Tells You |
|---|---|---|---|
| Usage | Prompts, logins, activities | Very easy | Whether people are using the tool |
| Integration | Workflows updated, processes redesigned | Harder | Whether AI has been embedded in firm work |
| Outcome | Speed, quality, cost, margin | Nearly impossible | Whether AI adoption is delivering business value |
Why Governance Matters More Than Your Tool
Gartner found that AI initiatives in finance succeed roughly 50% of the time. The firms that win are not winning because they chose a better tool. They're winning because they built governance structures that force outcome measurement.
High-performing finance teams do four things that struggling teams don't:
1. Document the baseline. Before implementing AI, measure the current state. How long does a client tax return take today? How many revisions does it go through? What's the cost per engagement? What percentage of client deliverables get flagged for quality review? These numbers become the before snapshot.
2. Implement AI in one controlled workflow first. Do not roll out AI to all tax returns on day one. Pilot it on one type of return (partnership returns, S-corp returns, a specific client segment) where you can track the outcome shift carefully. Measure the after state.
3. Compare outcomes explicitly. Calculate the difference. If your baseline shows 3.2 client revisions per return and your pilot shows 2.1 revisions, you have data. If cycle time dropped from 6.5 days to 4.8 days, you have evidence. These are not opinions. They're measurements.
4. Require documented review before signing off on AI work. If an AI tool prepared a client return or tax memo, someone with CPA credentials reviewed and certified it. Document that review. Create an audit trail. Make it clear that the human did not skip the judgment step.
These four steps take work. They're not glamorous. They're also the reason some firms' AI initiatives pay off and others become compliance theater.
Why This Matters for Your Firm Right Now
If your firm set an AI adoption target (usage metric) without building outcome measurement first, you've created the conditions for the KPMG problem. Your people can hit the target. Your work won't change. Six months from now, you'll report strong adoption and weak ROI simultaneously. Both will be true.
The fix is not to abandon the AI tool. The fix is to measure the right thing from the start. Before you scale any AI initiative, know what you're measuring. If it's activity counts, you're measuring compliance with a mandate. If it's outcomes, you're measuring whether the initiative works.
Download the measurement framework: Use this Adoption Measurement Checklist to build outcome-based metrics for your firm's next AI pilot. Available free in Nexairi Dispatch.
Real Adoption at Work: The Wilson Martinez Example
A 22-person firm called Wilson Martinez Accounting piloted AI research tools last fall. They started with a baseline: their senior accountant took 4 hours to research IRS revenue rulings for complex partnership structures. After implementing AI for initial research and drafting, the same research task took 1.5 hours, with better documentation. That's real adoption. They knew what changed because they measured it.
What Should You Do This Week
Pick one workflow where your firm is using AI (or planning to). Document the answer to three questions:
Question 1: What metric are you actually tracking right now? Is it usage (prompts, logins)? Is it integration (process redesign)? Is it outcome (speed, quality, cost)? Write down the specific metric. If you don't have a specific metric yet, that's the first gap to fix.
Question 2: Can you game this metric without changing the actual work? If the answer is yes, you're measuring the wrong thing. Usage metrics are gameable. Outcome metrics are not.
Question 3: Do you have a baseline? Before this AI tool, what was the state? How long did this workflow take? How many revisions, errors, or rework cycles happened? If you don't have a before number, you cannot calculate the after difference.
If you answer "yes" to question 2 or "no" to question 3, you have your starting point for the next sprint. Shift one AI workflow to outcome-based measurement. Build a baseline. Pilot in a controlled way. Compare explicitly. That's how adoption becomes real.
Sources
- KPMG Global AI in Finance Report 2026 — Active AI use doubled from 30% to 75%; only 42% of organizations are assurance-ready
- Going Concern — KPMG Employees Maliciously Comply with AI Usage Requirements — Documented case of metric gaming
- CFO Dive / Gartner — Four Tips for Building a Strong Finance AI Bench — AI initiatives succeed 50% of the time; governance structure differentiates winners
- McKinsey — From Promise to Impact: How Companies Can Measure and Realize the Full Value of AI — Five-layer measurement framework for AI success
Related Articles on Nexairi
- From AI Curiosity to a Real Firm Workflow — Five-stage maturity model for AI adoption
- Ramp Stack for Accounting Firms — Purpose-built AI for month-end close automation
- Most Companies Use AI in Finance. Most Can't Audit It — Closing the assurance-readiness gap before your next audit
Free Assessment
Is your firm ready for AI?
A 5-minute governance check for CPA firms using ChatGPT, Copilot or AI accounting software. Get your score and your top gaps — free.
Jim Smart is the founder and editor in chief of Nexairi. A Business Intelligence Developer with experience building data systems for Verizon, U.S. Army operations, and enterprise finance teams, Jim spent years turning complex data into decisions that executives could act on — dashboards, forecasting models, and automation pipelines across telecom and government contracting. He founded Nexairi to apply that same clarity to AI: making emerging technology understandable and actionable for the operators, accountants, and business owners who need it most. Jim holds GenAI certifications from the University of South Florida Bellini College of AI and completed Springboard's Data Science Career Track.

