March 02, 2026

80% of Banks Use AI. A Similar Share See No Bottom-Line Impact.

McKinsey's latest report on agentic AI in banking puts a number to a pattern we've been seeing: most banks are stuck in pilot purgatory. Our analysis of what's missing from the conversation.

Team Avido
Team Avido
80% of Banks Use AI. A Similar Share See No Bottom-Line Impact.

A McKinsey report published last month on agentic AI in banking surfaces a striking paradox: roughly 80% of financial institutions now use some form of artificial intelligence, yet a comparable proportion report no significant impact on their bottom line. The consultancy calls it “pilot purgatory,” a state in which banks deploy AI across narrow use cases without capturing the transformative value the technology promises.

The report makes a compelling case for strategic change. It advocates C-suite alignment, cross-functional collaboration between COOs, CTOs, and chief risk officers, and a shift from point solutions to end-to-end domain transformation. One Asian bank, McKinsey notes, mapped 600 operational processes to identify the ten worth transforming first. JPMorgan Chase has made AI adoption a CEO-level priority, backed by internal marketing campaigns, usage tracking, and executive training.

The strategic direction is increasingly well understood. What remains less examined is a more practical question: as banks scale their AI portfolios from a handful of pilots to dozens of production applications, how do they actually verify that those systems are performing as intended?

A gap between building and proving

Consider the typical bank with a growing AI footprint. A customer service chatbot, a credit memo drafting agent, a fraud detection model, several internal knowledge tools, each developed by a separate team with its own standards. The chatbot was last tested formally three months ago. The credit agent was reviewed by a single analyst. The fraud model carries proper machine learning metrics, but no one has assessed whether its generated explanations satisfy the compliance team responsible for defending them to regulators.

When leadership asks whether these applications are production-ready, the answer tends to be uncertain, not because the technology isn't capable, but because there is no systematic framework for demonstrating quality across stakeholders.

McKinsey's David Deninzon describes what he calls the "fingers and toes problem," the tendency for banks to automate fragments of existing workflows rather than redesigning them entirely. Robotic process automation ran into the same limitation a decade ago. A similar dynamic is playing out with AI quality assurance. Engineering teams can measure technical performance, latency, uptime, benchmark accuracy. But the criteria that matter most for production readiness often sit elsewhere in the organization.

Whether a credit memo meets documentation standards is a question for the credit team. Whether a chatbot's responses align with customer experience guidelines falls to the service organization. Whether a model's outputs can withstand regulatory scrutiny is a compliance concern. These domain experts can identify a poor output in seconds, but they typically have no mechanism for translating their expertise into formal evaluation criteria. Quality ownership, as a result, tends to default to no one.

What scaling does to the problem

McKinsey projects that AI could create 40 to 70 percent capacity gains in banking operations, a figure that implies not a few pilots but a substantial portfolio of AI-powered applications running in parallel across the institution.

The quality challenge compounds at that scale. If each application requires its own bespoke evaluation setup, designed and maintained by the team that built it, testing overhead grows linearly with the number of systems deployed. Engineering, rather than the AI itself, becomes the constraint on production readiness.

The alternative is to distribute quality ownership. When domain experts and business users can define evaluation criteria and run assessments in their own terms, without writing code, the bottleneck shifts. Testing capacity scales with the portfolio rather than against it.

The governance gap

The McKinsey report closes with an analogy offered by the COO of a large Asian bank: "AI is not the pilot replacing the crew. It is the new engine that makes the aircraft go farther and faster with the same team on board."

It is an instructive comparison. New engines come with maintenance programs, inspection protocols, and performance checklists designed for the full crew, not just the engineers who installed them. The same logic applies to AI in banking. The strategy conversation around agentic AI has matured considerably. The governance and quality assurance conversation, the infrastructure that turns promising pilots into trusted production systems, has not yet caught up.

The institutions that bridge that gap, building quality processes that are systematic, cross-functional, and scalable, will be the ones that move beyond pilot purgatory. Not necessarily because they deploy better models, but because they can prove those models work.

Source: ["The paradigm shift: How agentic AI is redefining banking operations,"](https://www.mckinsey.com/capabilities/operations/our-insights/the-paradigm-shift-how-agentic-ai-is-redefining-banking-operations) McKinsey & Company, February 2026.

Stay Ahead with AI Insights

Subscribe to our newsletter for expert tips, industry trends, and the latest in AI quality, compliance, and performance— delivered for Financial Services and Fintechs. Straight to your inbox.

We care about your data. Read our privacy policy.