November 25, 2024

Breadth vs. Depth in AI Testing: Why It’s Not Just About What You Test, But How

In AI testing, it’s crucial to focus on both breadth and depth. Learn how variance can reveal unexpected user behaviors and improve your AI’s real-world readiness.

Team Avido

Breadth vs. Depth in AI Testing: Why It’s Not Just About What You Test, But How

In the exciting (and occasionally terrifying) world of AI development, most teams spend a lot of time thinking about breadth. How many different things could users possibly throw at this tool? Have we covered all the edge cases? Will users understand what we built this thing to do, or will they try to make it do something absurd—like, say, write poetry in Klingon?

But here’s the thing: it’s not just about how many different use cases your AI can handle. There’s another layer that often gets overlooked. It’s the way those same use cases can show up in ways you’d never expect. This is what we call variance—and ignoring it can lead to some interesting (read: embarrassing) surprises.

A Sneaky Lesson in Variance

Let me tell you a story. I was part of a team that built an AI assistant to help people manage their finances. You know, give insights on their spending habits, help them set budgets, maybe throw in a few financial literacy quizzes to make them feel like budgeting rockstars. What we didn’t want, though, was for people to use the tool as a generic “do everything” assistant. Specifically, we didn’t want it to turn into a coding sidekick.

So we created a whole bunch of filters. If anyone asked for code—boom!—the assistant would politely decline. We tested this in all sorts of ways: “Please write me some code,” “Can I get a script for X?” Every variation we could think of. And guess what? We nailed it. No code, no problem. We were high-fiving all around, thinking we were untouchable.

Then one night, our CTO (because it’s always the CTO) was playing around with the system. Instead of using the word “code,” they asked, “Hey, can I get a script for this?” And, to our horror, out popped a block of code.

Turns out, we had been laser-focused on the word “code” but hadn’t thought much about “script.” Same use case, different wording. And our super-smart AI assistant? Yeah, it didn’t catch that. Fun times.

Why You Can’t Ignore Variance

Here’s what that taught us: it’s not enough to just cover a wide range of scenarios (what we call breadth). You’ve got to dig deeper. Even within the most common use cases, users will find all sorts of creative ways to ask for the same thing. It’s like they’re playing a game of “stump the AI,” and believe me, they’re good at it.

At Avido, we believe in testing for this kind of variation. Not just, “Can the AI handle a budget query?” but “Can it handle someone asking for a budget in five different ways?” It’s this attention to detail that stops those little slip-ups from turning into big problems down the road.

Automating the “But What If” Scenarios

So how do we do it? We automate task variants. Imagine asking the AI for the same thing, but in ten different ways—each with different wording, intent, or even a completely offbeat phrasing. Does it still respond the way we expect? If not, we know there’s work to do. And trust me, you’d be surprised how often these little tweaks make a big difference.

This isn’t about over-engineering. It’s about real-world readiness. If your AI can’t handle the tiny changes in how people ask for things, it’s only a matter of time before it gets tripped up in the wild.

Wrapping Up

In AI development, it’s tempting to focus on how many different things your tool can do. But just as important is figuring out all the ways your users might ask for the same thing. That’s where variance comes in—and that’s where we come in. At Avido, we make sure your AI not only works but works in the real world, where people love to test the boundaries (sometimes without even realizing it).

So next time you’re thinking about AI testing, don’t just think about breadth. Think about depth. And while you’re at it, maybe think of a few more synonyms for “code.”