You're About to Invest in AI for Testing. Do This First.

Most QE teams are automating the wrong 20%. Here's how to find the other 80%

Feb 15, 2026

Every QE leader I talk to right now is under pressure to “adopt AI.” The mandate comes from the CTO, from the board, from the analyst reports piling up in their inbox. And most of them are about to make the same mistake.

They’re going to pick a tool. They’re going to run a pilot on their UI regression suite. They’re going to report early wins. And eighteen months from now, they’re going to wonder why their testing operation still feels the same.

I’ve seen this pattern play out enough times to know why it happens.

The problem isn’t the tool. It’s the target.

When most organizations say “we’re adopting AI in QE,” what they mean is: we’re going to use AI to speed up test execution. Maybe auto-generate some Selenium scripts. Maybe add a copilot for writing test cases.

That’s not wrong. But it’s optimizing a stage that typically represents 15-20% of total QE effort.

The other 80% — coverage design, data preparation, environment setup, results analysis, defect triage, reporting, regression management — stays untouched. Manual. Expensive. Invisible.

A team can report 70% automation and still spend the majority of its budget on manual work. The metric everyone watches measures the wrong thing.

The real question isn’t “which AI tool should we buy?”

It’s: where in our testing operation would AI actually move the needle?

And you can’t answer that if you don’t know where the effort actually goes.

I’ve worked with QE organizations that were convinced their bottleneck was test execution speed. When we actually mapped the operation — every lifecycle stage, every test type, every release type — we found the real constraint was somewhere else entirely. Data preparation eating two days per release. Environment contention blocking three teams simultaneously. Results analysis where a senior engineer spent half their week manually triaging false failures.

These aren’t glamorous problems. They don’t make for exciting vendor demos. But they’re where the money is.

Why discovery has to come first

Here’s what I mean by discovery: before you evaluate a single AI tool, map your current testing operation from end to end. Not the process diagram on the wiki — what actually happens.

For every test type your organization performs, trace it through all ten stages of the testing lifecycle:

Coverage design — how do you decide what to test?
Test case creation — who writes them, how long does it take?
Script development — what’s automated, what’s maintained by hand?
Data preparation — where does test data come from?
Environment setup — how long do you wait?
Execution — this is the stage everyone focuses on
Results analysis — how long does triage take?
Defect management — what’s the false positive rate?
Reporting — can you answer “are we safe to ship?”
Regression management — is the suite growing or grooming?

For each stage, capture who does the work, how they do it, how long it takes, and what it costs. Then compare that to what’s actually possible today with modern tooling and agentic AI.

When you do this honestly, patterns emerge. You find lifecycle stages where the gap between current state and art of possible is enormous. You find stages where a single intervention would cascade through the entire operation. You find that the thing you were about to automate wasn’t actually the bottleneck.

That’s the fact base. Without it, every AI investment is a guess.

The two-phase approach

I frame this as a two-phase journey:

Phase 1: Discover. Map the operation. Build the fact base. Identify where the gaps are largest and where AI would deliver the most impact. Sequence priorities by dependency and ROI.

Phase 2: Transform. Match findings to solutions. Run proof of concepts against your actual environment. Train the team. Deploy. Optimize.

Most organizations skip Phase 1 and jump straight to Phase 2. They pick a tool because a vendor gave a compelling demo, pilot it on the most visible test type, and declare success based on a narrow metric. Meanwhile, the operating model stays the same.

Phase 1 takes 2-3 hours per application. Phase 2, done right, takes months. But Phase 1 is what makes Phase 2 successful.

I built a framework for Phase 1

I’ve put together a structured discovery document that walks you through this process. It covers all five dimensions of a testing operating model — crown jewels, release types, test phases, test types, and the full ten-stage lifecycle — with templates for both deep-dive and lightweight assessment.

It includes:

A routing section so you only complete what’s relevant
Full and lightweight lifecycle assessment templates for each test type
An “art of possible” comparison for every lifecycle stage showing what AI-enabled testing looks like today
A priority matrix to sequence where to invest first
A results summary template you can take to your CTO

This isn’t a maturity model. There’s no score at the end. It’s a fact base — the kind of clarity you need before you spend a dollar on AI tooling.

Get the discovery framework

I’m releasing this as a free PDF — it’s v0.9, a beta. I want feedback from practitioners who actually run QE organizations.

Subscribe to this newsletter and I’ll send you the framework directly. It’s free — just drop your email.

If you complete it and want a second opinion on your findings, or if you need help turning them into a funded roadmap, I’m happy to talk. You can book a discovery call at qualityreimagined.com.

The AI tools are getting better every month. The organizations that win won’t be the ones that adopted first — they’ll be the ones that knew where to point them.

Richie Yu works with QE leaders navigating the shift to agentic AI. His focus is on the operating model — not just the tools, but how testing work actually flows and where modernization delivers measurable returns.

Quality Reimagined

Discussion about this post

Ready for more?