The Point of View: Quality Reimagined in the Agentic Era
A thesis anchor for Quality Engineering leaders
This Substack is about modernizing QA by modernizing how release confidence is produced.
Executive Summary
AI is showing up in two places: inside software delivery, and inside the software itself. QA feels both shifts at once.
First, AI-augmented coding makes shipping change cheaper and faster. Release cadence tightens, and volume increases. If you do nothing, the current QA operating model relies heavily on human analysis between test runs. It cannot keep up with the pace of production.
Second, modern architectures increasingly include agentic workflows. These components plan steps and adapt behavior at runtime. Traditional deterministic testing is still necessary, but it cannot be your only signal anymore.
The response is to evolve QA from a testing function into an Assurance System. Execution is the easy part. The hard part is producing a defensible decision. The modern system must automate the analysis and evidence packaging that sits around the test run.
One line to remember: Quality is not just “more testing.” Quality is the system that produces a release decision with evidence, at the pace of delivery.
What Changed
For years, software delivery had a practical limit: delivering change was expensive. Even with good automation, volume was constrained by human throughput across design, coding, and reviews.
That limit is moving.
AI-augmented development reduces the cost of getting changes into production. Teams can generate more variants, refactors, fixes, and features with less friction. The business pushes for smaller, more frequent deployments because it feels safer.
At the same time, the software itself is changing. Products now include agentic capabilities. Instead of following a fixed script, these systems choose actions based on context, tools, and policies. Output can vary even when inputs look similar.
This is why Quality Engineering needs a new point of view.
The Moment the Old Model Breaks
It is Thursday afternoon. The release cadence used to be bi-weekly. Now it is weekly. Product wants twice a week because “we can do smaller changes safely.”
You have more automation than ever. The pipeline runs fast. The dashboards look busy.
And yet, confidence is low.
The readiness call starts, and the questions are predictable but difficult to answer: What actually changed since the last safe build? What is the blast radius? Do we have evidence this specific risk is covered?
Someone shares a pass rate. It’s green overall, but nobody can explain what the red failures mean for this release. Someone says they are flaky. Someone wants one more run. The decision turns into a negotiation based on intuition rather than data.
At that point, it becomes clear: The bottleneck is not test execution. The bottleneck is confidence production.
You can run a million tests, but if the results require hours of human interpretation to understand if the product is safe, you are still too slow.
The Key Reframing: The QA Org is an Assurance System
Most organizations talk about QA as a department or a set of activities. A more useful framing is that QA is an Assurance System.
It is the set of people, partners, practices, and tools that exist to answer one question repeatedly: Is this change safe enough to ship, and what is the evidence?
The system already exists today. It just has two gaps that are exposed by this new era:
Speed: It relies on too much manual work (analysis, data prep, triage) to produce readiness decisions at the speed of modern delivery.
Coverage: It lacks standardized methods to evaluate non-deterministic (agentic) workflows.
The mandate is not to “optimize testing.” It is to evolve the system so it can produce decisions as fast as developers ship code.
The Solution: Two Lanes, One System
This is not about splitting QA into two disconnected worlds. It is about extending the Assurance System to cover a broader surface.
Lane 1: Deterministic Quality
This is the discipline we know. UI, API, mobile, and data flows where expected results are stable. The goal here is efficiency.
We must apply AI to the “human middleware” steps. Automate change impact analysis, test data generation, and failure triage so that a “Pass” result actually equals “Ready to Ship” without a manual interpretation phase.
Lane 2: Agentic Quality
This is the new discipline. For agentic workflows, the question is not “did it match the expected value?” but “did it behave within acceptable boundaries?”
This requires new methods:
Scored Evaluations: Grading outputs against policies and reference sets rather than exact matches.
Constraint Checks: Verifying that the agent did not attempt prohibited actions or tools.
Drift Monitoring: Detecting if behavior is shifting over time compared to a baseline.
One readiness decision needs both signals. One Assurance System owns both.
What “Good” Looks Like
A modern Assurance System behaves like a closed loop. Every meaningful change triggers the same process:
Ingest: It understands what changed in the code and behavior.
Plan: It determines the blast radius and what coverage is required.
Execute: It verifies across both Deterministic and Agentic lanes.
Decide: It produces a clear Verdict + Evidence Pack that explains why.
Humans stay in the loop, but their role shifts up the stack. Less time is spent on repetitive execution or arguing about flaky tests. More time is spent defining policies, risk boundaries, and coverage intent.
What Comes Next
This series will double-click on the practical implementation of this view.
The Assessment: A candid walkthrough of today’s assurance lifecycle to identify where it breaks under load.
The Mechanics: What the modern loop looks like in practice, including the capabilities that matter most to automate the high-friction steps (like data and environments).
The Migration: How to move from “testing” to “assurance” without boiling the ocean, starting with the highest-risk workflows.
The goal is simple: Keep shipping more change, while making release confidence faster, clearer, and defensible.
