The Fourth Wave: AI Is Writing the Code. Who’s Is Testing It?

Agentic Software Development, what I have called the Fourth Wave, is accelerating at a rate that could not have been predicted. Advances in frontier models, accelerated enterprise adoption, and what seems like endless capital investment have quickly made coding agents one of the fastest technology diffusions in history. AI coding agents (and copilots) have fundamentally changed the pace of coding. In many enterprise environments, 40–50% of code is now AI-generated, and adoption across development teams is approaching universal. Coding tasks that used to take days or weeks can now be done in hours.

But as we’ve outlined in prior discussions on the Fourth Wave, software delivery is not a single step. It’s a complex, interconnected system—spanning planning, coding, testing, securing, releasing, and monitoring—often in the enterprise across tens of thousands of people (and now likely even more agents), globally distributed and operating under strict compliance requirements.

And as coding accelerates, a new constraint is becoming clear: The bottleneck has shifted—and for many enterprises, it has shifted to testing.

Testing Hasn’t Kept Pace with Development (even before AI)

For years, organizations have invested in modernizing QA. Automation frameworks, CI/CD pipelines, and access automation tooling have improved. But at enterprise scale, the reality is far less advanced than many assume.

Depending on the industry and system complexity, 50–70% of testing effort is still manual. Yes, in the age of AI many organizations are still testing manually.  Even in organizations that have invested heavily in automation, coverage gaps are common—particularly across mobile, web, performance, and accessibility testing.

The data is starting to bear this out. Recent industry analysis found that AI-authored pull requests average significantly more issues than human-written ones — with logic and correctness errors and security findings both materially higher. Developer confidence hasn’t kept pace with adoption either: surveys consistently show the majority of developers aren’t fully confident deploying AI-generated code, and a meaningful percentage of teams have already rolled back releases because of it. More code is shipping. More problems are coming with it.

And where automation does exist, it often struggles to scale.

A consistent pattern emerges across enterprises:

  • Test suites become brittle as applications evolve
  • Scripts fall out of sync with the codebase
  • Maintenance overhead grows faster than coverage

Industry estimates suggest that 40–50% of automated test failures are not actual defects, but rather the result of changes in the application, environment instability, or test data issues.

In other words, a significant portion of testing effort is spent chasing noise.

AI Changes Test Creation—But Exposes New Constraints

AI is now making it dramatically easier to generate automated tests. What used to be one of the biggest barriers to automation—test creation—is rapidly disappearing.  But this does not solve the problem. It shifts it.

As test generation accelerates, so does complexity:

  • Test volumes grow exponentially alongside AI-generated code
  • Tests drift out of sync faster as applications change more frequently
  • Execution demands increase across devices, browsers, and environments
  • Answering the question of “why did the test actually fail” becomes even more complex

It has never been easier to create tests—and never been harder to ensure they’re actually reliable at scale.

The Real Bottleneck: Everything After the Test Is Written

In the Fourth Wave, the constraints in testing are no longer about creating tests. They are about orchestrating, executing and operating them.  Across large enterprises, a disproportionate amount of time is spent not on finding defects—but on understanding test results.  Studies and industry benchmarks indicate that up to 50% of QA and engineering time in automated environments is spent triaging failures—answering basic but critical questions:

  • Is this a real defect in the application?
  • Is the test out of sync with the code?
  • Did the environment introduce instability?
  • Is the issue related to data or configuration?

This is the hidden tax of modern automated testing. And it becomes even more pronounced in mobile and web environments, where variability is the norm. Device fragmentation, OS differences, network conditions, and third-party dependencies all introduce layers of complexity that are difficult to simulate—and even harder to debug.

At scale, this creates a compounding effect:

More code → more tests → more failures → more time spent diagnosing → slower delivery.

The Shift in Value Creation

In prior waves, the primary value in testing was moving from manual to automated. In the Fourth Wave, AI is making the automation easier than ever. The new value creation opportunity has shifted to a different set of capabilities:

  • Maintaining alignment between tests and rapidly evolving applications
  • Orchestrating intelligent, risk-based test execution across complex pipelines
  • Scaling execution across real-world environments that accurately reflect production
  • Accelerating root cause analysis to reduce time lost in failure triage.

This is where most enterprises are now constrained. They can build faster. They can even generate tests faster. But they cannot validate quality at the speed of development.

Why This Matters Now

Quality is no longer a downstream checkpoint. It is a gating factor ultimately accelerating the flow of value and thus delivering on the promise of the Fourth Wave.

When testing cannot keep pace:

  • Release cycles slow down despite faster development
  • Defects escape into production environments
  • Customer experience degrades
  • Regulatory and compliance risks increase

And critically, the promised gains of AI-driven development are never fully realized.

The Case for Autonomous Testing at Scale

This is why we are seeing a shift toward intelligent, autonomous testing platforms—solutions designed to address not just test creation, but the full lifecycle of testing in an AI-driven world.

Platforms like Digital.ai Testing focus on the new bottlenecks:

  • Keeping tests in sync through AI-driven self-healing
  • Executing at scale across real devices and environments
  • Orchestrating tests intelligently across pipelines
  • Reducing triage time through faster agentic root cause analysis

The Fourth Wave demands all of them.

The Strategic Implication

The Fourth Wave is redefining where value is created across the software lifecycle. Coding is no longer the primary constraint, the bottlenecks live upstream and downstream from coding.  In many organizations testing is one of those bottlenecks.  Not because of a shortage of tests—but because of the inability to manage, execute, and learn from them at scale.

The organizations that recognize this required innovation in testing will unlock the full potential of AI-driven development and will thrive in the Fourth Wave. Those that don’t will find themselves constrained by quality bottlenecks, even as their coding capacity continues to grow.

Because in this new environment:

Coding is easy. Quality is the hard part.

The winners in the Fourth Wave will not just generate more software. They will deliver high-quality, resilient applications at machine speed.

Smarter software. Agentic speed. Quality at scale.

You Might Also Like