Trusted AI is Governed AI - Testing and Versioning Intelligent Agents

The testing approaches that got you HERE - aren't going to get you THERE

Jul 01, 2025

If you’re looking for a tough AI-related challenge, try finding a definitive source for testing or versioning Intelligent Agents. While there’s no shortage of articles, white papers, and vendor toolkits, I’ve yet to find what I would call true industry-standard guidance.

UPDATE: As I was finalizing this blog entry (week of 06-30-2025), OWASP® announced the launch of the OWASP® AI Testing Guide (AITG) — a new, community-driven project focused on the security testing and assurance of AI systems. This could be an important step toward establishing industry norms.

Testing and Versioning - Defining the Problem Space

Testing is fundamental to building trust in any system - but when it comes to Intelligent Agents, the stakes and complexity are much higher.

Let us revisit the commonly accepted definition from Wikipedia:

An intelligent agent is an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge.

Look closely at the italicized sections. These capabilities - perceiving environments, acting autonomously, learning over time - set Intelligent Agents apart from traditional software systems. And they raise an important question:

How do you test something designed to behave like a human - and learn like one, too?

Traditional methods such as unit testing or regression testing rely on clearly defined inputs and predictable outputs. But what happens when your system evolves, adapts, or “thinks” in ways that weren’t explicitly programmed?

The Versioning Dilemma

Let’s assume you manage to create a solid testing process for your Intelligent Agent and deploy version 1.0 into production.

Then along comes version 2.0, which has acquired new knowledge or behaviors based on updated training data. Suddenly, it starts producing different (though potentially better) outputs.

So, how do you validate this new version?

Are the changes improvements or regressions?
Are outputs now inconsistent with compliance expectations or ethical constraints?
How do you maintain trust while allowing the system to evolve?

These are questions we’ll all need to answer - and quickly - as Intelligent Agents continue to mature.

Where We Are Today

In our “Trusted AI is Governed AI” methodology, the first step in any evaluation is to look for established industry standards. (For more on this, see my earlier post: - Trusted AI - Embracing Industry Standards).

But when standards don’t yet exist, our best move is to seek out whatever guidance is available and apply it pragmatically to our current situation.

Noteworthy Resources

Full Disclosure: Intelligent Agent testing hasn’t been a primary focus of my work - yet. But here are a few resources I came across while preparing this post that I believe are worth exploring.

OWASP AI Testing Guide (AITG)

This newly launched initiative could become a foundational resource. The early drafts are available for public review and community input. It’s still evolving, but its release is a positive signal that formalized guidance is on the way.

Vendor-Specific Approaches

In the absence of universal standards, vendors have been filling the gap with their own testing frameworks. While vendor-aligned, many offer useful starting points. Look for documentation from major cloud providers and AI platform vendors.

Explainable AI (XAI)

Still in its early stages, Explainable AI aims to make machine-generated decisions understandable in natural language. Over time, XAI could be integral to how we test and validate agent behavior - particularly in regulated industries.

arXiv.org Research Papers

Hosted by Cornell University, arXiv is a treasure trove of leading-edge academic research on AI and Intelligent Agents. While much of it is theoretical or experimental, it’s an essential resource for staying ahead of what's next.

Final Thoughts

The path forward for Intelligent Agent Testing and Versioning is still evolving. If we want to trust them - we must learn how to test them and govern their evolution.

How can I help you “Keep Pace” with AI and Intelligent Agents?

With the rapid evolution of AI, a strong AI governance function is mandatory. I offer a range of consulting, governance and architecture services designed to help organizations successfully achieve their AI Goals.

I also host a free weekly webinar every Sunday at 9:00 AM EST, where I break down the week’s most important AI news and articles—and discuss what they mean for you and your business.

The webinar streams LIVE on the following sites:

www.facebook.com/keeppace
www.twitter.com/keeppace
www.linkedin.com/in/keeppace

And the history (and playlists) of previous videos is available at:

www.youtube.com/keeppace