The AI Arms Race Has a Fatal Flaw. Are You Measuring What Matters?

In the corporate AI arms race, the pressure is immense: adopt the latest, most powerful models or risk being left behind. Shareholders expect it, vendors promise breakthrough returns, and the fear of missing out is palpable.

But this race is being run on a dangerously flawed metric. We are rewarding vendors for a narrow, superficial definition of “performance” – raw accuracy and photorealistic outputs – while ignoring the true measure of enterprise-grade AI: resilience.

This incentive structure is creating a hidden crisis. To win deals based on dazzling demos, vendors are building ever-more complex ‘hybrid’ models that fuse multiple AI architectures. This complexity drives the impressive results, but it also creates an opaque and unpredictable system where risks don’t just add up; they multiply.

This isn’t a theoretical problem. It’s a real-world, $440 million problem.

On 1 August 2012, the financial firm Knight Capital lost $440 million in 45 minutes and was driven to bankruptcy. The cause was not one faulty algorithm. It was the unintended, destructive interaction between a new trading system and a piece of old, supposedly defunct code. The components, safe in isolation, created a devastating feedback loop when combined. As the US Securities and Exchange Commission (SEC) later confirmed, this was a failure of systems interacting in an unforeseen way, running “without any substantial human intervention”1.

This is the blueprint for a future hybrid AI disaster. We are incentivising the creation of systems where one component could be silently failing, while another expertly masks the error, producing outputs that are plausible, confident, and catastrophically wrong.

The Governance Gap: When the Vendor’s “Best” is Your Biggest Risk ⚖️

Most leaders rightly protest that they have no control over a vendor’s proprietary architecture. This is true, but it’s also the heart of the problem. Vendor opacity does not absolve an organization of its duty of care; it dramatically increases the required level of diligence.

The legal and regulatory landscape is already struggling to keep pace. The UK’s Law Commission, in a recent discussion paper, highlights the immense difficulty of assigning liability within a complex “AI supply chain”. The paper notes that the “opacity” of these systems poses a fundamental challenge to establishing accountability, making it “unclear which parties owe the victim a duty of care”.

When the legal framework itself is admitting uncertainty, relying on a vendor’s performance metrics is no longer a viable governance strategy. If you cannot inspect the engine, you must become ruthless in your oversight of the vehicle’s safety features and operational limits.

A New Playbook for AI Leadership 🎯

A CDO’s power lies not in second-guessing code, but in redefining the terms of engagement. It requires a shift from chasing performance to demanding resilience.

Redefine Procurement Diligence: From ‘Capability’ to ‘Fragility’.

The conversation with vendors must be fundamentally re-engineered. Change your metrics to change their behaviour.
- Old Question: “What is your model’s accuracy?”
- New Question: “Describe your system’s three most likely failure modes, particularly those involving cascading errors between internal components. How do you test for them?”
- New Question: “What is your protocol for a root cause analysis when the model produces a nonsensical but plausible output? How do you prove it wasn’t an emergent failure from system complexity?”
Mandate Contractual Assurance and Audibility.

The legal ambiguity surrounding AI is a powerful lever. Use it to build assurance into the contractual framework.
- Demand Diagnostic Access: Insist on the right to access tools and logs that can trace a bad outcome to a specific input or component, even if the “why” remains opaque.
- Introduce Independent Audits: Propose clauses requiring periodic, independent “adversarial audits,” where a third party actively tries to break the model. This is becoming a recognised practice, with frameworks like the NIST AI Risk Management Framework promoting continuous testing and independent reviews.
Build an Internal ‘Business Red Team’.

Your organisation’s best defence is its own people. A “Business Red Team” is not a technical function; it is a formalised process where staff from the business – customer service, finance, marketing – are tasked with being the “user from hell.”
- Their mandate is to probe the AI system with the messy, illogical, and adversarial behaviour of real-world humans. Their findings are not bug reports; they are a vital, qualitative dataset on the system’s true resilience.

The age of accepting AI systems as impenetrable black boxes is over. The true role of a data leader is to be the chief assurance officer for these powerful, complex, and fragile new capabilities. We do not need to build the engine, but we are absolutely accountable for the crash test results.

The AI arms race is moving faster than our governance frameworks.

☛ How are you and your teams redefining ‘performance’ to prioritize resilience over raw power? I’m keen to hear your perspectives in the comments. ☚

About me: I help organizations turn complex data into clear decisions and commercial outcomes. My focus is on enabling better decision-making and unlocking new value through data-driven innovation — especially where the stakes are high and the problems are difficult and poorly defined.

Follow me on LinkedIn for more insights.

The unstable pedestal – illustration by Gemini AI

References

Yazdani, S., Singh, A., Saxena, N., et al. (2025). Generative AI in depth: A survey of recent advances, model variants, and real-world applications. Journal of Big Data, 12(230).
U.S. Securities and Exchange Commission. (2013, October 16). Order Instituting Administrative and Cease-and-Desist Proceedings… against Knight Capital Americas LLC. (Release No. 70694). Available at: https://www.sec.gov/files/litigation/admin/2013/34-70694.pdf
Law Commission. (2025, August). AI and the Law: A Discussion Paper. Available from: https://lawcom.gov.uk/publication/artificial-intelligence-and-the-law-a-discussion-paper/
National Institute of Standards and Technology. (2023). AI Risk Management Framework (AI RMF 1.0). U.S. Department of Commerce. Available at: https://www.nist.gov/itl/ai-risk-management-framework