Why the Best AI Plans Fail (And How Red Teaming Saves Them)Waarom de Beste AI-Plannen Falen (En Hoe Red Teaming ze Redt)

12 min read · By Jeroen Janssen · Read in English →12 min leestijd · Door Jeroen Janssen · Read in English →

Last updated: April 2, 2026Laatst bijgewerkt: 2 april 2026

A mid-sized Dutch professional services firm — regulated, profitable, well-governed — hired one of the world's most respected consulting firms to design its AI transformation. The engagement lasted eight months. The result was impeccable: 180 slides, a phased roadmap, governance recommendations, and a business case projecting €14 million in annual efficiency gains by year three.Een middelgroot Nederlands dienstverlenend bedrijf — gereguleerd, winstgevend, goed bestuurd — huurde een van de meest gerespecteerde adviesbureaus ter wereld in voor het ontwerp van zijn AI-transformatie. Het traject duurde acht maanden. Het resultaat was onberispelijk: 180 slides, een gefaseerde roadmap, governance-aanbevelingen en een business case die €14 miljoen aan jaarlijkse efficiencywinst voorspelde in jaar drie.

The board approved the programme unanimously.De raad van bestuur keurde het programma unaniem goed.

Eighteen months later, the programme was quietly absorbed into “ongoing operations.” No evaluation was published. The efficiency gains were never measured, because the measurement framework was never built. The vendor dependencies that the roadmap introduced were never stress-tested. A critical regulatory assumption — that the AI Act would not apply to their use cases — turned out to be incorrect. By the time the compliance team raised the alarm, the architecture was already in production. The first enforcement deadlines of the AI Act had passed.Achttien maanden later werd het programma stilletjes opgenomen in “lopende bedrijfsvoering.” Er verscheen geen evaluatie. De efficiencywinst werd nooit gemeten, omdat het meetraamwerk nooit werd gebouwd. De vendor-afhankelijkheden die de roadmap introduceerde, werden nooit onder druk gezet. Een cruciale regulatoire aanname — dat de AI Act niet van toepassing zou zijn op hun use cases — bleek onjuist. Tegen de tijd dat het complianceteam aan de bel trok, stond de architectuur al in productie. De eerste handhavingstermijnen van de AI Act waren verstreken.

Nobody was negligent. Nobody was incompetent. But everyone was committed to the success, not to the truth.Niemand was nalatig. Niemand was incompetent. Maar iedereen was gecommitteerd aan het succes, niet aan de waarheid.

This is not an unusual story. It is the standard.Dit is geen ongewoon verhaal. Het is de standaard.

The problem you cannot see from the insideHet probleem dat je van binnenuit niet kunt zien

Every organisation tests its products. Its code. Its financial controls. Pharmaceutical companies run clinical trials. Engineers stress-test bridges. Software teams run thousands of automated tests before rolling out a single feature. The principle is the same everywhere: if it matters, you verify it under conditions designed to make it fail.Elke organisatie test haar producten. Haar code. Haar financiële controles. Farmaceutische bedrijven voeren klinische trials uit. Ingenieurs doen stresstests op bruggen. Softwareteams draaien duizenden geautomatiseerde tests voordat ze één enkele feature uitrollen. Het principe is overal hetzelfde: als het ertoe doet, verifieer je het onder omstandigheden die ontworpen zijn om het te laten falen.

Virtually no organisation applies this principle to the one thing that determines whether everything else matters: the strategy itself.Vrijwel geen enkele organisatie past dit principe toe op het enige dat bepaalt of al het andere ertoe doet: de strategie zelf.

You wouldn't ship software without testing it. You wouldn't approve a drug without a trial. Why would you approve a €5 million strategic programme based on nothing more than the confidence of the people who designed it?Je zou geen software uitleveren zonder te testen. Je zou geen medicijn goedkeuren zonder een trial. Waarom zou je een strategisch programma van €5 miljoen goedkeuren op basis van niets anders dan het vertrouwen van de mensen die het hebben ontworpen?

The reason is structural, not personal. The architects of a strategy are psychologically unfit to dismantle it. The consultants who developed it are paid to deliver a recommendation, not to tear it apart. The executives who approved it have tied their credibility to the decision. The programme managers leading the implementation need the strategy to be right, because their careers depend on it.De reden is structureel, niet persoonlijk. De architecten van een strategie zijn psychologisch ongeschikt om haar af te breken. De consultants die haar ontwierpen, worden betaald om een aanbeveling op te leveren, niet om haar te ontmantelen. De bestuurders die haar goedkeurden, hebben hun geloofwaardigheid aan het besluit verbonden. De programmamanagers die de implementatie leiden, hebben het nodig dat de strategie klopt, want hun carrière hangt ervan af.

This is not cynicism. It is how organisations function. Strategies build momentum precisely because they are not challenged. Every approval layer adds commitment. Every month of execution adds cost. And at every step, the price of discovering that a fundamental assumption was wrong increases.Dit is geen cynisme. Het is hoe organisaties functioneren. Strategieën bouwen momentum op juist omdat ze niet worden uitgedaagd. Elke goedkeuringslaag voegt commitment toe. Elke maand uitvoering voegt kosten toe. En bij elke stap stijgt de prijs van de ontdekking dat een fundamentele aanname onjuist was.

By the time reality delivers the test, the cost of correction is enormous. And the people who could have seen it coming are the same people who were never asked to look.Tegen de tijd dat de werkelijkheid de test levert, zijn de correctiekosten enorm. En de mensen die het hadden kunnen zien aankomen, zijn dezelfde mensen die nooit gevraagd werden om te kijken.

What AI changes — and what it doesn'tWat AI verandert — en wat niet

Artificial intelligence is the most significant technology investment most organisations will make this decade. The benefits are real: faster processing, pattern recognition at scale, automation of tasks that used to take thousands of hours. Organisations that deploy AI well achieve a genuine competitive advantage. Those who ignore it risk irrelevance.Kunstmatige intelligentie is de meest ingrijpende technologie-investering die de meeste organisaties dit decennium zullen doen. De voordelen zijn reëel: snellere verwerking, patroonherkenning op schaal, automatisering van taken die duizenden uren kostten. Organisaties die AI goed inzetten, behalen een daadwerkelijk concurrentievoordeel. Wie het negeert, riskeert irrelevantie.

But AI also introduces a category of risk for which most governance frameworks were not designed. Models drift. Dependencies concentrate. Decisions that previously required human judgement are now made by systems whose reasoning is not always explainable. Regulatory pressure is no longer theoretical — the EU AI Act has come into force, the first enforcement deadlines have passed, and DORA and NIS2 impose operational resilience requirements that brook no further delay. The pace at which AI capability is developing means that the strategy you approved six months ago may already rest on assumptions that both the technology and the legislator have outgrown.Maar AI introduceert ook een categorie risico waarvoor de meeste governance-raamwerken niet zijn ontworpen. Modellen driften. Afhankelijkheden concentreren zich. Beslissingen die voorheen menselijk oordeelsvermogen vereisten, worden nu genomen door systemen waarvan de redenering niet altijd uitlegbaar is. De regulatoire druk is niet langer theoretisch — de EU AI Act is in werking getreden, de eerste handhavingstermijnen zijn gepasseerd, en DORA en NIS2 leggen operationele weerbaarheidseisen op die geen uitstel meer dulden. Het tempo waarmee AI-capaciteit zich ontwikkelt, betekent dat de strategie die je zes maanden geleden goedkeurde, nu al kan rusten op aannames die de technologie én de wetgever zijn ontgroeid.

This is not a reason to slow down. It is a reason to test harder.Dit is geen reden om te vertragen. Het is een reden om harder te testen.

The organisations that come through this well are not the ones with the best AI. They are the ones that know, with evidence, where their AI strategy holds up and where it does not. Where the governance is real and where it is a paper reality. Where the business case is supported by measurements and where by hope.De organisaties die hier goed doorheen komen, zijn niet degenen met de beste AI. Het zijn degenen die, met bewijs, weten waar hun AI-strategie standhoudt en waar niet. Waar de governance reëel is en waar zij een papieren werkelijkheid vormt. Waar de business case wordt geschraagd door metingen en waar door hoop.

That knowledge does not come from the strategy process itself. It requires a separate discipline. An adversarial discipline.Die kennis ontstaat niet uit het strategieproces zelf. Zij vereist een aparte discipline. Een adversarial discipline.

What adversarial testing actually involvesWat adversarial testing werkelijk inhoudt

Red teaming has its origins in military intelligence. The principle is disarmingly simple: if you want to know where your plan breaks, assign someone whose explicit task is to break it. Not to be difficult. Not to be contrarian. But to find the vulnerability before the enemy does.Red teaming vindt zijn oorsprong in militaire inlichtingen. Het principe is ontwapenend eenvoudig: als je wilt weten waar je plan breekt, wijs dan iemand aan wiens expliciete taak het is om het te breken. Niet om lastig te zijn. Niet om contrair te doen. Maar om de kwetsbaarheid te vinden vóór de vijand dat doet.

In cybersecurity, this is standard practice. In financial stress testing, regulators require it. In AI safety, it has become a core discipline at every frontier lab. But at the level where the biggest bets are placed — strategic decision-making — adversarial testing barely exists.In cybersecurity is dit gangbare praktijk. Bij financiële stresstests eisen toezichthouders het. In AI-veiligheid is het een kerndiscipline geworden bij elk frontier lab. Maar op het niveau waar de grootste gokken worden geplaatst — strategische besluitvorming — bestaat adversarial testing nauwelijks.

The boardroom, where assumptions most often go unchallenged, is precisely the place where this discipline is most needed.De bestuurskamer, waar aannames het vaakst onuitgedaagd blijven, is precies de plek waar deze discipline het hardst nodig is.

Strategic red teaming, as I have developed it, applies this logic to organisational strategy. It subjects every strategic claim to four distinct adversarial perspectives: strategic coherence, regulatory readiness, economic viability, and technical resilience. Each perspective has its own standard of evidence. Each produces testable hypotheses. And each operates under a simple rule: no conclusion without documented evidence, no green score without independent verification.Strategic red teaming, zoals ik het heb ontwikkeld, past deze logica toe op organisatiestrategie. Het onderwerpt elke strategische claim aan vier onderscheiden adversarial perspectieven: strategische coherentie, regulatoire paraatheid, economische levensvatbaarheid en technische weerbaarheid. Elk perspectief heeft zijn eigen bewijsstandaard. Elk levert toetsbare hypothesen op. En elk opereert onder een simpele regel: geen conclusie zonder gedocumenteerd bewijs, geen groene score zonder onafhankelijke verificatie.

The Governance EnvelopeDe Governance Envelope

The result is not an opinion, but a forensic map. I call the boundary between what we know for certain and what we hope the Governance Envelope — after the concept from aviation, where the flight envelope is the boundary within which an aircraft can operate safely. Push beyond it, and you lose control.Het resultaat is geen mening, maar een forensische kaart. Ik noem de grens tussen wat we zeker weten en wat we hopen de Governance Envelope — naar het begrip uit de luchtvaart, waar de flight envelope de grens is waarbinnen een vliegtuig veilig kan opereren. Duw je erbuiten, dan verlies je controle.

Inside the boundary →Binnen de grens → Decisions supported by audit-ready evidence. Defensible under oversight, under board-level pressure, under regulatory scrutiny.Besluiten onderbouwd door audit-ready bewijs. Verdedigbaar onder toezicht, onder bestuurlijke druk, onder regulatoire scrutiny.

Outside the boundary →Buiten de grens → Exposure to blind spots, reputational risk, and enforcement actions. And now you know exactly where.Blootstelling aan blinde vlekken, reputatierisico en handhavingsacties. En nu weet je precies waar.

The goal is not to prove that the strategy is wrong. The goal is to discover which parts of it actually hold up — and which parts everyone simply assumed were correct.Het doel is niet om te bewijzen dat de strategie fout is. Het doel is om te ontdekken welke delen ervan daadwerkelijk kloppen — en welke delen iedereen gewoon aannam dat ze klopten.

Why this requires independenceWaarom dit onafhankelijkheid vereist

A legitimate question: why can't the consultants who designed the strategy also test it? Or the internal team that implemented it?Een legitieme vraag: waarom kunnen de consultants die de strategie hebben ontworpen haar niet ook testen? Of het interne team dat haar heeft geïmplementeerd?

For the same reason an auditor cannot audit their own work. The value of adversarial testing rests entirely on independence. The examiner must have no stake in the outcome. No implementation revenue to protect. No vendor relationship colouring the analysis. No organisational politics to navigate.Om dezelfde reden dat een accountant niet zijn eigen werk kan controleren. De waarde van adversarial testing berust volledig op onafhankelijkheid. De onderzoeker mag geen belang hebben bij de uitkomst. Geen implementatie-omzet om te beschermen. Geen vendorrelatie die de analyse kleurt. Geen organisatiepolitiek om te navigeren.

This is also why adversarial review works best not instead of traditional advisory work, but alongside it. You have already made the investment. You have hired serious professionals to design a serious strategy. The question is not whether they did good work. The question is whether that work holds up when confronted with conditions it was not designed for.Dit is ook de reden dat adversarial review het best werkt niet in plaats van traditioneel advieswerk, maar ernaast. Je hebt de investering al gedaan. Je hebt serieuze professionals ingehuurd om een serieuze strategie te ontwerpen. De vraag is niet of zij goed werk hebben geleverd. De vraag is of dat werk overeind blijft bij confrontatie met omstandigheden waarvoor het niet is ontworpen.

An adversarial review of €30,000 that exposes a single flawed assumption in an €800,000 strategy does not replace the strategy. It protects the investment.Een adversarial review van €30.000 die één gebrekkige aanname blootlegt in een strategie van €800.000, vervangt de strategie niet. Zij beschermt de investering.

What this looks like in practiceHoe dit er in de praktijk uitziet

I built Apparens to make this discipline available to organisations that need it. Not as a product you buy and install, but as an applied practice — rigorous, confidential, and deliberately bounded.Ik heb Apparens gebouwd om deze discipline beschikbaar te maken voor organisaties die haar nodig hebben. Niet als een product dat je koopt en installeert, maar als een toegepaste praktijk — rigoureus, vertrouwelijk en bewust begrensd.

The methodology uses agentic AI — a system I call ARES — to generate thousands of scenarios and hundreds of testable hypotheses from a single strategic position. But the AI does the heavy lifting, not the thinking. Every finding is manually verified. Every conclusion is tested against documented evidence. The analysis is more than 80% human work. The AI accelerates what would otherwise take months. The judgement remains mine.De methodologie maakt gebruik van agentic AI — een systeem dat ik ARES noem — om duizenden scenario’s en honderden toetsbare hypothesen te genereren vanuit één strategische positie. Maar de AI doet het zware tilwerk, niet het denken. Elke bevinding wordt handmatig geverifieerd. Elke conclusie wordt getoetst aan gedocumenteerd bewijs. De analyse is voor meer dan 80% mensenwerk. De AI versnelt wat anders maanden zou duren. Het oordeel blijft van mij.

Engagements are scoped, time-bound, and unilaterally terminable. There is no retainer and no dependency. Client data is processed within EU jurisdiction, never used for model training, and not retained after delivery. The purpose is to inform board-level decisions, not to create regulatory exposure.Opdrachten zijn afgebakend, tijdgebonden en eenzijdig stopbaar. Er is geen retainer en geen afhankelijkheid. Cliëntdata wordt verwerkt binnen EU-jurisdictie, nooit gebruikt voor modeltraining en niet bewaard na oplevering. Het doel is het informeren van bestuurlijke beslissingen, niet het creëren van regulatoire exposure.

I work from the Netherlands, which matters more than it might seem. The European regulatory reality is not an abstraction for me. It is my professional context — at Apparens and in my work as a strategic advisor on AI governance at one of the country's largest public IT organisations. The enforcement of the AI Act, the operational requirements of DORA, the reporting obligations of NIS2 — that is my daily reality. What I learn in one environment sharpens what I deliver in the other.Ik werk vanuit Nederland, wat er meer toe doet dan het lijkt. De Europese regulatoire werkelijkheid is voor mij geen abstractie. Het is mijn professionele context — bij Apparens en in mijn werk als strategisch adviseur AI-governance bij een van de grootste publieke IT-organisaties van het land. De handhaving van de AI Act, de operationele eisen van DORA, de meldplichten van NIS2 — dat is mijn dagelijkse werkelijkheid. Wat ik in de ene omgeving leer, verscherpt wat ik in de andere lever.

Apparens works with organisations in regulated sectors, the public sector, and mid-sized companies navigating AI-driven transformation. The kind of organisations where a strategic failure is not a write-off, but a crisis. Where the board doesn't just want a plan, but evidence that the plan holds up.Apparens werkt met organisaties in gereguleerde sectoren, de publieke sector en middelgrote bedrijven die AI-gedreven transformatie navigeren. Het soort organisaties waar een strategisch falen geen afschrijving is, maar een crisis. Waar het bestuur niet alleen een plan wil, maar bewijs dat het plan standhoudt.

The question beneath the questionDe vraag onder de vraag

Most organisations that approach me do not walk in saying “we need adversarial testing.” They come with something more practical. Something like: “we've spent a lot on this strategy and something doesn't feel right, but we can't articulate what.” Or: “the board keeps asking questions we don't have answers to.” Or, most honestly: “we need to know what we don't know.”De meeste organisaties die mij benaderen, komen niet binnen met de woorden “wij hebben adversarial testing nodig.” Ze komen met iets praktischer. Iets als: “we hebben veel uitgegeven aan deze strategie en er voelt iets niet goed, maar we kunnen niet verwoorden wat.” Of: “het bestuur blijft vragen stellen waar we geen antwoord op hebben.” Of, het eerlijkst: “we moeten weten wat we niet weten.”

That instinct is usually right. And acting on it while the strategy is still in motion — before the regulator asks, before the audit lands, before reality delivers the test you didn't design — is the most valuable thing a decision-maker can do.Dat instinct klopt meestal. En ernaar handelen terwijl de strategie nog in beweging is — vóór de toezichthouder vraagt, vóór de audit landt, vóór de werkelijkheid de test levert die je niet hebt ontworpen — is het meest waardevolle dat een beslisser kan doen.

Not because the strategy is wrong. But because you deserve to know whether it is right.Niet omdat de strategie fout is. Maar omdat je verdient te weten of zij klopt.

Strategy only works when it survives contact with reality. Most never get that test. The assumptions harden. The momentum builds. And by the time the gap between conviction and evidence becomes visible, the cost of correction is enormous.Strategie werkt alleen wanneer zij contact met de werkelijkheid overleeft. De meeste krijgen die test nooit. De aannames verharden. Het momentum bouwt op. En tegen de tijd dat de kloof tussen overtuiging en bewijs zichtbaar wordt, zijn de correctiekosten enorm.

The discipline of finding that gap early is not glamorous. It is not comfortable. Nobody wants their assumptions examined.De discipline om die kloof vroeg te vinden is niet glamoureus. Zij is niet comfortabel. Niemand wil dat zijn aannames onderzocht worden.

Until they need them to hold up.Tot ze nodig hebben dat ze standhouden.

What's next?Wat nu?

If you sense that your AI roadmap rests on “accepted conclusions” rather than tested assumptions, there are three ways to apply the Apparens discipline:Als u het gevoel heeft dat uw AI-roadmap rust op “geaccepteerde conclusies” in plaats van geteste aannames, zijn er drie manieren om de discipline van Apparens toe te passen:

1.Direct ReviewDirecte Review

A time-bound adversarial check on your current AI strategy. Ten working days. One conversation to start.Een tijdgebonden adversarial check op uw huidige AI-strategie. Tien werkdagen. Eén gesprek om te starten.

2.The Strategic Stress Test

Download the diagnostic PDF (60+ dimensions, four domains) to find the first cracks yourself.Download de diagnostische PDF (60+ dimensies, vier domeinen) om zelf de eerste scheuren te vinden.

3.ARES Insights

View a Diagnostic Snapshot of what AI-driven Red Teaming looks like in practice.Bekijk een Diagnostic Snapshot van hoe AI-gedreven Red Teaming er in de praktijk uitziet.

Read the complete guide: What is Strategic Red Teaming? →Lees de complete gids: Wat is Strategic Red Teaming? →