A Supervisory-Evidence Ontology for Agentic AI under EU Law

Abstract

Agentic AI has outpaced the ontologies intended to govern it. Commercial and academic ontologies released between 2024 and 2026 cluster around a shared enterprise core of Agent, Skill, Policy, Memory, and Outcome, but none was designed to produce evidence that a European supervisor can ingest. Current supervisory practice relies on ad-hoc documentation produced per controller and per request. This paper proposes a shared representational layer for agentic AI accountability evidence under EU law, structured in three components. The first is a candidate Minimum Conceptual Set of 23 conceptual slots, separated into an agent-behaviour core (twelve slots) and a supervisory-evidence layer (eleven slots). Under strict reuse-zero accounting these 23 slots correspond to 16 net-new classes plus 7 reuse slots; the Turtle vocabulary contains 69 owl:Class declarations once subtypes, support classes, and named categories are counted. Each slot is mapped to evidence needs arising under the GDPR, the AI Act, or NIS2, or is motivated by structured reading of a twenty-five-case sample of EU ADM enforcement. The second is a temporal extension expressed in OWL-Time and made structurally checkable through SHACL shapes for delegation validity, revocation propagation records, policy versioning, and evidence decay. The third is an integration layer that reuses GDPRov, DPV, and PROV-O through owl:imports rather than reinventing their concepts. A Standardised Supervisory Ingestion Interface is identified as a research direction in §8 rather than claimed as delivered contribution. The paper does not claim reference-architecture status. It claims that the synthesis and design choices are designed to be defensible, reproducible, and testably better than ad-hoc practice; the empirical validation tracks remain pending. Validation is pre-registered through three open tracks; Track 1 (inter-rater consistency on the case sample) is committed to for a follow-up revision. A companion v1.2 SHACL release ships Profile A (AP-inspired permissive, quantitative) and Profile B (CNIL/German-guidance-inspired stricter, qualitative) alongside this paper, with a size-based SME proportionality profile as a separate axis.

Keywords: agent ontology, EU AI Act, GDPR Article 22, supervisory evidence, SHACL, PROV-O, temporal governance, agentic AI, Minimum Conceptual Set

Introduction

A European data protection authority receives a notification that an agentic AI system at a regulated controller has produced an outcome that may engage GDPR Article 22, AI Act Annex III, or NIS2 Article 23. The supervisor opens an inquiry and asks the five standard questions. Who took this action. On what authority. Drawing on what data. Under what policy version. With what human intervention.

Today the answer arrives in one of three forms. A vendor compliance dashboard, queryable only through that vendor’s UI, comparable to no other controller’s dashboard. A long narrative attestation, expensive, retrospective, and not machine-readable. Or nothing structured at all: a Slack reconstruction, an email trail, a manually compiled audit memo. None of these scale. None are queryable. None support cross-controller comparison, which is what European-level supervision under the EDPB, the European AI Office, and the GDPR Article 60 cooperation procedure requires to operate at all.

The problem is not a lack of regulation. The GDPR, the AI Act, NIS2, DORA, the CRA, and the revised Product Liability Directive supply operative provisions in dense quantity. Nor is it a lack of technical ontology work: W3C PROV-O, OWL-Time, SHACL, DPV, and GDPRov have been available and stable for years. The gap sits between the two: no agent ontology released between 2024 and 2026 encodes the concepts a European supervisor needs, and no supervisory instrument specifies the ontological form in which it expects evidence to arrive.

This paper proposes a shared representational layer for agentic AI accountability evidence under EU law. The design has three layers. A core Minimum Conceptual Set (MCS) of twenty-three classes, separated into an agent-behaviour core (twelve classes) and a supervisory-evidence layer (eleven classes), each mapped to evidence needs arising under a specific provision of the GDPR, the AI Act, or NIS2, or in structured motivation against a sampled enforcement corpus (§4). A temporal and validation layer expressed in OWL-Time with SHACL shapes for delegation, revocation, policy versioning, and evidence decay (§6). An integration layer that reuses GDPRov, DPV, and PROV-O via owl:imports and invokes sectoral regimes (MDR, MiFID II, CRR/CRD, IDD, Awb) without absorbing them. A Standardised Supervisory Ingestion Interface (SSII), discussed as a research direction in §8, is not claimed as a delivered contribution at this release state.

The research questions are two. RQ1 is structural: what is the minimum conceptual set necessary to produce supervisor-ingestible accountability evidence for agentic AI under EU law. RQ2 is operational: how does the MCS map to specific statutory obligations under the GDPR, the AI Act, and NIS2, and what SHACL structures would allow supervisors to validate that evidence is structurally complete before it leaves the controller’s perimeter.

The central claim is narrower than it may appear. The MCS is not the only possible synthesis. Competing syntheses are possible and will likely follow. The contribution is a candidate defensible minimum, grounded in existing open ontologies where they suffice, extended where the legal-semantic gap requires new work, targeted at supervisory-grade accountability for agentic AI operating under EU law. The test of originality is not uniqueness. It is whether the synthesis is defensible, reproducible, and testably better than ad-hoc practice for the teams that would use it. Validation is treated as the second half of the contribution, but the release-state evidence remains a validation design rather than completed validation.

Positioning. This is a working paper, not a reference architecture. The MCS has a single-author empirical base pending independent inter-rater validation, an ingestion interface that remains a research direction, and specification governance within a single commercial consultancy rather than a standards body. A reader who treats this paper as reference architecture is applying a standard the paper does not claim.

The paper has an author disclosure. The MCS was developed by the author through Apparens, a consultancy providing AI governance services. The author has a professional interest in the adoption of specifications of this class. §7.8 analyses the distributive consequences of that interest directly rather than deflecting the point.

Methods

2.1 Ontology selection

Search executed January 2024 to April 2026 across the W3C Standards directory, the ISO/IEC JTC 1/SC 42 published catalogue, the CEN-CENELEC JTC 21 work programme, the Gartner Magic Quadrant for Metadata Management Solutions (2024 and 2025), arXiv categories cs.AI, cs.MA, cs.SE, and vendor documentation pages. Inclusion criteria: active version published or updated after 1 January 2024; public schema available as downloadable OWL, RDF, SHACL, or TTL, or an academic description in a peer-reviewed venue, an arXiv preprint, or a standards-body deliverable; either at least three independent citations or development under standards-body governance.

Fourteen ontologies met the criteria and appear in Table 4.1: Aviso, Atlan, Microsoft Fabric IQ, Skan Agentic Ontology of Work, Agent Spec, ACP Framework, OntoBoom, Foundation AgenticOS, PROV-O, OWL-Time, GDPRov, DPV, ISO/IEC 42001 plus 23894 plus 5338 (grouped), and CEN-CENELEC JTC 21 drafts (prEN 18286, prEN 18228, prEN 18284).

2.2 Coverage matrix coding

Two matrices operate at different resolutions. The survey matrix in §4.1 rates 14 ontologies × 18 concepts = 252 cells with three values: present (formal class or property in a published schema), partial (covered conceptually in prose without formal construct), absent (no construct or concept matching the row label). The deep-coding matrix referenced in Appendix F.3 rates 25 MCS class slots × 5 ontologies (PROV-O, GDPRov, DPV, Agent Spec, ACP) under a stricter rubric that additionally tests owl:imports-reusability. Coding performed 23 April 2026 in round 1 by the author for both matrices. Round 2 self-consistency audit scheduled 7 May 2026 under the rubric in Appendix F.2 / F.3. The audit is intra-rater rather than inter-rater. Track 1 of the validation pack commissions an independent second coder; until Track 1 reports, both matrices are read as one author’s structured reading rather than as inter-subjectively validated.

2.3 Case-law selection

Two selection criteria operate in parallel. Criterion A: CJEU judgments interpreting GDPR Article 22 directly, decided 1 January 2023 to 31 December 2025, and cited in published DPA guidance available by April 2026. Criterion B: enforcement decisions (CJEU, national courts, DPAs) decided 25 May 2018 to April 2026 in which Article 22 applicability was the contested or determined issue, or the case was cited by at least one DPA as an Article 22 reference point. Two cases meet Criterion A (C-634/21 SCHUFA, 7 December 2023; C-203/22 Dun and Bradstreet Austria, 27 February 2025). Twenty-five cases meet Criterion B and form the empirical validation sample (Appendix B).

2.4 Supervisor selection

Dutch market supervisory bodies named in the AP-RDI Final advice on the supervisory structure for the AI Act (7 November 2024), plus bodies that have issued explicit AI-related guidance with publication date on or before April 2026. Nine distinct entities, ten supervisory roles: AFM, DNB, IGJ, AP, RDI, Nederlandse Arbeidsinspectie/SZW, NCSC, ILT, ACM consumer-and-markets, ACM energy. References to “ten supervisors” should be read as “ten supervisory roles”. The list should be reassessed as Article 70 designations crystallise via the Uitvoeringswet AI-verordening (expected Q4 2026).

2.5 MCS derivation principle

Stated principle: the MCS is the minimum set of classes such that every operative provision in GDPR Articles 4, 22, and 30, and AI Act Articles 12, 13, 14, 26, 72, and 73, has at least one ontological correlate.

Counting convention. The MCS organises 23 conceptual slots. Under strict reuse-zero accounting, 5 DPV reuses (slots 6 to 10) and 2 PROV-O reuses (slots 3 and 19) each contribute 0 net-new classes. The slot count of 23 therefore reduces to 16 net-new MCS classes. NonQualifyingADM is the 16th net-new class, anchored in the empirical validation in Appendix B. The Turtle vocabulary contains 69 owl:Class declarations once subtypes, support classes, and named categories are counted; this raw figure is the file count, not the conceptual count. The 23 slot count is the working figure for MCS scope; the 16 net-new-class figure is the working figure for ontology size; the 69 owl:Class figure is the file count. Conflating slot count with class count is the inconsistency previously noted in earlier drafts.

Architectural layering. The 23 classes are organised into two layers, separated architecturally in §4.3. The agent-behaviour core (mcs-core) contains twelve classes covering agent action primitives, decision and intervention semantics, and delegation-and-policy temporal structure. The supervisory-evidence layer (mcs-supervisory) contains eleven classes covering sectoral regime pointers, supervisory scope, typed log events, monitoring observations, drift indicators, evidence artefacts, serious incidents, and the NonQualifyingADM residual.

2.6 Regulation verification

Every statutory provision cited was retrieved from EUR-Lex in its authentic-language form. GDPR (Regulation 2016/679) in English and Dutch. AI Act (Regulation 2024/1689) in English and Dutch. NIS2 Directive (2022/2555). Product Liability Directive (Directive 2024/2853). Each direct quotation, paraphrase, and article reference was matched against EUR-Lex line by line. Cross-check finding: NIS2 Article 23(3) limbs are alternative (disjunction), not conjunctive; severity qualifier is part of the test; the capable-of-causing branch applies alongside the caused branch; entity classification is separate from the significance threshold.

2.7 Case-classification protocol

The 25-case empirical sample in Appendix B is classified using a five-step protocol grounding Finding 2 (NonQualifyingADM) and Finding 3 (Solely/Assisted boundary instability). The full rubric appears in Appendix F.6.

Operative-provision test. If the operative ground is GDPR Arts 5, 6, 9, 25, or 35, or Art. 8 ECHR rather than Art. 22: the case is Out-of-Art.22-Scope and populates NonQualifyingADM.
AI involvement test. If no ADM/profiling: NonAutomated, exits the protocol.
WP251rev.01 meaningful-intervention test. All six properties met → Assisted; any failure → Solely.
Borderline cases. Art. 22 reasoning present but operative ground is Arts 5, 6, 9, 25, or 35: primary coding Out-of-Art.22-Scope; permissive alternative recorded.
Profile B cross-check. Under the stricter profile, borderline Assisted may reclassify as Solely; reclassification recorded.

Operationalised thresholds: 60-second default for interventionDuration (defensible operational presumption, not statutory); raised to 5 minutes for complex clinical triage; lowered to 15 seconds for high-volume fraud-detection alerts conditional on case-specific review evidence. Sector calibrations are declared per case. Round-1 coding 23 April 2026 by the author. Track 1 commissions an independent second coder with target Cohen’s κ ≥ 0.70; failure at κ < 0.60 triggers rubric revision rather than cell-level reconciliation.

Related work

3.1 Surveyed ontologies (operational context)

Vendor ontologies model enterprise workflow fluently but ignore data-protection primitives. Atlan leads the 2025 Gartner Magic Quadrant for Metadata Management and offers a context graph for agentic AI with decision-trace reification, but does not import DPV and has no public reference schema. Aviso’s Ontology Layer has paying enterprise customers but no public schema. Microsoft Fabric IQ has one named pilot (ENMAX Power) and is in public preview. OntoBoom offers SHACL/OWL authoring but not regulatory semantics. None ships a public reference schema at the time of writing.

Academic and framework-level ontologies have different gaps. Agent Spec (Open Agent Specification) emphasises portability through symbolic references; the portability creates tension with supervisory requirements (audit trails fragment across runtimes). The ACP Framework offers the strongest delegation semantics in the surveyed set, with verifiable chained delegation, cryptographic signatures, mandatory expiry, and transitive revocation. ACP does not encode GDPR legal primitives.

Open normative scaffolds remain PROV-O (W3C Recommendation, 30 April 2013) and OWL-Time (W3C Recommendation, 2017; Candidate Recommendation update 2022). Two GDPR-specific extensions bridge PROV-O to data-protection semantics: GDPRov (Pandit and Lewis, SEMANTiCS 2017); GDPRtEXT for GDPR text structure. The Data Privacy Vocabulary (DPV) is a Final Community Group Report (v2.3, 25 February 2026), the most mature open vocabulary for data-protection concepts. Neither DPV nor GDPRov carries formal regulatory authority; integration into the MCS rests on technical maturity and community adoption, not on regulatory status.

ISO/IEC JTC 1/SC 42 has produced prose standards (22989, 23053, 23894, 5338, 42001, 42005, 42006, 38507) but no machine-readable ontology. CEN-CENELEC JTC 21 is drafting harmonised standards under Standardisation Request M/593 plus M/613. prEN 18286 reached public Enquiry stage 30 October 2025 to 22 January 2026. prEN 18228 and prEN 18284 are identifiable by scope but technical text is confidential.

3.2 Computational law and legal informatics

The intellectual ancestors of the MCS are eight works across forty years of computational law: Sergot et al. (1986) on the British Nationality Act as a logic program; Sartor (2005) on defeasible and deontic reasoning (the dual SHACL profiles in §5.5 are a concrete application); Bench-Capon and Sartor (2003) on case-based reasoning; Pagallo (2013) on legal personhood (the MCS does not give agents separate standing); Verheij (2003, 2017) on argumentative disagreement; Hashmi et al. (2018) on compliance representation in BPM; Atkinson et al. (2020) on explanation taxonomy; Pandit et al. (2019) on DPV. The MCS is not novel in believing that statutes admit ontological representation. What it adds is a specific operational target: a 23-class, SHACL-validatable, supervisor-ingestible evidentiary fabric for agentic AI under EU law, anchored in named statutory provisions and validated against named enforcement cases.

3.3 Position against closest contemporary competitors

Cobbe, Singh and Sheridan (2025), “Governance and accountability frameworks for AI agents”. Closest competitor on supervisor-ingestibility. Treats the same problem at a higher level of abstraction without committing to specific ontological structures or regulatory anchors. The MCS commits to named GDPR, AI Act, and NIS2 provisions and to SHACL-validatable evidence; Cobbe et al. stop at principles. The MCS presupposes their principles and operationalises them against EU statutory text.

Agent Spec. Closest competitor on cross-runtime portability. Achieves portability through symbolic references; the cost is that audit trails fragment, defeating longitudinal supervisory reconstruction. The two projects solve adjacent but orthogonal problems.

ACP Framework. Closest competitor on delegation semantics. Stronger than the MCS’s DelegationGrant class in cryptographic guarantee, but cannot represent the three-way Decision taxonomy or the Article 22 drawsStronglyOn predicate. An implementation that uses ACP for delegation and the MCS for legal-semantic ingestion is architecturally available; the MCS provides the owl:imports surface.

RQ1 answer: a Minimum Conceptual Set for supervisory evidence

4.1 Cross-ontology coverage

Table 4.1 maps fourteen surveyed ontologies against eighteen concepts a European supervisor plausibly needs (252 cells, presented as four sub-tables for portrait readability). Legend: ● present (formal class or property in a published schema); P partial (covered conceptually or in standards-body prose); ○ absent. ISO/IEC and JTC 21 drafts are prose standards; under the strict legend their cells are P or ○. Column abbreviations: Av Aviso; At Atlan; FIQ Microsoft Fabric IQ; Skan Skan AOW; AS Agent Spec; ACP ACP Framework; OB OntoBoom; FAOS Foundation AgenticOS; PROV PROV-O; OWL-T OWL-Time; GDPRov GDPRov; DPV DPV; ISO ISO/IEC 42001+23894+5338; JTC21 CEN-CENELEC JTC 21 drafts.

Concept	Av	At	FIQ	Skan	AS	ACP	OB	FAOS	PROV	OWL-T	GDPRov	DPV	ISO	JTC21
Agent	P	P	P	●	●	●	P	●	●	○	●	●	P	P
Action / tool invocation	P	P	P	●	●	P	P	●	●	○	●	P	P	P
Memory read	P	●	P	●	P	○	○	P	P	○	○	○	○	○
Memory write	P	●	P	●	P	○	○	P	P	○	○	○	○	○
Delegation	○	○	○	P	P	●	○	P	●	○	P	P	P	P
Revocation / invalidation	○	P	○	○	○	●	○	○	●	○	●	●	P	P
Policy / constraint	P	●	●	●	●	P	●	●	○	○	P	●	P	P
Human oversight	P	P	P	●	P	○	○	P	○	○	○	P	P	P
Temporal interval	○	P	P	○	○	●	P	○	P	●	P	P	P	P
Authority scope	○	●	P	P	P	●	○	P	P	○	○	P	P	P
Provenance	○	●	P	P	●	P	○	P	●	○	●	P	P	P
Versioning	○	P	P	P	●	○	P	○	P	○	P	P	P	P
Risk classification	○	○	○	P	○	○	○	P	○	○	○	P	P	P
Data subject reference	○	○	○	○	○	○	○	○	○	○	●	●	○	P
Purpose of processing	○	P	○	P	P	○	○	P	○	○	●	●	P	P
Lawful basis	○	○	○	○	○	○	○	○	○	○	●	●	○	P
Personal-data cat. (Art. 9)	○	○	○	○	○	○	○	○	○	○	P	●	○	P
Consent record (lifecycle)	○	○	○	○	○	○	○	○	○	○	●	●	○	P

Table 4.1: Cross-ontology coverage matrix (14 ontologies × 18 concepts, 252 cells). Round-1 coding 23 April 2026 by a single coder; rubric in Appendix F.2.

Five findings follow. ACP Framework provides the strongest delegation semantics. Agent Spec advances portability at the cost of supervisory depth. PROV-O remains the normative spine but is necessary, not sufficient. Four of the five GDPR-supervisory-critical concepts (Data Subject, Purpose, Lawful Basis, Personal-Data Categories, Consent Record) are present in GDPRov and DPV but absent in every agent ontology surveyed; integration, not invention, closes this gap. Risk classification, intended purpose, sectoral regime, supervisory scope, typed log events, monitoring observations, drift indicators, evidence artefacts, and serious incident are absent everywhere except partially in ISO/IEC and JTC 21 drafts in prose, which are not machine-readable.

4.2 Integration with GDPRov, DPV, and the Legal Role Triad

The integration pattern reuses existing open vocabularies via owl:imports rather than reinventing their concepts. A minimal bridging pattern connects an agent action to its GDPR legal context:

gov:AgentAction a owl:Class ;
    rdfs:subClassOf prov:Activity , gdprov:DataProcessingStep .

gov:AgentActionShape a sh:NodeShape ;
    sh:targetClass gov:AgentAction ;
    sh:property [
        sh:path dpv:hasLegalBasis ;
        sh:minCount 1 ;
        sh:class dpv:LegalBasis ;
        sh:message "Art. 6(1) GDPR requires a lawful basis for every processing activity"
    ] ;
    sh:property [
        sh:path dpv:hasPurpose ;
        sh:minCount 1 ;
        sh:class dpv:Purpose ;
        sh:message "Art. 5(1)(b) GDPR: purpose limitation"
    ] ;
    sh:property [
        sh:path dpv:hasDataSubject ;
        sh:class dpv:DataSubject ;
        sh:message "Data subject must be identifiable for Art. 15 access requests"
    ] .

This pattern reduces the absent gap from nine MCS concepts to zero for the GDPR legal-semantic layer, without inventing new classes. The remaining gap is the AI Act legal-role layer (Provider, Deployer, Manufacturer) and the three-way Decision taxonomy, both of which require new classes because DPV predates the AI Act. The Legal Role Triad ontology declares Provider, Deployer, Distributor, Importer, Manufacturer, and AuthorisedRepresentative as subclasses of a new gov:AIActRole root. A single organisation can simultaneously hold dpv:Controller under GDPR and gov:Deployer under the AI Act; the gov:RoleAssignment class makes role attribution time-indexed.

4.3 The 23 MCS slots, split into core and supervisory layers

The 23 conceptual slots yield 16 net-new classes under strict reuse-zero accounting in §2.5. mcs-core holds 12 slots / 6 net-new classes; mcs-supervisory holds 11 slots / 10 net-new classes.

#	Class (MCS)	Operational definition	Regulatory anchor
1	gov:Agent (SoftwareAgent, HumanAgent, OrganizationalAgent)	Supervisory-scope agent abstraction	GDPR Art. 4(7)-(8); AI Act Art. 3(3)-(4); SCHUFA ¶48
2	gov:LegalRole (8 AI Act/GDPR subtypes)	Time-indexed role attribution via gov:RoleAssignment	GDPR Art. 4(7)-(8), 26, 28; AI Act Art. 3(3)-(8)
3	gov:Activity (PROV-O reuse)	Agent action; inherits startedAtTime, endedAtTime	AI Act Art. 12
4	gov:ToolInvocation	Subclass of Activity; hasTool, hasInvocationInput/Output	AI Act Art. 13
5	gov:MemoryEvent (Read, Write)	Provenance of model/agent memory access	AI Act Art. 12, 72
6	dpv:DataSubject (DPV reuse)	Data subject reference	GDPR Art. 4(1), 15, 22
7	dpv:PersonalData (DPV reuse)	Personal-data category incl. Art. 9 special categories	GDPR Art. 9, 10, 30(1)(c)
8	dpv:Purpose (DPV reuse)	Purpose of processing	GDPR Art. 5(1)(b), 6(4), 30
9	dpv:LegalBasis (DPV reuse)	Lawful basis for processing	GDPR Art. 6, 9
10	dpv:Consent + GDPRov lifecycle (reuse)	Consent record with lifecycle	GDPR Art. 6(1)(a), 7
11	gov:Decision (Solely / Assisted / NonAutomated) + gov:Score + gov:drawsStronglyOn	Three-way Article 22 taxonomy plus upstream attribution	GDPR Art. 22(1); SCHUFA ¶73; D&B ¶40
12	gov:HumanIntervention (six properties)	Meaningful-intervention schema incl. authorityActuallyExercised	GDPR Art. 22(3); AI Act Art. 14; WP251rev.01; AP handvatten 2025

Table 4.3a: mcs-core agent-behaviour layer (twelve classes).

#	Class (MCS)	Operational definition	Regulatory anchor
13	gov:Policy, gov:PolicyVersion	Versioned policy with temporal validity	AI Act Art. 9, 17
14	gov:RiskClassification (Prohibited/High/Limited/Minimal)	Risk classification aligned with AI Act	AI Act Art. 5, 6, 50, Annex III
15	gov:IntendedPurpose	Developer-attested AI system property, distinct from dpv:Purpose	AI Act Art. 3(12), 13
16	gov:SectoralRegime (MiFID II, MDR, CRR/CRD, IDD, AWB)	Sectoral regime pointers with competentSupervisor links	Sectoral statutes
17	gov:SupervisoryScope (10 NL supervisors + EU bodies)	Competent authority with supervisory mandate	AI Act Art. 70; AP-RDI 7 Nov 2024
18	gov:DelegationGrant (subclass prov:Delegation) + gov:RevocationEvent + gov:ValidityInterval + gov:AuthorityScope	Validity windows and revocation semantics	GDPR Art. 28; AI Act Art. 25
19	gov:ProvenanceChain (PROV-O reuse)	Reuse of prov:wasDerivedFrom, wasInformedBy, wasInfluencedBy	AI Act Art. 10, 12
20	gov:LogEvent (RiskID, PostMarketMonitoring, OperationMonitoring, BiometricID, Incident)	Typed log events at AI Act statutory-subclass granularity	AI Act Art. 12(2)-(3), 26(6); NIS2 Art. 23
21	gov:MonitoringObservation	observedMetric, observedAtTime, monitoringPurpose	AI Act Art. 72; ECB Guide; SAFEST
22	gov:DriftIndicator (Data, Concept, Performance, Control)	driftMagnitude, driftDetectedAt, triggersRefresh	AI Act Art. 72; PCAOB AS 1105
23	gov:SupervisoryEvidenceArtifact + gov:Incident + gov:NonQualifyingADM (Art5/Art6/Art9/Art25/Art35)	Evidence artefact hierarchy; serious incident; residual ADM class	AI Act Art. 26(10), 73; NIS2 Art. 23; GDPR Arts 5(1)(a), 6, 9, 25, 35; Art. 8 ECHR

Table 4.3b: mcs-supervisory supervisory-evidence layer (eleven classes). Slot 23 counts three thematic classes (SupervisoryEvidenceArtifact, Incident, NonQualifyingADM) as one slot under the counting convention.

4.4 EU-AIAct extension

The W3C DPVCG EU-AIAct extension (v2.3, 25 February 2026) covers Provider, Deployer, RiskLevel, CEMarking, AISystemPerformance, AIRegulatorySandbox, ProviderHumanOversightMeasure, DeployerHumanOversightMeasure, QualityManagementSystem, and PostMarketMonitoringSystem. Adopting the extension closes roughly ten of the twenty-three MCS gaps simultaneously at the AI Act legal-role layer. The extension does not cover the NonQualifyingADM residual class (GDPR Arts 5, 6, 9, 25, 35 governance paths) and does not cover typed Article 12 log events at statutory-subclass granularity. These remain MCS-side work.

4.5 Empirical validation findings

The 25-case sample yields four findings (full per-case rationales in mcs_case_sample_v0_3.csv; cases listed in Appendix B).

Finding 1: Solely dominates the litigated sample. Fifteen of 25 cases classify as Solely; three as Assisted; six as Out-of-Art.22-Scope (plus one Pending). The Solely dominance is selection bias, not population distribution; cases reach courts because classification was contested.

Finding 2: ADM enforcement decided on GDPR provisions other than Article 22 is a structural category. Six of 25 cases (B17, B18, B19, B20, B22, B24) are Out-of-Art.22-Scope: ADM substrate present but case decided on Art. 5 fairness, Art. 6 lawful basis, Art. 9 special category, Art. 25 by-design, or Art. 8 ECHR. This is the empirical anchor for gov:NonQualifyingADM. Sensitivity analysis: under the F2-relevant subset flip [16%, 36%]; under the inclusive-BORDERLINE flip [16%, 44%]. Wilson 95% CI [11.5%, 43.4%] anchored on k=6, n=25. The proportion is not a stable empirical fact; the existence and structural significance of the category is robust.

Finding 3: Solely/Assisted boundary is jurisprudentially unstable. The Uber/Ola Amsterdam sequence (B9, B10, B11, B12, B16) shows the same fact pattern classified differently depending on whether courts apply a formal or substantive test. The Hof Amsterdam (4 April 2023) reversed first instance, holding Krakow review “not much more than a purely symbolic act”. This validates the dual-profile SHACL design in §5.5 and grounds gov:authorityActuallyExercised as a sixth HumanIntervention property.

Finding 4: SCHUFA upstream attribution is operative in national courts. Cases B1, B2, B3 confirm the drawsStronglyOn path is applied by national courts post-SCHUFA. The Wiesbaden Verwaltungsgericht (January 2026) treated the existence of a specialised scoring model as itself evidence of strong reliance, without requiring quantification. This validates the predicate but reveals courts apply it qualitatively, not quantitatively. v1.2 makes drawsStronglyOn method a profile-level choice (Profile A quantitative; Profile B qualitative).

Reliability caveat. All four findings rest on single-coder classification by the author in round 1 (23 April 2026). Round-2 self-consistency 7 May 2026. Track 1 commissions an independent second coder. Until Track 1 reports κ ≥ 0.60, treat the findings as one coder’s structured reading rather than inter-subjectively validated empirical claims.

RQ2 answer: operational mapping to European legal categories

5.1 Article 22 doctrinal position

Two CJEU judgments frame Article 22 interpretation as of April 2026. C-634/21 SCHUFA Holding (Scoring), judgment of 7 December 2023: an automated probability score is itself a “decision” within Article 22(1) where the downstream actor draws strongly on it (paras 50, 73; reliance test grounded in para 48). C-203/22 Dun and Bradstreet Austria, judgment of 27 February 2025: “meaningful information about the logic” under Article 15(1)(h) must be functional rather than merely algorithmic.

Supervisory practice diverges. The Dutch AP (Advies artikel 22 AVG, 10 October 2024; Handvatten betekenisvolle menselijke tussenkomst, July 2025) reads Art. 22(2)(a) and Art. 22(3) as permitting automated selection followed by meaningful human treatment to fall outside Art. 22 engagement. CNIL, AEPD (March 2024), and several German state DPAs read SCHUFA more broadly: a score drawn strongly on by a downstream decision engages Art. 22(1) regardless of intervening human review unless intervention meets a higher bar. The MCS does not pre-decide the divergence. §5.5 operationalises it as two SHACL profiles over the same evidence graph.

5.2 Ten-supervisor model (Dutch scope)

Dutch AI governance operates through ten supervisory roles held by nine distinct legal entities, each with dated authorities: AFM (Agenda 2026, 19 January 2026); DNB (SAFEST, 25 July 2019); IGJ (MDR Rule 11 guidance, 10 February 2025); AP (coordinating algorithm supervisor since February 2025; Advies art. 22, 10 October 2024; Handvatten July 2025); RDI (AP-RDI Final advice, 7 November 2024; AI regulatory sandbox supervisor as of August 2026); Nederlandse Arbeidsinspectie/SZW (Algorithmic management, 20 November 2024); NCSC (NIS2); ILT (autonomous transport AI); ACM consumer-and-markets and ACM energy. The Uitvoeringswet AI-verordening (expected Q4 2026) will clarify coordination; current practice operates on the AP-RDI advice.

5.3 Mapping matrix with inline formal sketches

Rows are agent-behaviour primitives (8). Columns are legal categories (6). Each cell carries a status flag — [SD] statutory derivation, [IP] interpretive proposal, [RC] research conjecture — plus a confidence grade (H/M/L). The full 8 × 6 matrix is in Appendix A.3 (and as mcs_mapping_matrix_v0_3.csv in the Zenodo deposit). Two representative cells with inline sketches:

F1 Decision × Art. 22 [SD]

∀d. Decision(d) →
  Art22Engaged(d) ↔
    ( ( SolelyAutomated(d)
      ∨ ∃s. Score(s) ∧ drawsStronglyOn(d, s) )
    ∧ HasLegalOrSignificantEffect(d, subject) )

SolelyAutomated(d) ↔
  ¬∃hi. HumanIntervention(hi) ∧ hi.contributesTo(d) ∧ MeaningfulIntervention(hi)

MeaningfulIntervention(hi) ↔
  hi.hasAuthorityToDeviate = true
  ∧ hi.hasUnderstandingOfLogic = true
  ∧ hi.hasMarginOfDiscretion = true
  ∧ hi.consideredAllRelevantData = true
  ∧ hi.authorityActuallyExercised = true   [Profile B]

Captures Art. 22(1) scope plus SCHUFA upstream-attribution; the legal-or-significant-effect qualifier; WP251rev.01 meaningful-intervention criteria as conjunctive; the authorityActuallyExercised property responding to Uber/Ola. Does not capture Art. 22(2) exemptions, Art. 22(4) special-category restriction, or Art. 22(3) safeguards when engaged under Art. 22(2)(a) or (c).

F4 Tool invocation × NIS2 significant incident [IP]

∀i. Incident(i) →
  SignificantIncident(i) ↔
    ( ( CausedSevereOperationalDisruption(i, entity)
      ∨ CapableOfCausingSevereOperationalDisruption(i, entity)
      ∨ CausedSevereFinancialLoss(i, entity)
      ∨ CapableOfCausingSevereFinancialLoss(i, entity) )
    ∨
      ( CausedConsiderableDamageToOthers(i)
      ∨ CapableOfCausingConsiderableDamageToOthers(i) )
    )
where Severe(i) is judged by:
  ExtentOfFunctionImpaired(i) ∧
  DurationOf(i) ∧
  NumberOfRecipientsAffected(i)         (Recital 101)

EntityClassification(entity) ∈ {Essential, Important}
  determines reporting regime under Art. 3 and the
  reporting cadence under Art. 23(4),
  not the significance threshold itself.

Heaviest-load NIS2 cell. A naive encoding as F(operational_disruption ∧ essential_services) under-classifies reportable incidents: conjunction would require harm to both entity and external persons, whereas Art. 23(3) requires either; dropping the severity qualifier lowers the threshold below statutory text; treating “essential services” as a threshold component conflates two distinct provisions. A controller relying on such encoding could fail to report incidents NIS2 Art. 23(3) requires to be reported.

Sketches F2, F3, F5–F13, including the three NIS2-parity sketches (F11 delegation revocation × NIS2 incident window; F12 memory write × Art. 23(4) reporting cadence; F13 policy version change × Art. 21 measures), appear in Appendix A.4–A.19 below.

5.4 Formal-methods scope

The MCS adopts PROV-O, OWL-Time and SHACL as the constrained baseline for general supervisory ingestion. Fuller temporal-logic verification (LTL, CTL model checking, Event-B refinement) may be appropriate for safety-critical AI subsystems where the cost of a wrong classification justifies the verification overhead. The natural site for such verification is §8 research-agenda item 6 (causal blast radius algorithms for revocation propagation); outside that niche the MCS treats SHACL-time validation as the operative tool.

5.5 Supervisory divergence as conflicting SHACL profiles (v1.2)

In the companion SHACL release (mcs_profiles_v1_2.ttl), both Profile A and Profile B ship, plus Profile C (UK ICO post-Brexit) and Profile D (Italian Garante post-Law 132/2025) as stubs.

Profile A (AP-aligned, permissive, quantitative). A preparatory gov:Score is in-scope under Art. 22(1) only where a downstream gov:Decision with hasLegalEffect true satisfies drawsStronglyOn under the quantitative flip-rate test (ε = 10% of population range, M = 50% flip rate) and no gov:AssistedDecisionWithMeaningfulHumanIntervention pattern intervenes.

Profile B (CNIL-aligned, strict, qualitative). A preparatory gov:Score is in-scope whenever a downstream decision drawsStronglyOn it under any of three judicial indicia (specialised scoring model; refusal frequency ≥ 0.90; absence of documented intervention authority), unless intervention meets all five conjunctive properties including gov:authorityActuallyExercised.

gov:ProfileA a gov:SupervisoryProfile ;
    gov:hasDrawsStronglyOnMethod gov:QuantitativeFlipRate ;
    gov:quantitativeEpsilon "0.10"^^xsd:decimal ;
    gov:quantitativeM "0.50"^^xsd:decimal .

gov:ProfileA_Art22ScopeShape a sh:NodeShape ;
    sh:targetClass gov:Score ;
    sh:sparql [ sh:select """
        SELECT $this WHERE {
            $this a gov:Score .
            ?d gov:drawsStronglyOn $this ;
               gov:hasLegalEffect true .
            FILTER NOT EXISTS {
                ?d a gov:AssistedDecisionWithMeaningfulHumanIntervention .
            }
        } """ ] .

gov:ProfileB a gov:SupervisoryProfile ;
    gov:hasDrawsStronglyOnMethod gov:QualitativeJudicialIndicia .

gov:ProfileB_Art22ScopeShape a sh:NodeShape ;
    sh:targetClass gov:Score ;
    sh:sparql [ sh:select """
        SELECT $this WHERE {
            $this a gov:Score .
            ?d gov:drawsStronglyOn $this ;
               gov:hasLegalEffect true .
            FILTER NOT EXISTS {
                ?d gov:hasHumanIntervention ?hi .
                ?hi gov:hasAuthorityToDeviate true ;
                    gov:hasMarginOfDiscretion true ;
                    gov:hasUnderstandingOfLogic true ;
                    gov:consideredAllRelevantData true ;
                    gov:authorityActuallyExercised true .
            }
        } """ ] .

An agent validated against both profiles surfaces the delta as a governance artefact. Decisions that pass Profile A but fail Profile B trigger a gov:ProfileDelta record per gov:CrossProfileGovernanceShape; the delta requires either jurisdictional routing, elevation of human intervention attributes, or a documented supervisory dialogue. This converts a latent regulatory conflict into an explicit decision for the controller, recorded as provenance. Empirical support comes from the Uber/Ola Amsterdam sequence (§4.5 Finding 3).

5.6 SME proportionality profile

The dual-profile architecture addresses jurisdictional divergence; it does not address size-based capability asymmetry. The MCS specifies an SME proportionality profile (mcs_sme_profile_v0_2.ttl) that operates on two axes. Eligibility: EU Recommendation 2003/361/EC (<250 employees, turnover ≤ €50m or balance sheet ≤ €43m), reassessed annually. Substitution structure: a subset of SHACL shapes is marked gov:SMEExempt true; a structured accountability narrative substitutes for SHACL-checked structural evidence on those shapes. The narrative declares (i) what evidence would have populated the shape; (ii) why operational scale makes structural evidence disproportionate; (iii) what alternative accountability mechanism is provided.

What stays in SHACL-validated scope for SMEs: core Decision classification, meaningful-intervention schema, policy version recording, delegation/revocation, Article 22 scope shapes (both profiles). What becomes prose-substitutable: typed Article 12 LogEvent subclasses at full granularity, post-market monitoring at sub-monthly cadence, Bayesian decay structure (scalar priors permitted), cross-profile governance shape for single-jurisdiction controllers. A supervisor may suspend the proportionality treatment for a specific dimension during an investigation window via gov:ExemptionSuspension; outside the window proportionality resumes. The profile does not reduce substantive legal obligation.

Temporal governance

6.1 Revocation propagation

When a delegation is revoked, the MCS specifies five effects that may apply, any or all: BlocksNewWork (no new tasks), CancelsQueuedWork (queued tasks cancelled), StopsInflightWork (in-progress tasks stopped), RequiresReview (recently completed tasks reviewed before acceptance), MarksArtifactsStale (downstream artefacts flagged stale). Default in absence of explicit configuration: BlocksNewWork + MarksArtifactsStale. StopsInflightWork is opt-in because it can leave consequential actions half-executed. Property gov:executedUnderRevokedAuthority flags decisions produced under revoked authority; gov:cachedUnderRevokedAuthority flags artefacts produced before but accessed after revocation.

6.2 Evidence decay

The temporal reliability of evidence decays. The MCS replaces a scalar gov:probativeWeight default with a Bayesian decay function anchored in declared sectoral priors:

gov:probativeWeight(e, t) = gov:initialWeight(e) × exp(-λ(sector) × Δt)

where Δt is elapsed time since e’s last refresh and λ(sector) is a sector-specific decay rate expressed as a half-life. Declared priors:

Sector / regime	Half-life	Anchor
ISAE 3402 Type 2 audit evidence	12 mo	ISAE 3402 annual re-assertion
ECB model validation (CRR/CRD)	12 mo	ECB Guide to Internal Models
MDR post-market surveillance	6 mo	MDR Annex III §1.1(a)
AI Act Art. 12 automatic log retention	6 mo	AI Act Art. 26(6) minimum retention floor
AI Act Art. 72 post-market monitoring	12 mo	AI Act Art. 72 annual cycle
NIS2 Art. 21 measure attestation	6 mo	NIS2 Art. 20(1) management-body cadence
Manual human attestation	3 mo	Operational presumption pending empirical anchor

Table 6.2: Declared sectoral half-lives. Defensible defaults, not statutory; controllers running with different priors record the divergence. SHACL shape gov:EvidenceDecayShape validates that priors are either declared defaults or documented divergence. The AI Act Art. 26(6) six-month log retention is interpreted as a decay floor: probativeWeight is clamped to max(Bayesian_decay(e,t), 0.5) during the first six months.

The declared-priors framework does not resolve the underlying empirical anchor gap. It converts the gap from an unreasoned absence to a reasoned presumption that can be falsified as empirical data arrives.

6.3 Temporal extension with SHACL

The temporal extension defines six classes and twenty-one SHACL shapes. Classes: gov:DelegationGrant (subclass of prov:Delegation), gov:RevocationEvent, gov:ValidityInterval (subclass of time:ProperInterval), gov:AuthorityScope, gov:PolicyVersion (subclass of prov:Entity), gov:SupervisoryEvidenceArtifact. Shapes (14 core + 7 profile-layer in v1.2): DelegationGrantShape, PolicyVersionShape, EvidenceDecayShape, EvidenceQualityShape, MonitoringObservationShape, DriftIndicatorShape, AgentActionShape, RoleAssignmentShape, BiometricLogShape, DecisionShape, SchufaUpstreamShape, HumanInterventionShape, RevocationEventShape, LogEventSubclassGranularityShape.

The shapes fire as structural completeness checks, not as legal-compliance checks. sh:conforms=true indicates only that data meets the structural constraints. It does not establish compliance with the AI Act, GDPR, NIS2, DORA, the CRA, or any other instrument of Union or national law. This disclaimer is embedded in the vocabulary and shapes headers.

6.4 drawsStronglyOn operationalisation

Profile A (quantitative). drawsStronglyOn(d, score, ε, M) holds iff the empirical flip rate of (score, d) under counterfactual perturbation magnitude ε exceeds threshold M. Computable from PROV-O DAGs: perturb s by ε across its empirical distribution; re-run the decision pipeline; measure the proportion of replicas in which d flips outcome. Parameters: ε = 10%, M = 50%. Controller-side compliance instrument; may be rejected by supervisors applying qualitative indicia.

Profile B (qualitative). drawsStronglyOn(d, s) holds iff any of three judicial indicia applies. Indicium 1: the score uses a specialised scoring model (Wiesbaden VG January 2026). Indicium 2: refusal frequency given insufficient score ≥ 0.90 (operationalises SCHUFA ¶48 “in almost all cases”). Indicium 3: the decision lacks documented intervention with the five Profile B properties (operationalises D&B ¶40 “lacked any manual oversight”). Profile B is alignment with decided practice, not research conjecture.

Discussion

7.0 Cross-cutting design tensions

Six tensions shape the MCS, surfaced so a reader who disagrees can locate the design lever. Expressiveness vs auditability. Constrained expressiveness; 23 classes split into 12+11. Ontology richness vs supervisor usability. Layered design; mcs-core as minimum supervisory core. Temporal precision vs implementation feasibility. Minimum mandatory time fields with optional richer logic for safety-critical niches. Legal caution vs operational utility. Conditional rules with status flags (SD/IP/RC). The MCS never asserts legal classification; it produces evidence that controllers and supervisors then classify. Provenance completeness vs privacy minimisation. Tension with GDPR Art. 5(1)(c); pseudonymisation pattern is open research item (§8.16). Single supervisor profile vs cross-jurisdictional portability. v1.2 ships Profiles A and B; C and D as stubs.

7.1 Internal coherence as necessary but not sufficient

The MCS has been stress-tested via seven worked-example scenarios (companion document): retail banking, hospital triage, municipal benefits, platform delivery, revocation propagation, cross-border one-stop-shop, NIS2 + GDPR concurrent incident. All seven admit MCS class population from realistic event streams. The scenarios confirm the MCS admits population; they do not establish that deployed controllers will populate correctly, that supervisors will ingest the resulting evidence, or that the evidence will change regulatory outcomes. Those are validation-pack questions.

7.2 No empirical validation on real systems

The MCS has not been populated by any live controller, ingested by any live supervisor, or tested against any live enforcement outcome. The validation pack publishes three pre-registered open tracks: inter-rater consistency, SHACL throughput, structural fit. Track 1 is committed to for a follow-up revision. Failure at κ < 0.60 triggers rubric revision; publication of failure is part of the commitment. The empirical-validation deficit is shared with every prescriptive governance framework in the reviewed corpus; it requires a research programme.

7.3 Structural completeness is not legal compliance

A SHACL validation report with sh:conforms=true indicates only structural completeness. A supervisor who reads only the conforms flag without reading the underlying evidence is misusing the tool. The disclaimer is not self-enforcing.

7.4 Dual-profile implementation (v1.2 status)

v1.2 SHACL shapes (mcs_profiles_v1_2.ttl) implement Profile A (AP-permissive, quantitative) and Profile B (CNIL-strict, qualitative). Profiles C (UK ICO) and D (Italian Garante post-Law 132/2025) ship as stubs. A controller running v1.2 against a decision graph populates both profiles and surfaces deltas via gov:CrossProfileGovernanceShape. The implementation does not resolve underlying legal divergence; it makes divergence representable and auditable.

7.5 Multi-stakeholder supply chains

The MCS addresses third-party risk through DelegationGrant, RoleAssignment time-indexing, and SectoralRegime, treating model provider, orchestration platform, tool provider, and end-user deployer as distinct legal roles with distinct RoleAssignments against a shared Agent. The architecture reflects shared responsibility, not pure vendor-management. It does not specify governance interfaces between entities (e.g., model provider QMS evidence flowing structurally into deployer conformity assessment); that interface specification is flagged as an open item in §8.

7.6 Interoperability and the limits of regulatory-coherence

The MCS v1 treats EU regulation as coherent enough to permit a single structured evidence ontology to bridge regimes. Three limits surface. NIS2 transposition is incomplete: as of late 2025, ~14 of 27 Member States have fully transposed NIS2; the European Commission issued reasoned opinions to 19 Member States on 7 May 2025; targeted amendments proposed 20 January 2026. Supervisor guidance increasingly says “existing frameworks apply”: FCA (UK), BaFin (DE, 18 December 2025: AI as DORA ICT asset), CNIL (FR), AP (NL). The MCS must position as complementary to existing frameworks, not as substitute. Italy is a counter-example: Law No. 132/2025 (effective 10 October 2025) adds sector-specific AI obligations beyond the AI Act. The MCS responds with Profile D as stub and committed national-variant population (Italy, UK).

7.7 Dual-use considerations

Released artefacts contain only data-model components: RDF/OWL vocabulary, SHACL shapes, workload generator, framework-inventory document. These cannot observe, gate, or enforce behaviour. The concern arises from the design trajectory: a machine-readable evidence vocabulary, SHACL validation shapes, live agent-event streams emitting conformant data, an SSII, and Runtime Autonomy Gates with a push-based Compliance MCP Server — when four or five exist and are integrated, the system shifts from documentation infrastructure to operational control infrastructure. Five risks are acknowledged: surveillance risk; Hildebrandt-type meta-critique (framing, portability, formalism, ripple-effect, solutionism traps); vendor lock-in; regulatory capture; weaponisation by authoritarian deployers. The split-publication strategy (data-model artefacts open; operational-layer artefacts closed pending threat-model review) is partial mitigation, not a technical control.

7.8 Distributive effects

If the MCS or comparable specification became the de facto compliance norm, who gains and who loses. Three classes are advantaged: large controllers with established compliance budgets (existing ISO 27001, ISAE 3402 Type 2, internal model-validation absorb MCS as marginal extension); standards-literate consultancies and integrators (Apparens is one such firm; the Section 1 disclosure applies with full force); DPAs with technical staff. Three classes are disadvantaged: SMEs without compliance budgets (binary choice between consultancy €25k–€100k/yr or Big-4 at €500k–€2m/yr or in-house adoption with significant error risk); data subjects without technical representation; SMEs in regulated sectors who are simultaneously controllers and data subjects of upstream agent decisions. Net effect: a landscape in which large controllers and well-resourced supervisors interact through specialised consultancies, with SMEs and data subjects as second-class users. Three mitigations: mandatory plain-language SHACL ValidationReport summaries (prototype report_translator_v0_2.py); reference open-source SSII maintained by non-commercial body; SME proportionality profile (§5.6, mcs_sme_profile_v0_2.ttl). The author has specified the first and third; the second is an agenda item. A specification author who benefits commercially from the rise of the specification class is structurally limited in capacity to design mitigations against that benefit; follow-on work led by a different team is the natural locus.

Research agenda

Sixteen items ordered by precedence. Items are single-team-feasible unless otherwise flagged.

EU-AIAct extension and AI Act ontology alignment. Drive the W3C DPVCG EU-AIAct extension to mature status, or develop an equivalent CEN-CENELEC AI Act ontology covering Provider, Deployer, Manufacturer roles, Annex III categories, Art. 12 typed logs, Art. 72 monitoring, and the NonQualifyingADM residual. Highest-impact single action: eliminates ten of twenty-three MCS gaps simultaneously. NonQualifyingADM is not addressed by EU-AIAct alignment as currently scoped.
Standardised Supervisory Ingestion Interface (SSII). Transport-layer specification over MCS evidence graphs. Two operating modes: pull (SPARQL endpoints with supervisor-side query templates for the five standard questions) and push (Compliance MCP Server emitting structured evidence on schedule or trigger). Open items: threat model, minimum-viable reference endpoint, at least one supervisor pilot. Treated as research direction, not delivered contribution.
Temporal Delta Auditing system. Reconstruct how agent reasoning would differ under alternative policy versions, with versioned Ontology Ledgers and lex mitior compliance evaluation. The lex mitior application to administrative compliance has no case-law anchor as of April 2026; frame any pilot as a doctrinal proposal.
Empirical validation of Profile A quantitative pairings. Do (ε=10%, M=50%) at controller-side predict supervisor- and court-endorsed Solely vs Assisted classifications.
Meaningful-intervention predicate. Whether the six properties jointly predict the substantive-review test courts apply (empirical anchor: Uber/Ola Hof Amsterdam April 2023).
Causal blast radius algorithms for revocation propagation. Formal-methods-grade algorithms; natural site for LTL/CTL verification.
Authority-utilisation metrics for human intervention. Deviation rate, latency, time-on-task as predictors of substantive review.
Evidence decay algorithms for autonomous systems. Sector-specific decay functions with empirically validated refresh cadences.
Neurosymbolic auditability for LLM agents. Chain-of-Thought reasoning that natively outputs RDF triples aligned with enterprise ontologies.
Dynamic risk re-classification. Runtime monitoring detecting movement Low-Risk → High-Risk Annex III with automatic escalation. RC-flagged.
European Agent Identity Registry. Persistent agent identity built on SPIFFE/SPIRE.
Resolving the Article 22 rights-vs-procedural-obligation debate ontologically. Whether to encode DataSubjectInvocationEvent (rights) or continuous ComplianceAssessment (procedural), or both.
Bridge artefact maintenance and Member State variants. Italy and UK first.
Validation-pack execution and aggregation. Tracks 1, 2, 3.
Open test corpora for retrospective supervisory reconstruction. Synthetic but realistic agent-fleet event streams; natural maintainer is academic consortium or CEN-CENELEC-adjacent expert group.
Privacy-preserving provenance patterns. Minimum personal-data retention compatible with full authority and decision-trace reconstruction.

Conclusion

Agentic AI has outpaced the ontologies intended to govern it. Vendor ontologies model enterprise workflow fluently but ignore data-protection primitives. The open normative stack (PROV-O, OWL-Time, SHACL) supplies the spine but not the legal-semantic layer. GDPRov and DPV supply the GDPR layer but not AI Act roles, the Article 22 Decision taxonomy, or the Article 12 typed log events. ISO/IEC and CEN-CENELEC supply prose, not machine-readable ontology. No instrument combines these into a supervisor-ingestible whole.

The MCS proposes such a combination. 23 conceptual slots organised into a twelve-slot agent-behaviour core and an eleven-slot supervisory-evidence layer, anchored in operative statutory provisions or in empirical case-law validation, contributing 16 net-new classes under strict reuse-zero accounting (69 owl:Class declarations in the Turtle source). A temporal and validation layer with fourteen core SHACL shapes plus seven profile-layer shapes in v1.2 plus a Bayesian evidence-decay structure with declared sectoral priors. An integration layer reusing GDPRov, DPV, and PROV-O via owl:imports. A Standardised Supervisory Ingestion Interface as research direction in §8 rather than delivered contribution. A companion validation pack with three pre-registered open tracks; Track 1 committed.

The MCS succeeds if European supervisors, regulated controllers, and peer researchers find the synthesis defensible enough to build on and sharp enough to critique. It fails if that community finds the synthesis non-reproducible, operationally infeasible, or structurally inadequate. The validation pack’s three tracks test these failure modes in order. Both outcomes — success and failure — are scientifically useful and both are publishable as version 2. That commitment is the paper’s final claim.

Companion artefacts

Companion artefacts and appendices

The full appendix corpus (A–F), the Turtle vocabulary, SHACL profiles, TLA+ revocation model, 25-case sample, mapping matrix, deep-coding matrix, scenarios, validation pack, reproducibility scripts, and Track 1 blinded coder package are deposited on Zenodo as DOI 10.5281/zenodo.19758441. Per the authority hierarchy in the release README, machine-readable files are authoritative for the ontology, SHACL, CSV coding, and scripts; markdown files are human-readable specifications; PDF and DOCX renderings are publication outputs derived from the markdown source.

Appendix A

Operational mapping and formal sketches

Full 8 × 6 mapping matrix (primitives × legal categories) under Profile A coding (Table A.1). Sixteen formal sketches: F1 Decision × Art. 22, F2 Score × SCHUFA upstream, F3 Tool invocation × effect propagation, F4 Tool invocation × NIS2 significant incident, F5 Memory read × Art. 22 contributing factor, F6 Delegation × Art. 28 processor chain, F7 Policy version × lex mitior, F8 Evidence × AS 1105 reliability hierarchy, F9 Human intervention × authority utilisation, F10 Risk classification × Annex III, F11–F13 NIS2-parity sketches (revocation window, memory write Art. 23(4), policy version Art. 21), F14–F16 autonomous goal setting, cumulative goal execution, delegation × Art. 14. Source: mcs_mapping_matrix_v0_3.csv.

Appendix B

25-case empirical sample

Twenty-five EU ADM enforcement decisions (25 May 2018 to April 2026): SCHUFA (B1), Dun & Bradstreet Austria (B2), Wiesbaden VG (B3), the Italian Garante cluster (B4–B7, B23), Belgian APD (B8), the Uber/Ola Amsterdam sequence (B9–B12, B16), The Hague gun-licence (B13), AMS AMAS (B14), Slovak e-kasa (B15), Norwegian IB grading (B17), Clearview AI cluster (B18, B24), live FR in schools (B19), Toeslagen (B20), AP toezichtarrangement (B21), SyRI (B22), Hamburg DPA SCHUFA position (B25). Round-1 coding 23 April 2026 by the author. Source: mcs_case_sample_v0_3.csv.

Appendix C

Worked scenarios

Seven scenarios maintained in companion mcs_scenarios_v0_5_1.md: retail banking, hospital triage, municipal benefits, platform delivery, revocation propagation, cross-border one-stop-shop, NIS2 + GDPR concurrent incident.

Appendix D

Validation pack architecture

Three pre-registered tracks. Track 1 Independent Replication: one independent coder applies the rubric to the 25-case sample blind to the author’s round-1 classifications; Cohen’s κ reported; pre-registered hypothesis κ ≥ 0.70; failure at κ < 0.60 triggers rubric revision. Track 2 Technical Feasibility Benchmark: SHACL throughput at 1, 10, 100, 1000 decisions/sec on pyshacl, Apache Jena SHACL, TopBraid SHACL API; pre-registered p95 latency < 100ms at 100 dps. Track 3 Structural Fit Pilot: contributor with different agent topology validates structural sufficiency. Source: validation_pack_v1.md.

Appendix E

Coding rubrics for reproducibility

Rubrics for the survey coverage matrix, the deep-coding reusability matrix, the mapping matrix status/confidence coding, and case classification (the five-step protocol promoted to §2.7). E.7 publishes report_translator_v0_2.py: a plain-language SHACL ValidationReport translator (~300 lines, Python 3.11, rdflib + pyshacl). Licensed CC BY 4.0 (deliberately more permissive than the paper’s licence) so data-subject representation organisations can build production variants. Full rubrics in coding_rubrics_v0_2.md.

Appendix F

Revocation propagation, LTL formalisation

LTL formulas for the five §6.1 revocation effects: F.3.1 transitive revocation; F.3.2 BlocksNewWork as default post-revocation state; F.3.3 CancelsQueuedWork (immediate); F.3.4 StopsInflightWork as opt-in; F.3.5 MarksArtifactsStale (eventual + persistent). LTL chosen over CTL/TCTL because the five effects are linear-trace properties; TCTL would be required for hard-deadline revocation in safety-critical subsystems. Transition system in mcs_revocation_model_v0_2.tla with config .cfg; TLC model-checking transcript pending (TLC_STATUS_v0_5.md). Until checked, the formulas are publication of intent rather than verified properties.

Data and code availability. The complete v0.5.1 source pack is on Zenodo as DOI 10.5281/zenodo.19758441. The deposit contains the OWL/RDF vocabulary mcs_vocabulary.ttl (517 triples, 69 owl:Class, 16 owl:ObjectProperty, 24 owl:DatatypeProperty), the SHACL profiles mcs_profiles_v1_2.ttl (21 NodeShapes) and mcs_sme_profile_v0_2.ttl (3 NodeShapes), the TLA+ revocation model with v0.5.1 property rewrites, the case sample, the mapping matrix, the deep-coding matrix, the worked scenarios, the validation pack, the reproducibility scripts (reproduce_package_counts_v0_5_1.py, compute_track1_kappa_v0_5_1.py, generate_synthetic_mcs_workload_v0_5.py, run_track2_minimal_benchmark_v0_5_1.py), the SHACL pass and fail examples with expected reports, the Track 1 blinded coder package, the legal mapping review protocol and template, the ERRATA reconciliation against v0.5 and v0.4, and the artefact manifest with verified counts. The Concept DOI on Zenodo resolves to the latest version. Cite the version DOI for reproducibility.

↓ Paper PDF ↓ Companion pack PDF Zenodo deposit (full source pack) →

References

Argyris, C. and Schön, D. A. (1978) Organizational Learning: A Theory of Action Perspective. Reading, MA: Addison-Wesley.
Atkinson, K., Bench-Capon, T. and Bollegala, D. (2020) Explanation in AI and law: Past, present and future. Artificial Intelligence, 289.
Autoriteit Persoonsgegevens (2024) Advies artikel 22 AVG en geautomatiseerde selectie-instrumenten, 10 October 2024.
Autoriteit Persoonsgegevens (2025) Handvatten betekenisvolle menselijke tussenkomst, July 2025.
Autoriteit Persoonsgegevens and Rijksinspectie Digitale Infrastructuur (2024) Final advice on the supervisory structure for the AI Act, 7 November 2024.
AFM (2026) AFM Agenda 2026, 19 January 2026.
Bench-Capon, T. and Sartor, G. (2003) A model of legal reasoning with cases incorporating theories and values. Artificial Intelligence, 150(1-2), pp. 97–143.
BaFin (2025) Guidance on ICT risks in the use of AI at financial entities, 18 December 2025.
CJEU (2023) Case C-634/21 SCHUFA Holding (Scoring), judgment of 7 December 2023.
CJEU (2023) Case C-26/22 SCHUFA Holding (Discharge), judgment of 7 December 2023.
CJEU (2025) Case C-203/22 CK v Dun and Bradstreet Austria, judgment of 27 February 2025.
Cobbe, J., Singh, J. and Sheridan, T. (2025) Governance and accountability frameworks for AI agents. Working paper.
DNB (2019) General principles for the use of Artificial Intelligence in the financial sector (SAFEST), 25 July 2019.
European Commission (2016) Regulation (EU) 2016/679 — General Data Protection Regulation. OJ L 119, 4 May 2016.
European Commission (2022) Directive (EU) 2022/2555 — NIS2 Directive. OJ L 333, 27 December 2022.
European Commission (2024) Regulation (EU) 2024/1689 — Artificial Intelligence Act. OJ L series, 12 July 2024.
European Commission (2024) Directive (EU) 2024/2853 — Revised Product Liability Directive. OJ L series, 18 November 2024.
Hashmi, M., Governatori, G., Lam, H.-P. and Wynn, M. T. (2018) Are we done with business process compliance: state of the art and challenges ahead. Knowledge and Information Systems, 57, pp. 79–133.
Hildebrandt, M. (2015) Smart Technologies and the End(s) of Law. Cheltenham: Edward Elgar.
Janssen, J. (2025) The AI Accountability Trap. Deventer: Apparens.
Janssen, J. (2026) From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI. Apparens Working Paper. Available at: https://apparens.nl/essay-red-teaming.
Janssen, J. (2026) The Implementation Gap: The AI Enterprise Control Index as an Operational Governance Instrument for Agentic AI Systems. Apparens Working Paper. Available at: https://apparens.nl/essay-implementation-gap.
Klein, G. A. (1993) A recognition-primed decision (RPD) model of rapid decision making. In Klein, G. A. et al. (eds.) Decision Making in Action. Norwood, NJ: Ablex.
Knublauch, H. and Kontokostas, D. (2017) Shapes Constraint Language (SHACL). W3C Recommendation.
Lebo, T., Sahoo, S. and McGuinness, D. (2013) PROV-O: The PROV Ontology. W3C Recommendation.
Mittelstadt, B. (2019) Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1(11), pp. 501–507.
Pagallo, U. (2013) The Laws of Robots: Crimes, Contracts, and Torts. Dordrecht: Springer.
Pandit, H. J. and Lewis, D. (2017) Modelling provenance for GDPR compliance using linked open data vocabularies. SEMANTiCS 2017.
Pandit, H. J., et al. (2019) Creating a vocabulary for data privacy (DPV). OTM Confederated International Conferences.
Sartor, G. (2005) Legal Reasoning: A Cognitive Approach to the Law. Dordrecht: Springer.
Sergot, M. J., Sadri, F., Kowalski, R. A., Kriwaczek, F., Hammond, P. and Cory, H. T. (1986) The British Nationality Act as a logic program. Communications of the ACM, 29(5), pp. 370–386.
Veale, M. and Edwards, L. (2018) Clarity, surprises, and further questions in the Article 29 Working Party draft guidance on automated decision-making and profiling. Computer Law and Security Review, 34(2).
Verheij, B. (2003) Dialectical argumentation with argumentation schemes: An approach to legal logic. Artificial Intelligence and Law, 11(2-3).
Verheij, B. (2017) Formalizing arguments, rules and cases. ICAIL 2017.
Wachter, S., Mittelstadt, B. and Floridi, L. (2017) Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation. International Data Privacy Law, 7(2), pp. 76–99.
Article 29 Data Protection Working Party (2018) Guidelines on automated individual decision-making and profiling (WP251rev.01).