AI Agents Under EU Law: what providers actually have to do
High-risk agentic systems with untraceable behavioural drift cannot currently satisfy the essential requirements of the AI Act
A new working paper sets out the first systematic compliance map for AI agents on the EU market. I’m one of nine co-authors, alongside Luca Nannini (corresponding author), Michele Joshua Maggini, Enrico Panai, Sandra Feliciano, Aleksandr Tiulkanov, Elena Maran, James Gealy and Piercosma Bisconti. The paper is on arXiv: 2604.04604. It has been very well received on LinkedIn and elsewhere, so here is a TLDR version.
This is the short version. If you build, sell or deploy AI agents in the EU, the headline is simple: the AI Act is one of at least nine instruments you need to plan against, and high-risk agents whose behaviour drifts at runtime cannot meet the essential requirements today.
What the paper does
Three contributions:
A practical taxonomy of nine agent deployment categories, mapping concrete actions (sending an email, processing a refund, screening a CV) to the legislation that those actions trigger.
A list of agent-specific compliance challenges that the draft harmonised standards address only partially: cybersecurity, human oversight, transparency across multi-party action chains and runtime behavioural drift.
A twelve-step compliance architecture that integrates the AI Act with eight parallel EU instruments through the QMS standard (prEN 18286) as coordinator.
The paper draws on the January 2026 working drafts of the M/613 harmonised standards, the GPAI Code of Practice, the M/606 CRA standards programme and the Digital Omnibus proposals of November 2025.
The core argument
The AI Act has no legal definition of “agent”. This is deliberate: the Act regulates AI systems, not architectural patterns. An AI agent satisfies every element of the Article 3(1) AI system definition. The functional properties that distinguish agents — autonomous tool invocation, multi-step planning, environmental interaction, adaptive execution — increase regulatory risks that the framework already addresses in principle.
Your regulatory profile is determined by what your agent does in deployment, not by what is inside it. The same LLM-with-tool-calling pattern produces wildly different obligations depending on domain and external effects. An agent screening CVs hits Annex III high-risk classification and the full weight of Chapter III. An agent summarising meeting notes triggers only Article 50 transparency.
The provider’s foundational task is therefore to conduct an exhaustive inventory of the agent’s external actions, data flows, connected systems and affected persons. Classification follows from that inventory.
Four agent-specific challenges
Cybersecurity: Enforce privilege outside the model. A system prompt saying “do not delete files” is not a security control. Article 15(4) compliance requires architectural enforcement: the agent’s API simply does not expose capabilities it should not have. An email-summarisation agent gets a read endpoint, not send or delete.
Human oversight: design for evasion risk. LLMs trained on corpora containing oversight-circumvention patterns, or fine-tuned with RL where evasion was rewarded, may take actions that bypass oversight even when the system prompt tells them not to. Article 14 oversight must be designed as an external constraint rather than an internal instruction. For irreversible actions (sending external emails, executing transactions, modifying production databases), retrospective oversight is structurally insufficient where the risk management process produces unacceptable residual risk.
Transparency across action chains. The obligations under Article 50 extend beyond the user to every party affected by the agent’s actions. When an agent emails a third party, posts to a platform, or modifies another person’s account, those people are affected individuals who may not know they are interacting with AI. The 2025 AI Agent Index found fewer than 20% of agent developers disclose formal safety policies, and fewer than 10% report external safety evaluations.
Runtime behavioural drift and substantial modification (Article 3(23)). Three mechanisms with different regulatory consequences:
Anticipated adaptive behaviour (tool selection from a documented catalogue, in-context learning, RAG): not a substantial modification if foreseen, tested, documented and risk-assessed.
Continuous learning post-deployment (weight updates, online fine-tuning): a candidate for substantial modification if not anticipated in the conformity assessment.
Emergent drift (novel tool-use patterns, persistent cross-session memory shifts, oversight-evasion strategies): the hardest case. If you cannot detect and characterise it, you cannot demonstrate that the system stays inside the assessed envelope.
The paper's position: high-risk agentic systems with untraceable behavioural drift cannot currently satisfy the essential requirements. This is the legal position now, not a future risk. Runtime state must be versioned architecture: scoped tool catalogues, replayable memory, monitored behavioural metrics, and automated drift detection against the conformity assessment baseline.
The regulatory perimeter beyond the AI Act
The paper maps eight more instruments:
GDPR — applies whenever personal data is processed in training or operation. Near-universal for agents.
ePrivacy Directive — Article 5, confidentiality of communications, applies independently of the GDPR whenever an agent reads emails, messages, or browser data.
Cyber Resilience Act — applies to “products with digital elements”, including standalone software with network connectivity. Vulnerability reporting from September 2026, full conformity by October 2027.
Data Act — applies to connected products and related services since September 2025.
Digital Services Act — applies to agents operating as, or within, intermediary, hosting, or platform services.
NIS2 — applies when agents serve essential or important entities.
DORA — primary digital resilience framework for financial-sector agents and their ICT third-party providers, in force since January 2025.
Revised Product Liability Directive — strict liability for defective AI systems from late 2026. Non-compliance with Article 15 (accuracy) is strong evidence of a defect.
Sector-specific: MDR/IVDR, MiFID II, PSD2, EASA.
Two supervisory authorities have already moved on agentic GDPR. The Spanish DPA published 71 pages of guidance on 18 February 2026, adopting the “rule of 2” heuristic: an agent should not simultaneously process untrusted input, access sensitive data, and take autonomous action affecting individuals without human oversight. The Dutch DPA followed soon after.
The Cyber Resilience Act (CRA) standards-free zone
Two parallel standardisation tracks: M/613 (AI-specific) and M/606 (CRA product-level). ETSI CYBER-EUSR is producing seventeen vertical CRA standards. None covers AI products. Providers must self-assess AI agents against horizontal CRA standards for conventional software, while separately demonstrating compliance with AI-specific threat standards. From mid-2026 to late 2027, the requirements are enforceable, but the harmonised standards are not finalised.
Practical takeaways for providers
Inventory first, classify second. Your obligations come from external effects, not internal architecture. Catalogue every action, every data flow, every connected system, every affected person, before you decide which Articles and Annexes apply.
Treat the M/613 standards as a dependency graph, not a checklist. The QMS standard coordinates the suite. Conformity with one in isolation is structurally insufficient.
Engineer least privilege at the API. Capabilities you do not want exercised should not be reachable, regardless of what the prompt says.
Design oversight as an external constraint. Confirmation prompts and execution control before irreversible actions, with dependency-aware selective continuation so independent branches keep running.
Version your runtime state. If you cannot replay an action chain or detect drift against a baseline, you cannot demonstrate that the system is still inside its conformity assessment.
Map adjacent legislation deliberately. GDPR, ePrivacy, CRA, DSA, NIS2, DORA, the Data Act, the PLD, and sectoral rules are each triggered by specific actions. Step 9 of the twelve-step sequence is where this lives.
If you sell a general-purpose agent platform, decide now: contractually and technically restrict high-risk uses, or design for the highest tier of foreseeable misuse under Article 3(13).
The full paper is at arxiv.org/abs/2604.04604.

