EU GMP Annex 11 has governed pharmaceutical computerised systems for over two decades. EU GMP Annex 22, published in draft by the EMA in 2023 and moving toward implementation in 2026, extends the framework specifically to artificial intelligence and machine learning systems used in GxP decision-making. The two annexes are not alternatives — they apply simultaneously. Organisations that treat Annex 22 as a standalone AI policy exercise, separate from their existing CSV programme, will fail inspection. This article explains exactly what each framework requires, where they overlap, and what quality and compliance leaders must do right now.
Why This Matters in 2026
The pharmaceutical and life science sector is undergoing the fastest technology transition in its regulatory history. AI systems are now embedded in batch release decisions, visual inspection lines, pharmacovigilance signal detection, clinical data review, and quality control laboratory analysis. The technology has outpaced the regulatory framework — until now.
EU GMP Annex 22 changes that. For the first time, the EMA has produced a GxP-specific regulatory framework that directly addresses artificial intelligence and machine learning: how to validate them, how to govern them through their operational lifecycle, and critically, what happens when they change — because machine learning models do change, even without human intervention.
For Quality Heads, Validation Managers and CTOs in life science organisations, 2026 is the year this moves from "watch brief" to "implementation mandate." Health authorities are already asking about AI governance in pre-approval inspections. Several FDA 483 observations in 2024–2025 specifically cited inadequate AI system oversight in GxP contexts. The regulatory community has signalled its direction clearly.
The organisations that understand both Annex 11 and Annex 22 — and build validation programmes that satisfy both simultaneously — will be inspection-ready. Those that treat them as separate projects will have gaps that regulators will find.
What Annex 11 Actually Requires — and Where It Falls Short for AI
EU GMP Annex 11 has been the cornerstone of pharmaceutical computerised system validation since its original publication and its most recent significant revision in 2011. It applies to all computerised systems used in GxP-regulated activities — manufacturing execution systems (MES), laboratory information management systems (LIMS), quality management systems (QMS), clinical data management platforms, and any software that generates, modifies, maintains, archives, retrieves or transmits electronic data used in GxP decisions.
What Annex 11 covers well
Annex 11 establishes clear requirements across the full computerised system lifecycle: risk-based validation, supplier assessment, user requirements, electronic records integrity, audit trails, access controls, disaster recovery, and the management of legacy systems. For deterministic software — systems that behave identically given the same inputs — these requirements are well-understood and implementable through established GAMP 5 methodology.
Where Annex 11 was never designed to go
The fundamental assumption embedded in Annex 11 is that software behaviour is predictable, testable and stable. A LIMS system that generates the same results given the same inputs, every time, can be validated through installation, operational and performance qualification (IQ/OQ/PQ) and then managed through change control. If the software changes, you run change control, revalidate the affected functions, and document the outcome.
Machine learning systems do not work this way. A model trained on historical batch data to predict yield may, over time, shift its internal decision-making as it processes new production data — without any human-initiated change, without any software update, without triggering your change control procedure. The model has not malfunctioned. It has learned. But the basis on which it was validated may no longer accurately reflect how it is making decisions.
Annex 11 was designed for software that behaves identically given the same inputs. Machine learning models can change their behaviour without anyone changing a single line of code. That is not a defect — it is the point of the technology. But it is a validation problem that Annex 11 was never built to solve.
This is not a theoretical concern. In 2024, an FDA investigator asked a pharmaceutical manufacturer to demonstrate how they would detect if their AI-based visual inspection system had shifted its classification boundary over time. The site could not answer the question. The observation was cited. The system had an Annex 11-compliant validation package. It did not have an Annex 22-compliant governance programme.
What Annex 22 Introduces — The Seven Core Requirements
EU GMP Annex 22 (draft, 2023 — implementation expected 2026) does not replace Annex 11. It extends it. Every AI system in a GxP environment must satisfy both frameworks. Annex 22 introduces seven requirements that have no direct equivalent in Annex 11:
- Intended use boundaries. Before deployment, the organisation must define the operational envelope within which the AI system's outputs can be trusted. This includes the data types, ranges and contexts for which the model was trained and validated. Outputs generated outside these boundaries are out-of-specification and must be treated accordingly.
- Training data governance. The quality, provenance, representativeness and integrity of training data must be documented and controlled. This includes data versioning, bias assessment, and records of data preprocessing decisions. This is ALCOA+ applied to machine learning training datasets.
- Model validation protocol. Beyond traditional IQ/OQ/PQ, Annex 22 requires a model-specific validation protocol that includes performance metrics, acceptance criteria, and test datasets that are genuinely independent of training data. Overfitting must be assessed and controlled.
- Explainability and human oversight. For GxP-critical decisions, the AI system must support human review. The nature and extent of human oversight must be documented and justified based on risk. Black-box decisions affecting patient safety require higher levels of explainability and human intervention capability.
- Continuous performance monitoring. The model must be monitored throughout its operational life against the performance criteria established at validation. This is not periodic review — it is continuous monitoring with defined alert thresholds and escalation procedures.
- Model drift detection and management. Organisations must define what constitutes unacceptable drift from validated performance, and establish procedures for detecting it, responding to it, and revalidating when thresholds are exceeded.
- Change control for model updates. Any intentional update to the model — retraining, fine-tuning, architecture change — must go through a documented change control process with revalidation requirements proportionate to the change risk.
Annex 11 vs Annex 22 — Side by Side
| Requirement | Annex 11 | Annex 22 |
|---|---|---|
| System scope | All GxP computerised systems | AI/ML systems in GxP contexts specifically |
| Validation approach | IQ / OQ / PQ, risk-based | IQ/OQ/PQ + model validation protocol, performance metrics, independent test sets |
| Change control | Software change → revalidation of affected functions | Model update (including retraining) → documented change control + proportionate revalidation |
| Ongoing monitoring | Periodic review of validated status | Continuous performance monitoring with drift detection and alert thresholds |
| Data governance | Electronic records, audit trails (ALCOA+) | ALCOA+ for operational data AND training data provenance, versioning, bias assessment |
| Human oversight | Access controls, user authorisation | Explainability, human review capability, oversight requirements proportionate to risk |
| Intended use | URS defines intended use | Explicit intended use boundaries defining valid operating envelope |
| Supplier assessment | Required for all suppliers | Required + AI-specific supplier audit criteria for algorithm developers |
The Risks of Getting This Wrong
Health authority inspectors in the EU, UK and US are now specifically trained to assess AI governance in GxP environments. An AI system with an Annex 11-compliant validation package but no Annex 22 governance programme is a major finding waiting to happen — regardless of how technically sophisticated the system is.
Beyond the regulatory exposure, there are three categories of operational risk that Quality Heads and CTOs need to understand:
Patient safety risk from undetected model drift
A visual inspection AI trained to detect particulates in injectable products may, over time, recalibrate its sensitivity threshold as it processes new images — potentially missing defects it would previously have caught. Without continuous performance monitoring and drift detection, this change is invisible until a product failure or health authority investigation reveals it. The regulatory consequence of a patient safety event attributed to an unmonitored AI system is existential for a pharmaceutical manufacturer.
Commercial risk from invalidated systems
If a health authority determines during an inspection that your AI system does not meet Annex 22 requirements and demands remediation, the system may need to be taken offline or its outputs quarantined pending revalidation. For a batch release AI or a yield prediction system embedded in production, this is a significant operational disruption. One mid-size European pharmaceutical manufacturer estimated a six-week production impact when an AI system was suspended following an EMA inspection observation in 2024.
Data integrity risk from training data governance failures
FDA 483 observations in 2024–2025 include several cases where the training data used to develop GxP AI systems could not be reconstructed or audited. If you cannot demonstrate the provenance, quality and representativeness of your training data, you cannot demonstrate that your model was built on reliable foundations. This is a data integrity failure — one of the most serious categories of GxP non-compliance.
The AjaCertX Implementation Framework — Seven Steps
Based on our experience supporting pharmaceutical, biotech and medical device organisations through GxP AI validation programmes, the following framework addresses both Annex 11 and Annex 22 requirements in an integrated approach. This is not a sequential checklist — several workstreams run in parallel.
- AI system inventory and risk classification. Identify every AI or ML system operating in your GxP environment. This includes vendor-supplied AI embedded in your MES, LIMS or QMS platforms — not just bespoke systems you have developed. Classify each system by GxP impact using a risk matrix aligned to GAMP 5 AI Supplement guidance. High-risk systems (patient safety, batch release, product quality) require full Annex 22 compliance. Lower-risk systems require proportionate controls.
- Intended use boundary definition. For each system, document the intended use boundaries: the data types, ranges, contexts and decision types for which the system was trained and validated. Establish what constitutes out-of-boundary operation and how it is detected and managed.
- Training data audit and governance programme. Audit the training datasets used to develop each system. Document provenance, preprocessing decisions, data quality assessment and bias analysis. Establish a training data governance programme for any future model development or retraining. This is the most commonly missing element in AI validation packages.
- Model validation protocol development. Develop a model validation protocol that extends your existing IQ/OQ/PQ framework with model-specific elements: performance metrics, acceptance criteria, independent test dataset selection, and overfitting assessment methodology. The validation protocol must be approved before testing begins.
- Continuous monitoring programme design. Design and implement the continuous monitoring programme: what metrics are monitored, at what frequency, against what thresholds, with what alert and escalation procedures. This programme must be operational before the system is deployed in GxP use.
- Explainability and human oversight documentation. Document the human oversight model: what decisions require human review, what information is provided to the reviewer, how the reviewer can override or escalate, and how override decisions are recorded and trended.
- Change control procedure update. Update your existing change control procedure to include AI system-specific categories: model retraining, fine-tuning, architecture change, training data update, and performance threshold adjustment. Define revalidation requirements for each category.
Realistic Timelines and Costs
Organisations approaching Annex 22 compliance for the first time consistently underestimate the timelines involved. The following estimates are based on actual implementation experience across pharmaceutical, biotech and medical device organisations:
| Organisation type | AI systems in scope | Estimated timeline | Key complexity drivers |
|---|---|---|---|
| Small pharma / biotech | 1–3 systems | 4–8 months | Training data reconstruction, monitoring programme design |
| Mid-size manufacturer | 4–10 systems | 8–18 months | Vendor AI in commercial platforms, change control integration |
| Large multinational | 10+ systems, multi-site | 18–36 months | Site harmonisation, legacy AI systems, regulatory agency alignment |
The single largest cost driver in most programmes is training data reconstruction — the process of auditing and documenting the provenance and quality of data used to train systems that were deployed before Annex 22 governance requirements were understood. For organisations that deployed AI systems in 2020–2023 without formal training data governance, this can be a significant remediation exercise.
Based on our engagements, a comprehensive Annex 11 and Annex 22 compliance programme for a mid-size pharmaceutical manufacturer with five to eight AI systems in scope typically requires between 2,000 and 4,500 specialist days across validation, quality assurance, IT and regulatory functions — depending on the maturity of the existing CSV programme and the degree of training data remediation required.
The Five Most Common Mistakes Organisations Make
- Treating Annex 22 as an IT project, not a quality project. AI governance in GxP environments is a quality assurance responsibility, not an IT project. Organisations that route Annex 22 compliance through their IT function without QA leadership find themselves with technically sophisticated monitoring programmes that do not satisfy regulatory requirements for documented quality oversight.
- Ignoring vendor-supplied AI. Most pharmaceutical organisations have more AI in their GxP environment than they realise — embedded in commercial MES, LIMS, QMS and analytical instrument software. Vendor-supplied AI must be assessed under Annex 22 just as bespoke AI must. Vendor assessment criteria need to be updated to include AI-specific qualification questions.
- Conflating model testing with model validation. Running a model against a test dataset and recording the results is not validation. Validation requires a pre-approved protocol, independent test data, pre-defined acceptance criteria, and documented evidence of protocol execution. Many organisations have performed model testing but not model validation in the regulatory sense.
- Building monitoring without alert procedures. Continuous monitoring is only useful if there are documented procedures for what happens when a metric exceeds its threshold. Organisations frequently build monitoring dashboards without corresponding alert, investigation and escalation SOPs. This satisfies the monitoring requirement but not the governance requirement.
- Underestimating the training data problem. The EMA's draft Annex 22 is explicit that training data quality is a GxP data integrity matter. For AI systems developed before this was understood, the training data may be scattered across development environments, inadequately documented, or in some cases impossible to reconstruct. This is a material compliance gap that organisations need to assess honestly and remediate early.
Decision-Making Checklist — Are You Annex 22 Ready?
Work through each item. If you cannot answer yes with documented evidence, you have a gap to address.
Frequently Asked Questions
How AjaCertX Helps
AjaCertX delivers integrated Annex 11 and Annex 22 compliance programmes for pharmaceutical, biotech, medical device and clinical research organisations. Our team combines deep GxP validation expertise with specialist AI governance capability — the combination that most single-discipline firms cannot offer.
What we deliver
- AI system inventory and risk classification against GAMP 5 AI Supplement and Annex 22 criteria
- Training data audit and governance programme design
- Model validation protocol development and execution support
- Continuous performance monitoring programme design and implementation
- Gap assessment for legacy AI systems deployed before Annex 22 governance requirements
- Change control procedure update for AI-specific change categories
- Vendor assessment criteria development for AI platform providers
- Inspection readiness support including mock inspection preparation
- QA team training on Annex 22 requirements and AI validation methodology
Our approach
We work alongside your validation, QA and IT teams — building internal capability rather than creating dependency on external support. Every engagement is designed to leave your organisation more capable of managing its AI governance programme independently than it was before we arrived.
Proposals are fixed-price, with clear milestones and deliverables. We typically respond with a detailed proposal within 48 hours of an initial conversation.
Speak to a GxP and AI Validation specialist. Detailed proposal within 48 hours.
Conclusion
EU GMP Annex 11 and Annex 22 are not competing frameworks. They are complementary — and in 2026, life science organisations must satisfy both simultaneously for every AI system operating in their GxP environment. The organisations that understand this now, and build integrated validation and governance programmes that address both annexes from the outset, will be inspection-ready when the EMA finalises Annex 22 and enforcement begins in earnest.
The seven requirements introduced by Annex 22 — intended use boundaries, training data governance, model validation protocols, explainability, continuous monitoring, drift detection and AI-specific change control — are not burdensome if they are built into a programme from the start. They become very burdensome when applied retrospectively to systems that were deployed without them.
The question is not whether to build this programme. It is whether to build it now, on your terms, with adequate time — or to build it under inspection pressure, on the health authority's terms, with urgency and consequences.