HomeResourcesArticlesLife Science
Article · 10 min read

EU GMP Annex 11 vs Annex 22: What Life Science Organisations Must Understand About Computerised Systems and AI in 2026

Two regulatory frameworks. One validation programme. Most pharmaceutical, biotech and medical device organisations are not ready for what the intersection of Annex 11 and the new Annex 22 actually demands — and the gap is closing fast.

Published 12 May 2026 Updated May 2026 Life Science · AI Governance GxP Annex 22 AI Validation
Executive Summary

EU GMP Annex 11 has governed pharmaceutical computerised systems for over two decades. EU GMP Annex 22, published in draft by the EMA in 2023 and moving toward implementation in 2026, extends the framework specifically to artificial intelligence and machine learning systems used in GxP decision-making. The two annexes are not alternatives — they apply simultaneously. Organisations that treat Annex 22 as a standalone AI policy exercise, separate from their existing CSV programme, will fail inspection. This article explains exactly what each framework requires, where they overlap, and what quality and compliance leaders must do right now.

68%of FDA inspections with major findings include at least one data integrity observation (2024–2025)
2026Expected year of EU GMP Annex 22 finalisation and enforcement commencement
40%+of pharma manufacturing sites now deploying AI or ML in at least one GxP-regulated process

Why This Matters in 2026

The pharmaceutical and life science sector is undergoing the fastest technology transition in its regulatory history. AI systems are now embedded in batch release decisions, visual inspection lines, pharmacovigilance signal detection, clinical data review, and quality control laboratory analysis. The technology has outpaced the regulatory framework — until now.

EU GMP Annex 22 changes that. For the first time, the EMA has produced a GxP-specific regulatory framework that directly addresses artificial intelligence and machine learning: how to validate them, how to govern them through their operational lifecycle, and critically, what happens when they change — because machine learning models do change, even without human intervention.

For Quality Heads, Validation Managers and CTOs in life science organisations, 2026 is the year this moves from "watch brief" to "implementation mandate." Health authorities are already asking about AI governance in pre-approval inspections. Several FDA 483 observations in 2024–2025 specifically cited inadequate AI system oversight in GxP contexts. The regulatory community has signalled its direction clearly.

The organisations that understand both Annex 11 and Annex 22 — and build validation programmes that satisfy both simultaneously — will be inspection-ready. Those that treat them as separate projects will have gaps that regulators will find.

What Annex 11 Actually Requires — and Where It Falls Short for AI

EU GMP Annex 11 has been the cornerstone of pharmaceutical computerised system validation since its original publication and its most recent significant revision in 2011. It applies to all computerised systems used in GxP-regulated activities — manufacturing execution systems (MES), laboratory information management systems (LIMS), quality management systems (QMS), clinical data management platforms, and any software that generates, modifies, maintains, archives, retrieves or transmits electronic data used in GxP decisions.

What Annex 11 covers well

Annex 11 establishes clear requirements across the full computerised system lifecycle: risk-based validation, supplier assessment, user requirements, electronic records integrity, audit trails, access controls, disaster recovery, and the management of legacy systems. For deterministic software — systems that behave identically given the same inputs — these requirements are well-understood and implementable through established GAMP 5 methodology.

Where Annex 11 was never designed to go

The fundamental assumption embedded in Annex 11 is that software behaviour is predictable, testable and stable. A LIMS system that generates the same results given the same inputs, every time, can be validated through installation, operational and performance qualification (IQ/OQ/PQ) and then managed through change control. If the software changes, you run change control, revalidate the affected functions, and document the outcome.

Machine learning systems do not work this way. A model trained on historical batch data to predict yield may, over time, shift its internal decision-making as it processes new production data — without any human-initiated change, without any software update, without triggering your change control procedure. The model has not malfunctioned. It has learned. But the basis on which it was validated may no longer accurately reflect how it is making decisions.

Annex 11 was designed for software that behaves identically given the same inputs. Machine learning models can change their behaviour without anyone changing a single line of code. That is not a defect — it is the point of the technology. But it is a validation problem that Annex 11 was never built to solve.

AjaCertX GxP & AI Validation Practice

This is not a theoretical concern. In 2024, an FDA investigator asked a pharmaceutical manufacturer to demonstrate how they would detect if their AI-based visual inspection system had shifted its classification boundary over time. The site could not answer the question. The observation was cited. The system had an Annex 11-compliant validation package. It did not have an Annex 22-compliant governance programme.

What Annex 22 Introduces — The Seven Core Requirements

EU GMP Annex 22 (draft, 2023 — implementation expected 2026) does not replace Annex 11. It extends it. Every AI system in a GxP environment must satisfy both frameworks. Annex 22 introduces seven requirements that have no direct equivalent in Annex 11:

  1. Intended use boundaries. Before deployment, the organisation must define the operational envelope within which the AI system's outputs can be trusted. This includes the data types, ranges and contexts for which the model was trained and validated. Outputs generated outside these boundaries are out-of-specification and must be treated accordingly.
  2. Training data governance. The quality, provenance, representativeness and integrity of training data must be documented and controlled. This includes data versioning, bias assessment, and records of data preprocessing decisions. This is ALCOA+ applied to machine learning training datasets.
  3. Model validation protocol. Beyond traditional IQ/OQ/PQ, Annex 22 requires a model-specific validation protocol that includes performance metrics, acceptance criteria, and test datasets that are genuinely independent of training data. Overfitting must be assessed and controlled.
  4. Explainability and human oversight. For GxP-critical decisions, the AI system must support human review. The nature and extent of human oversight must be documented and justified based on risk. Black-box decisions affecting patient safety require higher levels of explainability and human intervention capability.
  5. Continuous performance monitoring. The model must be monitored throughout its operational life against the performance criteria established at validation. This is not periodic review — it is continuous monitoring with defined alert thresholds and escalation procedures.
  6. Model drift detection and management. Organisations must define what constitutes unacceptable drift from validated performance, and establish procedures for detecting it, responding to it, and revalidating when thresholds are exceeded.
  7. Change control for model updates. Any intentional update to the model — retraining, fine-tuning, architecture change — must go through a documented change control process with revalidation requirements proportionate to the change risk.

Annex 11 vs Annex 22 — Side by Side

Requirement Annex 11 Annex 22
System scopeAll GxP computerised systemsAI/ML systems in GxP contexts specifically
Validation approachIQ / OQ / PQ, risk-basedIQ/OQ/PQ + model validation protocol, performance metrics, independent test sets
Change controlSoftware change → revalidation of affected functionsModel update (including retraining) → documented change control + proportionate revalidation
Ongoing monitoringPeriodic review of validated statusContinuous performance monitoring with drift detection and alert thresholds
Data governanceElectronic records, audit trails (ALCOA+)ALCOA+ for operational data AND training data provenance, versioning, bias assessment
Human oversightAccess controls, user authorisationExplainability, human review capability, oversight requirements proportionate to risk
Intended useURS defines intended useExplicit intended use boundaries defining valid operating envelope
Supplier assessmentRequired for all suppliersRequired + AI-specific supplier audit criteria for algorithm developers

The Risks of Getting This Wrong

Regulatory Risk

Health authority inspectors in the EU, UK and US are now specifically trained to assess AI governance in GxP environments. An AI system with an Annex 11-compliant validation package but no Annex 22 governance programme is a major finding waiting to happen — regardless of how technically sophisticated the system is.

Beyond the regulatory exposure, there are three categories of operational risk that Quality Heads and CTOs need to understand:

Patient safety risk from undetected model drift

A visual inspection AI trained to detect particulates in injectable products may, over time, recalibrate its sensitivity threshold as it processes new images — potentially missing defects it would previously have caught. Without continuous performance monitoring and drift detection, this change is invisible until a product failure or health authority investigation reveals it. The regulatory consequence of a patient safety event attributed to an unmonitored AI system is existential for a pharmaceutical manufacturer.

Commercial risk from invalidated systems

If a health authority determines during an inspection that your AI system does not meet Annex 22 requirements and demands remediation, the system may need to be taken offline or its outputs quarantined pending revalidation. For a batch release AI or a yield prediction system embedded in production, this is a significant operational disruption. One mid-size European pharmaceutical manufacturer estimated a six-week production impact when an AI system was suspended following an EMA inspection observation in 2024.

Data integrity risk from training data governance failures

FDA 483 observations in 2024–2025 include several cases where the training data used to develop GxP AI systems could not be reconstructed or audited. If you cannot demonstrate the provenance, quality and representativeness of your training data, you cannot demonstrate that your model was built on reliable foundations. This is a data integrity failure — one of the most serious categories of GxP non-compliance.

The AjaCertX Implementation Framework — Seven Steps

Based on our experience supporting pharmaceutical, biotech and medical device organisations through GxP AI validation programmes, the following framework addresses both Annex 11 and Annex 22 requirements in an integrated approach. This is not a sequential checklist — several workstreams run in parallel.

  1. AI system inventory and risk classification. Identify every AI or ML system operating in your GxP environment. This includes vendor-supplied AI embedded in your MES, LIMS or QMS platforms — not just bespoke systems you have developed. Classify each system by GxP impact using a risk matrix aligned to GAMP 5 AI Supplement guidance. High-risk systems (patient safety, batch release, product quality) require full Annex 22 compliance. Lower-risk systems require proportionate controls.
  2. Intended use boundary definition. For each system, document the intended use boundaries: the data types, ranges, contexts and decision types for which the system was trained and validated. Establish what constitutes out-of-boundary operation and how it is detected and managed.
  3. Training data audit and governance programme. Audit the training datasets used to develop each system. Document provenance, preprocessing decisions, data quality assessment and bias analysis. Establish a training data governance programme for any future model development or retraining. This is the most commonly missing element in AI validation packages.
  4. Model validation protocol development. Develop a model validation protocol that extends your existing IQ/OQ/PQ framework with model-specific elements: performance metrics, acceptance criteria, independent test dataset selection, and overfitting assessment methodology. The validation protocol must be approved before testing begins.
  5. Continuous monitoring programme design. Design and implement the continuous monitoring programme: what metrics are monitored, at what frequency, against what thresholds, with what alert and escalation procedures. This programme must be operational before the system is deployed in GxP use.
  6. Explainability and human oversight documentation. Document the human oversight model: what decisions require human review, what information is provided to the reviewer, how the reviewer can override or escalate, and how override decisions are recorded and trended.
  7. Change control procedure update. Update your existing change control procedure to include AI system-specific categories: model retraining, fine-tuning, architecture change, training data update, and performance threshold adjustment. Define revalidation requirements for each category.

Realistic Timelines and Costs

Organisations approaching Annex 22 compliance for the first time consistently underestimate the timelines involved. The following estimates are based on actual implementation experience across pharmaceutical, biotech and medical device organisations:

Organisation typeAI systems in scopeEstimated timelineKey complexity drivers
Small pharma / biotech1–3 systems4–8 monthsTraining data reconstruction, monitoring programme design
Mid-size manufacturer4–10 systems8–18 monthsVendor AI in commercial platforms, change control integration
Large multinational10+ systems, multi-site18–36 monthsSite harmonisation, legacy AI systems, regulatory agency alignment

The single largest cost driver in most programmes is training data reconstruction — the process of auditing and documenting the provenance and quality of data used to train systems that were deployed before Annex 22 governance requirements were understood. For organisations that deployed AI systems in 2020–2023 without formal training data governance, this can be a significant remediation exercise.

Cost Benchmark

Based on our engagements, a comprehensive Annex 11 and Annex 22 compliance programme for a mid-size pharmaceutical manufacturer with five to eight AI systems in scope typically requires between 2,000 and 4,500 specialist days across validation, quality assurance, IT and regulatory functions — depending on the maturity of the existing CSV programme and the degree of training data remediation required.

The Five Most Common Mistakes Organisations Make

  1. Treating Annex 22 as an IT project, not a quality project. AI governance in GxP environments is a quality assurance responsibility, not an IT project. Organisations that route Annex 22 compliance through their IT function without QA leadership find themselves with technically sophisticated monitoring programmes that do not satisfy regulatory requirements for documented quality oversight.
  2. Ignoring vendor-supplied AI. Most pharmaceutical organisations have more AI in their GxP environment than they realise — embedded in commercial MES, LIMS, QMS and analytical instrument software. Vendor-supplied AI must be assessed under Annex 22 just as bespoke AI must. Vendor assessment criteria need to be updated to include AI-specific qualification questions.
  3. Conflating model testing with model validation. Running a model against a test dataset and recording the results is not validation. Validation requires a pre-approved protocol, independent test data, pre-defined acceptance criteria, and documented evidence of protocol execution. Many organisations have performed model testing but not model validation in the regulatory sense.
  4. Building monitoring without alert procedures. Continuous monitoring is only useful if there are documented procedures for what happens when a metric exceeds its threshold. Organisations frequently build monitoring dashboards without corresponding alert, investigation and escalation SOPs. This satisfies the monitoring requirement but not the governance requirement.
  5. Underestimating the training data problem. The EMA's draft Annex 22 is explicit that training data quality is a GxP data integrity matter. For AI systems developed before this was understood, the training data may be scattered across development environments, inadequately documented, or in some cases impossible to reconstruct. This is a material compliance gap that organisations need to assess honestly and remediate early.

Decision-Making Checklist — Are You Annex 22 Ready?

Work through each item. If you cannot answer yes with documented evidence, you have a gap to address.

Annex 22 Readiness Checklist
We have a complete inventory of AI and ML systems operating in our GxP environment, including vendor-supplied AI
Each AI system has a documented risk classification and a defined scope of Annex 22 requirements proportionate to that risk
The training data for each system has been audited for provenance, quality and representativeness
Each system has a validated intended use boundary and an out-of-boundary detection procedure
Model validation protocols with pre-defined acceptance criteria and independent test datasets exist for each high-risk system
A continuous performance monitoring programme is operational for each system, with documented alert thresholds
Human oversight requirements are documented and justified based on the risk classification of each system
Our change control procedure covers AI-specific change categories including model retraining and fine-tuning
Our vendor assessment process includes AI-specific qualification criteria for algorithm developers and platform providers
Our QA team has received training on Annex 22 requirements and can assess AI validation packages against them

Frequently Asked Questions

Does Annex 22 apply to AI embedded in commercial software platforms we purchase from vendors?
Yes. If a commercial platform — your MES, LIMS, QMS or analytical instrument software — contains an AI component that makes or contributes to GxP decisions, that component is in scope for Annex 22. This includes AI-powered anomaly detection in process control systems, predictive quality algorithms in MES platforms, and AI-assisted data review tools in clinical systems. You cannot delegate Annex 22 compliance to the vendor — you must assess and document it. Your vendor assessment process should include questions specifically about how the vendor manages AI governance, model validation, drift monitoring and change control for AI components in their platforms.
We deployed several AI systems before Annex 22 was published. Do we need to revalidate them?
Almost certainly yes — at least partially. Systems deployed before Annex 22 was understood likely have gaps in training data documentation, performance monitoring and human oversight documentation. The approach to remediation should be risk-based: start with your highest-risk systems (batch release, visual inspection, clinical decision support) and work through a gap assessment against the seven Annex 22 requirements. Where gaps exist, a remediation plan with documented evidence of the remediation is required. Full revalidation from scratch is rarely necessary — gap closure with supplementary documentation is usually achievable for systems where the original validation was otherwise sound.
How does the EU AI Act interact with Annex 22?
The EU AI Act and EU GMP Annex 22 are separate regulatory frameworks with different legal bases, different enforcement authorities and different timelines — but they overlap significantly for life science organisations. AI systems used in medical devices, in vitro diagnostic devices, and clinical decision support are explicitly classified as high-risk under the EU AI Act, which means they are subject to both Annex 22 (health authority inspection) and EU AI Act conformity assessment (notified body). Building a governance programme that satisfies both simultaneously — rather than managing them as separate compliance workstreams — is significantly more efficient. We have published a detailed analysis of the overlap and divergence between the two frameworks.
What is the EMA's current inspection posture on Annex 22?
The EMA has been conducting pre-approval inspections and routine GMP inspections with Annex 22 questions since 2024, even ahead of the final publication of the annex. Inspectors are asking about AI system inventories, validation documentation, and monitoring programmes. The posture is currently educational rather than punitive — inspectors are sharing observations rather than issuing formal non-compliance statements in most cases. However, this is expected to change following finalisation of the annex. Organisations that have demonstrable programmes in place — even if not fully complete — are being treated more favourably than those with no evidence of awareness or activity.
How long should a model validation protocol take to execute?
This depends heavily on the complexity of the model, the availability of independent test data and the number of performance metrics being assessed. For a relatively straightforward classification model — a visual inspection system with binary pass/fail outputs — protocol execution typically takes four to eight weeks, with an additional two to four weeks for report writing, review and approval. More complex models, particularly those with multi-dimensional outputs or those requiring statistical analysis of large test datasets, can take three to six months. Training data reconstruction, if required, adds significantly to the timeline and is better addressed as a separate upstream workstream before the validation protocol begins.

How AjaCertX Helps

AjaCertX delivers integrated Annex 11 and Annex 22 compliance programmes for pharmaceutical, biotech, medical device and clinical research organisations. Our team combines deep GxP validation expertise with specialist AI governance capability — the combination that most single-discipline firms cannot offer.

What we deliver

  • AI system inventory and risk classification against GAMP 5 AI Supplement and Annex 22 criteria
  • Training data audit and governance programme design
  • Model validation protocol development and execution support
  • Continuous performance monitoring programme design and implementation
  • Gap assessment for legacy AI systems deployed before Annex 22 governance requirements
  • Change control procedure update for AI-specific change categories
  • Vendor assessment criteria development for AI platform providers
  • Inspection readiness support including mock inspection preparation
  • QA team training on Annex 22 requirements and AI validation methodology

Our approach

We work alongside your validation, QA and IT teams — building internal capability rather than creating dependency on external support. Every engagement is designed to leave your organisation more capable of managing its AI governance programme independently than it was before we arrived.

Proposals are fixed-price, with clear milestones and deliverables. We typically respond with a detailed proposal within 48 hours of an initial conversation.

Ready to assess your Annex 11 and Annex 22 compliance position?

Speak to a GxP and AI Validation specialist. Detailed proposal within 48 hours.

Conclusion

EU GMP Annex 11 and Annex 22 are not competing frameworks. They are complementary — and in 2026, life science organisations must satisfy both simultaneously for every AI system operating in their GxP environment. The organisations that understand this now, and build integrated validation and governance programmes that address both annexes from the outset, will be inspection-ready when the EMA finalises Annex 22 and enforcement begins in earnest.

The seven requirements introduced by Annex 22 — intended use boundaries, training data governance, model validation protocols, explainability, continuous monitoring, drift detection and AI-specific change control — are not burdensome if they are built into a programme from the start. They become very burdensome when applied retrospectively to systems that were deployed without them.

The question is not whether to build this programme. It is whether to build it now, on your terms, with adequate time — or to build it under inspection pressure, on the health authority's terms, with urgency and consequences.

About AjaCertX
AjaCertX is a specialist compliance, certification and assurance partner serving life science, technology, manufacturing and regulated industries across global markets. Our GxP and AI Validation practice combines deep regulatory expertise with practical implementation experience across FDA, EMA, MHRA and TGA jurisdictions. We work alongside quality and compliance teams to build systems that satisfy regulators — and keep satisfying them.
WhatsApp Connect