HomeResourcesGuides › Life Science
Practical Guide · 14 pages · Free

GAMP 5 AI Gap Report: The 6 Most Common Validation Failures for AI Systems in GxP Environments

Most GxP AI systems deployed between 2020 and 2024 have at least two of these six validation gaps. Inspectors are now specifically trained to look for them. This guide identifies each gap, explains why it arises, and provides the remediation pathway.

Published May 2026·Life Science·GAMP 5 AI Validation GxP Annex 22

Why GAMP 5 AI Validation Is Different

GAMP 5 has been the primary validation methodology for GxP computerised systems since its first edition in 1994. Its category-based risk classification, lifecycle approach, and IQ/OQ/PQ validation framework have guided thousands of system validations across pharmaceutical, biotech and medical device organisations. For conventional software, the framework works well.

AI and machine learning systems challenge several GAMP 5 assumptions. GAMP 5 was designed for deterministic software: given the same inputs, the system produces the same outputs. AI systems — particularly those that learn from operational data — may not meet this condition. A model that was validated against historical training data may behave differently as it processes new operational data over time. The GAMP 5 AI Supplement, published in 2022, extends the GAMP 5 framework to address these characteristics. The six validation gaps in this guide are the most common failures to apply this extended framework correctly.

Access the complete guide
All 14 pages — practical implementation guidance, checklists and templates. Free, instant access.
No spam. No sales calls. AjaCertX will email you a copy for reference.
Guide unlocked ✓
A copy has been sent to your email for reference.

The Six GAMP 5 AI Validation Failures

Gap 1 — Intended use boundaries not defined

The single most common GAMP 5 AI validation gap. Intended use boundaries define the data types, ranges, contexts and decision types for which the AI model was trained and validated. Without defined boundaries, there is no basis for determining when the system is operating within validated conditions — and no mechanism for detecting when it is not. Remediation: document intended use boundaries in the user requirements specification and validation summary report, and implement monitoring that detects out-of-boundary operation.

Gap 2 — Training data not governed as GxP data

Training data is the foundation on which model behaviour is built. If training data quality is not assessed, provenance is not documented, and data preprocessing decisions are not recorded, the validation package cannot demonstrate that the model was built on reliable foundations. This is a data integrity failure. Remediation: apply ALCOA+ principles to training data — document provenance, preprocessing methodology, quality assessment criteria, and data version control. Retain training datasets as GxP records.

Gap 3 — Test datasets not genuinely independent of training data

A model tested against data that was used in training will appear more capable than it is. Test datasets must be selected before training begins, must not overlap with training data, and must represent the operational conditions the model will face — not just the conditions it was trained on. Remediation: document test dataset selection methodology in the validation protocol, confirm independence from training data, and include operational edge cases that were not well-represented in training.

Gap 4 — Performance metrics not defined before testing

Acceptance criteria for AI systems must be defined before validation testing begins — not selected after observing results. Post-hoc selection of metrics that the model happens to meet is not validation. Remediation: define primary performance metrics, acceptance thresholds, and the minimum test dataset size required to demonstrate statistical confidence in the metrics before the validation protocol is executed.

Gap 5 — No continuous performance monitoring programme

AI systems that are not continuously monitored may drift from their validated performance without detection. A model that was validated with 97% accuracy may, over time, degrade to 89% — below the acceptance criterion established at validation — without any system-generated alert. Remediation: implement continuous monitoring for primary performance metrics, define alert thresholds, establish escalation procedures for threshold breaches, and integrate monitoring into the change control procedure as a trigger for revalidation assessment.

Gap 6 — Change control not extended to AI-specific changes

Standard software change control was designed for intentional, human-initiated changes. AI systems can change their behaviour through mechanisms that traditional change control does not capture: model retraining, fine-tuning, changes to training data, and in some architectures, continuous learning from operational data. Remediation: extend your change control procedure with AI-specific change categories and define the revalidation requirements for each category.

Self-Assessment — Where Is Your AI System?

GAMP 5 AI Validation Gap Assessment
Intended use boundaries are documented in the validation package — specifying data types, ranges, contexts and decision types
Training data governance documentation demonstrates ALCOA+ compliance — provenance, preprocessing, quality assessment, version control
Test datasets were selected before training began and documented independence from training data is confirmed
Primary performance metrics and acceptance thresholds were defined in the validation protocol before testing
A continuous performance monitoring programme is operational with documented alert thresholds and escalation procedures
Change control procedure has been extended with AI-specific change categories and revalidation requirements

Remediation Priority Framework

GapRisk LevelTypical Remediation TimelineInspection Consequence if Unaddressed
Intended use boundariesCritical2–4 weeksMajor finding — caps element capability assessment
Training data governanceCritical4–12 weeks (depends on data reconstruction complexity)Critical data integrity finding
Independent test datasetsHigh2–6 weeksMajor finding — validation package rejected
Pre-defined acceptance criteriaHigh1–2 weeks if model not yet deployedMajor finding for new systems; observation for legacy
Continuous monitoringHigh4–8 weeksMajor finding — system may be considered uncontrolled
AI change controlMedium2–4 weeksObservation — potential major if changes have been made without assessment
Assessing your GxP AI validation programme?

GAMP 5 and AI validation specialists. Gap assessment proposal within 48 hours.

About AjaCertX
AjaCertX is a specialist compliance, certification and assurance partner serving life science organisations globally. Our GAMP 5 and AI Validation practice delivers GxP AI system validation, Annex 22 compliance programmes, and validation gap remediation for pharmaceutical, biotech and medical device organisations.
WhatsAppConnect