RAI — GARP Risk and AI Certificate Quick Review

Last revised: July 1, 2026

Fast review for the GARP Risk and AI Certificate (RAI): AI risk, governance, model validation, data, bias, explainability, and GenAI controls.

Quick Orientation for RAI Candidates

This quick review is for candidates preparing for the GARP Risk and AI Certificate (RAI), exam code RAI, offered by GARP. Use it as a final-pass review before moving into independent companion practice, topic drills, mock exams, and detailed explanations.

The exam rewards more than vocabulary. Be ready to apply AI risk concepts to practical scenarios involving governance, data, model development, validation, deployment, monitoring, bias, explainability, operational resilience, and generative AI.

Practical study rule: if you can explain what can go wrong, why it matters, how to detect it, and what control reduces the risk, you are reviewing at the right level.

High-Yield RAI Review Map

Area	What to Know	Common Exam Angle
AI and ML fundamentals	Supervised, unsupervised, reinforcement, generative AI, model training, inference	Match model type to use case and risk
Data risk	Quality, lineage, representativeness, privacy, leakage, bias	Identify flawed data assumptions
Model development	Feature engineering, train/test split, overfitting, hyperparameters	Choose better development practice
Model validation	Independent review, conceptual soundness, performance testing, limitations	Distinguish validation from monitoring
Governance	Accountability, roles, risk appetite, policies, escalation	Identify missing ownership or weak controls
Explainability	Global/local explanations, SHAP, LIME, counterfactuals	Select explanation method by audience and use
Fairness and bias	Disparate outcomes, proxy variables, label bias, fairness trade-offs	Avoid “remove protected variable” trap
Operational risk	Deployment, change management, outages, human override, vendor reliance	Recognize production risk beyond model accuracy
Cyber and adversarial risk	Data poisoning, evasion, prompt injection, model extraction	Select control for attack type
Generative AI	Hallucination, grounding, RAG, guardrails, red teaming, human review	Apply controls to LLM-specific risks
Monitoring	Drift, degradation, threshold breaches, incident response	Know what to monitor and when to retrain
Third-party risk	Vendor models, APIs, cloud, documentation, auditability	Identify due diligence gaps

AI, ML, and Model Types

Core Distinctions

Concept	Meaning	Watch For
Artificial intelligence	Systems performing tasks associated with human intelligence	Broad umbrella; not all AI is machine learning
Machine learning	Models learn patterns from data rather than explicit rules	Data quality and representativeness become central risks
Deep learning	Neural network methods with many layers	Often high performance but less transparent
Generative AI	Produces new text, images, code, or other content	Hallucination, misuse, IP/privacy, prompt risks
Foundation model	Large pretrained model adaptable to many tasks	Broad capability creates broad risk surface
Inference	Applying a trained model to new inputs	Production controls matter here
Training	Estimating model parameters from data	Leakage, bias, overfitting, and compute risk arise here

Model Family Quick Review

Model Type	Typical Use	Strength	Key Risk
Linear/logistic regression	Scoring, classification, interpretable baselines	Transparent, stable	Misses nonlinear relationships
Decision tree	Rules-based classification/regression	Easy to explain	Overfits if unconstrained
Random forest	Ensemble prediction	Robust, strong performance	Less interpretable than single tree
Gradient boosting	Credit, fraud, pricing, risk scoring	High predictive power	Sensitive to tuning; explainability challenge
Neural network	Complex patterns, images, language, nonlinear prediction	Flexible	Opaque, data/compute intensive
Clustering	Segmentation, anomaly grouping	Finds structure without labels	Clusters may lack business meaning
Anomaly detection	Fraud, cyber, operational exceptions	Identifies rare patterns	High false positives if poorly calibrated
Reinforcement learning	Sequential decisions, optimization	Learns from reward feedback	Safety and unintended strategy risk
Large language model	Text generation, summarization, assistants	Flexible natural language capability	Hallucination, prompt injection, data leakage

Supervised vs. Unsupervised vs. Generative

Question	Likely Category
“We have labeled historical outcomes and want to predict a future outcome.”	Supervised learning
“We want to group customers or transactions without labels.”	Unsupervised learning
“We want the model to create text, code, or synthetic content.”	Generative AI
“We want an agent to learn actions over time based on rewards.”	Reinforcement learning

AI Risk Lifecycle

    flowchart LR
	    A[Use Case Definition] --> B[Data Sourcing and Governance]
	    B --> C[Model Development]
	    C --> D[Independent Review and Validation]
	    D --> E[Approval and Deployment]
	    E --> F[Production Monitoring]
	    F --> G[Change Management]
	    G --> H[Retire, Replace, or Revalidate]
	    F --> C

Lifecycle Control Points

Stage	Key Question	Control Focus
Use case definition	Is AI appropriate for the decision?	Materiality, risk appetite, human impact
Data sourcing	Is the data fit for purpose?	Lineage, quality, permissions, representativeness
Development	Is the model designed soundly?	Method selection, feature review, documentation
Validation	Does the model work as intended?	Independent challenge, testing, limitations
Approval	Who accepts the residual risk?	Governance, sign-off, accountability
Deployment	Is implementation faithful and secure?	Access control, testing, rollback plans
Monitoring	Is performance stable over time?	Drift, accuracy, bias, usage, incidents
Change management	What changed and who approved it?	Version control, revalidation triggers
Retirement	Is the model still needed?	Decommissioning, replacement, record retention

Governance and Accountability

Governance Building Blocks

Element	Purpose	Weak Signal
Risk appetite	Defines acceptable risk levels	No escalation when thresholds are breached
Policy and standards	Create consistent minimum expectations	Teams invent their own controls
Roles and responsibilities	Clarify ownership	“The model owns itself” or no named accountable owner
Independent challenge	Tests assumptions and limitations	Development team validates its own work without review
Documentation	Supports auditability and repeatability	Key decisions exist only in emails or code comments
Inventory	Tracks AI systems and materiality	Shadow AI tools used outside approval
Escalation process	Ensures issues reach decision-makers	Monitoring flags ignored
Human oversight	Keeps accountable judgment in the loop	Rubber-stamp review or no override path

Three-Lines-of-Defense Style Thinking

Function	Typical Responsibility	Exam Trap
First line	Owns and operates the AI use case	Cannot outsource accountability to validation or vendor
Second line	Sets risk framework, challenges, oversees	Should not become the model developer
Third line	Independent audit/assurance	Reviews framework effectiveness, not daily tuning

Governance Decision Rules

High materiality + low explainability requires stronger documentation, validation, monitoring, and human oversight.
Customer-impacting decisions require special attention to fairness, transparency, complaint handling, and override processes.
Automated decisions without review raise governance stakes, especially if adverse outcomes are possible.
Vendor-provided AI still needs internal accountability, due diligence, performance monitoring, and exit planning.
GenAI used for advice, summaries, or decisions needs controls for hallucination, grounding, prompt injection, and user misuse.

Data Risk Quick Review

Common Data Risks

Risk	Meaning	Example	Control
Poor quality	Inaccurate, incomplete, inconsistent data	Missing income fields	Data checks, cleansing, reconciliation
Bias	Data reflects historical inequities or sampling issues	Underrepresented borrower group	Bias testing, representative sampling
Leakage	Training data contains future or target information	Using post-default collection status to predict default	Feature review, temporal validation
Drift	Production data changes over time	New customer mix after product launch	Drift monitoring, retraining triggers
Lineage gaps	Unknown origin or transformations	Vendor data field cannot be traced	Data lineage documentation
Privacy exposure	Sensitive information used improperly	PII in prompts or logs	Minimization, masking, access controls
Proxy variables	Non-sensitive variables approximate sensitive traits	ZIP code as socioeconomic proxy	Fairness review, feature testing
Label error	Outcome variable is wrong or biased	Fraud labels based only on detected fraud	Label audit, alternative labels

Data Leakage Traps

Data leakage is one of the most important candidate traps. Look for features that would not be available at the time of decision.

Suspicious Feature	Why It May Leak
Collection outcome used in credit approval model	Outcome occurs after approval
Claim settlement amount used in claim triage at filing	Settlement occurs later
Fraud investigation result used at transaction authorization	Investigation occurs after transaction
Customer churn reason used to predict churn	Reason is known only after churn

Train, Validation, and Test Sets

Dataset	Purpose	Candidate Trap
Training set	Fit model parameters	Do not use it as proof of real-world performance
Validation set	Tune hyperparameters and select model	Repeated tuning can overfit validation data
Test set	Final unbiased performance estimate	Do not tune after looking at test results
Out-of-time sample	Tests temporal stability	Often more realistic for financial data
Production data	Live operating environment	Must be monitored; not the same as test data

Model Development and Performance

Overfitting vs. Underfitting

Pattern	Meaning	Typical Evidence	Response
Overfitting	Learns noise or idiosyncrasies	High training performance, weak test performance	Simplify model, regularize, more data, cross-validation
Underfitting	Too simple to capture signal	Weak training and test performance	Add features, richer model, improve data
Data drift	Relationship changes after deployment	Performance degrades over time	Monitor, recalibrate, retrain
Concept drift	Target relationship changes	Old predictors no longer work	Reassess model design and assumptions

Classification Metrics

Use the confusion matrix language carefully:

Term	Meaning
True positive	Model predicts positive and actual outcome is positive
False positive	Model predicts positive but actual outcome is negative
True negative	Model predicts negative and actual outcome is negative
False negative	Model predicts negative but actual outcome is positive

Key formulas:

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall} = \frac{TP}{TP + FN} \]\[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]

Metric Selection Traps

Situation	Better Focus	Why
Rare fraud events	Precision, recall, PR curve, false positives	Accuracy can look high by predicting “no fraud”
Safety-critical missed events	Recall / false negative rate	Missing positives is costly
Manual review capacity is limited	Precision and alert volume	Too many false positives overwhelm teams
Ranking customers by risk	AUC, lift, calibration, decile analysis	Threshold may be chosen later
Probability used in pricing or capital	Calibration	Ranking alone is insufficient
Highly imbalanced data	Precision-recall metrics	ROC-AUC may appear optimistic

Regression Metrics

Metric	Use	Trap
Mean absolute error	Average absolute prediction error	Easy to interpret but treats all errors linearly
Mean squared error	Penalizes large errors	Sensitive to outliers
Root mean squared error	Error in original units	Still outlier-sensitive
R-squared	Share of variance explained	Can be misleading outside development context

Validation and Independent Challenge

What Validation Should Cover

Validation Area	Questions to Ask
Conceptual soundness	Does the model approach make sense for the use case?
Data suitability	Is the data accurate, representative, and available at decision time?
Implementation	Was the model coded and deployed correctly?
Performance	Does it perform well on relevant samples and segments?
Stability	Does performance hold over time and across conditions?
Explainability	Can users and reviewers understand drivers and limitations?
Fairness	Are outcomes unfairly adverse for groups or segments?
Limitations	Are weaknesses documented and accepted?
Monitoring plan	Are thresholds, owners, and actions defined?

Validation Is Not the Same as Monitoring

Activity	Timing	Purpose
Validation	Before approval and after material change	Assess fitness for intended use
Monitoring	Ongoing after deployment	Detect degradation, drift, misuse, or control failures
Audit	Periodic independent assurance	Assess whether governance and controls work
Development testing	During model build	Improve the model, not independently challenge it

Common Validation Mistakes

Validating only accuracy while ignoring data quality, implementation, fairness, and explainability.
Treating vendor documentation as a substitute for internal review.
Testing on random splits when time-based splits are more appropriate.
Ignoring subgroup performance because aggregate performance looks strong.
Approving a model without defined monitoring thresholds.
Failing to document known limitations and compensating controls.

Explainability and Transparency

Explainability Methods

Method	Best For	Limitation
Feature importance	Global driver overview	May not explain individual decisions
Partial dependence plot	Average effect of a feature	Can mislead when features are correlated
SHAP-style explanation	Local and global contribution analysis	Can be complex and computationally expensive
LIME-style explanation	Local approximation around one prediction	Approximation may be unstable
Counterfactual explanation	“What would need to change?”	Must be realistic and actionable
Surrogate model	Simple model approximates complex model	Approximation is not the true model
Model cards / documentation	Communicate intended use, data, limits	Only useful if accurate and maintained

Explanation Audience Matters

Audience	Needs
Model developer	Technical diagnostics and feature behavior
Validator	Assumptions, limitations, robustness, implementation evidence
Business owner	Decision drivers, risk trade-offs, control implications
Customer or affected party	Clear, actionable reason for outcome where applicable
Senior management	Material risk, residual risk, accountability, escalation

Explainability Traps

A model can be explainable but still biased.
A model can be accurate but not appropriate for a high-impact decision.
Global explanations do not necessarily explain a specific individual outcome.
Removing complex algorithms does not automatically remove risk.
More explanation is not always better; the explanation must be truthful, relevant, and usable.

Fairness, Bias, and Responsible AI

Sources of Bias

Source	Description	Example
Historical bias	Past decisions reflect unfair patterns	Historical lending approvals encode discrimination
Sampling bias	Training data underrepresents a group	Few observations for a region or customer type
Measurement bias	Variables measure groups differently	Inconsistent income verification methods
Label bias	Target variable reflects imperfect process	Fraud labels only for investigated cases
Proxy bias	Neutral-looking variable captures sensitive trait	Location or education as proxy
Deployment bias	Model used differently than intended	Advisory score becomes automatic rejection

Fairness Metrics: Conceptual Differences

Concept	Basic Idea	Trap
Demographic parity	Similar positive prediction rates across groups	May ignore true risk differences
Equal opportunity	Similar true positive rates across groups	Focuses on access to favorable correct outcomes
Equalized odds	Similar true positive and false positive rates	Often hard to satisfy with other goals
Calibration by group	Predicted probabilities mean the same across groups	May conflict with parity metrics
Individual fairness	Similar individuals treated similarly	Requires defining “similar” appropriately

High-Yield Fairness Decision Rules

Do not assume fairness because protected attributes are excluded. Proxies may remain.
Do not assume equal accuracy means equal impact. Error types may differ by group.
Do not assume one fairness metric solves all fairness concerns. Metrics can conflict.
Do test subgroup performance. Aggregate metrics can hide harm.
Do connect fairness findings to governance. Someone must decide, document, and monitor residual risk.

Generative AI and LLM Risk

GenAI Risk Quick Review

Risk	Meaning	Control
Hallucination	Plausible but false output	Grounding, retrieval, citations, human review
Prompt injection	User manipulates model instructions	Input filtering, instruction hierarchy, sandboxing
Data leakage	Sensitive data exposed in prompts, outputs, or logs	Data minimization, masking, access controls
Toxic or harmful output	Unsafe, biased, or inappropriate generation	Guardrails, moderation, red teaming
Model misuse	Users rely on outputs beyond intended use	Usage policy, training, disclaimers, monitoring
Overreliance	Human accepts output without review	Human-in-the-loop, confidence indicators
Model drift/version change	Provider updates affect behavior	Version tracking, regression testing
Retrieval error	RAG system retrieves wrong or stale context	Curated knowledge base, freshness controls
Automation bias	Users defer to AI recommendation	Review requirements, challenge prompts

RAG and Grounding

Retrieval-augmented generation, or RAG, connects a generative model to external documents or databases. It can reduce hallucination, but it does not eliminate risk.

RAG Component	Risk
Source documents	May be stale, wrong, or unauthorized
Retrieval ranking	May retrieve irrelevant context
Prompt assembly	May expose sensitive data
Generation	May distort retrieved content
Citation output	May cite sources incorrectly
User interface	May encourage overtrust

GenAI Control Stack

Layer	Examples
Use-case control	Approved use cases, prohibited uses, risk tiering
Data control	No sensitive data in prompts unless authorized and protected
Prompt control	Templates, system instructions, prompt injection defenses
Model control	Approved models, versioning, performance testing
Output control	Human review, moderation, citations, confidence warnings
Access control	Role-based access, logging, authentication
Monitoring	Usage analytics, incidents, quality samples, abuse detection
Incident response	Escalation, containment, user notification process where applicable

Cyber, Operational, and Third-Party AI Risk

Adversarial and Cyber Risks

Attack / Risk	Description	Likely Control
Data poisoning	Training data manipulated	Data provenance, anomaly checks, trusted sources
Evasion attack	Inputs crafted to avoid detection	Robust testing, adversarial testing
Model extraction	Attacker replicates model through queries	Rate limits, monitoring, API controls
Membership inference	Attacker infers whether data was in training set	Privacy controls, differential privacy where appropriate
Prompt injection	Malicious instructions override intended behavior	Prompt defenses, tool-use restrictions
Jailbreak	User bypasses safety constraints	Red teaming, guardrails, monitoring
Supply chain risk	Dependency, model, or library compromised	Vendor review, dependency management

Operational Risk Questions

Ask these for any AI deployment:

Who can access the system?
What happens if the model fails or becomes unavailable?
Can humans override the model?
Are overrides tracked and reviewed?
Is there a rollback plan?
Are model versions controlled?
Are inputs and outputs logged appropriately?
Are incidents escalated?
Are users trained on limitations?
Is the model being used only for its approved purpose?

Third-Party and Vendor AI

Due Diligence Area	What to Review
Intended use	Is the vendor solution appropriate for the business decision?
Data usage	What data is sent, stored, retained, or used for training?
Model transparency	What documentation, limitations, and testing evidence are available?
Security	Access controls, encryption, incident procedures
Resilience	Availability, service continuity, fallback options
Change management	How are updates communicated and tested?
Audit rights	Can the organization obtain needed assurance?
Exit strategy	Can the organization replace the service if needed?

Monitoring, Drift, and Ongoing Control

What to Monitor

Monitoring Area	Examples
Input data	Missing values, distributions, outliers, population shifts
Output data	Score distributions, approval rates, alert volume
Performance	Accuracy, recall, precision, error rates, calibration
Fairness	Group outcomes, error rates, adverse impact indicators
Stability	Drift, volatility, threshold breaches
Usage	Approved vs. actual use, user behavior, overrides
Operations	Latency, availability, failures, incidents
GenAI quality	Hallucination samples, unsafe outputs, user feedback
Security	Suspicious queries, abuse, unauthorized access

Drift Types

Drift Type	Meaning	Example
Data drift	Input distribution changes	New customer segment uses product
Concept drift	Relationship between inputs and target changes	Fraud patterns evolve
Prediction drift	Output distribution changes	Sudden spike in high-risk scores
Performance drift	Actual model quality declines	Recall falls after market change

Revalidation Triggers

Material model change.
New data source or feature set.
New use case or user population.
Significant performance degradation.
Significant drift or threshold breach.
Vendor model update.
Change in operating environment.
Incident, complaint trend, or unexpected harm.
Regulatory, policy, or governance framework change where relevant.

Risk Assessment and Control Thinking

Inherent vs. Residual Risk

Term	Meaning
Inherent risk	Risk before controls
Control	Measure designed to prevent, detect, or correct risk
Residual risk	Risk remaining after controls
Risk appetite	Level of risk the organization is willing to accept
Risk tolerance	Specific thresholds or limits supporting appetite

Control Types

Control Type	Purpose	Example
Preventive	Stop issue before it occurs	Access restriction, approved feature list
Detective	Identify issue after or during occurrence	Drift monitoring, exception reports
Corrective	Fix or mitigate issue	Rollback, retraining, incident remediation
Compensating	Reduce risk when primary control is imperfect	Human review for low-explainability model

Control Matching Drill

If the Problem Is…	A Stronger Control Is Usually…
Unclear accountability	Named model owner and governance approval
Poor data lineage	Data documentation and lineage controls
Overfitting	Out-of-sample testing, regularization, simpler model
Biased outcomes	Fairness testing, feature review, governance decision
Hallucination	RAG, human review, output verification
Prompt injection	Input filtering, tool restrictions, red teaming
Vendor opacity	Due diligence, monitoring, contractual assurance
Drift	Monitoring thresholds and retraining process
Overreliance	Human-in-the-loop and user training
Unapproved usage	Access control and usage monitoring

Scenario Decision Rules

Choose the Best Answer by Asking

What is the primary risk? Data, model, governance, fairness, cyber, operational, or third-party?
Where in the lifecycle is the issue? Development, validation, deployment, monitoring, or change?
Is the control preventive, detective, or corrective?
Who should own the action? Developer, business owner, risk function, validator, audit, vendor manager?
Is the proposed action sufficient for materiality?
Does the answer confuse performance with governance?
Does the answer rely on a simplistic fix, such as “remove the variable” or “use a more accurate model”?

Common “Best Answer” Patterns

Scenario	Strong Answer
Model performs well overall but poorly for one group	Investigate subgroup performance, fairness, data quality, and mitigation
New vendor AI tool is proposed	Conduct due diligence, assess data/security/model risk, define monitoring
LLM gives confident false answers	Add grounding, verification, human review, and monitoring
Model accuracy declines after launch	Investigate drift, data changes, implementation, and retraining triggers
Business wants to bypass validation to meet deadline	Escalate governance issue; do not skip independent review for material models
Feature is highly predictive but may be a proxy	Test for proxy effects and fairness implications
Model is used for a new decision	Reassess intended use, materiality, validation, and approval
Users ignore model limitations	Improve training, interface controls, oversight, and documentation

Candidate Mistakes to Avoid

Equating AI risk management with model accuracy only.
Forgetting that data problems can dominate model problems.
Treating explainability as the same thing as fairness.
Assuming a black-box model is unacceptable in all cases.
Assuming a simple model is automatically low risk.
Ignoring human process risk around the model.
Overlooking production implementation and monitoring.
Thinking vendor AI removes internal responsibility.
Choosing a technical fix when the scenario is actually a governance failure.
Choosing a governance policy when the scenario needs a specific operational control.
Ignoring materiality: higher-impact use cases require stronger control.
Relying on aggregate metrics without segment analysis.
Treating GenAI outputs as reliable because they sound confident.
Forgetting that monitoring must have thresholds, owners, and escalation.

Fast Final Review Checklist

Before you move into topic drills or a mock exam, make sure you can answer these quickly:

Can you distinguish supervised, unsupervised, reinforcement, and generative AI?
Can you identify data leakage in a scenario?
Can you choose the right metric for imbalanced classification?
Can you explain overfitting and how to reduce it?
Can you separate development testing, validation, monitoring, and audit?
Can you identify fairness risks even when protected variables are removed?
Can you match explainability tools to the audience and decision type?
Can you name practical controls for hallucination and prompt injection?
Can you explain why vendor AI still requires internal oversight?
Can you identify when revalidation is needed?
Can you connect materiality to stronger governance?
Can you distinguish inherent risk, controls, and residual risk?

How to Use Practice Questions After This Review

Use this Quick Review as a bridge into original practice questions. For the GARP Risk and AI Certificate (RAI), efficient practice should include:

Topic drills for data risk, validation, fairness, explainability, GenAI, and governance.
Scenario questions that require selecting the best control, not just defining terms.
Mock exams to practice pacing and mixed-topic recognition.
Detailed explanations to understand why tempting answers are incomplete or misaligned.

Next step: work through independent companion practice questions by topic, review every explanation, and keep a short error log of missed concepts, especially around data leakage, model monitoring, fairness trade-offs, vendor risk, and generative AI controls.

Study Plan