RAI — GARP Risk and AI Certificate Exam Blueprint

Last revised: July 1, 2026

Practical exam blueprint for the GARP Risk and AI Certificate (RAI), covering AI risk governance, model lifecycle, validation, data, controls, and finance use cases.

How to Use This Exam Blueprint

This independent checklist is for candidates preparing for the GARP Risk and AI Certificate (RAI) exam from GARP, exam code RAI. Use it as a practical study blueprint: identify the topic areas you can explain, the scenarios you can handle, and the weak spots to revisit before exam day.

Because official weights can change, this page does not assign point values or imply section weightings. Treat every area below as a readiness area for applied AI risk judgment in a finance context.

A strong candidate should be able to:

Explain AI and machine learning concepts in risk-management language.
Identify data, model, operational, compliance, and governance risks across the AI lifecycle.
Evaluate controls for traditional predictive models and newer generative AI systems.
Interpret model performance, fairness, explainability, and monitoring evidence.
Choose appropriate risk responses in finance scenarios involving credit, markets, fraud, operations, compliance, vendors, and client-facing tools.

Topic-area readiness map

Readiness area	What to review	You are ready when you can…	Quick self-test
AI and machine learning foundations	Supervised learning, unsupervised learning, reinforcement learning, generative AI, model training, features, labels, parameters, hyperparameters	Classify an AI use case by model type and explain how the model learns	Is a fraud clustering model supervised or unsupervised? What changes if labels are added?
Finance risk context	Credit risk, market risk, liquidity risk, operational risk, model risk, compliance risk, conduct risk, reputational risk	Connect an AI failure mode to a financial risk consequence	If an AI limit-monitoring tool misses a breach, which risks are implicated?
Data lifecycle and data risk	Data sourcing, lineage, quality, representativeness, missing values, outliers, leakage, privacy, retention	Spot when a model problem is primarily a data problem	What is the risk of training on data that includes post-decision outcomes?
Model development lifecycle	Problem framing, target definition, feature engineering, training, testing, tuning, deployment	Explain how design choices create or reduce risk	Why can a poorly defined target produce a high-performing but unsuitable model?
Model evaluation	Accuracy, precision, recall, specificity, F1, ROC/AUC, calibration, MAE, RMSE, overfitting, underfitting	Select evaluation metrics based on the business error cost	In fraud detection, why might recall matter more than accuracy?
Validation and independent challenge	Conceptual soundness, outcomes analysis, benchmark/challenger models, sensitivity testing, stress testing, documentation review	Distinguish model development from validation and challenge	What should a validator question even if back-test results look strong?
Governance and accountability	Model inventory, ownership, approval, escalation, policy exceptions, committees, documentation, audit trails	Identify who should approve, monitor, and challenge AI use	Who owns a vendor model used in a bank decision process?
Explainability and transparency	Global vs local explanations, feature importance, reason codes, model cards, limitations, human interpretability	Match the level of explanation to model risk and use case	What explanation is needed for an adverse credit decision versus a marketing segmentation model?
Bias, fairness, and ethics	Proxy variables, disparate impact, sampling bias, measurement bias, fairness metrics, human oversight	Recognize fairness concerns even when protected attributes are not used directly	Can ZIP code, school, or employment history act as a proxy?
Generative AI and LLM risk	Prompting, hallucination, retrieval, training data exposure, prompt injection, output controls, human review	Identify controls for AI-generated text, code, summaries, and decisions	What controls are needed before using an LLM to draft risk reports?
Cybersecurity and resilience	Access control, adversarial attacks, data poisoning, model extraction, prompt injection, logging, incident response	Explain how AI systems can be attacked or misused	How could a malicious prompt alter an internal chatbot’s output?
Third-party and vendor AI risk	Due diligence, contractual controls, model opacity, service levels, data sharing, audit rights, concentration risk	Evaluate vendor risk without assuming “vendor-owned” means “risk-free”	What evidence should be requested for a black-box scoring model?
Monitoring and change management	Drift, stability, performance decay, threshold changes, retraining, versioning, production controls	Define what should be monitored after deployment	What does a sudden drop in approval rates require you to investigate?
Regulation, compliance, and professional conduct	Governance expectations, documentation, disclosures, privacy, consumer protection, accountability, ethics	Apply principles without relying on a single jurisdiction-specific rule	What makes an AI control defensible to a regulator, auditor, or risk committee?

Core “can you do this?” checklist

Use this section as a fast readiness test. If you cannot do an item without notes, mark it for review.

AI and model concepts

Distinguish AI, machine learning, deep learning, natural language processing, and generative AI.
Explain the difference between supervised, unsupervised, semi-supervised, and reinforcement learning.
Identify examples of classification, regression, clustering, ranking, anomaly detection, recommendation, and text generation.
Explain training, validation, test, and production datasets in plain language.
Describe the role of features, labels, targets, parameters, hyperparameters, loss functions, and thresholds.
Explain overfitting, underfitting, data leakage, concept drift, and model decay.
Describe why a model can be statistically accurate but still unsuitable, unfair, unstable, or noncompliant.
Explain why correlation, prediction, and causation are not the same.

Risk governance and lifecycle controls

Map an AI system from use-case proposal through retirement.
Identify control points before development, before deployment, and after deployment.
Define model owner, business owner, developer, validator, approver, user, and auditor roles.
Explain why high-impact finance use cases need stronger governance than low-impact internal productivity tools.
Identify when a model should be escalated for independent validation.
Recognize when a change is material enough to require reapproval or revalidation.
Describe what belongs in a model inventory.
Explain why documentation is a control, not just an administrative task.

Data and feature risk

Assess whether data is complete, accurate, timely, representative, and fit for purpose.
Identify missing-value, outlier, duplicate, stale-data, and inconsistent-definition risks.
Detect target leakage and training-serving skew.
Explain data lineage from source system to model input.
Identify privacy and confidentiality risks in training, testing, prompting, logging, and output storage.
Recognize proxy variables that may encode sensitive or protected characteristics.
Explain why historical bias can be learned and amplified by an AI system.
Describe controls for sensitive data access, masking, minimization, retention, and deletion.

Model evaluation and validation

Interpret confusion-matrix outcomes: true positives, false positives, true negatives, and false negatives.
Choose between precision, recall, specificity, accuracy, F1, ROC/AUC, calibration, MAE, RMSE, and stability metrics based on the use case.
Explain why class imbalance can make accuracy misleading.
Describe validation tests for conceptual soundness, data quality, performance, sensitivity, stability, and implementation.
Explain benchmark and challenger model analysis.
Identify limitations of back-testing when market regimes, customer behavior, or fraud patterns change.
Distinguish model validation from internal audit, quality assurance, and business sign-off.
Explain how validation findings should be rated, remediated, and tracked.

Explainability, transparency, and accountability

Distinguish global explanations from local explanations.
Explain feature importance, partial dependence, reason codes, and local explanations at a conceptual level.
Identify when explainability is needed for users, customers, validators, auditors, or regulators.
Explain the trade-off between model complexity and interpretability.
Recognize when a simpler model may be preferred because it is more transparent, stable, or controllable.
Describe how documentation, user training, and human review support accountability.
Identify when “the AI said so” is not an acceptable decision rationale.

Generative AI and LLM risk

Explain hallucination, prompt injection, data leakage, insecure output, copyright/intellectual-property risk, and overreliance.
Distinguish open-ended generation from deterministic scoring or classification.
Identify controls for LLM prompts, retrieved context, output review, logging, access, and prohibited use.
Explain why an LLM-generated answer may be fluent but wrong.
Describe retrieval-augmented generation at a conceptual level and its risk benefits and limitations.
Identify when human review is required before using generated text in risk, compliance, client, or regulatory communication.
Recognize that using a third-party LLM can create data, vendor, operational, and compliance risk.

Metrics and calculation readiness

The RAI exam may emphasize applied judgment more than computation, but you should be comfortable interpreting common model metrics. Do not memorize formulas in isolation; know what each metric means and when it can mislead.

Classification metrics

Use these terms consistently:

Term	Meaning	Common risk interpretation
True positive	Model correctly identifies the positive class	Correctly flags fraud, default, breach, or event
False positive	Model flags positive when actual is negative	Unnecessary review, declined good customer, operational burden
True negative	Model correctly identifies the negative class	Correctly clears a normal case
False negative	Model misses an actual positive	Undetected fraud, missed default, missed breach, uncontrolled exposure

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall or Sensitivity} = \frac{TP}{TP + FN} \]\[ \text{Specificity} = \frac{TN}{TN + FP} \]\[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]

Readiness checks:

Explain why high accuracy can be meaningless when the positive class is rare.
Explain the trade-off between false positives and false negatives.
Choose recall-focused controls when missing the event is costly.
Choose precision-focused controls when excessive false alarms are costly.
Explain why changing a classification threshold changes business outcomes.
Connect threshold decisions to risk appetite, capacity, customer impact, and compliance.

Regression and forecasting metrics

Metric	Use	What to watch
MAE	Average absolute error	Easier to interpret in original units
RMSE	Penalizes larger errors more heavily	Sensitive to outliers
R-squared	Share of variance explained in a simple regression context	Can be misleading outside its context or with poor validation
Calibration	Whether predicted probabilities match observed outcomes	Critical when probabilities drive risk decisions
Back-test result	Comparison of predictions with realized outcomes	May fail when conditions change

\[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]\[ MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| \]

Readiness checks:

Explain why a model with lower RMSE may still be less acceptable if errors are concentrated in high-risk segments.
Interpret calibration problems in probability-of-default, fraud-risk, or loss-forecasting settings.
Explain why back-tests should be supplemented by sensitivity, stress, and stability analysis.

Drift and stability checks

Concept	What it means	Scenario cue
Data drift	Input data distribution changes	Customer income, transaction patterns, or market variables shift
Concept drift	Relationship between inputs and target changes	Fraudsters adapt; default drivers change after macro stress
Population stability	Scored population differs from development population	New product, new region, changed underwriting strategy
Performance decay	Model output quality declines over time	More overrides, complaints, losses, or investigation misses

Population Stability Index is one common way to summarize distribution shift:

\[ PSI = \sum_i (Actual_i - Expected_i)\ln\left(\frac{Actual_i}{Expected_i}\right) \]

Readiness checks:

Know that no single drift metric proves a model is safe or unsafe.
Explain why monitoring should include data, performance, overrides, complaints, exceptions, and business outcomes.
Identify when drift requires investigation, threshold adjustment, retraining, fallback controls, or model retirement.

Finance scenario and decision-point checks

Applied scenario table

Scenario cue	Main issue being tested	Better answer direction	Common trap
A credit model has strong overall performance but worse outcomes for a demographic segment	Fairness, segmentation, proxy variables, governance	Investigate bias, proxies, data representativeness, adverse impact, explanations, and remediation	Saying the model is acceptable because protected attributes were excluded
A fraud model flags many legitimate transactions	Precision/recall trade-off, customer impact, operations capacity	Review thresholds, false-positive costs, customer friction, escalation workflow, and monitoring	Optimizing only for maximum fraud capture
A model performs well in development but poorly after launch	Implementation error, drift, training-serving skew, monitoring	Compare production inputs, code versions, thresholds, data definitions, and population changes	Assuming the model was validated once, so the issue is not model risk
A vendor provides a black-box score with limited documentation	Third-party model risk, explainability, accountability	Request methodology, validation evidence, performance by segment, limitations, data controls, and audit rights	Treating vendor secrecy as a reason to skip validation
An LLM summarizes internal risk reports	Hallucination, confidentiality, source grounding, human review	Use approved data sources, retrieval controls, output review, citations where appropriate, and logging	Assuming polished language equals factual accuracy
Staff paste confidential client data into a public AI tool	Privacy, security, data leakage, policy breach	Stop use, contain exposure, investigate, notify internally, strengthen access and training controls	Treating it as only an employee training issue
A trading-support model is retrained after a volatile period	Regime change, validation, market risk, change control	Test performance across regimes, stress assumptions, document changes, approve deployment	Assuming recent data is always more relevant
A customer chatbot gives product guidance	Conduct risk, disclosure, suitability-style concerns, escalation	Limit scope, monitor outputs, provide disclaimers, route complex cases to humans	Letting the chatbot provide unrestricted financial advice
A compliance alert model reduces investigations by suppressing low-risk alerts	False negatives, regulatory exposure, model explainability	Validate suppression logic, review missed-event risk, sample suppressed alerts, monitor outcomes	Celebrating efficiency without checking missed suspicious activity
A model uses social-media-derived features	Data ethics, consent, representativeness, reputational risk	Review legality, fairness, purpose limitation, bias, explainability, and customer expectations	Focusing only on predictive power

Decision prompts to practice

For each prompt, practice giving a complete answer in 60 to 90 seconds.

Use-case classification What decision does the AI system support, who relies on it, and what happens if it is wrong?
Risk materiality Is the use case customer-impacting, financially material, regulatory-facing, safety-critical, or operationally critical?
Data fitness Are the data sources authorized, representative, current, complete, and aligned with the intended decision?
Model fitness Does the model type match the problem, and are its limitations understood by users?
Performance trade-off Which error is more costly: false positive or false negative? Who bears the cost?
Fairness and ethics Could the model create unjustified differences in outcomes across groups or customer segments?
Explainability Can the decision be explained to the appropriate audience: user, customer, validator, auditor, or regulator?
Human oversight Is human review meaningful, trained, documented, and empowered to override the model?
Monitoring What indicators would show the model is degrading, being misused, or operating outside intended conditions?
Response plan If the model fails, is there a fallback process, incident response path, and owner for remediation?

Lifecycle artifact checklist

Be ready to recognize the purpose of each artifact and what a weak version would look like.

Artifact	What it should show	Weak version or red flag
AI use-case intake form	Business purpose, users, decision impact, data used, risk rating, owner	Vague purpose, no owner, no impact assessment
Model inventory entry	Model name, owner, use case, status, materiality, version, dependencies	Missing vendor tools, spreadsheets, or embedded AI features
Data lineage record	Source systems, transformations, controls, access, retention	Unclear source, manual extracts, undocumented transformations
Development documentation	Problem definition, target, features, training process, assumptions, limitations	Focuses only on performance results
Validation report	Independent review of data, methodology, performance, limitations, implementation, monitoring	No challenge, no limitations, no remediation plan
Model card or factsheet	Intended use, prohibited use, performance, fairness, data, limitations	Marketing-style summary with no risk information
Explainability evidence	Global drivers, local explanations, reason codes where relevant	Black-box output with no decision rationale
Monitoring dashboard	Data drift, performance, overrides, exceptions, incidents, business outcomes	Only shows uptime or model volume
Change log	Versions, retraining events, threshold changes, approvals	Untracked parameter, data, or prompt changes
Incident log	Failure, impact, containment, root cause, remediation, lessons learned	Issues handled informally with no escalation
Vendor due diligence file	Methodology, controls, security, privacy, validation, service levels, auditability	Vendor refuses all evidence but model is still approved
User procedures	How to use outputs, limitations, escalation, override rules	Users treat AI outputs as mandatory decisions

Governance and control readiness

Lines of accountability

Role or function	What to know for exam readiness
Business owner	Owns the use case, business process, outcomes, and risk acceptance
Model developer	Builds or configures the model and documents methodology and assumptions
Independent validator	Challenges data, design, performance, implementation, monitoring, and limitations
Risk management	Sets standards, oversees risk appetite, reviews material risks, tracks remediation
Compliance/legal/privacy	Reviews regulatory, conduct, privacy, disclosure, and customer-impact concerns
Information security	Reviews access, security architecture, attack surfaces, and incident response
Internal audit	Provides independent assurance over governance and control effectiveness
Senior management or committee	Approves higher-risk use, exceptions, remediation priorities, and accountability

Readiness checks:

Explain why accountability cannot be outsourced to a vendor or algorithm.
Identify when a business owner should not self-approve a high-risk AI model.
Explain why independence matters in validation.
Distinguish policy approval, model approval, deployment approval, and exception approval.
Identify how risk appetite should affect thresholds, controls, monitoring, and escalation.

Control types to recognize

Control type	Examples	What it mitigates
Preventive	Use-case approval, access limits, data minimization, approved model library	Unauthorized or unsuitable AI use
Detective	Monitoring, drift alerts, exception reports, audit logs, output sampling	Model degradation, misuse, policy breaches
Corrective	Rollback, retraining, threshold adjustment, incident remediation, customer correction	Harm after failure or unexpected behavior
Compensating	Human review, manual fallback, dual control, restricted output use	Residual risk when primary controls are limited
Governance	Policy, committee review, inventory, documentation, validation standards	Inconsistent or uncontrolled AI deployment

Generative AI-specific checklist

Generative AI requires different controls from many traditional scoring models because outputs can be open-ended, variable, and difficult to verify.

Risk area	What to check	Ready response
Hallucination	Output may invent facts, sources, numbers, or reasoning	Require grounding, verification, human review, and limited-use policies
Prompt injection	User or document instructions manipulate the model	Sanitize inputs, restrict tool access, separate system instructions, monitor outputs
Data leakage	Confidential data may enter prompts, logs, or vendor systems	Use approved environments, access controls, masking, and retention limits
Overreliance	Users defer to fluent but unverified output	Train users, require review, show limitations, restrict high-impact decisions
Toxic or biased output	Generated content may be unfair, offensive, or discriminatory	Test outputs, apply filters, monitor complaints, include escalation
Inconsistent output	Same prompt may produce different results	Define acceptable variability, use templates, test repeatability where needed
Tool misuse	LLM can trigger actions, retrieve data, or generate code	Limit permissions, log actions, require approvals for high-impact actions
Vendor opacity	Model training and controls may be unclear	Perform due diligence, contract for controls, document residual risk

Can you do this?

Explain why LLM output should not be treated as verified fact without controls.
Identify prohibited or restricted uses for generative AI in a finance organization.
Design a control set for an internal chatbot that answers policy questions.
Design a control set for an LLM that drafts customer-facing communications.
Explain why logs, prompts, retrieved documents, and outputs all create risk evidence.
Identify when a generative AI tool should be included in an AI/model inventory.

Finance use-case readiness checks

Use case	Key AI risks	Controls to consider
Credit underwriting or account management	Bias, explainability, data leakage, adverse customer impact, model drift	Fairness testing, reason codes, validation, monitoring, override review
Fraud detection	Class imbalance, false positives, adaptive adversaries, customer friction	Threshold governance, alert sampling, recall/precision review, drift monitoring
Financial crime or compliance surveillance	Missed suspicious activity, alert suppression, explainability, regulator scrutiny	Independent validation, sampling, audit trails, escalation controls
Market risk or trading support	Regime change, data latency, model instability, automation risk	Stress testing, limits, human oversight, change control, kill-switch concepts
Portfolio analytics	Overfitting, unstable correlations, misleading optimization, scenario weakness	Sensitivity testing, benchmark comparison, assumptions review
Liquidity or stress forecasting	Rare-event uncertainty, macro regime changes, data limitations	Scenario analysis, expert challenge, conservative assumptions, monitoring
Customer service chatbot	Misleading answers, privacy exposure, conduct risk, escalation failure	Approved knowledge base, human escalation, output monitoring, disclosure controls
Operations automation	Process errors, exception handling, access control, resilience	Workflow controls, fallback procedures, reconciliation, logging
Vendor risk scoring	Opaque methodology, dependency risk, inconsistent updates	Due diligence, service-level controls, validation evidence, performance monitoring
Employee productivity AI	Confidentiality, unapproved use, inaccurate summaries, policy violations	Usage policy, approved tools, training, data restrictions, review procedures

Common weak areas and traps

Trap	Why it is wrong	Better exam mindset
“The model is accurate, so it is low risk.”	Accuracy may hide bias, instability, explainability gaps, or high-impact errors.	Evaluate performance, use case, impact, controls, and limitations together.
“Protected attributes were excluded, so fairness is solved.”	Proxy variables can reproduce sensitive characteristics.	Test outcomes and drivers, not just input names.
“The vendor owns the model, so the vendor owns the risk.”	The institution still owns its decisions and customer/process impact.	Apply third-party risk and model governance controls.
“Validation is a one-time pre-launch step.”	Models degrade and environments change.	Monitor, revalidate after material change, and track performance over time.
“Human in the loop always reduces risk.”	Human review can be superficial, biased, overloaded, or ignored.	Ensure review is trained, documented, empowered, and effective.
“More complex models are always better.”	Complexity can reduce transparency, stability, and control.	Match complexity to use case, evidence, and risk appetite.
“Generative AI is just another predictive model.”	LLMs can hallucinate, leak data, follow malicious prompts, and create open-ended outputs.	Apply generation-specific controls.
“Back-testing proves future performance.”	Future regimes may differ from historical data.	Combine back-testing with stress, sensitivity, drift, and expert challenge.
“Monitoring only means checking uptime.”	A model can be available but wrong, biased, stale, or misused.	Monitor data, outputs, outcomes, exceptions, and incidents.
“Documentation is optional if the team understands the model.”	Staff change, audits occur, and decisions need evidence.	Treat documentation as a core control.
“Threshold changes are minor tuning.”	Thresholds can materially change approvals, alerts, losses, and customer impact.	Govern thresholds as decision controls.
“AI policy is only a technology issue.”	AI risk spans business, legal, compliance, risk, privacy, security, and operations.	Use cross-functional governance.

Final-week review checklist

High-yield final review tasks

Task	Done?
Rebuild the AI lifecycle from memory: intake, design, data, development, validation, approval, deployment, monitoring, change, retirement.	[ ]
Create a one-page table of model risks and controls by lifecycle stage.	[ ]
Review confusion-matrix metrics and practice choosing the right metric for a scenario.	[ ]
Practice explaining false positives and false negatives in credit, fraud, compliance, and customer-service examples.	[ ]
Review data leakage, proxy bias, drift, overfitting, and model explainability.	[ ]
Review generative AI risks: hallucination, prompt injection, data leakage, overreliance, and output governance.	[ ]
Practice third-party AI scenarios where the model is opaque or vendor-controlled.	[ ]
Review artifacts: model inventory, validation report, model card, monitoring dashboard, change log, incident log.	[ ]
Practice deciding when to approve, reject, restrict, escalate, retrain, or retire an AI system.	[ ]
Make a list of your top five weak areas and answer scenario questions for each.	[ ]

Last-pass “ready or not” questions

You are close to ready when you can answer each without guessing:

What makes an AI use case high risk?
What evidence should be reviewed before deployment?
What evidence should be monitored after deployment?
How can a model be biased without using protected attributes?
How can strong historical performance fail in a new environment?
What is the difference between model development, validation, governance, and audit?
What controls reduce LLM hallucination and data leakage risk?
When should a vendor AI tool be challenged or restricted?
How do you choose between precision and recall?
What should happen after a material model change?

Practical next step

Pick the three checklist areas with the most unchecked items. For each one, write a short scenario, identify the AI risk, choose the control response, and explain why weaker alternatives are insufficient. Then move into timed practice so you can apply the checklist quickly under exam conditions for the GARP Risk and AI Certificate (RAI).

Study Plan

Scenario Guide

RAI — GARP Risk and AI Certificate Exam Blueprint

How to Use This Exam Blueprint

Topic-area readiness map

Core “can you do this?” checklist

AI and model concepts

Risk governance and lifecycle controls

Data and feature risk

Model evaluation and validation

Explainability, transparency, and accountability

Generative AI and LLM risk

Metrics and calculation readiness

Classification metrics

Regression and forecasting metrics

Drift and stability checks

Finance scenario and decision-point checks

Applied scenario table

Decision prompts to practice

Lifecycle artifact checklist

Governance and control readiness

Lines of accountability

Control types to recognize

Generative AI-specific checklist

Finance use-case readiness checks

Common weak areas and traps

Final-week review checklist

High-yield final review tasks

Last-pass “ready or not” questions

Practical next step

Browse Certification Practice Tests by Exam Family