RAI — GARP Risk and AI Certificate Exam Blueprint

Practical exam blueprint for the GARP Risk and AI Certificate (RAI), covering AI risk governance, model lifecycle, validation, data, controls, and finance use cases.

How to Use This Exam Blueprint

This independent checklist is for candidates preparing for the GARP Risk and AI Certificate (RAI) exam from GARP, exam code RAI. Use it as a practical study blueprint: identify the topic areas you can explain, the scenarios you can handle, and the weak spots to revisit before exam day.

Because official weights can change, this page does not assign point values or imply section weightings. Treat every area below as a readiness area for applied AI risk judgment in a finance context.

A strong candidate should be able to:

  • Explain AI and machine learning concepts in risk-management language.
  • Identify data, model, operational, compliance, and governance risks across the AI lifecycle.
  • Evaluate controls for traditional predictive models and newer generative AI systems.
  • Interpret model performance, fairness, explainability, and monitoring evidence.
  • Choose appropriate risk responses in finance scenarios involving credit, markets, fraud, operations, compliance, vendors, and client-facing tools.

Topic-area readiness map

Readiness areaWhat to reviewYou are ready when you can…Quick self-test
AI and machine learning foundationsSupervised learning, unsupervised learning, reinforcement learning, generative AI, model training, features, labels, parameters, hyperparametersClassify an AI use case by model type and explain how the model learnsIs a fraud clustering model supervised or unsupervised? What changes if labels are added?
Finance risk contextCredit risk, market risk, liquidity risk, operational risk, model risk, compliance risk, conduct risk, reputational riskConnect an AI failure mode to a financial risk consequenceIf an AI limit-monitoring tool misses a breach, which risks are implicated?
Data lifecycle and data riskData sourcing, lineage, quality, representativeness, missing values, outliers, leakage, privacy, retentionSpot when a model problem is primarily a data problemWhat is the risk of training on data that includes post-decision outcomes?
Model development lifecycleProblem framing, target definition, feature engineering, training, testing, tuning, deploymentExplain how design choices create or reduce riskWhy can a poorly defined target produce a high-performing but unsuitable model?
Model evaluationAccuracy, precision, recall, specificity, F1, ROC/AUC, calibration, MAE, RMSE, overfitting, underfittingSelect evaluation metrics based on the business error costIn fraud detection, why might recall matter more than accuracy?
Validation and independent challengeConceptual soundness, outcomes analysis, benchmark/challenger models, sensitivity testing, stress testing, documentation reviewDistinguish model development from validation and challengeWhat should a validator question even if back-test results look strong?
Governance and accountabilityModel inventory, ownership, approval, escalation, policy exceptions, committees, documentation, audit trailsIdentify who should approve, monitor, and challenge AI useWho owns a vendor model used in a bank decision process?
Explainability and transparencyGlobal vs local explanations, feature importance, reason codes, model cards, limitations, human interpretabilityMatch the level of explanation to model risk and use caseWhat explanation is needed for an adverse credit decision versus a marketing segmentation model?
Bias, fairness, and ethicsProxy variables, disparate impact, sampling bias, measurement bias, fairness metrics, human oversightRecognize fairness concerns even when protected attributes are not used directlyCan ZIP code, school, or employment history act as a proxy?
Generative AI and LLM riskPrompting, hallucination, retrieval, training data exposure, prompt injection, output controls, human reviewIdentify controls for AI-generated text, code, summaries, and decisionsWhat controls are needed before using an LLM to draft risk reports?
Cybersecurity and resilienceAccess control, adversarial attacks, data poisoning, model extraction, prompt injection, logging, incident responseExplain how AI systems can be attacked or misusedHow could a malicious prompt alter an internal chatbot’s output?
Third-party and vendor AI riskDue diligence, contractual controls, model opacity, service levels, data sharing, audit rights, concentration riskEvaluate vendor risk without assuming “vendor-owned” means “risk-free”What evidence should be requested for a black-box scoring model?
Monitoring and change managementDrift, stability, performance decay, threshold changes, retraining, versioning, production controlsDefine what should be monitored after deploymentWhat does a sudden drop in approval rates require you to investigate?
Regulation, compliance, and professional conductGovernance expectations, documentation, disclosures, privacy, consumer protection, accountability, ethicsApply principles without relying on a single jurisdiction-specific ruleWhat makes an AI control defensible to a regulator, auditor, or risk committee?

Core “can you do this?” checklist

Use this section as a fast readiness test. If you cannot do an item without notes, mark it for review.

AI and model concepts

  • Distinguish AI, machine learning, deep learning, natural language processing, and generative AI.
  • Explain the difference between supervised, unsupervised, semi-supervised, and reinforcement learning.
  • Identify examples of classification, regression, clustering, ranking, anomaly detection, recommendation, and text generation.
  • Explain training, validation, test, and production datasets in plain language.
  • Describe the role of features, labels, targets, parameters, hyperparameters, loss functions, and thresholds.
  • Explain overfitting, underfitting, data leakage, concept drift, and model decay.
  • Describe why a model can be statistically accurate but still unsuitable, unfair, unstable, or noncompliant.
  • Explain why correlation, prediction, and causation are not the same.

Risk governance and lifecycle controls

  • Map an AI system from use-case proposal through retirement.
  • Identify control points before development, before deployment, and after deployment.
  • Define model owner, business owner, developer, validator, approver, user, and auditor roles.
  • Explain why high-impact finance use cases need stronger governance than low-impact internal productivity tools.
  • Identify when a model should be escalated for independent validation.
  • Recognize when a change is material enough to require reapproval or revalidation.
  • Describe what belongs in a model inventory.
  • Explain why documentation is a control, not just an administrative task.

Data and feature risk

  • Assess whether data is complete, accurate, timely, representative, and fit for purpose.
  • Identify missing-value, outlier, duplicate, stale-data, and inconsistent-definition risks.
  • Detect target leakage and training-serving skew.
  • Explain data lineage from source system to model input.
  • Identify privacy and confidentiality risks in training, testing, prompting, logging, and output storage.
  • Recognize proxy variables that may encode sensitive or protected characteristics.
  • Explain why historical bias can be learned and amplified by an AI system.
  • Describe controls for sensitive data access, masking, minimization, retention, and deletion.

Model evaluation and validation

  • Interpret confusion-matrix outcomes: true positives, false positives, true negatives, and false negatives.
  • Choose between precision, recall, specificity, accuracy, F1, ROC/AUC, calibration, MAE, RMSE, and stability metrics based on the use case.
  • Explain why class imbalance can make accuracy misleading.
  • Describe validation tests for conceptual soundness, data quality, performance, sensitivity, stability, and implementation.
  • Explain benchmark and challenger model analysis.
  • Identify limitations of back-testing when market regimes, customer behavior, or fraud patterns change.
  • Distinguish model validation from internal audit, quality assurance, and business sign-off.
  • Explain how validation findings should be rated, remediated, and tracked.

Explainability, transparency, and accountability

  • Distinguish global explanations from local explanations.
  • Explain feature importance, partial dependence, reason codes, and local explanations at a conceptual level.
  • Identify when explainability is needed for users, customers, validators, auditors, or regulators.
  • Explain the trade-off between model complexity and interpretability.
  • Recognize when a simpler model may be preferred because it is more transparent, stable, or controllable.
  • Describe how documentation, user training, and human review support accountability.
  • Identify when “the AI said so” is not an acceptable decision rationale.

Generative AI and LLM risk

  • Explain hallucination, prompt injection, data leakage, insecure output, copyright/intellectual-property risk, and overreliance.
  • Distinguish open-ended generation from deterministic scoring or classification.
  • Identify controls for LLM prompts, retrieved context, output review, logging, access, and prohibited use.
  • Explain why an LLM-generated answer may be fluent but wrong.
  • Describe retrieval-augmented generation at a conceptual level and its risk benefits and limitations.
  • Identify when human review is required before using generated text in risk, compliance, client, or regulatory communication.
  • Recognize that using a third-party LLM can create data, vendor, operational, and compliance risk.

Metrics and calculation readiness

The RAI exam may emphasize applied judgment more than computation, but you should be comfortable interpreting common model metrics. Do not memorize formulas in isolation; know what each metric means and when it can mislead.

Classification metrics

Use these terms consistently:

TermMeaningCommon risk interpretation
True positiveModel correctly identifies the positive classCorrectly flags fraud, default, breach, or event
False positiveModel flags positive when actual is negativeUnnecessary review, declined good customer, operational burden
True negativeModel correctly identifies the negative classCorrectly clears a normal case
False negativeModel misses an actual positiveUndetected fraud, missed default, missed breach, uncontrolled exposure
\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall or Sensitivity} = \frac{TP}{TP + FN} \]\[ \text{Specificity} = \frac{TN}{TN + FP} \]\[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]

Readiness checks:

  • Explain why high accuracy can be meaningless when the positive class is rare.
  • Explain the trade-off between false positives and false negatives.
  • Choose recall-focused controls when missing the event is costly.
  • Choose precision-focused controls when excessive false alarms are costly.
  • Explain why changing a classification threshold changes business outcomes.
  • Connect threshold decisions to risk appetite, capacity, customer impact, and compliance.

Regression and forecasting metrics

MetricUseWhat to watch
MAEAverage absolute errorEasier to interpret in original units
RMSEPenalizes larger errors more heavilySensitive to outliers
R-squaredShare of variance explained in a simple regression contextCan be misleading outside its context or with poor validation
CalibrationWhether predicted probabilities match observed outcomesCritical when probabilities drive risk decisions
Back-test resultComparison of predictions with realized outcomesMay fail when conditions change
\[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]\[ MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| \]

Readiness checks:

  • Explain why a model with lower RMSE may still be less acceptable if errors are concentrated in high-risk segments.
  • Interpret calibration problems in probability-of-default, fraud-risk, or loss-forecasting settings.
  • Explain why back-tests should be supplemented by sensitivity, stress, and stability analysis.

Drift and stability checks

ConceptWhat it meansScenario cue
Data driftInput data distribution changesCustomer income, transaction patterns, or market variables shift
Concept driftRelationship between inputs and target changesFraudsters adapt; default drivers change after macro stress
Population stabilityScored population differs from development populationNew product, new region, changed underwriting strategy
Performance decayModel output quality declines over timeMore overrides, complaints, losses, or investigation misses

Population Stability Index is one common way to summarize distribution shift:

\[ PSI = \sum_i (Actual_i - Expected_i)\ln\left(\frac{Actual_i}{Expected_i}\right) \]

Readiness checks:

  • Know that no single drift metric proves a model is safe or unsafe.
  • Explain why monitoring should include data, performance, overrides, complaints, exceptions, and business outcomes.
  • Identify when drift requires investigation, threshold adjustment, retraining, fallback controls, or model retirement.

Finance scenario and decision-point checks

Applied scenario table

Scenario cueMain issue being testedBetter answer directionCommon trap
A credit model has strong overall performance but worse outcomes for a demographic segmentFairness, segmentation, proxy variables, governanceInvestigate bias, proxies, data representativeness, adverse impact, explanations, and remediationSaying the model is acceptable because protected attributes were excluded
A fraud model flags many legitimate transactionsPrecision/recall trade-off, customer impact, operations capacityReview thresholds, false-positive costs, customer friction, escalation workflow, and monitoringOptimizing only for maximum fraud capture
A model performs well in development but poorly after launchImplementation error, drift, training-serving skew, monitoringCompare production inputs, code versions, thresholds, data definitions, and population changesAssuming the model was validated once, so the issue is not model risk
A vendor provides a black-box score with limited documentationThird-party model risk, explainability, accountabilityRequest methodology, validation evidence, performance by segment, limitations, data controls, and audit rightsTreating vendor secrecy as a reason to skip validation
An LLM summarizes internal risk reportsHallucination, confidentiality, source grounding, human reviewUse approved data sources, retrieval controls, output review, citations where appropriate, and loggingAssuming polished language equals factual accuracy
Staff paste confidential client data into a public AI toolPrivacy, security, data leakage, policy breachStop use, contain exposure, investigate, notify internally, strengthen access and training controlsTreating it as only an employee training issue
A trading-support model is retrained after a volatile periodRegime change, validation, market risk, change controlTest performance across regimes, stress assumptions, document changes, approve deploymentAssuming recent data is always more relevant
A customer chatbot gives product guidanceConduct risk, disclosure, suitability-style concerns, escalationLimit scope, monitor outputs, provide disclaimers, route complex cases to humansLetting the chatbot provide unrestricted financial advice
A compliance alert model reduces investigations by suppressing low-risk alertsFalse negatives, regulatory exposure, model explainabilityValidate suppression logic, review missed-event risk, sample suppressed alerts, monitor outcomesCelebrating efficiency without checking missed suspicious activity
A model uses social-media-derived featuresData ethics, consent, representativeness, reputational riskReview legality, fairness, purpose limitation, bias, explainability, and customer expectationsFocusing only on predictive power

Decision prompts to practice

For each prompt, practice giving a complete answer in 60 to 90 seconds.

  1. Use-case classification What decision does the AI system support, who relies on it, and what happens if it is wrong?

  2. Risk materiality Is the use case customer-impacting, financially material, regulatory-facing, safety-critical, or operationally critical?

  3. Data fitness Are the data sources authorized, representative, current, complete, and aligned with the intended decision?

  4. Model fitness Does the model type match the problem, and are its limitations understood by users?

  5. Performance trade-off Which error is more costly: false positive or false negative? Who bears the cost?

  6. Fairness and ethics Could the model create unjustified differences in outcomes across groups or customer segments?

  7. Explainability Can the decision be explained to the appropriate audience: user, customer, validator, auditor, or regulator?

  8. Human oversight Is human review meaningful, trained, documented, and empowered to override the model?

  9. Monitoring What indicators would show the model is degrading, being misused, or operating outside intended conditions?

  10. Response plan If the model fails, is there a fallback process, incident response path, and owner for remediation?

Lifecycle artifact checklist

Be ready to recognize the purpose of each artifact and what a weak version would look like.

ArtifactWhat it should showWeak version or red flag
AI use-case intake formBusiness purpose, users, decision impact, data used, risk rating, ownerVague purpose, no owner, no impact assessment
Model inventory entryModel name, owner, use case, status, materiality, version, dependenciesMissing vendor tools, spreadsheets, or embedded AI features
Data lineage recordSource systems, transformations, controls, access, retentionUnclear source, manual extracts, undocumented transformations
Development documentationProblem definition, target, features, training process, assumptions, limitationsFocuses only on performance results
Validation reportIndependent review of data, methodology, performance, limitations, implementation, monitoringNo challenge, no limitations, no remediation plan
Model card or factsheetIntended use, prohibited use, performance, fairness, data, limitationsMarketing-style summary with no risk information
Explainability evidenceGlobal drivers, local explanations, reason codes where relevantBlack-box output with no decision rationale
Monitoring dashboardData drift, performance, overrides, exceptions, incidents, business outcomesOnly shows uptime or model volume
Change logVersions, retraining events, threshold changes, approvalsUntracked parameter, data, or prompt changes
Incident logFailure, impact, containment, root cause, remediation, lessons learnedIssues handled informally with no escalation
Vendor due diligence fileMethodology, controls, security, privacy, validation, service levels, auditabilityVendor refuses all evidence but model is still approved
User proceduresHow to use outputs, limitations, escalation, override rulesUsers treat AI outputs as mandatory decisions

Governance and control readiness

Lines of accountability

Role or functionWhat to know for exam readiness
Business ownerOwns the use case, business process, outcomes, and risk acceptance
Model developerBuilds or configures the model and documents methodology and assumptions
Independent validatorChallenges data, design, performance, implementation, monitoring, and limitations
Risk managementSets standards, oversees risk appetite, reviews material risks, tracks remediation
Compliance/legal/privacyReviews regulatory, conduct, privacy, disclosure, and customer-impact concerns
Information securityReviews access, security architecture, attack surfaces, and incident response
Internal auditProvides independent assurance over governance and control effectiveness
Senior management or committeeApproves higher-risk use, exceptions, remediation priorities, and accountability

Readiness checks:

  • Explain why accountability cannot be outsourced to a vendor or algorithm.
  • Identify when a business owner should not self-approve a high-risk AI model.
  • Explain why independence matters in validation.
  • Distinguish policy approval, model approval, deployment approval, and exception approval.
  • Identify how risk appetite should affect thresholds, controls, monitoring, and escalation.

Control types to recognize

Control typeExamplesWhat it mitigates
PreventiveUse-case approval, access limits, data minimization, approved model libraryUnauthorized or unsuitable AI use
DetectiveMonitoring, drift alerts, exception reports, audit logs, output samplingModel degradation, misuse, policy breaches
CorrectiveRollback, retraining, threshold adjustment, incident remediation, customer correctionHarm after failure or unexpected behavior
CompensatingHuman review, manual fallback, dual control, restricted output useResidual risk when primary controls are limited
GovernancePolicy, committee review, inventory, documentation, validation standardsInconsistent or uncontrolled AI deployment

Generative AI-specific checklist

Generative AI requires different controls from many traditional scoring models because outputs can be open-ended, variable, and difficult to verify.

Risk areaWhat to checkReady response
HallucinationOutput may invent facts, sources, numbers, or reasoningRequire grounding, verification, human review, and limited-use policies
Prompt injectionUser or document instructions manipulate the modelSanitize inputs, restrict tool access, separate system instructions, monitor outputs
Data leakageConfidential data may enter prompts, logs, or vendor systemsUse approved environments, access controls, masking, and retention limits
OverrelianceUsers defer to fluent but unverified outputTrain users, require review, show limitations, restrict high-impact decisions
Toxic or biased outputGenerated content may be unfair, offensive, or discriminatoryTest outputs, apply filters, monitor complaints, include escalation
Inconsistent outputSame prompt may produce different resultsDefine acceptable variability, use templates, test repeatability where needed
Tool misuseLLM can trigger actions, retrieve data, or generate codeLimit permissions, log actions, require approvals for high-impact actions
Vendor opacityModel training and controls may be unclearPerform due diligence, contract for controls, document residual risk

Can you do this?

  • Explain why LLM output should not be treated as verified fact without controls.
  • Identify prohibited or restricted uses for generative AI in a finance organization.
  • Design a control set for an internal chatbot that answers policy questions.
  • Design a control set for an LLM that drafts customer-facing communications.
  • Explain why logs, prompts, retrieved documents, and outputs all create risk evidence.
  • Identify when a generative AI tool should be included in an AI/model inventory.

Finance use-case readiness checks

Use caseKey AI risksControls to consider
Credit underwriting or account managementBias, explainability, data leakage, adverse customer impact, model driftFairness testing, reason codes, validation, monitoring, override review
Fraud detectionClass imbalance, false positives, adaptive adversaries, customer frictionThreshold governance, alert sampling, recall/precision review, drift monitoring
Financial crime or compliance surveillanceMissed suspicious activity, alert suppression, explainability, regulator scrutinyIndependent validation, sampling, audit trails, escalation controls
Market risk or trading supportRegime change, data latency, model instability, automation riskStress testing, limits, human oversight, change control, kill-switch concepts
Portfolio analyticsOverfitting, unstable correlations, misleading optimization, scenario weaknessSensitivity testing, benchmark comparison, assumptions review
Liquidity or stress forecastingRare-event uncertainty, macro regime changes, data limitationsScenario analysis, expert challenge, conservative assumptions, monitoring
Customer service chatbotMisleading answers, privacy exposure, conduct risk, escalation failureApproved knowledge base, human escalation, output monitoring, disclosure controls
Operations automationProcess errors, exception handling, access control, resilienceWorkflow controls, fallback procedures, reconciliation, logging
Vendor risk scoringOpaque methodology, dependency risk, inconsistent updatesDue diligence, service-level controls, validation evidence, performance monitoring
Employee productivity AIConfidentiality, unapproved use, inaccurate summaries, policy violationsUsage policy, approved tools, training, data restrictions, review procedures

Common weak areas and traps

TrapWhy it is wrongBetter exam mindset
“The model is accurate, so it is low risk.”Accuracy may hide bias, instability, explainability gaps, or high-impact errors.Evaluate performance, use case, impact, controls, and limitations together.
“Protected attributes were excluded, so fairness is solved.”Proxy variables can reproduce sensitive characteristics.Test outcomes and drivers, not just input names.
“The vendor owns the model, so the vendor owns the risk.”The institution still owns its decisions and customer/process impact.Apply third-party risk and model governance controls.
“Validation is a one-time pre-launch step.”Models degrade and environments change.Monitor, revalidate after material change, and track performance over time.
“Human in the loop always reduces risk.”Human review can be superficial, biased, overloaded, or ignored.Ensure review is trained, documented, empowered, and effective.
“More complex models are always better.”Complexity can reduce transparency, stability, and control.Match complexity to use case, evidence, and risk appetite.
“Generative AI is just another predictive model.”LLMs can hallucinate, leak data, follow malicious prompts, and create open-ended outputs.Apply generation-specific controls.
“Back-testing proves future performance.”Future regimes may differ from historical data.Combine back-testing with stress, sensitivity, drift, and expert challenge.
“Monitoring only means checking uptime.”A model can be available but wrong, biased, stale, or misused.Monitor data, outputs, outcomes, exceptions, and incidents.
“Documentation is optional if the team understands the model.”Staff change, audits occur, and decisions need evidence.Treat documentation as a core control.
“Threshold changes are minor tuning.”Thresholds can materially change approvals, alerts, losses, and customer impact.Govern thresholds as decision controls.
“AI policy is only a technology issue.”AI risk spans business, legal, compliance, risk, privacy, security, and operations.Use cross-functional governance.

Final-week review checklist

High-yield final review tasks

TaskDone?
Rebuild the AI lifecycle from memory: intake, design, data, development, validation, approval, deployment, monitoring, change, retirement.[ ]
Create a one-page table of model risks and controls by lifecycle stage.[ ]
Review confusion-matrix metrics and practice choosing the right metric for a scenario.[ ]
Practice explaining false positives and false negatives in credit, fraud, compliance, and customer-service examples.[ ]
Review data leakage, proxy bias, drift, overfitting, and model explainability.[ ]
Review generative AI risks: hallucination, prompt injection, data leakage, overreliance, and output governance.[ ]
Practice third-party AI scenarios where the model is opaque or vendor-controlled.[ ]
Review artifacts: model inventory, validation report, model card, monitoring dashboard, change log, incident log.[ ]
Practice deciding when to approve, reject, restrict, escalate, retrain, or retire an AI system.[ ]
Make a list of your top five weak areas and answer scenario questions for each.[ ]

Last-pass “ready or not” questions

You are close to ready when you can answer each without guessing:

  • What makes an AI use case high risk?
  • What evidence should be reviewed before deployment?
  • What evidence should be monitored after deployment?
  • How can a model be biased without using protected attributes?
  • How can strong historical performance fail in a new environment?
  • What is the difference between model development, validation, governance, and audit?
  • What controls reduce LLM hallucination and data leakage risk?
  • When should a vendor AI tool be challenged or restricted?
  • How do you choose between precision and recall?
  • What should happen after a material model change?

Practical next step

Pick the three checklist areas with the most unchecked items. For each one, write a short scenario, identify the AI risk, choose the control response, and explain why weaker alternatives are insufficient. Then move into timed practice so you can apply the checklist quickly under exam conditions for the GARP Risk and AI Certificate (RAI).