RAI — GARP Risk and AI Certificate Exam Blueprint
Practical exam blueprint for the GARP Risk and AI Certificate (RAI), covering AI risk governance, model lifecycle, validation, data, controls, and finance use cases.
How to Use This Exam Blueprint
This independent checklist is for candidates preparing for the GARP Risk and AI Certificate (RAI) exam from GARP, exam code RAI. Use it as a practical study blueprint: identify the topic areas you can explain, the scenarios you can handle, and the weak spots to revisit before exam day.
Because official weights can change, this page does not assign point values or imply section weightings. Treat every area below as a readiness area for applied AI risk judgment in a finance context.
A strong candidate should be able to:
- Explain AI and machine learning concepts in risk-management language.
- Identify data, model, operational, compliance, and governance risks across the AI lifecycle.
- Evaluate controls for traditional predictive models and newer generative AI systems.
- Interpret model performance, fairness, explainability, and monitoring evidence.
- Choose appropriate risk responses in finance scenarios involving credit, markets, fraud, operations, compliance, vendors, and client-facing tools.
Topic-area readiness map
| Readiness area | What to review | You are ready when you can… | Quick self-test |
|---|---|---|---|
| AI and machine learning foundations | Supervised learning, unsupervised learning, reinforcement learning, generative AI, model training, features, labels, parameters, hyperparameters | Classify an AI use case by model type and explain how the model learns | Is a fraud clustering model supervised or unsupervised? What changes if labels are added? |
| Finance risk context | Credit risk, market risk, liquidity risk, operational risk, model risk, compliance risk, conduct risk, reputational risk | Connect an AI failure mode to a financial risk consequence | If an AI limit-monitoring tool misses a breach, which risks are implicated? |
| Data lifecycle and data risk | Data sourcing, lineage, quality, representativeness, missing values, outliers, leakage, privacy, retention | Spot when a model problem is primarily a data problem | What is the risk of training on data that includes post-decision outcomes? |
| Model development lifecycle | Problem framing, target definition, feature engineering, training, testing, tuning, deployment | Explain how design choices create or reduce risk | Why can a poorly defined target produce a high-performing but unsuitable model? |
| Model evaluation | Accuracy, precision, recall, specificity, F1, ROC/AUC, calibration, MAE, RMSE, overfitting, underfitting | Select evaluation metrics based on the business error cost | In fraud detection, why might recall matter more than accuracy? |
| Validation and independent challenge | Conceptual soundness, outcomes analysis, benchmark/challenger models, sensitivity testing, stress testing, documentation review | Distinguish model development from validation and challenge | What should a validator question even if back-test results look strong? |
| Governance and accountability | Model inventory, ownership, approval, escalation, policy exceptions, committees, documentation, audit trails | Identify who should approve, monitor, and challenge AI use | Who owns a vendor model used in a bank decision process? |
| Explainability and transparency | Global vs local explanations, feature importance, reason codes, model cards, limitations, human interpretability | Match the level of explanation to model risk and use case | What explanation is needed for an adverse credit decision versus a marketing segmentation model? |
| Bias, fairness, and ethics | Proxy variables, disparate impact, sampling bias, measurement bias, fairness metrics, human oversight | Recognize fairness concerns even when protected attributes are not used directly | Can ZIP code, school, or employment history act as a proxy? |
| Generative AI and LLM risk | Prompting, hallucination, retrieval, training data exposure, prompt injection, output controls, human review | Identify controls for AI-generated text, code, summaries, and decisions | What controls are needed before using an LLM to draft risk reports? |
| Cybersecurity and resilience | Access control, adversarial attacks, data poisoning, model extraction, prompt injection, logging, incident response | Explain how AI systems can be attacked or misused | How could a malicious prompt alter an internal chatbot’s output? |
| Third-party and vendor AI risk | Due diligence, contractual controls, model opacity, service levels, data sharing, audit rights, concentration risk | Evaluate vendor risk without assuming “vendor-owned” means “risk-free” | What evidence should be requested for a black-box scoring model? |
| Monitoring and change management | Drift, stability, performance decay, threshold changes, retraining, versioning, production controls | Define what should be monitored after deployment | What does a sudden drop in approval rates require you to investigate? |
| Regulation, compliance, and professional conduct | Governance expectations, documentation, disclosures, privacy, consumer protection, accountability, ethics | Apply principles without relying on a single jurisdiction-specific rule | What makes an AI control defensible to a regulator, auditor, or risk committee? |
Core “can you do this?” checklist
Use this section as a fast readiness test. If you cannot do an item without notes, mark it for review.
AI and model concepts
- Distinguish AI, machine learning, deep learning, natural language processing, and generative AI.
- Explain the difference between supervised, unsupervised, semi-supervised, and reinforcement learning.
- Identify examples of classification, regression, clustering, ranking, anomaly detection, recommendation, and text generation.
- Explain training, validation, test, and production datasets in plain language.
- Describe the role of features, labels, targets, parameters, hyperparameters, loss functions, and thresholds.
- Explain overfitting, underfitting, data leakage, concept drift, and model decay.
- Describe why a model can be statistically accurate but still unsuitable, unfair, unstable, or noncompliant.
- Explain why correlation, prediction, and causation are not the same.
Risk governance and lifecycle controls
- Map an AI system from use-case proposal through retirement.
- Identify control points before development, before deployment, and after deployment.
- Define model owner, business owner, developer, validator, approver, user, and auditor roles.
- Explain why high-impact finance use cases need stronger governance than low-impact internal productivity tools.
- Identify when a model should be escalated for independent validation.
- Recognize when a change is material enough to require reapproval or revalidation.
- Describe what belongs in a model inventory.
- Explain why documentation is a control, not just an administrative task.
Data and feature risk
- Assess whether data is complete, accurate, timely, representative, and fit for purpose.
- Identify missing-value, outlier, duplicate, stale-data, and inconsistent-definition risks.
- Detect target leakage and training-serving skew.
- Explain data lineage from source system to model input.
- Identify privacy and confidentiality risks in training, testing, prompting, logging, and output storage.
- Recognize proxy variables that may encode sensitive or protected characteristics.
- Explain why historical bias can be learned and amplified by an AI system.
- Describe controls for sensitive data access, masking, minimization, retention, and deletion.
Model evaluation and validation
- Interpret confusion-matrix outcomes: true positives, false positives, true negatives, and false negatives.
- Choose between precision, recall, specificity, accuracy, F1, ROC/AUC, calibration, MAE, RMSE, and stability metrics based on the use case.
- Explain why class imbalance can make accuracy misleading.
- Describe validation tests for conceptual soundness, data quality, performance, sensitivity, stability, and implementation.
- Explain benchmark and challenger model analysis.
- Identify limitations of back-testing when market regimes, customer behavior, or fraud patterns change.
- Distinguish model validation from internal audit, quality assurance, and business sign-off.
- Explain how validation findings should be rated, remediated, and tracked.
Explainability, transparency, and accountability
- Distinguish global explanations from local explanations.
- Explain feature importance, partial dependence, reason codes, and local explanations at a conceptual level.
- Identify when explainability is needed for users, customers, validators, auditors, or regulators.
- Explain the trade-off between model complexity and interpretability.
- Recognize when a simpler model may be preferred because it is more transparent, stable, or controllable.
- Describe how documentation, user training, and human review support accountability.
- Identify when “the AI said so” is not an acceptable decision rationale.
Generative AI and LLM risk
- Explain hallucination, prompt injection, data leakage, insecure output, copyright/intellectual-property risk, and overreliance.
- Distinguish open-ended generation from deterministic scoring or classification.
- Identify controls for LLM prompts, retrieved context, output review, logging, access, and prohibited use.
- Explain why an LLM-generated answer may be fluent but wrong.
- Describe retrieval-augmented generation at a conceptual level and its risk benefits and limitations.
- Identify when human review is required before using generated text in risk, compliance, client, or regulatory communication.
- Recognize that using a third-party LLM can create data, vendor, operational, and compliance risk.
Metrics and calculation readiness
The RAI exam may emphasize applied judgment more than computation, but you should be comfortable interpreting common model metrics. Do not memorize formulas in isolation; know what each metric means and when it can mislead.
Classification metrics
Use these terms consistently:
| Term | Meaning | Common risk interpretation |
|---|---|---|
| True positive | Model correctly identifies the positive class | Correctly flags fraud, default, breach, or event |
| False positive | Model flags positive when actual is negative | Unnecessary review, declined good customer, operational burden |
| True negative | Model correctly identifies the negative class | Correctly clears a normal case |
| False negative | Model misses an actual positive | Undetected fraud, missed default, missed breach, uncontrolled exposure |
Readiness checks:
- Explain why high accuracy can be meaningless when the positive class is rare.
- Explain the trade-off between false positives and false negatives.
- Choose recall-focused controls when missing the event is costly.
- Choose precision-focused controls when excessive false alarms are costly.
- Explain why changing a classification threshold changes business outcomes.
- Connect threshold decisions to risk appetite, capacity, customer impact, and compliance.
Regression and forecasting metrics
| Metric | Use | What to watch |
|---|---|---|
| MAE | Average absolute error | Easier to interpret in original units |
| RMSE | Penalizes larger errors more heavily | Sensitive to outliers |
| R-squared | Share of variance explained in a simple regression context | Can be misleading outside its context or with poor validation |
| Calibration | Whether predicted probabilities match observed outcomes | Critical when probabilities drive risk decisions |
| Back-test result | Comparison of predictions with realized outcomes | May fail when conditions change |
Readiness checks:
- Explain why a model with lower RMSE may still be less acceptable if errors are concentrated in high-risk segments.
- Interpret calibration problems in probability-of-default, fraud-risk, or loss-forecasting settings.
- Explain why back-tests should be supplemented by sensitivity, stress, and stability analysis.
Drift and stability checks
| Concept | What it means | Scenario cue |
|---|---|---|
| Data drift | Input data distribution changes | Customer income, transaction patterns, or market variables shift |
| Concept drift | Relationship between inputs and target changes | Fraudsters adapt; default drivers change after macro stress |
| Population stability | Scored population differs from development population | New product, new region, changed underwriting strategy |
| Performance decay | Model output quality declines over time | More overrides, complaints, losses, or investigation misses |
Population Stability Index is one common way to summarize distribution shift:
\[ PSI = \sum_i (Actual_i - Expected_i)\ln\left(\frac{Actual_i}{Expected_i}\right) \]Readiness checks:
- Know that no single drift metric proves a model is safe or unsafe.
- Explain why monitoring should include data, performance, overrides, complaints, exceptions, and business outcomes.
- Identify when drift requires investigation, threshold adjustment, retraining, fallback controls, or model retirement.
Finance scenario and decision-point checks
Applied scenario table
| Scenario cue | Main issue being tested | Better answer direction | Common trap |
|---|---|---|---|
| A credit model has strong overall performance but worse outcomes for a demographic segment | Fairness, segmentation, proxy variables, governance | Investigate bias, proxies, data representativeness, adverse impact, explanations, and remediation | Saying the model is acceptable because protected attributes were excluded |
| A fraud model flags many legitimate transactions | Precision/recall trade-off, customer impact, operations capacity | Review thresholds, false-positive costs, customer friction, escalation workflow, and monitoring | Optimizing only for maximum fraud capture |
| A model performs well in development but poorly after launch | Implementation error, drift, training-serving skew, monitoring | Compare production inputs, code versions, thresholds, data definitions, and population changes | Assuming the model was validated once, so the issue is not model risk |
| A vendor provides a black-box score with limited documentation | Third-party model risk, explainability, accountability | Request methodology, validation evidence, performance by segment, limitations, data controls, and audit rights | Treating vendor secrecy as a reason to skip validation |
| An LLM summarizes internal risk reports | Hallucination, confidentiality, source grounding, human review | Use approved data sources, retrieval controls, output review, citations where appropriate, and logging | Assuming polished language equals factual accuracy |
| Staff paste confidential client data into a public AI tool | Privacy, security, data leakage, policy breach | Stop use, contain exposure, investigate, notify internally, strengthen access and training controls | Treating it as only an employee training issue |
| A trading-support model is retrained after a volatile period | Regime change, validation, market risk, change control | Test performance across regimes, stress assumptions, document changes, approve deployment | Assuming recent data is always more relevant |
| A customer chatbot gives product guidance | Conduct risk, disclosure, suitability-style concerns, escalation | Limit scope, monitor outputs, provide disclaimers, route complex cases to humans | Letting the chatbot provide unrestricted financial advice |
| A compliance alert model reduces investigations by suppressing low-risk alerts | False negatives, regulatory exposure, model explainability | Validate suppression logic, review missed-event risk, sample suppressed alerts, monitor outcomes | Celebrating efficiency without checking missed suspicious activity |
| A model uses social-media-derived features | Data ethics, consent, representativeness, reputational risk | Review legality, fairness, purpose limitation, bias, explainability, and customer expectations | Focusing only on predictive power |
Decision prompts to practice
For each prompt, practice giving a complete answer in 60 to 90 seconds.
Use-case classification What decision does the AI system support, who relies on it, and what happens if it is wrong?
Risk materiality Is the use case customer-impacting, financially material, regulatory-facing, safety-critical, or operationally critical?
Data fitness Are the data sources authorized, representative, current, complete, and aligned with the intended decision?
Model fitness Does the model type match the problem, and are its limitations understood by users?
Performance trade-off Which error is more costly: false positive or false negative? Who bears the cost?
Fairness and ethics Could the model create unjustified differences in outcomes across groups or customer segments?
Explainability Can the decision be explained to the appropriate audience: user, customer, validator, auditor, or regulator?
Human oversight Is human review meaningful, trained, documented, and empowered to override the model?
Monitoring What indicators would show the model is degrading, being misused, or operating outside intended conditions?
Response plan If the model fails, is there a fallback process, incident response path, and owner for remediation?
Lifecycle artifact checklist
Be ready to recognize the purpose of each artifact and what a weak version would look like.
| Artifact | What it should show | Weak version or red flag |
|---|---|---|
| AI use-case intake form | Business purpose, users, decision impact, data used, risk rating, owner | Vague purpose, no owner, no impact assessment |
| Model inventory entry | Model name, owner, use case, status, materiality, version, dependencies | Missing vendor tools, spreadsheets, or embedded AI features |
| Data lineage record | Source systems, transformations, controls, access, retention | Unclear source, manual extracts, undocumented transformations |
| Development documentation | Problem definition, target, features, training process, assumptions, limitations | Focuses only on performance results |
| Validation report | Independent review of data, methodology, performance, limitations, implementation, monitoring | No challenge, no limitations, no remediation plan |
| Model card or factsheet | Intended use, prohibited use, performance, fairness, data, limitations | Marketing-style summary with no risk information |
| Explainability evidence | Global drivers, local explanations, reason codes where relevant | Black-box output with no decision rationale |
| Monitoring dashboard | Data drift, performance, overrides, exceptions, incidents, business outcomes | Only shows uptime or model volume |
| Change log | Versions, retraining events, threshold changes, approvals | Untracked parameter, data, or prompt changes |
| Incident log | Failure, impact, containment, root cause, remediation, lessons learned | Issues handled informally with no escalation |
| Vendor due diligence file | Methodology, controls, security, privacy, validation, service levels, auditability | Vendor refuses all evidence but model is still approved |
| User procedures | How to use outputs, limitations, escalation, override rules | Users treat AI outputs as mandatory decisions |
Governance and control readiness
Lines of accountability
| Role or function | What to know for exam readiness |
|---|---|
| Business owner | Owns the use case, business process, outcomes, and risk acceptance |
| Model developer | Builds or configures the model and documents methodology and assumptions |
| Independent validator | Challenges data, design, performance, implementation, monitoring, and limitations |
| Risk management | Sets standards, oversees risk appetite, reviews material risks, tracks remediation |
| Compliance/legal/privacy | Reviews regulatory, conduct, privacy, disclosure, and customer-impact concerns |
| Information security | Reviews access, security architecture, attack surfaces, and incident response |
| Internal audit | Provides independent assurance over governance and control effectiveness |
| Senior management or committee | Approves higher-risk use, exceptions, remediation priorities, and accountability |
Readiness checks:
- Explain why accountability cannot be outsourced to a vendor or algorithm.
- Identify when a business owner should not self-approve a high-risk AI model.
- Explain why independence matters in validation.
- Distinguish policy approval, model approval, deployment approval, and exception approval.
- Identify how risk appetite should affect thresholds, controls, monitoring, and escalation.
Control types to recognize
| Control type | Examples | What it mitigates |
|---|---|---|
| Preventive | Use-case approval, access limits, data minimization, approved model library | Unauthorized or unsuitable AI use |
| Detective | Monitoring, drift alerts, exception reports, audit logs, output sampling | Model degradation, misuse, policy breaches |
| Corrective | Rollback, retraining, threshold adjustment, incident remediation, customer correction | Harm after failure or unexpected behavior |
| Compensating | Human review, manual fallback, dual control, restricted output use | Residual risk when primary controls are limited |
| Governance | Policy, committee review, inventory, documentation, validation standards | Inconsistent or uncontrolled AI deployment |
Generative AI-specific checklist
Generative AI requires different controls from many traditional scoring models because outputs can be open-ended, variable, and difficult to verify.
| Risk area | What to check | Ready response |
|---|---|---|
| Hallucination | Output may invent facts, sources, numbers, or reasoning | Require grounding, verification, human review, and limited-use policies |
| Prompt injection | User or document instructions manipulate the model | Sanitize inputs, restrict tool access, separate system instructions, monitor outputs |
| Data leakage | Confidential data may enter prompts, logs, or vendor systems | Use approved environments, access controls, masking, and retention limits |
| Overreliance | Users defer to fluent but unverified output | Train users, require review, show limitations, restrict high-impact decisions |
| Toxic or biased output | Generated content may be unfair, offensive, or discriminatory | Test outputs, apply filters, monitor complaints, include escalation |
| Inconsistent output | Same prompt may produce different results | Define acceptable variability, use templates, test repeatability where needed |
| Tool misuse | LLM can trigger actions, retrieve data, or generate code | Limit permissions, log actions, require approvals for high-impact actions |
| Vendor opacity | Model training and controls may be unclear | Perform due diligence, contract for controls, document residual risk |
Can you do this?
- Explain why LLM output should not be treated as verified fact without controls.
- Identify prohibited or restricted uses for generative AI in a finance organization.
- Design a control set for an internal chatbot that answers policy questions.
- Design a control set for an LLM that drafts customer-facing communications.
- Explain why logs, prompts, retrieved documents, and outputs all create risk evidence.
- Identify when a generative AI tool should be included in an AI/model inventory.
Finance use-case readiness checks
| Use case | Key AI risks | Controls to consider |
|---|---|---|
| Credit underwriting or account management | Bias, explainability, data leakage, adverse customer impact, model drift | Fairness testing, reason codes, validation, monitoring, override review |
| Fraud detection | Class imbalance, false positives, adaptive adversaries, customer friction | Threshold governance, alert sampling, recall/precision review, drift monitoring |
| Financial crime or compliance surveillance | Missed suspicious activity, alert suppression, explainability, regulator scrutiny | Independent validation, sampling, audit trails, escalation controls |
| Market risk or trading support | Regime change, data latency, model instability, automation risk | Stress testing, limits, human oversight, change control, kill-switch concepts |
| Portfolio analytics | Overfitting, unstable correlations, misleading optimization, scenario weakness | Sensitivity testing, benchmark comparison, assumptions review |
| Liquidity or stress forecasting | Rare-event uncertainty, macro regime changes, data limitations | Scenario analysis, expert challenge, conservative assumptions, monitoring |
| Customer service chatbot | Misleading answers, privacy exposure, conduct risk, escalation failure | Approved knowledge base, human escalation, output monitoring, disclosure controls |
| Operations automation | Process errors, exception handling, access control, resilience | Workflow controls, fallback procedures, reconciliation, logging |
| Vendor risk scoring | Opaque methodology, dependency risk, inconsistent updates | Due diligence, service-level controls, validation evidence, performance monitoring |
| Employee productivity AI | Confidentiality, unapproved use, inaccurate summaries, policy violations | Usage policy, approved tools, training, data restrictions, review procedures |
Common weak areas and traps
| Trap | Why it is wrong | Better exam mindset |
|---|---|---|
| “The model is accurate, so it is low risk.” | Accuracy may hide bias, instability, explainability gaps, or high-impact errors. | Evaluate performance, use case, impact, controls, and limitations together. |
| “Protected attributes were excluded, so fairness is solved.” | Proxy variables can reproduce sensitive characteristics. | Test outcomes and drivers, not just input names. |
| “The vendor owns the model, so the vendor owns the risk.” | The institution still owns its decisions and customer/process impact. | Apply third-party risk and model governance controls. |
| “Validation is a one-time pre-launch step.” | Models degrade and environments change. | Monitor, revalidate after material change, and track performance over time. |
| “Human in the loop always reduces risk.” | Human review can be superficial, biased, overloaded, or ignored. | Ensure review is trained, documented, empowered, and effective. |
| “More complex models are always better.” | Complexity can reduce transparency, stability, and control. | Match complexity to use case, evidence, and risk appetite. |
| “Generative AI is just another predictive model.” | LLMs can hallucinate, leak data, follow malicious prompts, and create open-ended outputs. | Apply generation-specific controls. |
| “Back-testing proves future performance.” | Future regimes may differ from historical data. | Combine back-testing with stress, sensitivity, drift, and expert challenge. |
| “Monitoring only means checking uptime.” | A model can be available but wrong, biased, stale, or misused. | Monitor data, outputs, outcomes, exceptions, and incidents. |
| “Documentation is optional if the team understands the model.” | Staff change, audits occur, and decisions need evidence. | Treat documentation as a core control. |
| “Threshold changes are minor tuning.” | Thresholds can materially change approvals, alerts, losses, and customer impact. | Govern thresholds as decision controls. |
| “AI policy is only a technology issue.” | AI risk spans business, legal, compliance, risk, privacy, security, and operations. | Use cross-functional governance. |
Final-week review checklist
High-yield final review tasks
| Task | Done? |
|---|---|
| Rebuild the AI lifecycle from memory: intake, design, data, development, validation, approval, deployment, monitoring, change, retirement. | [ ] |
| Create a one-page table of model risks and controls by lifecycle stage. | [ ] |
| Review confusion-matrix metrics and practice choosing the right metric for a scenario. | [ ] |
| Practice explaining false positives and false negatives in credit, fraud, compliance, and customer-service examples. | [ ] |
| Review data leakage, proxy bias, drift, overfitting, and model explainability. | [ ] |
| Review generative AI risks: hallucination, prompt injection, data leakage, overreliance, and output governance. | [ ] |
| Practice third-party AI scenarios where the model is opaque or vendor-controlled. | [ ] |
| Review artifacts: model inventory, validation report, model card, monitoring dashboard, change log, incident log. | [ ] |
| Practice deciding when to approve, reject, restrict, escalate, retrain, or retire an AI system. | [ ] |
| Make a list of your top five weak areas and answer scenario questions for each. | [ ] |
Last-pass “ready or not” questions
You are close to ready when you can answer each without guessing:
- What makes an AI use case high risk?
- What evidence should be reviewed before deployment?
- What evidence should be monitored after deployment?
- How can a model be biased without using protected attributes?
- How can strong historical performance fail in a new environment?
- What is the difference between model development, validation, governance, and audit?
- What controls reduce LLM hallucination and data leakage risk?
- When should a vendor AI tool be challenged or restricted?
- How do you choose between precision and recall?
- What should happen after a material model change?
Practical next step
Pick the three checklist areas with the most unchecked items. For each one, write a short scenario, identify the AI risk, choose the control response, and explain why weaker alternatives are insufficient. Then move into timed practice so you can apply the checklist quickly under exam conditions for the GARP Risk and AI Certificate (RAI).