RAI — GARP Risk and AI Certificate Quick Reference

Last revised: July 1, 2026

Compact exam-prep reference for the GARP Risk and AI Certificate (RAI): AI risk concepts, model governance, validation, fairness, explainability, and GenAI controls.

Exam Identity and How to Use This Page

This independent Quick Reference supports candidates preparing for the GARP Risk and AI Certificate (RAI), exam code RAI, offered by GARP. It focuses on high-yield distinctions, applied risk-management decisions, and compact review tables rather than broad textbook coverage.

Use it to quickly answer exam-style questions such as:

What AI or machine learning method fits a financial risk use case?
What validation test or metric should be used?
What model risk control addresses a specific failure mode?
How do explainability, fairness, governance, and monitoring differ?
What additional risks arise from generative AI and large language models?

Core AI, ML, and Risk Vocabulary

Term	Exam-ready meaning	Common trap
Artificial intelligence	Broad field of systems performing tasks associated with human intelligence, such as prediction, classification, language generation, or decision support	Treating all AI as machine learning; rule-based systems can also be AI
Machine learning	Models learn patterns from data rather than relying only on explicit rules	Assuming ML removes the need for human judgment or controls
Supervised learning	Trains on labeled examples where the target outcome is known	Using it where no reliable target label exists
Unsupervised learning	Finds structure without labeled outcomes, such as clusters or anomalies	Treating clusters as causal or inherently meaningful
Reinforcement learning	Learns actions through rewards and penalties over time	Often risky in finance if live experimentation affects customers, markets, or capital
Generative AI	Produces new text, code, images, data, or other content based on learned patterns	Confusing fluent output with verified truth
Large language model	GenAI model trained to predict and generate language-like sequences	Assuming it has a database of facts or reliable reasoning
Model	A quantitative, statistical, AI, or rule-based method that transforms inputs into estimates, rankings, decisions, or recommendations	Excluding vendor tools, spreadsheets, or embedded AI from the model inventory
Model risk	Risk of adverse consequences from model errors, misuse, or inappropriate reliance	Thinking model risk only means coding mistakes
Explainability	Ability to understand why a model behaves as it does	Assuming explanation equals proof of correctness
Interpretability	How directly understandable the model structure or logic is	Treating post-hoc explanations as the same as transparent design
Bias	Systematic error or unfair disparity in data, model design, outcomes, or deployment	Looking only at training data and ignoring downstream decision processes
Drift	Change over time in data, relationships, behavior, or performance	Monitoring accuracy only, while input distributions change silently
Human-in-the-loop	Human review, approval, override, or escalation within an AI process	Assuming human review is effective without training, authority, and audit trail
Third-party AI risk	Risk from vendor models, data, platforms, APIs, or embedded AI services	Assuming outsourced AI transfers accountability to the vendor

AI Lifecycle for Financial Risk Management

    flowchart LR
	    A[Business objective] --> B[Use-case risk assessment]
	    B --> C[Data sourcing and permissions]
	    C --> D[Feature engineering and model design]
	    D --> E[Training and tuning]
	    E --> F[Independent validation]
	    F --> G[Approval and deployment]
	    G --> H[Monitoring and controls]
	    H --> I[Change management]
	    I --> D
	    H --> J[Retirement or replacement]

Lifecycle stage	Main exam focus	Key control questions
Business objective	Link AI to a legitimate business or risk-management purpose	What decision will the model support? Who is affected? What risk is created if wrong?
Use-case assessment	Classify criticality, materiality, customer impact, and regulatory sensitivity	Is the model high-impact, automated, customer-facing, or used for capital/liquidity/risk limits?
Data sourcing	Data quality, lineage, rights, representativeness, privacy, and bias	Is the data fit for purpose? Are labels reliable? Are proxies creating unfair outcomes?
Feature engineering	Transform raw data into predictive inputs	Are features stable, explainable, permissible, and available at decision time?
Training and tuning	Model selection, objective function, overfitting control	Is performance measured out-of-sample? Has tuning leaked validation information?
Validation	Independent challenge of conceptual soundness, implementation, and outcomes	Does the model work as intended under normal and stressed conditions?
Approval	Governance before production use	Are limitations documented? Are owners, thresholds, and escalation paths defined?
Deployment	Controlled implementation into systems and workflows	Does production code match validated code? Are access, logs, and fallbacks in place?
Monitoring	Ongoing performance, drift, exceptions, and use	Are thresholds actionable? Who reviews breaches?
Change management	Updates, retraining, vendor changes, new data, new use	Does the change require revalidation or reapproval?
Retirement	Remove obsolete or unsafe models	Are dependencies, records, and replacement controls managed?

Learning Types and When to Use Them

Learning type	Typical finance use cases	Strengths	Key risks
Supervised classification	Default prediction, fraud detection, AML alert triage, churn prediction	Clear target, measurable classification performance	Class imbalance, biased labels, threshold misuse
Supervised regression	Loss forecasting, exposure estimation, pricing, demand forecasting	Predicts continuous values	Outliers, unstable relationships, extrapolation error
Unsupervised clustering	Customer segmentation, peer grouping, portfolio pattern discovery	Useful when labels are absent	Clusters may be unstable or non-actionable
Anomaly detection	Fraud, cyber, trading surveillance, operational incidents	Finds unusual behavior	High false positives; unusual does not always mean suspicious
Time-series forecasting	Market variables, liquidity flows, macroeconomic indicators	Captures temporal structure	Regime shifts, autocorrelation mistakes, look-ahead bias
Natural language processing	News/sentiment analysis, document review, complaint classification	Converts text into structured signals	Context loss, language bias, hallucination with GenAI
Reinforcement learning	Execution algorithms, dynamic strategies, resource allocation	Optimizes sequential actions	Unsafe exploration, hard-to-explain behavior, feedback loops
Generative AI	Summaries, drafting, code assistance, research support, synthetic data	Scalable content generation and language interface	Hallucination, confidentiality, prompt injection, copyright/IP, overreliance

Common Model Families

Model family	Use when	Advantages	Limitations and exam traps
Linear regression	Continuous target; relationship is approximately linear	Simple, interpretable, fast	Sensitive to outliers and multicollinearity; poor for nonlinear effects
Logistic regression	Binary classification such as default/no default	Interpretable coefficients; useful baseline	Linear decision boundary unless features are transformed
Decision tree	Nonlinear rules and interactions are important	Intuitive splits; handles mixed data	Overfits easily if unconstrained
Random forest	Need robust nonlinear prediction	Reduces variance versus single tree	Less interpretable; may mask bias
Gradient boosting	High predictive performance on tabular data	Often strong for credit/fraud/risk scoring	Sensitive to tuning; overfitting and explainability concerns
Neural network	Complex patterns, unstructured data, images, text, high-dimensional signals	Flexible function approximation	Data-hungry, harder to validate and explain
Support vector machine	Classification with complex boundaries	Effective in some high-dimensional settings	Scaling and interpretability issues
k-means clustering	Partition observations into similar groups	Simple and fast	Requires preselected number of clusters; sensitive to scaling/outliers
Principal component analysis	Dimensionality reduction, factor extraction	Reduces correlated features	Components may be hard to interpret
Bayesian models	Need probabilistic updating or prior information	Explicit uncertainty treatment	Prior choice and computation may be challenging
Large language model	Language generation, summarization, extraction, Q&A support	Natural interface and broad text capability	Output may be plausible but wrong; needs grounding and guardrails

Financial Risk Use-Case Matrix

Use case	AI contribution	Primary risks	Controls to emphasize
Credit underwriting	PD estimation, scorecards, alternative data analysis	Discrimination, proxy variables, adverse selection, explainability gaps	Fairness testing, reason codes, feature governance, threshold review
Credit portfolio monitoring	Early warning indicators, migration prediction	Drift, macro regime change, false reassurance	Backtesting, stress testing, scenario overlays
Market risk	Volatility forecasting, pricing proxies, anomaly detection	Nonstationarity, tail underestimation, model opacity	Stress testing, benchmark models, sensitivity analysis
Liquidity risk	Cash-flow forecasting, deposit behavior prediction	Behavioral shifts, concentration risk, feedback effects	Scenario analysis, conservative overlays, monitoring
Operational risk	Loss event classification, incident detection	Incomplete labels, low-frequency high-severity events	Expert review, scenario analysis, qualitative controls
Fraud detection	Transaction scoring, anomaly detection, network analysis	Class imbalance, adversarial behavior, customer friction	Threshold tuning, feedback loops, alert quality metrics
AML/KYC	Alert prioritization, entity resolution, transaction monitoring	False positives, explainability, regulatory sensitivity	Human review, audit trail, typology testing
Trading and execution	Signal generation, execution optimization	Overfitting, market impact, feedback loops	Out-of-sample testing, kill switches, limit controls
Customer service	Chatbots, complaint routing, document summaries	Hallucination, unfair treatment, privacy leakage	Retrieval grounding, escalation, conversation logging
Risk reporting	Narrative generation, data extraction, dashboards	Misstatement, stale data, weak lineage	Source linking, reconciliation, approval workflow

Model Risk Management Reference

Three-Lines View

Function	Typical role in AI/model risk	What to remember for exam scenarios
First line	Owns business use, model development, operation, controls, and day-to-day performance	Cannot outsource accountability to validators or vendors
Second line	Sets policy, challenges risk assessment, performs or oversees independent validation, monitors risk appetite	Independence and effective challenge matter
Third line	Provides internal audit assurance over governance and controls	Reviews whether the framework works, not just one model’s performance

Model Governance Artifacts

Artifact	Purpose	Common deficiency
Model inventory	Complete list of models, AI tools, owners, status, and materiality	Missing vendor models, spreadsheets, GenAI tools, or embedded analytics
Model tiering	Prioritizes governance by risk, materiality, complexity, and impact	Classifying by complexity only and ignoring business impact
Development documentation	Explains objective, data, assumptions, methods, limitations, and testing	Documentation written after the fact and not tied to design decisions
Validation report	Independent assessment of conceptual soundness, implementation, outcomes, and limitations	Merely reproducing developer metrics without challenge
Approval record	Evidence that authorized governance body accepted use and limitations	Approval without conditions, owners, or monitoring thresholds
Monitoring plan	Defines metrics, thresholds, frequency, escalation, and remediation	Metrics tracked but no action when breached
Change log	Records retraining, feature changes, code changes, data changes, and vendor updates	Treating “minor” data or API changes as non-model changes
Issue log	Tracks limitations, findings, remediation owners, and due dates	Findings closed without evidence
User procedures	Explains how outputs should and should not be used	Users treat scores as final decisions despite intended advisory use

Independent Validation: What to Test

Validation area	Key question	Example techniques
Conceptual soundness	Is the model design appropriate for the objective?	Method review, assumptions review, benchmark comparison
Data quality	Is the data accurate, complete, representative, and permitted?	Lineage review, missing-value analysis, outlier review, label audit
Feature appropriateness	Are inputs available, stable, explainable, and acceptable?	Leakage checks, proxy analysis, correlation review
Implementation	Does production match approved design?	Code review, replication, unit testing, environment checks
Performance	Does the model predict well on unseen data?	Holdout testing, cross-validation, backtesting
Stability	Does performance hold across time, segments, and conditions?	Temporal validation, population stability, stress periods
Fairness	Are outcomes unjustifiably different across groups?	Disparity metrics, proxy review, segment-level error analysis
Explainability	Can stakeholders understand drivers and limitations?	SHAP/LIME, reason codes, sensitivity analysis
Robustness	Can the model handle noise, adversarial inputs, and edge cases?	Perturbation testing, stress tests, scenario tests
Use and controls	Is the model used as intended with oversight?	Workflow review, override analysis, access control review
Ongoing monitoring	Are deterioration and misuse detected?	Thresholds, alerts, drift metrics, periodic review

Validation and Testing Traps

Trap	Why it matters	Better exam answer
Data leakage	Model uses information unavailable at decision time	Rebuild features using only information known at prediction time
Look-ahead bias	Future data contaminates training or testing	Use time-consistent splits for time-dependent data
Target leakage	Predictor directly encodes the label or outcome	Remove or redefine leaky features
Overfitting	Model learns noise rather than general patterns	Use holdout data, cross-validation, regularization, pruning
Underfitting	Model too simple to capture meaningful structure	Add relevant features or more appropriate model class
Class imbalance	High accuracy can hide poor detection of rare events	Use precision, recall, F1, ROC/PR curves, cost-sensitive thresholds
Survivorship bias	Failed or exited entities are omitted	Include inactive, defaulted, closed, or failed observations where relevant
Sample-selection bias	Training population differs from deployment population	Test representativeness and segment performance
Proxy discrimination	Innocent-looking variables replicate protected traits	Conduct proxy analysis and fairness testing
Concept drift	Relationship between inputs and target changes	Monitor performance and retrain or recalibrate
Data drift	Input distribution changes	Monitor feature distributions and population stability
Automation bias	Users overtrust model output	Require training, explanations, overrides, and escalation
Feedback loop	Model decisions affect future data labels	Separate observed outcomes from model-influenced outcomes where possible

Core Metrics and Formulas

Confusion Matrix Terms

Term	Meaning
True positive	Model predicts positive and actual outcome is positive
False positive	Model predicts positive but actual outcome is negative
True negative	Model predicts negative and actual outcome is negative
False negative	Model predicts negative but actual outcome is positive

For a fraud model, “positive” often means flagged as fraud. For a credit default model, “positive” may mean predicted default. Always identify the positive class before interpreting precision, recall, or false-positive rates.

Classification Metrics

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall or Sensitivity} = \frac{TP}{TP + FN} \]\[ \text{Specificity} = \frac{TN}{TN + FP} \]\[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]

Metric	Best used for	Trap
Accuracy	Balanced classes and similar error costs	Misleading when positives are rare
Precision	Costly false positives, such as unnecessary investigations or customer friction	High precision may miss many true positives
Recall / sensitivity	Costly false negatives, such as missed fraud or missed defaults	High recall may create excessive false positives
Specificity	Ability to correctly reject negatives	Can be high even when positive detection is weak
F1 score	Balance between precision and recall	Does not include true negatives
ROC AUC	Ranking quality across thresholds	Can look strong under heavy class imbalance
Precision-recall curve	Rare-event detection	More informative than ROC in many imbalanced settings
Calibration	Whether predicted probabilities match realized frequencies	A high-ranking model may still be poorly calibrated

Regression and Forecasting Metrics

\[ MAE = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i| \]\[ MSE = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]\[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]

Metric	Interpretation	Trap
MAE	Average absolute error; easy to interpret in original units	Treats all errors linearly
MSE	Average squared error	Harder to interpret; heavily penalizes large errors
RMSE	Error measure in original units, sensitive to large misses	Can be dominated by outliers
MAPE	Percentage error measure	Problematic when actual values are near zero
R-squared	Share of variance explained in sample	High value does not prove causality or out-of-sample performance

Credit and Risk Formulas

\[ \text{Expected Loss} = PD \times LGD \times EAD \]\[ \text{Unexpected Loss} = \text{Potential loss above expected loss, often linked to volatility or tail outcomes} \]\[ VaR_{\alpha} = \text{Loss threshold not exceeded with confidence level } \alpha \text{ over a specified horizon} \]\[ ES_{\alpha} = \text{Average loss conditional on losses exceeding } VaR_{\alpha} \]

Formula concept	Exam focus	Common trap
PD	Probability of default	Confusing default probability with loss amount
LGD	Loss severity if default occurs	Ignoring collateral, seniority, and recovery uncertainty
EAD	Exposure at default	Treating current balance as always equal to future exposure
Expected loss	Average credit loss estimate	Not the same as economic capital or unexpected loss
VaR	Quantile-based loss measure	Says little about severity beyond the threshold
Expected shortfall	Tail-loss average beyond VaR	More informative about tail severity, but model-dependent
Stress testing	Scenario-based adverse impact assessment	Not a probability forecast unless explicitly modeled as one

Thresholds, Cutoffs, and Business Decisions

Decision point	What changes when threshold moves up?	What changes when threshold moves down?
Fraud alert threshold	Fewer alerts, higher precision, more missed fraud risk	More alerts, higher recall, more operational burden
Credit approval cutoff	Fewer approvals, lower default risk, potential lost revenue	More approvals, higher default risk, possible fairness impact
AML alert priority	Fewer escalations, risk of missed suspicious activity	More escalations, analyst overload
Model override threshold	Fewer manual reviews, faster processing	More review effort, potentially better control of edge cases
GenAI confidence/escalation rule	More automated responses, higher hallucination exposure	More human review, slower response

Exam-style threshold questions usually require balancing error costs, risk appetite, customer impact, regulatory sensitivity, and operational capacity.

Explainability and Interpretability

Concept or method	What it does	Best use	Limitation
Intrinsic interpretability	Model is understandable by design, such as linear models or shallow trees	High-stakes decisions requiring clear rationale	May sacrifice predictive performance
Post-hoc explanation	Explains a trained model after the fact	Complex models where transparency is limited	May approximate rather than reveal true logic
Global explanation	Describes model behavior overall	Policy, governance, validation	May hide segment-level behavior
Local explanation	Explains one prediction	Customer-level decision review, investigation	Can be unstable near decision boundaries
Feature importance	Ranks influential inputs	Model review and governance	Does not show direction, causality, or fairness
Partial dependence plot	Shows average effect of a feature	Understanding nonlinear relationships	Can mislead when features are correlated
ICE plot	Shows feature effect for individual observations	Detecting heterogeneity	Can be noisy and hard to summarize
SHAP	Allocates prediction contribution across features	Local and global explanations	Computational complexity; assumptions matter
LIME	Local surrogate explanation around one prediction	Quick local explanation	Sensitive to sampling and neighborhood definition
Counterfactual explanation	Shows minimal change needed to alter outcome	Actionable adverse-action-style reasoning	Must be realistic and permissible

Explanation Quality Checklist

A useful explanation should be:

Faithful: reflects actual model behavior, not a convenient story.
Stable: similar inputs produce similar explanations unless a real boundary is crossed.
Actionable: helps users understand what can be changed or reviewed.
Audience-appropriate: different detail for developers, validators, executives, customers, and auditors.
Documented: stored with assumptions, limitations, and intended use.
Tested: checked for consistency across segments and edge cases.

Fairness, Bias, and Responsible AI

Bias source	Example in finance	Control
Historical bias	Past lending decisions reflect unequal access to credit	Review labels, outcomes, and policy history
Representation bias	Training data underrepresents certain groups or regions	Sampling review and segment testing
Measurement bias	Income, employment, or address data have unequal quality	Data quality checks by segment
Label bias	“Default” or “fraud” labels reflect prior detection practices	Label audit and alternative outcome definitions
Proxy bias	ZIP code, device type, or merchant behavior correlates with protected traits	Proxy analysis and feature governance
Aggregation bias	One model performs poorly for a subgroup	Segment-level validation and monitoring
Deployment bias	Users apply model outside intended scope	Training, access control, use restrictions
Feedback bias	Model-driven decisions affect future observed outcomes	Monitor feedback loops and use independent samples where possible

Fairness Metrics: What They Mean

Metric	Plain-language question	Trap
Demographic parity	Are positive decision rates similar across groups?	May ignore legitimate risk differences
Equal opportunity	Are true positive rates similar across groups?	Focuses on positives, not false positives
Equalized odds	Are both true positive and false positive rates similar?	Often conflicts with calibration
Predictive parity	Is precision similar across groups?	May conflict with equalized odds when base rates differ
Calibration by group	Do predicted probabilities match outcomes within each group?	Calibrated models can still produce different approval rates
Disparate impact analysis	Do outcomes disproportionately affect a group?	Legal meaning depends on jurisdiction and context

Responsible AI Principles in Exam Scenarios

Principle	Practical implication
Accountability	Named owners remain responsible for AI outcomes
Transparency	Stakeholders understand purpose, limitations, and decision role
Fairness	Outcomes and errors are assessed across relevant groups
Robustness	Model withstands noise, drift, stress, and adversarial behavior
Privacy	Data use is limited, protected, and appropriate
Security	Model, data, prompts, APIs, and outputs are protected
Human oversight	People can intervene meaningfully where risk warrants
Auditability	Evidence exists for design, validation, approval, and use

Data Risk Reference

Data issue	Why it matters	Detection or mitigation
Missing data	Can bias estimates if not random	Missingness analysis, imputation policy, segment checks
Outliers	Can distort training and metrics	Winsorization, robust methods, investigation
Duplicates	Can overweight observations	Deduplication and entity resolution
Incorrect labels	Directly corrupt supervised learning	Label audit, reconciliation, expert review
Non-representative sample	Model may fail in production	Population comparison and monitoring
Stale data	Relationships may no longer hold	Recency controls and drift monitoring
Poor lineage	Weak auditability and reproducibility	Data catalog, lineage documentation
Unauthorized data	Legal, ethical, and contractual risk	Data permissions review and access controls
Sensitive data	Privacy, fairness, and conduct risk	Minimization, masking, encryption, governance
Alternative data	Potential predictive lift with higher uncertainty	Source diligence, explainability, stability testing

Generative AI and LLM Risk Reference

GenAI concept	Meaning	Exam-relevant control
Prompt	User or system instruction given to a model	Prompt standards, testing, access controls
System prompt	Higher-priority instruction defining behavior and constraints	Protect from disclosure or override
Hallucination	Plausible but false or unsupported output	Grounding, source citation, human review
Retrieval-augmented generation	Model uses retrieved documents or data to support output	Curated knowledge base, source validation
Fine-tuning	Additional training on specific data or tasks	Data governance, evaluation, version control
Embedding	Numeric representation of text or objects for similarity search	Privacy review and vector database controls
Vector database	Stores embeddings for retrieval	Access control, deletion process, data lineage
Prompt injection	Malicious instruction attempts to override controls	Input filtering, isolation, output validation
Data exfiltration	Sensitive data leaked through prompts or outputs	DLP controls, redaction, logging
Model inversion	Attempt to infer training data from model behavior	Privacy-preserving controls and access limits
Jailbreak	User circumvents model safety restrictions	Adversarial testing and guardrails
Guardrail	Technical or process control around model behavior	Policy filters, allowlists, escalation rules
Temperature	Parameter affecting randomness of generated output	Lower for deterministic tasks; higher increases variability

GenAI Use-Case Control Matrix

Use case	Lower-risk pattern	Higher-risk pattern
Internal summarization	Human-reviewed summaries from approved documents	Unverified summary used for official reporting
Customer chatbot	Narrow scope, retrieval grounding, escalation to human	Open-ended advice with no audit trail
Code assistant	Developer review, testing, secure repository controls	Direct production deployment of generated code
Research support	Source-linked drafts reviewed by analyst	Trading or credit decision based on unsourced output
Synthetic data	Tested for utility and privacy leakage	Used as if it were real observed data
Policy interpretation	References approved policy library	Creates new policy language without approval

AI Security, Privacy, and Operational Resilience

Risk	Example	Control
Adversarial input	Slightly altered transaction evades fraud model	Robustness testing, adversarial training, monitoring
Model theft	Unauthorized extraction of model behavior or parameters	Rate limits, access controls, API monitoring
Data poisoning	Malicious or corrupted training data changes behavior	Data validation, source controls, anomaly detection
Prompt injection	External text instructs LLM to ignore controls	Content sanitization, tool isolation, output checks
Sensitive data leakage	Confidential client data appears in prompt or output	Redaction, DLP, encryption, retention controls
Vendor outage	AI service becomes unavailable	Fallback process, resilience planning
Concentration risk	Many processes depend on one AI provider or model	Vendor diversification or contingency planning
Unauthorized model change	Vendor or developer changes model without review	Versioning, change notification, revalidation triggers
Weak logging	Cannot reconstruct AI-assisted decision	Audit logs, prompt/output retention policy
Over-permissioned tools	AI agent accesses systems beyond need	Least privilege and tool-use constraints

Vendor and Third-Party AI Due Diligence

Diligence area	Questions to ask	Evidence to seek
Model purpose	What is the tool intended and not intended to do?	Product documentation, use-case restrictions
Data	What data trained or powers the model? Can client data be used for training?	Data handling terms, privacy controls
Performance	How was performance measured? On what population?	Validation reports, benchmarks, test methodology
Explainability	Can outputs be explained at the level users need?	Feature drivers, reason codes, interpretability tools
Bias and fairness	Has the vendor tested relevant disparities?	Fairness testing documentation
Security	How are prompts, data, APIs, and outputs protected?	Security controls, certifications, incident process
Change management	How are model updates communicated and controlled?	Version notes, release governance
Auditability	Can the institution log and reconstruct decisions?	Logging capabilities, export options
Resilience	What happens during outage or degraded service?	SLAs, contingency procedures
Subcontractors	Are other providers involved?	Third-party dependency list
Exit	Can the institution transition away?	Data export, deletion, termination procedures

Regulatory and Standards Themes to Recognize

Do not memorize this section as jurisdiction-specific legal advice. For exam purposes, focus on recurring supervisory and governance themes: accountability, transparency, fairness, privacy, security, resilience, documentation, validation, and human oversight.

Source or framework type	High-yield theme
Model risk management guidance	Models require inventory, governance, validation, documentation, monitoring, and effective challenge
Banking and financial supervision	AI use should align with safety and soundness, consumer protection, operational resilience, and risk governance
Data protection frameworks	Personal data use should be lawful, limited, protected, and transparent where applicable
AI risk management frameworks	Identify, measure, manage, monitor, and govern AI risks across the lifecycle
Conduct and consumer protection expectations	Avoid unfair, deceptive, discriminatory, or unsuitable outcomes
Cybersecurity standards	Protect AI systems, data, APIs, identities, logs, and third-party connections
Operational resilience expectations	Critical AI-supported services need contingency, incident response, and recovery planning
Emerging AI laws and policies	Higher-impact AI use cases typically receive greater governance scrutiny

High-Yield Distinctions

Distinction	Know the difference
Accuracy vs calibration	Accuracy measures correct classifications; calibration measures whether probabilities match realized frequencies
Correlation vs causation	Predictive association does not prove one variable causes another
Explainability vs fairness	A model can be explainable but unfair, or fairer by some metric but hard to explain
Model validation vs model monitoring	Validation is pre-use or periodic independent challenge; monitoring is ongoing production surveillance
Development testing vs independent validation	Developers optimize and test; independent validators challenge assumptions and use
Data drift vs concept drift	Data drift means inputs change; concept drift means relationships between inputs and outcomes change
Bias vs variance	Bias is systematic error from oversimplification; variance is instability from sensitivity to training data
White-box vs black-box	White-box models are easier to inspect; black-box models may need post-hoc explanation and stronger controls
Human-in-the-loop vs human-on-the-loop	In-the-loop requires human action before decision; on-the-loop means human oversight of automated activity
Automation vs augmentation	Automation replaces a task; augmentation supports a human decision
Predictive model vs decision policy	A model estimates risk; a policy converts estimates into actions
Model output vs business outcome	Good statistical performance may still create poor customer, financial, or operational outcomes
Validation exception vs limitation	Exception is a finding requiring remediation; limitation is a known boundary that must be accepted and controlled
Vendor validation vs user validation	Vendor evidence helps, but the financial institution still needs fit-for-purpose assessment
GenAI fluency vs reliability	Well-written output is not evidence of truth, completeness, or suitability

Scenario Decision Tables

If the Scenario Says…

Scenario clue	Likely concept tested	Best response direction
Model performs well in training but poorly in production	Overfitting, leakage, drift, or deployment mismatch	Check out-of-sample testing, leakage, monitoring, implementation
Rare event model has 99% accuracy	Class imbalance	Examine recall, precision, confusion matrix, PR curve
Model uses postal code and produces group disparities	Proxy bias	Conduct fairness/proxy analysis and review feature permissibility
Business adopts vendor AI chatbot quickly	Third-party and GenAI risk	Require due diligence, use-case approval, guardrails, logging
LLM gives confident but false answer	Hallucination	Use retrieval grounding, source checks, human review
Production model changes after vendor update	Change management	Trigger impact assessment and possible revalidation
Model decision cannot be explained to affected user	Explainability and governance	Add reason codes, interpretable model, or human review
Performance degrades after economic shift	Concept drift or regime change	Reassess assumptions, recalibrate, retrain, or apply overlays
Analysts ignore model warnings	Use risk and human factors	Review workflow, training, incentives, escalation
Analysts blindly accept outputs	Automation bias	Strengthen human oversight and challenge
Many false fraud alerts overwhelm staff	Threshold and operational capacity	Tune cutoff, prioritize alerts, measure precision
New data source improves accuracy but includes sensitive attributes	Data ethics and compliance risk	Review permissions, fairness, privacy, and necessity
Model works overall but fails for a segment	Aggregation bias	Perform segment validation and targeted remediation
AI system controls a critical process with no fallback	Operational resilience	Add contingency, manual fallback, incident plan
Developers cannot reproduce results	Weak documentation or environment control	Require versioning, lineage, code/data reproducibility

Control Selection Matrix

Problem	Primary control	Supporting controls
Overfitting	Out-of-sample validation	Regularization, pruning, simpler benchmark
Data leakage	Feature review at decision-time boundary	Time-based splits, independent replication
Biased outcomes	Fairness testing	Proxy review, policy review, segment monitoring
Black-box model	Explainability tools	Simpler challenger, documentation, human review
GenAI hallucination	Grounding and verification	Source citation, confidence rules, escalation
Unauthorized data use	Data governance	Access controls, lineage, privacy review
Vendor opacity	Third-party due diligence	Contractual reporting, independent testing
Drift	Ongoing monitoring	Retraining triggers, recalibration, alerts
Poor user adoption	Training and workflow design	Feedback collection, override tracking
Excessive reliance	Human oversight	Decision limits, second review, audit trail
Cyber attack on model	Security controls	Adversarial testing, monitoring, incident response
Production mismatch	Implementation validation	Code review, deployment controls, reconciliations

Compact Exam Checklist

Before answering an RAI scenario question, identify:

Use case: credit, market, operational, compliance, customer, trading, reporting, or GenAI support.
Decision role: advisory, automated, customer-facing, internal, capital-related, or critical process.
Data risk: quality, lineage, permission, representativeness, leakage, proxies, privacy.
Model risk: complexity, overfitting, drift, calibration, explainability, robustness.
Human impact: customer harm, fairness, conduct, access to services, appeal or override.
Governance need: inventory, tiering, validation, approval, monitoring, change control.
Metric fit: classification, regression, ranking, calibration, fairness, tail risk.
Control fit: technical control, process control, governance control, or human oversight.
Residual risk: what remains after controls and whether it is acceptable.
Evidence: documentation, logs, test results, approvals, issue tracking.

Final Review Prompts

Use these prompts to test readiness:

Can you explain why high accuracy may be weak evidence for a fraud model?
Can you choose between precision, recall, F1, ROC AUC, and calibration for a scenario?
Can you identify leakage, drift, overfitting, proxy bias, and automation bias from short fact patterns?
Can you distinguish model validation from monitoring and governance approval?
Can you select appropriate GenAI controls for hallucination, prompt injection, and data leakage?
Can you describe why vendor AI still requires internal accountability and fit-for-purpose review?
Can you connect AI controls to financial risk outcomes, customer impact, and operational resilience?

Practical Next Step

Turn each table into short practice drills: read a scenario, name the AI risk, choose the right metric or control, and explain why two tempting alternatives are weaker. Then complete a fresh set of RAI-style practice questions under timed conditions to build speed and applied judgment.

Scenario Guide

History and Overview of AI Concepts