RAI — GARP Risk and AI Certificate Quick Reference

Compact exam-prep reference for the GARP Risk and AI Certificate (RAI): AI risk concepts, model governance, validation, fairness, explainability, and GenAI controls.

Exam Identity and How to Use This Page

This independent Quick Reference supports candidates preparing for the GARP Risk and AI Certificate (RAI), exam code RAI, offered by GARP. It focuses on high-yield distinctions, applied risk-management decisions, and compact review tables rather than broad textbook coverage.

Use it to quickly answer exam-style questions such as:

  • What AI or machine learning method fits a financial risk use case?
  • What validation test or metric should be used?
  • What model risk control addresses a specific failure mode?
  • How do explainability, fairness, governance, and monitoring differ?
  • What additional risks arise from generative AI and large language models?

Core AI, ML, and Risk Vocabulary

TermExam-ready meaningCommon trap
Artificial intelligenceBroad field of systems performing tasks associated with human intelligence, such as prediction, classification, language generation, or decision supportTreating all AI as machine learning; rule-based systems can also be AI
Machine learningModels learn patterns from data rather than relying only on explicit rulesAssuming ML removes the need for human judgment or controls
Supervised learningTrains on labeled examples where the target outcome is knownUsing it where no reliable target label exists
Unsupervised learningFinds structure without labeled outcomes, such as clusters or anomaliesTreating clusters as causal or inherently meaningful
Reinforcement learningLearns actions through rewards and penalties over timeOften risky in finance if live experimentation affects customers, markets, or capital
Generative AIProduces new text, code, images, data, or other content based on learned patternsConfusing fluent output with verified truth
Large language modelGenAI model trained to predict and generate language-like sequencesAssuming it has a database of facts or reliable reasoning
ModelA quantitative, statistical, AI, or rule-based method that transforms inputs into estimates, rankings, decisions, or recommendationsExcluding vendor tools, spreadsheets, or embedded AI from the model inventory
Model riskRisk of adverse consequences from model errors, misuse, or inappropriate relianceThinking model risk only means coding mistakes
ExplainabilityAbility to understand why a model behaves as it doesAssuming explanation equals proof of correctness
InterpretabilityHow directly understandable the model structure or logic isTreating post-hoc explanations as the same as transparent design
BiasSystematic error or unfair disparity in data, model design, outcomes, or deploymentLooking only at training data and ignoring downstream decision processes
DriftChange over time in data, relationships, behavior, or performanceMonitoring accuracy only, while input distributions change silently
Human-in-the-loopHuman review, approval, override, or escalation within an AI processAssuming human review is effective without training, authority, and audit trail
Third-party AI riskRisk from vendor models, data, platforms, APIs, or embedded AI servicesAssuming outsourced AI transfers accountability to the vendor

AI Lifecycle for Financial Risk Management

    flowchart LR
	    A[Business objective] --> B[Use-case risk assessment]
	    B --> C[Data sourcing and permissions]
	    C --> D[Feature engineering and model design]
	    D --> E[Training and tuning]
	    E --> F[Independent validation]
	    F --> G[Approval and deployment]
	    G --> H[Monitoring and controls]
	    H --> I[Change management]
	    I --> D
	    H --> J[Retirement or replacement]
Lifecycle stageMain exam focusKey control questions
Business objectiveLink AI to a legitimate business or risk-management purposeWhat decision will the model support? Who is affected? What risk is created if wrong?
Use-case assessmentClassify criticality, materiality, customer impact, and regulatory sensitivityIs the model high-impact, automated, customer-facing, or used for capital/liquidity/risk limits?
Data sourcingData quality, lineage, rights, representativeness, privacy, and biasIs the data fit for purpose? Are labels reliable? Are proxies creating unfair outcomes?
Feature engineeringTransform raw data into predictive inputsAre features stable, explainable, permissible, and available at decision time?
Training and tuningModel selection, objective function, overfitting controlIs performance measured out-of-sample? Has tuning leaked validation information?
ValidationIndependent challenge of conceptual soundness, implementation, and outcomesDoes the model work as intended under normal and stressed conditions?
ApprovalGovernance before production useAre limitations documented? Are owners, thresholds, and escalation paths defined?
DeploymentControlled implementation into systems and workflowsDoes production code match validated code? Are access, logs, and fallbacks in place?
MonitoringOngoing performance, drift, exceptions, and useAre thresholds actionable? Who reviews breaches?
Change managementUpdates, retraining, vendor changes, new data, new useDoes the change require revalidation or reapproval?
RetirementRemove obsolete or unsafe modelsAre dependencies, records, and replacement controls managed?

Learning Types and When to Use Them

Learning typeTypical finance use casesStrengthsKey risks
Supervised classificationDefault prediction, fraud detection, AML alert triage, churn predictionClear target, measurable classification performanceClass imbalance, biased labels, threshold misuse
Supervised regressionLoss forecasting, exposure estimation, pricing, demand forecastingPredicts continuous valuesOutliers, unstable relationships, extrapolation error
Unsupervised clusteringCustomer segmentation, peer grouping, portfolio pattern discoveryUseful when labels are absentClusters may be unstable or non-actionable
Anomaly detectionFraud, cyber, trading surveillance, operational incidentsFinds unusual behaviorHigh false positives; unusual does not always mean suspicious
Time-series forecastingMarket variables, liquidity flows, macroeconomic indicatorsCaptures temporal structureRegime shifts, autocorrelation mistakes, look-ahead bias
Natural language processingNews/sentiment analysis, document review, complaint classificationConverts text into structured signalsContext loss, language bias, hallucination with GenAI
Reinforcement learningExecution algorithms, dynamic strategies, resource allocationOptimizes sequential actionsUnsafe exploration, hard-to-explain behavior, feedback loops
Generative AISummaries, drafting, code assistance, research support, synthetic dataScalable content generation and language interfaceHallucination, confidentiality, prompt injection, copyright/IP, overreliance

Common Model Families

Model familyUse whenAdvantagesLimitations and exam traps
Linear regressionContinuous target; relationship is approximately linearSimple, interpretable, fastSensitive to outliers and multicollinearity; poor for nonlinear effects
Logistic regressionBinary classification such as default/no defaultInterpretable coefficients; useful baselineLinear decision boundary unless features are transformed
Decision treeNonlinear rules and interactions are importantIntuitive splits; handles mixed dataOverfits easily if unconstrained
Random forestNeed robust nonlinear predictionReduces variance versus single treeLess interpretable; may mask bias
Gradient boostingHigh predictive performance on tabular dataOften strong for credit/fraud/risk scoringSensitive to tuning; overfitting and explainability concerns
Neural networkComplex patterns, unstructured data, images, text, high-dimensional signalsFlexible function approximationData-hungry, harder to validate and explain
Support vector machineClassification with complex boundariesEffective in some high-dimensional settingsScaling and interpretability issues
k-means clusteringPartition observations into similar groupsSimple and fastRequires preselected number of clusters; sensitive to scaling/outliers
Principal component analysisDimensionality reduction, factor extractionReduces correlated featuresComponents may be hard to interpret
Bayesian modelsNeed probabilistic updating or prior informationExplicit uncertainty treatmentPrior choice and computation may be challenging
Large language modelLanguage generation, summarization, extraction, Q&A supportNatural interface and broad text capabilityOutput may be plausible but wrong; needs grounding and guardrails

Financial Risk Use-Case Matrix

Use caseAI contributionPrimary risksControls to emphasize
Credit underwritingPD estimation, scorecards, alternative data analysisDiscrimination, proxy variables, adverse selection, explainability gapsFairness testing, reason codes, feature governance, threshold review
Credit portfolio monitoringEarly warning indicators, migration predictionDrift, macro regime change, false reassuranceBacktesting, stress testing, scenario overlays
Market riskVolatility forecasting, pricing proxies, anomaly detectionNonstationarity, tail underestimation, model opacityStress testing, benchmark models, sensitivity analysis
Liquidity riskCash-flow forecasting, deposit behavior predictionBehavioral shifts, concentration risk, feedback effectsScenario analysis, conservative overlays, monitoring
Operational riskLoss event classification, incident detectionIncomplete labels, low-frequency high-severity eventsExpert review, scenario analysis, qualitative controls
Fraud detectionTransaction scoring, anomaly detection, network analysisClass imbalance, adversarial behavior, customer frictionThreshold tuning, feedback loops, alert quality metrics
AML/KYCAlert prioritization, entity resolution, transaction monitoringFalse positives, explainability, regulatory sensitivityHuman review, audit trail, typology testing
Trading and executionSignal generation, execution optimizationOverfitting, market impact, feedback loopsOut-of-sample testing, kill switches, limit controls
Customer serviceChatbots, complaint routing, document summariesHallucination, unfair treatment, privacy leakageRetrieval grounding, escalation, conversation logging
Risk reportingNarrative generation, data extraction, dashboardsMisstatement, stale data, weak lineageSource linking, reconciliation, approval workflow

Model Risk Management Reference

Three-Lines View

FunctionTypical role in AI/model riskWhat to remember for exam scenarios
First lineOwns business use, model development, operation, controls, and day-to-day performanceCannot outsource accountability to validators or vendors
Second lineSets policy, challenges risk assessment, performs or oversees independent validation, monitors risk appetiteIndependence and effective challenge matter
Third lineProvides internal audit assurance over governance and controlsReviews whether the framework works, not just one model’s performance

Model Governance Artifacts

ArtifactPurposeCommon deficiency
Model inventoryComplete list of models, AI tools, owners, status, and materialityMissing vendor models, spreadsheets, GenAI tools, or embedded analytics
Model tieringPrioritizes governance by risk, materiality, complexity, and impactClassifying by complexity only and ignoring business impact
Development documentationExplains objective, data, assumptions, methods, limitations, and testingDocumentation written after the fact and not tied to design decisions
Validation reportIndependent assessment of conceptual soundness, implementation, outcomes, and limitationsMerely reproducing developer metrics without challenge
Approval recordEvidence that authorized governance body accepted use and limitationsApproval without conditions, owners, or monitoring thresholds
Monitoring planDefines metrics, thresholds, frequency, escalation, and remediationMetrics tracked but no action when breached
Change logRecords retraining, feature changes, code changes, data changes, and vendor updatesTreating “minor” data or API changes as non-model changes
Issue logTracks limitations, findings, remediation owners, and due datesFindings closed without evidence
User proceduresExplains how outputs should and should not be usedUsers treat scores as final decisions despite intended advisory use

Independent Validation: What to Test

Validation areaKey questionExample techniques
Conceptual soundnessIs the model design appropriate for the objective?Method review, assumptions review, benchmark comparison
Data qualityIs the data accurate, complete, representative, and permitted?Lineage review, missing-value analysis, outlier review, label audit
Feature appropriatenessAre inputs available, stable, explainable, and acceptable?Leakage checks, proxy analysis, correlation review
ImplementationDoes production match approved design?Code review, replication, unit testing, environment checks
PerformanceDoes the model predict well on unseen data?Holdout testing, cross-validation, backtesting
StabilityDoes performance hold across time, segments, and conditions?Temporal validation, population stability, stress periods
FairnessAre outcomes unjustifiably different across groups?Disparity metrics, proxy review, segment-level error analysis
ExplainabilityCan stakeholders understand drivers and limitations?SHAP/LIME, reason codes, sensitivity analysis
RobustnessCan the model handle noise, adversarial inputs, and edge cases?Perturbation testing, stress tests, scenario tests
Use and controlsIs the model used as intended with oversight?Workflow review, override analysis, access control review
Ongoing monitoringAre deterioration and misuse detected?Thresholds, alerts, drift metrics, periodic review

Validation and Testing Traps

TrapWhy it mattersBetter exam answer
Data leakageModel uses information unavailable at decision timeRebuild features using only information known at prediction time
Look-ahead biasFuture data contaminates training or testingUse time-consistent splits for time-dependent data
Target leakagePredictor directly encodes the label or outcomeRemove or redefine leaky features
OverfittingModel learns noise rather than general patternsUse holdout data, cross-validation, regularization, pruning
UnderfittingModel too simple to capture meaningful structureAdd relevant features or more appropriate model class
Class imbalanceHigh accuracy can hide poor detection of rare eventsUse precision, recall, F1, ROC/PR curves, cost-sensitive thresholds
Survivorship biasFailed or exited entities are omittedInclude inactive, defaulted, closed, or failed observations where relevant
Sample-selection biasTraining population differs from deployment populationTest representativeness and segment performance
Proxy discriminationInnocent-looking variables replicate protected traitsConduct proxy analysis and fairness testing
Concept driftRelationship between inputs and target changesMonitor performance and retrain or recalibrate
Data driftInput distribution changesMonitor feature distributions and population stability
Automation biasUsers overtrust model outputRequire training, explanations, overrides, and escalation
Feedback loopModel decisions affect future data labelsSeparate observed outcomes from model-influenced outcomes where possible

Core Metrics and Formulas

Confusion Matrix Terms

TermMeaning
True positiveModel predicts positive and actual outcome is positive
False positiveModel predicts positive but actual outcome is negative
True negativeModel predicts negative and actual outcome is negative
False negativeModel predicts negative but actual outcome is positive

For a fraud model, “positive” often means flagged as fraud. For a credit default model, “positive” may mean predicted default. Always identify the positive class before interpreting precision, recall, or false-positive rates.

Classification Metrics

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall or Sensitivity} = \frac{TP}{TP + FN} \]\[ \text{Specificity} = \frac{TN}{TN + FP} \]\[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]
MetricBest used forTrap
AccuracyBalanced classes and similar error costsMisleading when positives are rare
PrecisionCostly false positives, such as unnecessary investigations or customer frictionHigh precision may miss many true positives
Recall / sensitivityCostly false negatives, such as missed fraud or missed defaultsHigh recall may create excessive false positives
SpecificityAbility to correctly reject negativesCan be high even when positive detection is weak
F1 scoreBalance between precision and recallDoes not include true negatives
ROC AUCRanking quality across thresholdsCan look strong under heavy class imbalance
Precision-recall curveRare-event detectionMore informative than ROC in many imbalanced settings
CalibrationWhether predicted probabilities match realized frequenciesA high-ranking model may still be poorly calibrated

Regression and Forecasting Metrics

\[ MAE = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i| \]\[ MSE = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]\[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]
MetricInterpretationTrap
MAEAverage absolute error; easy to interpret in original unitsTreats all errors linearly
MSEAverage squared errorHarder to interpret; heavily penalizes large errors
RMSEError measure in original units, sensitive to large missesCan be dominated by outliers
MAPEPercentage error measureProblematic when actual values are near zero
R-squaredShare of variance explained in sampleHigh value does not prove causality or out-of-sample performance

Credit and Risk Formulas

\[ \text{Expected Loss} = PD \times LGD \times EAD \]\[ \text{Unexpected Loss} = \text{Potential loss above expected loss, often linked to volatility or tail outcomes} \]\[ VaR_{\alpha} = \text{Loss threshold not exceeded with confidence level } \alpha \text{ over a specified horizon} \]\[ ES_{\alpha} = \text{Average loss conditional on losses exceeding } VaR_{\alpha} \]
Formula conceptExam focusCommon trap
PDProbability of defaultConfusing default probability with loss amount
LGDLoss severity if default occursIgnoring collateral, seniority, and recovery uncertainty
EADExposure at defaultTreating current balance as always equal to future exposure
Expected lossAverage credit loss estimateNot the same as economic capital or unexpected loss
VaRQuantile-based loss measureSays little about severity beyond the threshold
Expected shortfallTail-loss average beyond VaRMore informative about tail severity, but model-dependent
Stress testingScenario-based adverse impact assessmentNot a probability forecast unless explicitly modeled as one

Thresholds, Cutoffs, and Business Decisions

Decision pointWhat changes when threshold moves up?What changes when threshold moves down?
Fraud alert thresholdFewer alerts, higher precision, more missed fraud riskMore alerts, higher recall, more operational burden
Credit approval cutoffFewer approvals, lower default risk, potential lost revenueMore approvals, higher default risk, possible fairness impact
AML alert priorityFewer escalations, risk of missed suspicious activityMore escalations, analyst overload
Model override thresholdFewer manual reviews, faster processingMore review effort, potentially better control of edge cases
GenAI confidence/escalation ruleMore automated responses, higher hallucination exposureMore human review, slower response

Exam-style threshold questions usually require balancing error costs, risk appetite, customer impact, regulatory sensitivity, and operational capacity.

Explainability and Interpretability

Concept or methodWhat it doesBest useLimitation
Intrinsic interpretabilityModel is understandable by design, such as linear models or shallow treesHigh-stakes decisions requiring clear rationaleMay sacrifice predictive performance
Post-hoc explanationExplains a trained model after the factComplex models where transparency is limitedMay approximate rather than reveal true logic
Global explanationDescribes model behavior overallPolicy, governance, validationMay hide segment-level behavior
Local explanationExplains one predictionCustomer-level decision review, investigationCan be unstable near decision boundaries
Feature importanceRanks influential inputsModel review and governanceDoes not show direction, causality, or fairness
Partial dependence plotShows average effect of a featureUnderstanding nonlinear relationshipsCan mislead when features are correlated
ICE plotShows feature effect for individual observationsDetecting heterogeneityCan be noisy and hard to summarize
SHAPAllocates prediction contribution across featuresLocal and global explanationsComputational complexity; assumptions matter
LIMELocal surrogate explanation around one predictionQuick local explanationSensitive to sampling and neighborhood definition
Counterfactual explanationShows minimal change needed to alter outcomeActionable adverse-action-style reasoningMust be realistic and permissible

Explanation Quality Checklist

A useful explanation should be:

  • Faithful: reflects actual model behavior, not a convenient story.
  • Stable: similar inputs produce similar explanations unless a real boundary is crossed.
  • Actionable: helps users understand what can be changed or reviewed.
  • Audience-appropriate: different detail for developers, validators, executives, customers, and auditors.
  • Documented: stored with assumptions, limitations, and intended use.
  • Tested: checked for consistency across segments and edge cases.

Fairness, Bias, and Responsible AI

Bias sourceExample in financeControl
Historical biasPast lending decisions reflect unequal access to creditReview labels, outcomes, and policy history
Representation biasTraining data underrepresents certain groups or regionsSampling review and segment testing
Measurement biasIncome, employment, or address data have unequal qualityData quality checks by segment
Label bias“Default” or “fraud” labels reflect prior detection practicesLabel audit and alternative outcome definitions
Proxy biasZIP code, device type, or merchant behavior correlates with protected traitsProxy analysis and feature governance
Aggregation biasOne model performs poorly for a subgroupSegment-level validation and monitoring
Deployment biasUsers apply model outside intended scopeTraining, access control, use restrictions
Feedback biasModel-driven decisions affect future observed outcomesMonitor feedback loops and use independent samples where possible

Fairness Metrics: What They Mean

MetricPlain-language questionTrap
Demographic parityAre positive decision rates similar across groups?May ignore legitimate risk differences
Equal opportunityAre true positive rates similar across groups?Focuses on positives, not false positives
Equalized oddsAre both true positive and false positive rates similar?Often conflicts with calibration
Predictive parityIs precision similar across groups?May conflict with equalized odds when base rates differ
Calibration by groupDo predicted probabilities match outcomes within each group?Calibrated models can still produce different approval rates
Disparate impact analysisDo outcomes disproportionately affect a group?Legal meaning depends on jurisdiction and context

Responsible AI Principles in Exam Scenarios

PrinciplePractical implication
AccountabilityNamed owners remain responsible for AI outcomes
TransparencyStakeholders understand purpose, limitations, and decision role
FairnessOutcomes and errors are assessed across relevant groups
RobustnessModel withstands noise, drift, stress, and adversarial behavior
PrivacyData use is limited, protected, and appropriate
SecurityModel, data, prompts, APIs, and outputs are protected
Human oversightPeople can intervene meaningfully where risk warrants
AuditabilityEvidence exists for design, validation, approval, and use

Data Risk Reference

Data issueWhy it mattersDetection or mitigation
Missing dataCan bias estimates if not randomMissingness analysis, imputation policy, segment checks
OutliersCan distort training and metricsWinsorization, robust methods, investigation
DuplicatesCan overweight observationsDeduplication and entity resolution
Incorrect labelsDirectly corrupt supervised learningLabel audit, reconciliation, expert review
Non-representative sampleModel may fail in productionPopulation comparison and monitoring
Stale dataRelationships may no longer holdRecency controls and drift monitoring
Poor lineageWeak auditability and reproducibilityData catalog, lineage documentation
Unauthorized dataLegal, ethical, and contractual riskData permissions review and access controls
Sensitive dataPrivacy, fairness, and conduct riskMinimization, masking, encryption, governance
Alternative dataPotential predictive lift with higher uncertaintySource diligence, explainability, stability testing

Generative AI and LLM Risk Reference

GenAI conceptMeaningExam-relevant control
PromptUser or system instruction given to a modelPrompt standards, testing, access controls
System promptHigher-priority instruction defining behavior and constraintsProtect from disclosure or override
HallucinationPlausible but false or unsupported outputGrounding, source citation, human review
Retrieval-augmented generationModel uses retrieved documents or data to support outputCurated knowledge base, source validation
Fine-tuningAdditional training on specific data or tasksData governance, evaluation, version control
EmbeddingNumeric representation of text or objects for similarity searchPrivacy review and vector database controls
Vector databaseStores embeddings for retrievalAccess control, deletion process, data lineage
Prompt injectionMalicious instruction attempts to override controlsInput filtering, isolation, output validation
Data exfiltrationSensitive data leaked through prompts or outputsDLP controls, redaction, logging
Model inversionAttempt to infer training data from model behaviorPrivacy-preserving controls and access limits
JailbreakUser circumvents model safety restrictionsAdversarial testing and guardrails
GuardrailTechnical or process control around model behaviorPolicy filters, allowlists, escalation rules
TemperatureParameter affecting randomness of generated outputLower for deterministic tasks; higher increases variability

GenAI Use-Case Control Matrix

Use caseLower-risk patternHigher-risk pattern
Internal summarizationHuman-reviewed summaries from approved documentsUnverified summary used for official reporting
Customer chatbotNarrow scope, retrieval grounding, escalation to humanOpen-ended advice with no audit trail
Code assistantDeveloper review, testing, secure repository controlsDirect production deployment of generated code
Research supportSource-linked drafts reviewed by analystTrading or credit decision based on unsourced output
Synthetic dataTested for utility and privacy leakageUsed as if it were real observed data
Policy interpretationReferences approved policy libraryCreates new policy language without approval

AI Security, Privacy, and Operational Resilience

RiskExampleControl
Adversarial inputSlightly altered transaction evades fraud modelRobustness testing, adversarial training, monitoring
Model theftUnauthorized extraction of model behavior or parametersRate limits, access controls, API monitoring
Data poisoningMalicious or corrupted training data changes behaviorData validation, source controls, anomaly detection
Prompt injectionExternal text instructs LLM to ignore controlsContent sanitization, tool isolation, output checks
Sensitive data leakageConfidential client data appears in prompt or outputRedaction, DLP, encryption, retention controls
Vendor outageAI service becomes unavailableFallback process, resilience planning
Concentration riskMany processes depend on one AI provider or modelVendor diversification or contingency planning
Unauthorized model changeVendor or developer changes model without reviewVersioning, change notification, revalidation triggers
Weak loggingCannot reconstruct AI-assisted decisionAudit logs, prompt/output retention policy
Over-permissioned toolsAI agent accesses systems beyond needLeast privilege and tool-use constraints

Vendor and Third-Party AI Due Diligence

Diligence areaQuestions to askEvidence to seek
Model purposeWhat is the tool intended and not intended to do?Product documentation, use-case restrictions
DataWhat data trained or powers the model? Can client data be used for training?Data handling terms, privacy controls
PerformanceHow was performance measured? On what population?Validation reports, benchmarks, test methodology
ExplainabilityCan outputs be explained at the level users need?Feature drivers, reason codes, interpretability tools
Bias and fairnessHas the vendor tested relevant disparities?Fairness testing documentation
SecurityHow are prompts, data, APIs, and outputs protected?Security controls, certifications, incident process
Change managementHow are model updates communicated and controlled?Version notes, release governance
AuditabilityCan the institution log and reconstruct decisions?Logging capabilities, export options
ResilienceWhat happens during outage or degraded service?SLAs, contingency procedures
SubcontractorsAre other providers involved?Third-party dependency list
ExitCan the institution transition away?Data export, deletion, termination procedures

Regulatory and Standards Themes to Recognize

Do not memorize this section as jurisdiction-specific legal advice. For exam purposes, focus on recurring supervisory and governance themes: accountability, transparency, fairness, privacy, security, resilience, documentation, validation, and human oversight.

Source or framework typeHigh-yield theme
Model risk management guidanceModels require inventory, governance, validation, documentation, monitoring, and effective challenge
Banking and financial supervisionAI use should align with safety and soundness, consumer protection, operational resilience, and risk governance
Data protection frameworksPersonal data use should be lawful, limited, protected, and transparent where applicable
AI risk management frameworksIdentify, measure, manage, monitor, and govern AI risks across the lifecycle
Conduct and consumer protection expectationsAvoid unfair, deceptive, discriminatory, or unsuitable outcomes
Cybersecurity standardsProtect AI systems, data, APIs, identities, logs, and third-party connections
Operational resilience expectationsCritical AI-supported services need contingency, incident response, and recovery planning
Emerging AI laws and policiesHigher-impact AI use cases typically receive greater governance scrutiny

High-Yield Distinctions

DistinctionKnow the difference
Accuracy vs calibrationAccuracy measures correct classifications; calibration measures whether probabilities match realized frequencies
Correlation vs causationPredictive association does not prove one variable causes another
Explainability vs fairnessA model can be explainable but unfair, or fairer by some metric but hard to explain
Model validation vs model monitoringValidation is pre-use or periodic independent challenge; monitoring is ongoing production surveillance
Development testing vs independent validationDevelopers optimize and test; independent validators challenge assumptions and use
Data drift vs concept driftData drift means inputs change; concept drift means relationships between inputs and outcomes change
Bias vs varianceBias is systematic error from oversimplification; variance is instability from sensitivity to training data
White-box vs black-boxWhite-box models are easier to inspect; black-box models may need post-hoc explanation and stronger controls
Human-in-the-loop vs human-on-the-loopIn-the-loop requires human action before decision; on-the-loop means human oversight of automated activity
Automation vs augmentationAutomation replaces a task; augmentation supports a human decision
Predictive model vs decision policyA model estimates risk; a policy converts estimates into actions
Model output vs business outcomeGood statistical performance may still create poor customer, financial, or operational outcomes
Validation exception vs limitationException is a finding requiring remediation; limitation is a known boundary that must be accepted and controlled
Vendor validation vs user validationVendor evidence helps, but the financial institution still needs fit-for-purpose assessment
GenAI fluency vs reliabilityWell-written output is not evidence of truth, completeness, or suitability

Scenario Decision Tables

If the Scenario Says…

Scenario clueLikely concept testedBest response direction
Model performs well in training but poorly in productionOverfitting, leakage, drift, or deployment mismatchCheck out-of-sample testing, leakage, monitoring, implementation
Rare event model has 99% accuracyClass imbalanceExamine recall, precision, confusion matrix, PR curve
Model uses postal code and produces group disparitiesProxy biasConduct fairness/proxy analysis and review feature permissibility
Business adopts vendor AI chatbot quicklyThird-party and GenAI riskRequire due diligence, use-case approval, guardrails, logging
LLM gives confident but false answerHallucinationUse retrieval grounding, source checks, human review
Production model changes after vendor updateChange managementTrigger impact assessment and possible revalidation
Model decision cannot be explained to affected userExplainability and governanceAdd reason codes, interpretable model, or human review
Performance degrades after economic shiftConcept drift or regime changeReassess assumptions, recalibrate, retrain, or apply overlays
Analysts ignore model warningsUse risk and human factorsReview workflow, training, incentives, escalation
Analysts blindly accept outputsAutomation biasStrengthen human oversight and challenge
Many false fraud alerts overwhelm staffThreshold and operational capacityTune cutoff, prioritize alerts, measure precision
New data source improves accuracy but includes sensitive attributesData ethics and compliance riskReview permissions, fairness, privacy, and necessity
Model works overall but fails for a segmentAggregation biasPerform segment validation and targeted remediation
AI system controls a critical process with no fallbackOperational resilienceAdd contingency, manual fallback, incident plan
Developers cannot reproduce resultsWeak documentation or environment controlRequire versioning, lineage, code/data reproducibility

Control Selection Matrix

ProblemPrimary controlSupporting controls
OverfittingOut-of-sample validationRegularization, pruning, simpler benchmark
Data leakageFeature review at decision-time boundaryTime-based splits, independent replication
Biased outcomesFairness testingProxy review, policy review, segment monitoring
Black-box modelExplainability toolsSimpler challenger, documentation, human review
GenAI hallucinationGrounding and verificationSource citation, confidence rules, escalation
Unauthorized data useData governanceAccess controls, lineage, privacy review
Vendor opacityThird-party due diligenceContractual reporting, independent testing
DriftOngoing monitoringRetraining triggers, recalibration, alerts
Poor user adoptionTraining and workflow designFeedback collection, override tracking
Excessive relianceHuman oversightDecision limits, second review, audit trail
Cyber attack on modelSecurity controlsAdversarial testing, monitoring, incident response
Production mismatchImplementation validationCode review, deployment controls, reconciliations

Compact Exam Checklist

Before answering an RAI scenario question, identify:

  1. Use case: credit, market, operational, compliance, customer, trading, reporting, or GenAI support.
  2. Decision role: advisory, automated, customer-facing, internal, capital-related, or critical process.
  3. Data risk: quality, lineage, permission, representativeness, leakage, proxies, privacy.
  4. Model risk: complexity, overfitting, drift, calibration, explainability, robustness.
  5. Human impact: customer harm, fairness, conduct, access to services, appeal or override.
  6. Governance need: inventory, tiering, validation, approval, monitoring, change control.
  7. Metric fit: classification, regression, ranking, calibration, fairness, tail risk.
  8. Control fit: technical control, process control, governance control, or human oversight.
  9. Residual risk: what remains after controls and whether it is acceptable.
  10. Evidence: documentation, logs, test results, approvals, issue tracking.

Final Review Prompts

Use these prompts to test readiness:

  • Can you explain why high accuracy may be weak evidence for a fraud model?
  • Can you choose between precision, recall, F1, ROC AUC, and calibration for a scenario?
  • Can you identify leakage, drift, overfitting, proxy bias, and automation bias from short fact patterns?
  • Can you distinguish model validation from monitoring and governance approval?
  • Can you select appropriate GenAI controls for hallucination, prompt injection, and data leakage?
  • Can you describe why vendor AI still requires internal accountability and fit-for-purpose review?
  • Can you connect AI controls to financial risk outcomes, customer impact, and operational resilience?

Practical Next Step

Turn each table into short practice drills: read a scenario, name the AI risk, choose the right metric or control, and explain why two tempting alternatives are weaker. Then complete a fresh set of RAI-style practice questions under timed conditions to build speed and applied judgment.