Exam Identity and How to Use This Page
This independent Quick Reference supports candidates preparing for the GARP Risk and AI Certificate (RAI), exam code RAI, offered by GARP. It focuses on high-yield distinctions, applied risk-management decisions, and compact review tables rather than broad textbook coverage.
Use it to quickly answer exam-style questions such as:
- What AI or machine learning method fits a financial risk use case?
- What validation test or metric should be used?
- What model risk control addresses a specific failure mode?
- How do explainability, fairness, governance, and monitoring differ?
- What additional risks arise from generative AI and large language models?
Core AI, ML, and Risk Vocabulary
| Term | Exam-ready meaning | Common trap |
|---|
| Artificial intelligence | Broad field of systems performing tasks associated with human intelligence, such as prediction, classification, language generation, or decision support | Treating all AI as machine learning; rule-based systems can also be AI |
| Machine learning | Models learn patterns from data rather than relying only on explicit rules | Assuming ML removes the need for human judgment or controls |
| Supervised learning | Trains on labeled examples where the target outcome is known | Using it where no reliable target label exists |
| Unsupervised learning | Finds structure without labeled outcomes, such as clusters or anomalies | Treating clusters as causal or inherently meaningful |
| Reinforcement learning | Learns actions through rewards and penalties over time | Often risky in finance if live experimentation affects customers, markets, or capital |
| Generative AI | Produces new text, code, images, data, or other content based on learned patterns | Confusing fluent output with verified truth |
| Large language model | GenAI model trained to predict and generate language-like sequences | Assuming it has a database of facts or reliable reasoning |
| Model | A quantitative, statistical, AI, or rule-based method that transforms inputs into estimates, rankings, decisions, or recommendations | Excluding vendor tools, spreadsheets, or embedded AI from the model inventory |
| Model risk | Risk of adverse consequences from model errors, misuse, or inappropriate reliance | Thinking model risk only means coding mistakes |
| Explainability | Ability to understand why a model behaves as it does | Assuming explanation equals proof of correctness |
| Interpretability | How directly understandable the model structure or logic is | Treating post-hoc explanations as the same as transparent design |
| Bias | Systematic error or unfair disparity in data, model design, outcomes, or deployment | Looking only at training data and ignoring downstream decision processes |
| Drift | Change over time in data, relationships, behavior, or performance | Monitoring accuracy only, while input distributions change silently |
| Human-in-the-loop | Human review, approval, override, or escalation within an AI process | Assuming human review is effective without training, authority, and audit trail |
| Third-party AI risk | Risk from vendor models, data, platforms, APIs, or embedded AI services | Assuming outsourced AI transfers accountability to the vendor |
AI Lifecycle for Financial Risk Management
flowchart LR
A[Business objective] --> B[Use-case risk assessment]
B --> C[Data sourcing and permissions]
C --> D[Feature engineering and model design]
D --> E[Training and tuning]
E --> F[Independent validation]
F --> G[Approval and deployment]
G --> H[Monitoring and controls]
H --> I[Change management]
I --> D
H --> J[Retirement or replacement]
| Lifecycle stage | Main exam focus | Key control questions |
|---|
| Business objective | Link AI to a legitimate business or risk-management purpose | What decision will the model support? Who is affected? What risk is created if wrong? |
| Use-case assessment | Classify criticality, materiality, customer impact, and regulatory sensitivity | Is the model high-impact, automated, customer-facing, or used for capital/liquidity/risk limits? |
| Data sourcing | Data quality, lineage, rights, representativeness, privacy, and bias | Is the data fit for purpose? Are labels reliable? Are proxies creating unfair outcomes? |
| Feature engineering | Transform raw data into predictive inputs | Are features stable, explainable, permissible, and available at decision time? |
| Training and tuning | Model selection, objective function, overfitting control | Is performance measured out-of-sample? Has tuning leaked validation information? |
| Validation | Independent challenge of conceptual soundness, implementation, and outcomes | Does the model work as intended under normal and stressed conditions? |
| Approval | Governance before production use | Are limitations documented? Are owners, thresholds, and escalation paths defined? |
| Deployment | Controlled implementation into systems and workflows | Does production code match validated code? Are access, logs, and fallbacks in place? |
| Monitoring | Ongoing performance, drift, exceptions, and use | Are thresholds actionable? Who reviews breaches? |
| Change management | Updates, retraining, vendor changes, new data, new use | Does the change require revalidation or reapproval? |
| Retirement | Remove obsolete or unsafe models | Are dependencies, records, and replacement controls managed? |
Learning Types and When to Use Them
| Learning type | Typical finance use cases | Strengths | Key risks |
|---|
| Supervised classification | Default prediction, fraud detection, AML alert triage, churn prediction | Clear target, measurable classification performance | Class imbalance, biased labels, threshold misuse |
| Supervised regression | Loss forecasting, exposure estimation, pricing, demand forecasting | Predicts continuous values | Outliers, unstable relationships, extrapolation error |
| Unsupervised clustering | Customer segmentation, peer grouping, portfolio pattern discovery | Useful when labels are absent | Clusters may be unstable or non-actionable |
| Anomaly detection | Fraud, cyber, trading surveillance, operational incidents | Finds unusual behavior | High false positives; unusual does not always mean suspicious |
| Time-series forecasting | Market variables, liquidity flows, macroeconomic indicators | Captures temporal structure | Regime shifts, autocorrelation mistakes, look-ahead bias |
| Natural language processing | News/sentiment analysis, document review, complaint classification | Converts text into structured signals | Context loss, language bias, hallucination with GenAI |
| Reinforcement learning | Execution algorithms, dynamic strategies, resource allocation | Optimizes sequential actions | Unsafe exploration, hard-to-explain behavior, feedback loops |
| Generative AI | Summaries, drafting, code assistance, research support, synthetic data | Scalable content generation and language interface | Hallucination, confidentiality, prompt injection, copyright/IP, overreliance |
Common Model Families
| Model family | Use when | Advantages | Limitations and exam traps |
|---|
| Linear regression | Continuous target; relationship is approximately linear | Simple, interpretable, fast | Sensitive to outliers and multicollinearity; poor for nonlinear effects |
| Logistic regression | Binary classification such as default/no default | Interpretable coefficients; useful baseline | Linear decision boundary unless features are transformed |
| Decision tree | Nonlinear rules and interactions are important | Intuitive splits; handles mixed data | Overfits easily if unconstrained |
| Random forest | Need robust nonlinear prediction | Reduces variance versus single tree | Less interpretable; may mask bias |
| Gradient boosting | High predictive performance on tabular data | Often strong for credit/fraud/risk scoring | Sensitive to tuning; overfitting and explainability concerns |
| Neural network | Complex patterns, unstructured data, images, text, high-dimensional signals | Flexible function approximation | Data-hungry, harder to validate and explain |
| Support vector machine | Classification with complex boundaries | Effective in some high-dimensional settings | Scaling and interpretability issues |
| k-means clustering | Partition observations into similar groups | Simple and fast | Requires preselected number of clusters; sensitive to scaling/outliers |
| Principal component analysis | Dimensionality reduction, factor extraction | Reduces correlated features | Components may be hard to interpret |
| Bayesian models | Need probabilistic updating or prior information | Explicit uncertainty treatment | Prior choice and computation may be challenging |
| Large language model | Language generation, summarization, extraction, Q&A support | Natural interface and broad text capability | Output may be plausible but wrong; needs grounding and guardrails |
Financial Risk Use-Case Matrix
| Use case | AI contribution | Primary risks | Controls to emphasize |
|---|
| Credit underwriting | PD estimation, scorecards, alternative data analysis | Discrimination, proxy variables, adverse selection, explainability gaps | Fairness testing, reason codes, feature governance, threshold review |
| Credit portfolio monitoring | Early warning indicators, migration prediction | Drift, macro regime change, false reassurance | Backtesting, stress testing, scenario overlays |
| Market risk | Volatility forecasting, pricing proxies, anomaly detection | Nonstationarity, tail underestimation, model opacity | Stress testing, benchmark models, sensitivity analysis |
| Liquidity risk | Cash-flow forecasting, deposit behavior prediction | Behavioral shifts, concentration risk, feedback effects | Scenario analysis, conservative overlays, monitoring |
| Operational risk | Loss event classification, incident detection | Incomplete labels, low-frequency high-severity events | Expert review, scenario analysis, qualitative controls |
| Fraud detection | Transaction scoring, anomaly detection, network analysis | Class imbalance, adversarial behavior, customer friction | Threshold tuning, feedback loops, alert quality metrics |
| AML/KYC | Alert prioritization, entity resolution, transaction monitoring | False positives, explainability, regulatory sensitivity | Human review, audit trail, typology testing |
| Trading and execution | Signal generation, execution optimization | Overfitting, market impact, feedback loops | Out-of-sample testing, kill switches, limit controls |
| Customer service | Chatbots, complaint routing, document summaries | Hallucination, unfair treatment, privacy leakage | Retrieval grounding, escalation, conversation logging |
| Risk reporting | Narrative generation, data extraction, dashboards | Misstatement, stale data, weak lineage | Source linking, reconciliation, approval workflow |
Model Risk Management Reference
Three-Lines View
| Function | Typical role in AI/model risk | What to remember for exam scenarios |
|---|
| First line | Owns business use, model development, operation, controls, and day-to-day performance | Cannot outsource accountability to validators or vendors |
| Second line | Sets policy, challenges risk assessment, performs or oversees independent validation, monitors risk appetite | Independence and effective challenge matter |
| Third line | Provides internal audit assurance over governance and controls | Reviews whether the framework works, not just one model’s performance |
Model Governance Artifacts
| Artifact | Purpose | Common deficiency |
|---|
| Model inventory | Complete list of models, AI tools, owners, status, and materiality | Missing vendor models, spreadsheets, GenAI tools, or embedded analytics |
| Model tiering | Prioritizes governance by risk, materiality, complexity, and impact | Classifying by complexity only and ignoring business impact |
| Development documentation | Explains objective, data, assumptions, methods, limitations, and testing | Documentation written after the fact and not tied to design decisions |
| Validation report | Independent assessment of conceptual soundness, implementation, outcomes, and limitations | Merely reproducing developer metrics without challenge |
| Approval record | Evidence that authorized governance body accepted use and limitations | Approval without conditions, owners, or monitoring thresholds |
| Monitoring plan | Defines metrics, thresholds, frequency, escalation, and remediation | Metrics tracked but no action when breached |
| Change log | Records retraining, feature changes, code changes, data changes, and vendor updates | Treating “minor” data or API changes as non-model changes |
| Issue log | Tracks limitations, findings, remediation owners, and due dates | Findings closed without evidence |
| User procedures | Explains how outputs should and should not be used | Users treat scores as final decisions despite intended advisory use |
Independent Validation: What to Test
| Validation area | Key question | Example techniques |
|---|
| Conceptual soundness | Is the model design appropriate for the objective? | Method review, assumptions review, benchmark comparison |
| Data quality | Is the data accurate, complete, representative, and permitted? | Lineage review, missing-value analysis, outlier review, label audit |
| Feature appropriateness | Are inputs available, stable, explainable, and acceptable? | Leakage checks, proxy analysis, correlation review |
| Implementation | Does production match approved design? | Code review, replication, unit testing, environment checks |
| Performance | Does the model predict well on unseen data? | Holdout testing, cross-validation, backtesting |
| Stability | Does performance hold across time, segments, and conditions? | Temporal validation, population stability, stress periods |
| Fairness | Are outcomes unjustifiably different across groups? | Disparity metrics, proxy review, segment-level error analysis |
| Explainability | Can stakeholders understand drivers and limitations? | SHAP/LIME, reason codes, sensitivity analysis |
| Robustness | Can the model handle noise, adversarial inputs, and edge cases? | Perturbation testing, stress tests, scenario tests |
| Use and controls | Is the model used as intended with oversight? | Workflow review, override analysis, access control review |
| Ongoing monitoring | Are deterioration and misuse detected? | Thresholds, alerts, drift metrics, periodic review |
Validation and Testing Traps
| Trap | Why it matters | Better exam answer |
|---|
| Data leakage | Model uses information unavailable at decision time | Rebuild features using only information known at prediction time |
| Look-ahead bias | Future data contaminates training or testing | Use time-consistent splits for time-dependent data |
| Target leakage | Predictor directly encodes the label or outcome | Remove or redefine leaky features |
| Overfitting | Model learns noise rather than general patterns | Use holdout data, cross-validation, regularization, pruning |
| Underfitting | Model too simple to capture meaningful structure | Add relevant features or more appropriate model class |
| Class imbalance | High accuracy can hide poor detection of rare events | Use precision, recall, F1, ROC/PR curves, cost-sensitive thresholds |
| Survivorship bias | Failed or exited entities are omitted | Include inactive, defaulted, closed, or failed observations where relevant |
| Sample-selection bias | Training population differs from deployment population | Test representativeness and segment performance |
| Proxy discrimination | Innocent-looking variables replicate protected traits | Conduct proxy analysis and fairness testing |
| Concept drift | Relationship between inputs and target changes | Monitor performance and retrain or recalibrate |
| Data drift | Input distribution changes | Monitor feature distributions and population stability |
| Automation bias | Users overtrust model output | Require training, explanations, overrides, and escalation |
| Feedback loop | Model decisions affect future data labels | Separate observed outcomes from model-influenced outcomes where possible |
Confusion Matrix Terms
| Term | Meaning |
|---|
| True positive | Model predicts positive and actual outcome is positive |
| False positive | Model predicts positive but actual outcome is negative |
| True negative | Model predicts negative and actual outcome is negative |
| False negative | Model predicts negative but actual outcome is positive |
For a fraud model, “positive” often means flagged as fraud. For a credit default model, “positive” may mean predicted default. Always identify the positive class before interpreting precision, recall, or false-positive rates.
Classification Metrics
\[
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
\]\[
\text{Precision} = \frac{TP}{TP + FP}
\]\[
\text{Recall or Sensitivity} = \frac{TP}{TP + FN}
\]\[
\text{Specificity} = \frac{TN}{TN + FP}
\]\[
\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\]
| Metric | Best used for | Trap |
|---|
| Accuracy | Balanced classes and similar error costs | Misleading when positives are rare |
| Precision | Costly false positives, such as unnecessary investigations or customer friction | High precision may miss many true positives |
| Recall / sensitivity | Costly false negatives, such as missed fraud or missed defaults | High recall may create excessive false positives |
| Specificity | Ability to correctly reject negatives | Can be high even when positive detection is weak |
| F1 score | Balance between precision and recall | Does not include true negatives |
| ROC AUC | Ranking quality across thresholds | Can look strong under heavy class imbalance |
| Precision-recall curve | Rare-event detection | More informative than ROC in many imbalanced settings |
| Calibration | Whether predicted probabilities match realized frequencies | A high-ranking model may still be poorly calibrated |
Regression and Forecasting Metrics
\[
MAE = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|
\]\[
MSE = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2
\]\[
RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
\]
| Metric | Interpretation | Trap |
|---|
| MAE | Average absolute error; easy to interpret in original units | Treats all errors linearly |
| MSE | Average squared error | Harder to interpret; heavily penalizes large errors |
| RMSE | Error measure in original units, sensitive to large misses | Can be dominated by outliers |
| MAPE | Percentage error measure | Problematic when actual values are near zero |
| R-squared | Share of variance explained in sample | High value does not prove causality or out-of-sample performance |
\[
\text{Expected Loss} = PD \times LGD \times EAD
\]\[
\text{Unexpected Loss} = \text{Potential loss above expected loss, often linked to volatility or tail outcomes}
\]\[
VaR_{\alpha} = \text{Loss threshold not exceeded with confidence level } \alpha \text{ over a specified horizon}
\]\[
ES_{\alpha} = \text{Average loss conditional on losses exceeding } VaR_{\alpha}
\]
| Formula concept | Exam focus | Common trap |
|---|
| PD | Probability of default | Confusing default probability with loss amount |
| LGD | Loss severity if default occurs | Ignoring collateral, seniority, and recovery uncertainty |
| EAD | Exposure at default | Treating current balance as always equal to future exposure |
| Expected loss | Average credit loss estimate | Not the same as economic capital or unexpected loss |
| VaR | Quantile-based loss measure | Says little about severity beyond the threshold |
| Expected shortfall | Tail-loss average beyond VaR | More informative about tail severity, but model-dependent |
| Stress testing | Scenario-based adverse impact assessment | Not a probability forecast unless explicitly modeled as one |
Thresholds, Cutoffs, and Business Decisions
| Decision point | What changes when threshold moves up? | What changes when threshold moves down? |
|---|
| Fraud alert threshold | Fewer alerts, higher precision, more missed fraud risk | More alerts, higher recall, more operational burden |
| Credit approval cutoff | Fewer approvals, lower default risk, potential lost revenue | More approvals, higher default risk, possible fairness impact |
| AML alert priority | Fewer escalations, risk of missed suspicious activity | More escalations, analyst overload |
| Model override threshold | Fewer manual reviews, faster processing | More review effort, potentially better control of edge cases |
| GenAI confidence/escalation rule | More automated responses, higher hallucination exposure | More human review, slower response |
Exam-style threshold questions usually require balancing error costs, risk appetite, customer impact, regulatory sensitivity, and operational capacity.
Explainability and Interpretability
| Concept or method | What it does | Best use | Limitation |
|---|
| Intrinsic interpretability | Model is understandable by design, such as linear models or shallow trees | High-stakes decisions requiring clear rationale | May sacrifice predictive performance |
| Post-hoc explanation | Explains a trained model after the fact | Complex models where transparency is limited | May approximate rather than reveal true logic |
| Global explanation | Describes model behavior overall | Policy, governance, validation | May hide segment-level behavior |
| Local explanation | Explains one prediction | Customer-level decision review, investigation | Can be unstable near decision boundaries |
| Feature importance | Ranks influential inputs | Model review and governance | Does not show direction, causality, or fairness |
| Partial dependence plot | Shows average effect of a feature | Understanding nonlinear relationships | Can mislead when features are correlated |
| ICE plot | Shows feature effect for individual observations | Detecting heterogeneity | Can be noisy and hard to summarize |
| SHAP | Allocates prediction contribution across features | Local and global explanations | Computational complexity; assumptions matter |
| LIME | Local surrogate explanation around one prediction | Quick local explanation | Sensitive to sampling and neighborhood definition |
| Counterfactual explanation | Shows minimal change needed to alter outcome | Actionable adverse-action-style reasoning | Must be realistic and permissible |
Explanation Quality Checklist
A useful explanation should be:
- Faithful: reflects actual model behavior, not a convenient story.
- Stable: similar inputs produce similar explanations unless a real boundary is crossed.
- Actionable: helps users understand what can be changed or reviewed.
- Audience-appropriate: different detail for developers, validators, executives, customers, and auditors.
- Documented: stored with assumptions, limitations, and intended use.
- Tested: checked for consistency across segments and edge cases.
Fairness, Bias, and Responsible AI
| Bias source | Example in finance | Control |
|---|
| Historical bias | Past lending decisions reflect unequal access to credit | Review labels, outcomes, and policy history |
| Representation bias | Training data underrepresents certain groups or regions | Sampling review and segment testing |
| Measurement bias | Income, employment, or address data have unequal quality | Data quality checks by segment |
| Label bias | “Default” or “fraud” labels reflect prior detection practices | Label audit and alternative outcome definitions |
| Proxy bias | ZIP code, device type, or merchant behavior correlates with protected traits | Proxy analysis and feature governance |
| Aggregation bias | One model performs poorly for a subgroup | Segment-level validation and monitoring |
| Deployment bias | Users apply model outside intended scope | Training, access control, use restrictions |
| Feedback bias | Model-driven decisions affect future observed outcomes | Monitor feedback loops and use independent samples where possible |
Fairness Metrics: What They Mean
| Metric | Plain-language question | Trap |
|---|
| Demographic parity | Are positive decision rates similar across groups? | May ignore legitimate risk differences |
| Equal opportunity | Are true positive rates similar across groups? | Focuses on positives, not false positives |
| Equalized odds | Are both true positive and false positive rates similar? | Often conflicts with calibration |
| Predictive parity | Is precision similar across groups? | May conflict with equalized odds when base rates differ |
| Calibration by group | Do predicted probabilities match outcomes within each group? | Calibrated models can still produce different approval rates |
| Disparate impact analysis | Do outcomes disproportionately affect a group? | Legal meaning depends on jurisdiction and context |
Responsible AI Principles in Exam Scenarios
| Principle | Practical implication |
|---|
| Accountability | Named owners remain responsible for AI outcomes |
| Transparency | Stakeholders understand purpose, limitations, and decision role |
| Fairness | Outcomes and errors are assessed across relevant groups |
| Robustness | Model withstands noise, drift, stress, and adversarial behavior |
| Privacy | Data use is limited, protected, and appropriate |
| Security | Model, data, prompts, APIs, and outputs are protected |
| Human oversight | People can intervene meaningfully where risk warrants |
| Auditability | Evidence exists for design, validation, approval, and use |
Data Risk Reference
| Data issue | Why it matters | Detection or mitigation |
|---|
| Missing data | Can bias estimates if not random | Missingness analysis, imputation policy, segment checks |
| Outliers | Can distort training and metrics | Winsorization, robust methods, investigation |
| Duplicates | Can overweight observations | Deduplication and entity resolution |
| Incorrect labels | Directly corrupt supervised learning | Label audit, reconciliation, expert review |
| Non-representative sample | Model may fail in production | Population comparison and monitoring |
| Stale data | Relationships may no longer hold | Recency controls and drift monitoring |
| Poor lineage | Weak auditability and reproducibility | Data catalog, lineage documentation |
| Unauthorized data | Legal, ethical, and contractual risk | Data permissions review and access controls |
| Sensitive data | Privacy, fairness, and conduct risk | Minimization, masking, encryption, governance |
| Alternative data | Potential predictive lift with higher uncertainty | Source diligence, explainability, stability testing |
Generative AI and LLM Risk Reference
| GenAI concept | Meaning | Exam-relevant control |
|---|
| Prompt | User or system instruction given to a model | Prompt standards, testing, access controls |
| System prompt | Higher-priority instruction defining behavior and constraints | Protect from disclosure or override |
| Hallucination | Plausible but false or unsupported output | Grounding, source citation, human review |
| Retrieval-augmented generation | Model uses retrieved documents or data to support output | Curated knowledge base, source validation |
| Fine-tuning | Additional training on specific data or tasks | Data governance, evaluation, version control |
| Embedding | Numeric representation of text or objects for similarity search | Privacy review and vector database controls |
| Vector database | Stores embeddings for retrieval | Access control, deletion process, data lineage |
| Prompt injection | Malicious instruction attempts to override controls | Input filtering, isolation, output validation |
| Data exfiltration | Sensitive data leaked through prompts or outputs | DLP controls, redaction, logging |
| Model inversion | Attempt to infer training data from model behavior | Privacy-preserving controls and access limits |
| Jailbreak | User circumvents model safety restrictions | Adversarial testing and guardrails |
| Guardrail | Technical or process control around model behavior | Policy filters, allowlists, escalation rules |
| Temperature | Parameter affecting randomness of generated output | Lower for deterministic tasks; higher increases variability |
GenAI Use-Case Control Matrix
| Use case | Lower-risk pattern | Higher-risk pattern |
|---|
| Internal summarization | Human-reviewed summaries from approved documents | Unverified summary used for official reporting |
| Customer chatbot | Narrow scope, retrieval grounding, escalation to human | Open-ended advice with no audit trail |
| Code assistant | Developer review, testing, secure repository controls | Direct production deployment of generated code |
| Research support | Source-linked drafts reviewed by analyst | Trading or credit decision based on unsourced output |
| Synthetic data | Tested for utility and privacy leakage | Used as if it were real observed data |
| Policy interpretation | References approved policy library | Creates new policy language without approval |
AI Security, Privacy, and Operational Resilience
| Risk | Example | Control |
|---|
| Adversarial input | Slightly altered transaction evades fraud model | Robustness testing, adversarial training, monitoring |
| Model theft | Unauthorized extraction of model behavior or parameters | Rate limits, access controls, API monitoring |
| Data poisoning | Malicious or corrupted training data changes behavior | Data validation, source controls, anomaly detection |
| Prompt injection | External text instructs LLM to ignore controls | Content sanitization, tool isolation, output checks |
| Sensitive data leakage | Confidential client data appears in prompt or output | Redaction, DLP, encryption, retention controls |
| Vendor outage | AI service becomes unavailable | Fallback process, resilience planning |
| Concentration risk | Many processes depend on one AI provider or model | Vendor diversification or contingency planning |
| Unauthorized model change | Vendor or developer changes model without review | Versioning, change notification, revalidation triggers |
| Weak logging | Cannot reconstruct AI-assisted decision | Audit logs, prompt/output retention policy |
| Over-permissioned tools | AI agent accesses systems beyond need | Least privilege and tool-use constraints |
Vendor and Third-Party AI Due Diligence
| Diligence area | Questions to ask | Evidence to seek |
|---|
| Model purpose | What is the tool intended and not intended to do? | Product documentation, use-case restrictions |
| Data | What data trained or powers the model? Can client data be used for training? | Data handling terms, privacy controls |
| Performance | How was performance measured? On what population? | Validation reports, benchmarks, test methodology |
| Explainability | Can outputs be explained at the level users need? | Feature drivers, reason codes, interpretability tools |
| Bias and fairness | Has the vendor tested relevant disparities? | Fairness testing documentation |
| Security | How are prompts, data, APIs, and outputs protected? | Security controls, certifications, incident process |
| Change management | How are model updates communicated and controlled? | Version notes, release governance |
| Auditability | Can the institution log and reconstruct decisions? | Logging capabilities, export options |
| Resilience | What happens during outage or degraded service? | SLAs, contingency procedures |
| Subcontractors | Are other providers involved? | Third-party dependency list |
| Exit | Can the institution transition away? | Data export, deletion, termination procedures |
Regulatory and Standards Themes to Recognize
Do not memorize this section as jurisdiction-specific legal advice. For exam purposes, focus on recurring supervisory and governance themes: accountability, transparency, fairness, privacy, security, resilience, documentation, validation, and human oversight.
| Source or framework type | High-yield theme |
|---|
| Model risk management guidance | Models require inventory, governance, validation, documentation, monitoring, and effective challenge |
| Banking and financial supervision | AI use should align with safety and soundness, consumer protection, operational resilience, and risk governance |
| Data protection frameworks | Personal data use should be lawful, limited, protected, and transparent where applicable |
| AI risk management frameworks | Identify, measure, manage, monitor, and govern AI risks across the lifecycle |
| Conduct and consumer protection expectations | Avoid unfair, deceptive, discriminatory, or unsuitable outcomes |
| Cybersecurity standards | Protect AI systems, data, APIs, identities, logs, and third-party connections |
| Operational resilience expectations | Critical AI-supported services need contingency, incident response, and recovery planning |
| Emerging AI laws and policies | Higher-impact AI use cases typically receive greater governance scrutiny |
High-Yield Distinctions
| Distinction | Know the difference |
|---|
| Accuracy vs calibration | Accuracy measures correct classifications; calibration measures whether probabilities match realized frequencies |
| Correlation vs causation | Predictive association does not prove one variable causes another |
| Explainability vs fairness | A model can be explainable but unfair, or fairer by some metric but hard to explain |
| Model validation vs model monitoring | Validation is pre-use or periodic independent challenge; monitoring is ongoing production surveillance |
| Development testing vs independent validation | Developers optimize and test; independent validators challenge assumptions and use |
| Data drift vs concept drift | Data drift means inputs change; concept drift means relationships between inputs and outcomes change |
| Bias vs variance | Bias is systematic error from oversimplification; variance is instability from sensitivity to training data |
| White-box vs black-box | White-box models are easier to inspect; black-box models may need post-hoc explanation and stronger controls |
| Human-in-the-loop vs human-on-the-loop | In-the-loop requires human action before decision; on-the-loop means human oversight of automated activity |
| Automation vs augmentation | Automation replaces a task; augmentation supports a human decision |
| Predictive model vs decision policy | A model estimates risk; a policy converts estimates into actions |
| Model output vs business outcome | Good statistical performance may still create poor customer, financial, or operational outcomes |
| Validation exception vs limitation | Exception is a finding requiring remediation; limitation is a known boundary that must be accepted and controlled |
| Vendor validation vs user validation | Vendor evidence helps, but the financial institution still needs fit-for-purpose assessment |
| GenAI fluency vs reliability | Well-written output is not evidence of truth, completeness, or suitability |
Scenario Decision Tables
If the Scenario Says…
| Scenario clue | Likely concept tested | Best response direction |
|---|
| Model performs well in training but poorly in production | Overfitting, leakage, drift, or deployment mismatch | Check out-of-sample testing, leakage, monitoring, implementation |
| Rare event model has 99% accuracy | Class imbalance | Examine recall, precision, confusion matrix, PR curve |
| Model uses postal code and produces group disparities | Proxy bias | Conduct fairness/proxy analysis and review feature permissibility |
| Business adopts vendor AI chatbot quickly | Third-party and GenAI risk | Require due diligence, use-case approval, guardrails, logging |
| LLM gives confident but false answer | Hallucination | Use retrieval grounding, source checks, human review |
| Production model changes after vendor update | Change management | Trigger impact assessment and possible revalidation |
| Model decision cannot be explained to affected user | Explainability and governance | Add reason codes, interpretable model, or human review |
| Performance degrades after economic shift | Concept drift or regime change | Reassess assumptions, recalibrate, retrain, or apply overlays |
| Analysts ignore model warnings | Use risk and human factors | Review workflow, training, incentives, escalation |
| Analysts blindly accept outputs | Automation bias | Strengthen human oversight and challenge |
| Many false fraud alerts overwhelm staff | Threshold and operational capacity | Tune cutoff, prioritize alerts, measure precision |
| New data source improves accuracy but includes sensitive attributes | Data ethics and compliance risk | Review permissions, fairness, privacy, and necessity |
| Model works overall but fails for a segment | Aggregation bias | Perform segment validation and targeted remediation |
| AI system controls a critical process with no fallback | Operational resilience | Add contingency, manual fallback, incident plan |
| Developers cannot reproduce results | Weak documentation or environment control | Require versioning, lineage, code/data reproducibility |
Control Selection Matrix
| Problem | Primary control | Supporting controls |
|---|
| Overfitting | Out-of-sample validation | Regularization, pruning, simpler benchmark |
| Data leakage | Feature review at decision-time boundary | Time-based splits, independent replication |
| Biased outcomes | Fairness testing | Proxy review, policy review, segment monitoring |
| Black-box model | Explainability tools | Simpler challenger, documentation, human review |
| GenAI hallucination | Grounding and verification | Source citation, confidence rules, escalation |
| Unauthorized data use | Data governance | Access controls, lineage, privacy review |
| Vendor opacity | Third-party due diligence | Contractual reporting, independent testing |
| Drift | Ongoing monitoring | Retraining triggers, recalibration, alerts |
| Poor user adoption | Training and workflow design | Feedback collection, override tracking |
| Excessive reliance | Human oversight | Decision limits, second review, audit trail |
| Cyber attack on model | Security controls | Adversarial testing, monitoring, incident response |
| Production mismatch | Implementation validation | Code review, deployment controls, reconciliations |
Compact Exam Checklist
Before answering an RAI scenario question, identify:
- Use case: credit, market, operational, compliance, customer, trading, reporting, or GenAI support.
- Decision role: advisory, automated, customer-facing, internal, capital-related, or critical process.
- Data risk: quality, lineage, permission, representativeness, leakage, proxies, privacy.
- Model risk: complexity, overfitting, drift, calibration, explainability, robustness.
- Human impact: customer harm, fairness, conduct, access to services, appeal or override.
- Governance need: inventory, tiering, validation, approval, monitoring, change control.
- Metric fit: classification, regression, ranking, calibration, fairness, tail risk.
- Control fit: technical control, process control, governance control, or human oversight.
- Residual risk: what remains after controls and whether it is acceptable.
- Evidence: documentation, logs, test results, approvals, issue tracking.
Final Review Prompts
Use these prompts to test readiness:
- Can you explain why high accuracy may be weak evidence for a fraud model?
- Can you choose between precision, recall, F1, ROC AUC, and calibration for a scenario?
- Can you identify leakage, drift, overfitting, proxy bias, and automation bias from short fact patterns?
- Can you distinguish model validation from monitoring and governance approval?
- Can you select appropriate GenAI controls for hallucination, prompt injection, and data leakage?
- Can you describe why vendor AI still requires internal accountability and fit-for-purpose review?
- Can you connect AI controls to financial risk outcomes, customer impact, and operational resilience?
Practical Next Step
Turn each table into short practice drills: read a scenario, name the AI risk, choose the right metric or control, and explain why two tempting alternatives are weaker. Then complete a fresh set of RAI-style practice questions under timed conditions to build speed and applied judgment.