Quick Orientation for RAI Candidates
This quick review is for candidates preparing for the GARP Risk and AI Certificate (RAI), exam code RAI, offered by GARP. Use it as a final-pass review before moving into independent companion practice, topic drills, mock exams, and detailed explanations.
The exam rewards more than vocabulary. Be ready to apply AI risk concepts to practical scenarios involving governance, data, model development, validation, deployment, monitoring, bias, explainability, operational resilience, and generative AI.
Practical study rule: if you can explain what can go wrong, why it matters, how to detect it, and what control reduces the risk, you are reviewing at the right level.
High-Yield RAI Review Map
| Area | What to Know | Common Exam Angle |
|---|
| AI and ML fundamentals | Supervised, unsupervised, reinforcement, generative AI, model training, inference | Match model type to use case and risk |
| Data risk | Quality, lineage, representativeness, privacy, leakage, bias | Identify flawed data assumptions |
| Model development | Feature engineering, train/test split, overfitting, hyperparameters | Choose better development practice |
| Model validation | Independent review, conceptual soundness, performance testing, limitations | Distinguish validation from monitoring |
| Governance | Accountability, roles, risk appetite, policies, escalation | Identify missing ownership or weak controls |
| Explainability | Global/local explanations, SHAP, LIME, counterfactuals | Select explanation method by audience and use |
| Fairness and bias | Disparate outcomes, proxy variables, label bias, fairness trade-offs | Avoid “remove protected variable” trap |
| Operational risk | Deployment, change management, outages, human override, vendor reliance | Recognize production risk beyond model accuracy |
| Cyber and adversarial risk | Data poisoning, evasion, prompt injection, model extraction | Select control for attack type |
| Generative AI | Hallucination, grounding, RAG, guardrails, red teaming, human review | Apply controls to LLM-specific risks |
| Monitoring | Drift, degradation, threshold breaches, incident response | Know what to monitor and when to retrain |
| Third-party risk | Vendor models, APIs, cloud, documentation, auditability | Identify due diligence gaps |
AI, ML, and Model Types
Core Distinctions
| Concept | Meaning | Watch For |
|---|
| Artificial intelligence | Systems performing tasks associated with human intelligence | Broad umbrella; not all AI is machine learning |
| Machine learning | Models learn patterns from data rather than explicit rules | Data quality and representativeness become central risks |
| Deep learning | Neural network methods with many layers | Often high performance but less transparent |
| Generative AI | Produces new text, images, code, or other content | Hallucination, misuse, IP/privacy, prompt risks |
| Foundation model | Large pretrained model adaptable to many tasks | Broad capability creates broad risk surface |
| Inference | Applying a trained model to new inputs | Production controls matter here |
| Training | Estimating model parameters from data | Leakage, bias, overfitting, and compute risk arise here |
Model Family Quick Review
| Model Type | Typical Use | Strength | Key Risk |
|---|
| Linear/logistic regression | Scoring, classification, interpretable baselines | Transparent, stable | Misses nonlinear relationships |
| Decision tree | Rules-based classification/regression | Easy to explain | Overfits if unconstrained |
| Random forest | Ensemble prediction | Robust, strong performance | Less interpretable than single tree |
| Gradient boosting | Credit, fraud, pricing, risk scoring | High predictive power | Sensitive to tuning; explainability challenge |
| Neural network | Complex patterns, images, language, nonlinear prediction | Flexible | Opaque, data/compute intensive |
| Clustering | Segmentation, anomaly grouping | Finds structure without labels | Clusters may lack business meaning |
| Anomaly detection | Fraud, cyber, operational exceptions | Identifies rare patterns | High false positives if poorly calibrated |
| Reinforcement learning | Sequential decisions, optimization | Learns from reward feedback | Safety and unintended strategy risk |
| Large language model | Text generation, summarization, assistants | Flexible natural language capability | Hallucination, prompt injection, data leakage |
Supervised vs. Unsupervised vs. Generative
| Question | Likely Category |
|---|
| “We have labeled historical outcomes and want to predict a future outcome.” | Supervised learning |
| “We want to group customers or transactions without labels.” | Unsupervised learning |
| “We want the model to create text, code, or synthetic content.” | Generative AI |
| “We want an agent to learn actions over time based on rewards.” | Reinforcement learning |
AI Risk Lifecycle
flowchart LR
A[Use Case Definition] --> B[Data Sourcing and Governance]
B --> C[Model Development]
C --> D[Independent Review and Validation]
D --> E[Approval and Deployment]
E --> F[Production Monitoring]
F --> G[Change Management]
G --> H[Retire, Replace, or Revalidate]
F --> C
Lifecycle Control Points
| Stage | Key Question | Control Focus |
|---|
| Use case definition | Is AI appropriate for the decision? | Materiality, risk appetite, human impact |
| Data sourcing | Is the data fit for purpose? | Lineage, quality, permissions, representativeness |
| Development | Is the model designed soundly? | Method selection, feature review, documentation |
| Validation | Does the model work as intended? | Independent challenge, testing, limitations |
| Approval | Who accepts the residual risk? | Governance, sign-off, accountability |
| Deployment | Is implementation faithful and secure? | Access control, testing, rollback plans |
| Monitoring | Is performance stable over time? | Drift, accuracy, bias, usage, incidents |
| Change management | What changed and who approved it? | Version control, revalidation triggers |
| Retirement | Is the model still needed? | Decommissioning, replacement, record retention |
Governance and Accountability
Governance Building Blocks
| Element | Purpose | Weak Signal |
|---|
| Risk appetite | Defines acceptable risk levels | No escalation when thresholds are breached |
| Policy and standards | Create consistent minimum expectations | Teams invent their own controls |
| Roles and responsibilities | Clarify ownership | “The model owns itself” or no named accountable owner |
| Independent challenge | Tests assumptions and limitations | Development team validates its own work without review |
| Documentation | Supports auditability and repeatability | Key decisions exist only in emails or code comments |
| Inventory | Tracks AI systems and materiality | Shadow AI tools used outside approval |
| Escalation process | Ensures issues reach decision-makers | Monitoring flags ignored |
| Human oversight | Keeps accountable judgment in the loop | Rubber-stamp review or no override path |
Three-Lines-of-Defense Style Thinking
| Function | Typical Responsibility | Exam Trap |
|---|
| First line | Owns and operates the AI use case | Cannot outsource accountability to validation or vendor |
| Second line | Sets risk framework, challenges, oversees | Should not become the model developer |
| Third line | Independent audit/assurance | Reviews framework effectiveness, not daily tuning |
Governance Decision Rules
- High materiality + low explainability requires stronger documentation, validation, monitoring, and human oversight.
- Customer-impacting decisions require special attention to fairness, transparency, complaint handling, and override processes.
- Automated decisions without review raise governance stakes, especially if adverse outcomes are possible.
- Vendor-provided AI still needs internal accountability, due diligence, performance monitoring, and exit planning.
- GenAI used for advice, summaries, or decisions needs controls for hallucination, grounding, prompt injection, and user misuse.
Data Risk Quick Review
Common Data Risks
| Risk | Meaning | Example | Control |
|---|
| Poor quality | Inaccurate, incomplete, inconsistent data | Missing income fields | Data checks, cleansing, reconciliation |
| Bias | Data reflects historical inequities or sampling issues | Underrepresented borrower group | Bias testing, representative sampling |
| Leakage | Training data contains future or target information | Using post-default collection status to predict default | Feature review, temporal validation |
| Drift | Production data changes over time | New customer mix after product launch | Drift monitoring, retraining triggers |
| Lineage gaps | Unknown origin or transformations | Vendor data field cannot be traced | Data lineage documentation |
| Privacy exposure | Sensitive information used improperly | PII in prompts or logs | Minimization, masking, access controls |
| Proxy variables | Non-sensitive variables approximate sensitive traits | ZIP code as socioeconomic proxy | Fairness review, feature testing |
| Label error | Outcome variable is wrong or biased | Fraud labels based only on detected fraud | Label audit, alternative labels |
Data Leakage Traps
Data leakage is one of the most important candidate traps. Look for features that would not be available at the time of decision.
| Suspicious Feature | Why It May Leak |
|---|
| Collection outcome used in credit approval model | Outcome occurs after approval |
| Claim settlement amount used in claim triage at filing | Settlement occurs later |
| Fraud investigation result used at transaction authorization | Investigation occurs after transaction |
| Customer churn reason used to predict churn | Reason is known only after churn |
Train, Validation, and Test Sets
| Dataset | Purpose | Candidate Trap |
|---|
| Training set | Fit model parameters | Do not use it as proof of real-world performance |
| Validation set | Tune hyperparameters and select model | Repeated tuning can overfit validation data |
| Test set | Final unbiased performance estimate | Do not tune after looking at test results |
| Out-of-time sample | Tests temporal stability | Often more realistic for financial data |
| Production data | Live operating environment | Must be monitored; not the same as test data |
Overfitting vs. Underfitting
| Pattern | Meaning | Typical Evidence | Response |
|---|
| Overfitting | Learns noise or idiosyncrasies | High training performance, weak test performance | Simplify model, regularize, more data, cross-validation |
| Underfitting | Too simple to capture signal | Weak training and test performance | Add features, richer model, improve data |
| Data drift | Relationship changes after deployment | Performance degrades over time | Monitor, recalibrate, retrain |
| Concept drift | Target relationship changes | Old predictors no longer work | Reassess model design and assumptions |
Classification Metrics
Use the confusion matrix language carefully:
| Term | Meaning |
|---|
| True positive | Model predicts positive and actual outcome is positive |
| False positive | Model predicts positive but actual outcome is negative |
| True negative | Model predicts negative and actual outcome is negative |
| False negative | Model predicts negative but actual outcome is positive |
Key formulas:
\[
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
\]\[
\text{Precision} = \frac{TP}{TP + FP}
\]\[
\text{Recall} = \frac{TP}{TP + FN}
\]\[
\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\]
Metric Selection Traps
| Situation | Better Focus | Why |
|---|
| Rare fraud events | Precision, recall, PR curve, false positives | Accuracy can look high by predicting “no fraud” |
| Safety-critical missed events | Recall / false negative rate | Missing positives is costly |
| Manual review capacity is limited | Precision and alert volume | Too many false positives overwhelm teams |
| Ranking customers by risk | AUC, lift, calibration, decile analysis | Threshold may be chosen later |
| Probability used in pricing or capital | Calibration | Ranking alone is insufficient |
| Highly imbalanced data | Precision-recall metrics | ROC-AUC may appear optimistic |
Regression Metrics
| Metric | Use | Trap |
|---|
| Mean absolute error | Average absolute prediction error | Easy to interpret but treats all errors linearly |
| Mean squared error | Penalizes large errors | Sensitive to outliers |
| Root mean squared error | Error in original units | Still outlier-sensitive |
| R-squared | Share of variance explained | Can be misleading outside development context |
Validation and Independent Challenge
What Validation Should Cover
| Validation Area | Questions to Ask |
|---|
| Conceptual soundness | Does the model approach make sense for the use case? |
| Data suitability | Is the data accurate, representative, and available at decision time? |
| Implementation | Was the model coded and deployed correctly? |
| Performance | Does it perform well on relevant samples and segments? |
| Stability | Does performance hold over time and across conditions? |
| Explainability | Can users and reviewers understand drivers and limitations? |
| Fairness | Are outcomes unfairly adverse for groups or segments? |
| Limitations | Are weaknesses documented and accepted? |
| Monitoring plan | Are thresholds, owners, and actions defined? |
Validation Is Not the Same as Monitoring
| Activity | Timing | Purpose |
|---|
| Validation | Before approval and after material change | Assess fitness for intended use |
| Monitoring | Ongoing after deployment | Detect degradation, drift, misuse, or control failures |
| Audit | Periodic independent assurance | Assess whether governance and controls work |
| Development testing | During model build | Improve the model, not independently challenge it |
Common Validation Mistakes
- Validating only accuracy while ignoring data quality, implementation, fairness, and explainability.
- Treating vendor documentation as a substitute for internal review.
- Testing on random splits when time-based splits are more appropriate.
- Ignoring subgroup performance because aggregate performance looks strong.
- Approving a model without defined monitoring thresholds.
- Failing to document known limitations and compensating controls.
Explainability and Transparency
Explainability Methods
| Method | Best For | Limitation |
|---|
| Feature importance | Global driver overview | May not explain individual decisions |
| Partial dependence plot | Average effect of a feature | Can mislead when features are correlated |
| SHAP-style explanation | Local and global contribution analysis | Can be complex and computationally expensive |
| LIME-style explanation | Local approximation around one prediction | Approximation may be unstable |
| Counterfactual explanation | “What would need to change?” | Must be realistic and actionable |
| Surrogate model | Simple model approximates complex model | Approximation is not the true model |
| Model cards / documentation | Communicate intended use, data, limits | Only useful if accurate and maintained |
Explanation Audience Matters
| Audience | Needs |
|---|
| Model developer | Technical diagnostics and feature behavior |
| Validator | Assumptions, limitations, robustness, implementation evidence |
| Business owner | Decision drivers, risk trade-offs, control implications |
| Customer or affected party | Clear, actionable reason for outcome where applicable |
| Senior management | Material risk, residual risk, accountability, escalation |
Explainability Traps
- A model can be explainable but still biased.
- A model can be accurate but not appropriate for a high-impact decision.
- Global explanations do not necessarily explain a specific individual outcome.
- Removing complex algorithms does not automatically remove risk.
- More explanation is not always better; the explanation must be truthful, relevant, and usable.
Fairness, Bias, and Responsible AI
Sources of Bias
| Source | Description | Example |
|---|
| Historical bias | Past decisions reflect unfair patterns | Historical lending approvals encode discrimination |
| Sampling bias | Training data underrepresents a group | Few observations for a region or customer type |
| Measurement bias | Variables measure groups differently | Inconsistent income verification methods |
| Label bias | Target variable reflects imperfect process | Fraud labels only for investigated cases |
| Proxy bias | Neutral-looking variable captures sensitive trait | Location or education as proxy |
| Deployment bias | Model used differently than intended | Advisory score becomes automatic rejection |
Fairness Metrics: Conceptual Differences
| Concept | Basic Idea | Trap |
|---|
| Demographic parity | Similar positive prediction rates across groups | May ignore true risk differences |
| Equal opportunity | Similar true positive rates across groups | Focuses on access to favorable correct outcomes |
| Equalized odds | Similar true positive and false positive rates | Often hard to satisfy with other goals |
| Calibration by group | Predicted probabilities mean the same across groups | May conflict with parity metrics |
| Individual fairness | Similar individuals treated similarly | Requires defining “similar” appropriately |
High-Yield Fairness Decision Rules
- Do not assume fairness because protected attributes are excluded. Proxies may remain.
- Do not assume equal accuracy means equal impact. Error types may differ by group.
- Do not assume one fairness metric solves all fairness concerns. Metrics can conflict.
- Do test subgroup performance. Aggregate metrics can hide harm.
- Do connect fairness findings to governance. Someone must decide, document, and monitor residual risk.
Generative AI and LLM Risk
GenAI Risk Quick Review
| Risk | Meaning | Control |
|---|
| Hallucination | Plausible but false output | Grounding, retrieval, citations, human review |
| Prompt injection | User manipulates model instructions | Input filtering, instruction hierarchy, sandboxing |
| Data leakage | Sensitive data exposed in prompts, outputs, or logs | Data minimization, masking, access controls |
| Toxic or harmful output | Unsafe, biased, or inappropriate generation | Guardrails, moderation, red teaming |
| Model misuse | Users rely on outputs beyond intended use | Usage policy, training, disclaimers, monitoring |
| Overreliance | Human accepts output without review | Human-in-the-loop, confidence indicators |
| Model drift/version change | Provider updates affect behavior | Version tracking, regression testing |
| Retrieval error | RAG system retrieves wrong or stale context | Curated knowledge base, freshness controls |
| Automation bias | Users defer to AI recommendation | Review requirements, challenge prompts |
RAG and Grounding
Retrieval-augmented generation, or RAG, connects a generative model to external documents or databases. It can reduce hallucination, but it does not eliminate risk.
| RAG Component | Risk |
|---|
| Source documents | May be stale, wrong, or unauthorized |
| Retrieval ranking | May retrieve irrelevant context |
| Prompt assembly | May expose sensitive data |
| Generation | May distort retrieved content |
| Citation output | May cite sources incorrectly |
| User interface | May encourage overtrust |
GenAI Control Stack
| Layer | Examples |
|---|
| Use-case control | Approved use cases, prohibited uses, risk tiering |
| Data control | No sensitive data in prompts unless authorized and protected |
| Prompt control | Templates, system instructions, prompt injection defenses |
| Model control | Approved models, versioning, performance testing |
| Output control | Human review, moderation, citations, confidence warnings |
| Access control | Role-based access, logging, authentication |
| Monitoring | Usage analytics, incidents, quality samples, abuse detection |
| Incident response | Escalation, containment, user notification process where applicable |
Cyber, Operational, and Third-Party AI Risk
Adversarial and Cyber Risks
| Attack / Risk | Description | Likely Control |
|---|
| Data poisoning | Training data manipulated | Data provenance, anomaly checks, trusted sources |
| Evasion attack | Inputs crafted to avoid detection | Robust testing, adversarial testing |
| Model extraction | Attacker replicates model through queries | Rate limits, monitoring, API controls |
| Membership inference | Attacker infers whether data was in training set | Privacy controls, differential privacy where appropriate |
| Prompt injection | Malicious instructions override intended behavior | Prompt defenses, tool-use restrictions |
| Jailbreak | User bypasses safety constraints | Red teaming, guardrails, monitoring |
| Supply chain risk | Dependency, model, or library compromised | Vendor review, dependency management |
Operational Risk Questions
Ask these for any AI deployment:
- Who can access the system?
- What happens if the model fails or becomes unavailable?
- Can humans override the model?
- Are overrides tracked and reviewed?
- Is there a rollback plan?
- Are model versions controlled?
- Are inputs and outputs logged appropriately?
- Are incidents escalated?
- Are users trained on limitations?
- Is the model being used only for its approved purpose?
Third-Party and Vendor AI
| Due Diligence Area | What to Review |
|---|
| Intended use | Is the vendor solution appropriate for the business decision? |
| Data usage | What data is sent, stored, retained, or used for training? |
| Model transparency | What documentation, limitations, and testing evidence are available? |
| Security | Access controls, encryption, incident procedures |
| Resilience | Availability, service continuity, fallback options |
| Change management | How are updates communicated and tested? |
| Audit rights | Can the organization obtain needed assurance? |
| Exit strategy | Can the organization replace the service if needed? |
Monitoring, Drift, and Ongoing Control
What to Monitor
| Monitoring Area | Examples |
|---|
| Input data | Missing values, distributions, outliers, population shifts |
| Output data | Score distributions, approval rates, alert volume |
| Performance | Accuracy, recall, precision, error rates, calibration |
| Fairness | Group outcomes, error rates, adverse impact indicators |
| Stability | Drift, volatility, threshold breaches |
| Usage | Approved vs. actual use, user behavior, overrides |
| Operations | Latency, availability, failures, incidents |
| GenAI quality | Hallucination samples, unsafe outputs, user feedback |
| Security | Suspicious queries, abuse, unauthorized access |
Drift Types
| Drift Type | Meaning | Example |
|---|
| Data drift | Input distribution changes | New customer segment uses product |
| Concept drift | Relationship between inputs and target changes | Fraud patterns evolve |
| Prediction drift | Output distribution changes | Sudden spike in high-risk scores |
| Performance drift | Actual model quality declines | Recall falls after market change |
Revalidation Triggers
- Material model change.
- New data source or feature set.
- New use case or user population.
- Significant performance degradation.
- Significant drift or threshold breach.
- Vendor model update.
- Change in operating environment.
- Incident, complaint trend, or unexpected harm.
- Regulatory, policy, or governance framework change where relevant.
Risk Assessment and Control Thinking
Inherent vs. Residual Risk
| Term | Meaning |
|---|
| Inherent risk | Risk before controls |
| Control | Measure designed to prevent, detect, or correct risk |
| Residual risk | Risk remaining after controls |
| Risk appetite | Level of risk the organization is willing to accept |
| Risk tolerance | Specific thresholds or limits supporting appetite |
Control Types
| Control Type | Purpose | Example |
|---|
| Preventive | Stop issue before it occurs | Access restriction, approved feature list |
| Detective | Identify issue after or during occurrence | Drift monitoring, exception reports |
| Corrective | Fix or mitigate issue | Rollback, retraining, incident remediation |
| Compensating | Reduce risk when primary control is imperfect | Human review for low-explainability model |
Control Matching Drill
| If the Problem Is… | A Stronger Control Is Usually… |
|---|
| Unclear accountability | Named model owner and governance approval |
| Poor data lineage | Data documentation and lineage controls |
| Overfitting | Out-of-sample testing, regularization, simpler model |
| Biased outcomes | Fairness testing, feature review, governance decision |
| Hallucination | RAG, human review, output verification |
| Prompt injection | Input filtering, tool restrictions, red teaming |
| Vendor opacity | Due diligence, monitoring, contractual assurance |
| Drift | Monitoring thresholds and retraining process |
| Overreliance | Human-in-the-loop and user training |
| Unapproved usage | Access control and usage monitoring |
Scenario Decision Rules
Choose the Best Answer by Asking
- What is the primary risk? Data, model, governance, fairness, cyber, operational, or third-party?
- Where in the lifecycle is the issue? Development, validation, deployment, monitoring, or change?
- Is the control preventive, detective, or corrective?
- Who should own the action? Developer, business owner, risk function, validator, audit, vendor manager?
- Is the proposed action sufficient for materiality?
- Does the answer confuse performance with governance?
- Does the answer rely on a simplistic fix, such as “remove the variable” or “use a more accurate model”?
Common “Best Answer” Patterns
| Scenario | Strong Answer |
|---|
| Model performs well overall but poorly for one group | Investigate subgroup performance, fairness, data quality, and mitigation |
| New vendor AI tool is proposed | Conduct due diligence, assess data/security/model risk, define monitoring |
| LLM gives confident false answers | Add grounding, verification, human review, and monitoring |
| Model accuracy declines after launch | Investigate drift, data changes, implementation, and retraining triggers |
| Business wants to bypass validation to meet deadline | Escalate governance issue; do not skip independent review for material models |
| Feature is highly predictive but may be a proxy | Test for proxy effects and fairness implications |
| Model is used for a new decision | Reassess intended use, materiality, validation, and approval |
| Users ignore model limitations | Improve training, interface controls, oversight, and documentation |
Candidate Mistakes to Avoid
- Equating AI risk management with model accuracy only.
- Forgetting that data problems can dominate model problems.
- Treating explainability as the same thing as fairness.
- Assuming a black-box model is unacceptable in all cases.
- Assuming a simple model is automatically low risk.
- Ignoring human process risk around the model.
- Overlooking production implementation and monitoring.
- Thinking vendor AI removes internal responsibility.
- Choosing a technical fix when the scenario is actually a governance failure.
- Choosing a governance policy when the scenario needs a specific operational control.
- Ignoring materiality: higher-impact use cases require stronger control.
- Relying on aggregate metrics without segment analysis.
- Treating GenAI outputs as reliable because they sound confident.
- Forgetting that monitoring must have thresholds, owners, and escalation.
Fast Final Review Checklist
Before you move into topic drills or a mock exam, make sure you can answer these quickly:
- Can you distinguish supervised, unsupervised, reinforcement, and generative AI?
- Can you identify data leakage in a scenario?
- Can you choose the right metric for imbalanced classification?
- Can you explain overfitting and how to reduce it?
- Can you separate development testing, validation, monitoring, and audit?
- Can you identify fairness risks even when protected variables are removed?
- Can you match explainability tools to the audience and decision type?
- Can you name practical controls for hallucination and prompt injection?
- Can you explain why vendor AI still requires internal oversight?
- Can you identify when revalidation is needed?
- Can you connect materiality to stronger governance?
- Can you distinguish inherent risk, controls, and residual risk?
How to Use Practice Questions After This Review
Use this Quick Review as a bridge into original practice questions. For the GARP Risk and AI Certificate (RAI), efficient practice should include:
- Topic drills for data risk, validation, fairness, explainability, GenAI, and governance.
- Scenario questions that require selecting the best control, not just defining terms.
- Mock exams to practice pacing and mixed-topic recognition.
- Detailed explanations to understand why tempting answers are incomplete or misaligned.
Next step: work through independent companion practice questions by topic, review every explanation, and keep a short error log of missed concepts, especially around data leakage, model monitoring, fairness trade-offs, vendor risk, and generative AI controls.