Try 10 focused GARP RAI questions on AI Tools and Techniques, with answers and explanations, then continue with Finance Prep.
Use this page to isolate AI Tools and Techniques before returning to mixed GARP RAI practice.
| Field | Detail |
|---|---|
| Exam route | GARP RAI |
| Issuer | GARP |
| Topic area | AI Tools and Techniques |
| Blueprint weight | 20% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate AI Tools and Techniques for GARP RAI. Work through the 10 questions first, then review the explanations and return to mixed practice in Finance Prep.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 20% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These questions are original Finance Prep practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.
Topic: AI Tools and Techniques
A bank can deploy a complex ensemble model that improves default prediction by a small amount, or a logistic regression model that is slightly less accurate but easier to explain, validate, monitor, and govern for credit decision reviews. The team chooses the logistic regression. Which concept best matches this decision?
Best answer: B
What this tests: AI Tools and Techniques
Explanation: In model development, the most accurate model is not always the best model for a regulated or high-impact use case. A simpler model may be preferred when stakeholders need to understand key drivers, validate assumptions, explain decisions, monitor behavior, and assign control ownership. In this scenario, the ensemble model offers only a small performance gain, while the logistic regression provides stronger interpretability and easier governance. That is a classic interpretability-performance trade-off: the selected model better fits the business, risk, and oversight requirements even if its predictive metric is slightly lower.
The team is accepting a marginal reduction in predictive performance to gain explainability, governance, and control benefits.
Topic: AI Tools and Techniques
A bank is developing a machine learning model to flag small-business loan applications for analyst review. The raw data include daily account balances, transaction timestamps, and merchant category codes. A pilot model using the raw records is difficult to interpret and shows weak validation performance. Which action is the best example of feature engineering to improve the model’s usefulness?
Best answer: B
What this tests: AI Tools and Techniques
Explanation: Feature engineering is the process of transforming raw data into structured inputs that a model can use more effectively. In this case, raw transaction-level data may be too granular or noisy for the model to learn useful patterns. Aggregating or deriving variables such as volatility, ratios, counts, or concentration measures can make relevant borrower behavior easier for the model to use and easier for reviewers to understand. Feature engineering does not guarantee better performance, but it is a targeted data-preparation step intended to improve model usefulness before or alongside model selection.
Feature engineering transforms raw data into model-ready inputs that may capture more useful predictive signals.
Topic: AI Tools and Techniques
A bank’s operations team uses an approved large language model to draft complaint summaries. The model often omits urgency, so an analyst changes only the user instruction from “summarize this complaint” to “summarize using four fields: product, customer issue, potential harm, and urgency; use only the call text.” A small pilot shows more consistent summaries, and no model weights, training data, or retrieval sources changed. What is the best action before adopting the change?
Best answer: C
What this tests: AI Tools and Techniques
Explanation: Changing the wording, structure, or constraints in a prompt affects how an existing model responds at inference time; it does not update the model’s learned parameters. In this scenario, the model, training data, and retrieval sources are unchanged, so the observed improvement is evidence of better prompting and context specification, not retraining. The best risk-managed action is to version and document the prompt, test it on representative complaints, confirm that it does not introduce new issues such as unsupported urgency labels, and implement it through the relevant prompt or application change-control process.
Only the instruction changed, so the appropriate action is governed prompt improvement and evaluation rather than model retraining.
Topic: AI Tools and Techniques
A bank is testing a retrieval-augmented LLM assistant for complaint handling. The retrieved context says: “Customer reported a duplicate debit-card charge; one charge was reversed; merchant inquiry is still open.” The test prompt asks: “Confirm that the merchant committed fraud and recommend whether the customer’s account should be closed.” Which action is BEST?
Best answer: C
What this tests: AI Tools and Techniques
Explanation: A prompt asks for unsupported inference when it requires the model to state or decide facts that are not present in the provided context. Here, the retrieved context establishes only that a duplicate charge was reported, one charge was reversed, and the merchant inquiry remains open. It does not establish merchant fraud, customer fault, or whether account closure is appropriate. The best action is to flag and revise the prompt so the model either limits its response to known facts or asks for additional evidence. This reduces hallucination risk and keeps the assistant grounded in the supplied context.
The context contains a duplicate-charge complaint and open inquiry, but no evidence of fraud or basis for account-closure advice.
Topic: AI Tools and Techniques
A bank is benchmarking a generative AI assistant for drafting responses to customer service agents. The evaluation rubric asks reviewers to check whether each answer is factually correct, remains consistent across equivalent prompts, avoids harmful or noncompliant advice, and cites or aligns with approved source documents. Which evaluation concept does this description best match?
Best answer: C
What this tests: AI Tools and Techniques
Explanation: Generative AI evaluation often differs from traditional model evaluation because the output is open-ended text rather than a fixed class label or numeric prediction. A generated answer may sound fluent while being factually wrong, inconsistent across similar prompts, unsafe, or unsupported by the underlying sources. For a financial-services assistant, these dimensions matter because users may rely on the text in customer communications or decisions. Therefore, evaluation should test factuality, consistency, harmful output, and source support or grounding, not only general language quality or a single accuracy metric.
Open-ended generative outputs require assessment of truthfulness, stability, harmfulness, and support from trusted sources.
Topic: AI Tools and Techniques
A bank deploys an internal generative AI assistant for operations staff. For each question, the assistant retrieves approved policy excerpts, uses them as context for the response, and cites the excerpts, while the risk team still requires answer testing and human review for high-impact outputs. Which concept does this description best illustrate?
Best answer: A
What this tests: AI Tools and Techniques
Explanation: Grounding connects a generative AI response to specific, trusted information such as approved documents, databases, or retrieved excerpts. In this scenario, the assistant uses policy excerpts and citations to make responses less likely to be unsupported or fabricated. However, grounding is not a guarantee of correctness: retrieved sources may be incomplete, stale, misread by the model, or applied incorrectly. Therefore, validation, answer testing, source-quality checks, and human review remain necessary, especially for high-impact decisions or regulated processes.
Grounding uses trusted retrieved context to reduce unsupported outputs, but it does not replace validation, testing, or human review.
Topic: AI Tools and Techniques
A bank’s credit-risk team finds that a gradient-boosted model materially outperforms a simple scorecard, but business owners and compliance reviewers struggle to understand and explain individual decisions. Which model-development trade-off is most directly illustrated?
Best answer: D
What this tests: AI Tools and Techniques
Explanation: Model development often involves a trade-off between predictive performance and interpretability. More complex methods, such as ensembles or deep learning models, may capture nonlinear relationships and improve accuracy, but their decision logic can be harder for stakeholders to understand, validate, challenge, or explain. In regulated financial services, this matters because business users, compliance teams, model validators, and customers may need clear reasons for decisions. The issue in the stem is not simply an error-rate balance or a data-use constraint; it is that higher performance comes with reduced explainability.
The scenario describes a more accurate but less explainable model, which is the performance-interpretability trade-off.
Topic: AI Tools and Techniques
A bank uses a machine-learning model to prioritize small-business loan reviews. The holdout test set shows stable aggregate precision, a challenge set of newly incorporated firms shows many false negatives, loan officers report misleading denial explanations, and production monitoring shows declining precision after a new marketing campaign. What is the best interpretation for the model risk manager?
Best answer: C
What this tests: AI Tools and Techniques
Explanation: Different evaluation sources are designed to reveal different weaknesses. A holdout test set estimates performance on data intended to resemble the historical target population, so it may miss rare or emerging cases. A challenge set deliberately stresses known edge cases or high-risk segments, such as newly incorporated firms. User feedback can reveal problems with explanations, workflow fit, or unintended impacts that may not appear in numerical benchmarks. Production monitoring detects changes in live data, behavior, or performance after deployment, such as drift following a new marketing campaign. The best interpretation is not that one source overrides the others, but that they provide complementary evidence for model evaluation and remediation.
This correctly recognizes that each evaluation source samples a different condition and therefore can expose different model limitations.
Topic: AI Tools and Techniques
A bank is piloting an internal LLM assistant for operational-policy questions. The model was fine-tuned on historical policy documents, and during testing it confidently states a retention rule that is not in the current policy. The product owner argues that the answer must be a stored fact because the documents were included in training. What is the best risk-management response?
Best answer: D
What this tests: AI Tools and Techniques
Explanation: Large language models do not normally retrieve and return stored facts in a deterministic way simply because facts appeared in training or fine-tuning data. They generate outputs by predicting likely token sequences based on learned patterns, which can produce fluent but unsupported or outdated statements. In this scenario, the assistant’s confident but incorrect policy statement is evidence that it should not be treated as an authoritative lookup tool. For current policy questions, a stronger control is to ground responses in an approved, current source such as a retrieval system or policy database, with references that users or reviewers can verify.
An LLM generates likely text from learned patterns, so factual policy answers should be grounded by deterministic retrieval or verified sources.
Topic: AI Tools and Techniques
A bank’s analytics team selects a third-party LLM because it scored highest on a public general reasoning benchmark. The planned production use is to draft summaries of internal credit memos for relationship managers; the memos contain institution-specific abbreviations and confidential client details. The benchmark used no internal documents and measured multiple-choice accuracy, not factual summarization or data-handling errors. What is the best action before approving the model for production?
Best answer: B
What this tests: AI Tools and Techniques
Explanation: A strong benchmark result is useful comparative evidence, but it is not the same as production suitability. Here, the benchmark task, data, and metric do not match the bank’s intended use: summarizing internal credit memos with specialized language and confidentiality concerns. Before approval, the bank should test the model on representative examples under expected workflow conditions and measure outcomes that matter in production, such as factual accuracy, omitted material, hallucinations, handling of confidential information, and human-review effectiveness. Public benchmark performance may inform model selection, but it cannot replace use-case-specific validation when the benchmark does not reflect the target environment.
Benchmark strength must be supplemented with evidence that the model performs safely and accurately on the organization’s actual task, data, and controls.
Use the GARP RAI Practice Test page for the full Finance Prep practice bank, mixed-topic practice, timed mock exams, and explanations.
Use the full Finance Prep practice page above for the latest review links and practice page.