Free AWS AIF-C01 Full-Length Practice Exam: 65 Questions

Try 65 free AWS AIF-C01 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length AWS AIF-C01 practice exam includes 65 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the AWS AIF-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AWS AIF-C01 on Web View full AWS AIF-C01 practice page

Exam snapshot

  • Exam route: AWS AIF-C01
  • Practice-set question count: 65
  • Time limit: 90 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Fundamentals of AI and ML20%
Fundamentals of Genai24%
Applications of Foundation Models28%
Guidelines for Responsible AI14%
Security, Compliance, and Governance for AI Solutions14%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Fundamentals of Genai

An insurance company wants to use generative AI to reduce the time it takes to process customer claims. Claim decisions can have legal and financial impact, and the company must be able to explain how each decision was made.

Which proposed use of GenAI is INCORRECT and should be avoided?

Options:

  • A. Use a Bedrock Knowledge Base (RAG) to answer adjuster questions with citations to policy documents

  • B. Automatically approve or deny claims end-to-end based only on the model’s output, without human review

  • C. Draft customer response emails with Bedrock Guardrails and require an agent to approve before sending

  • D. Use Amazon Bedrock to summarize claim documents and highlight missing information for an adjuster to review

Best answer: B

Explanation: For high-impact decisions like approving or denying insurance claims, GenAI is best used to augment trained staff rather than replace them end-to-end. Models can hallucinate or produce inconsistent rationales, so automated decisions without human oversight create unacceptable risk and weaken explainability and auditability requirements.

The core limitation of GenAI for business problems is that it can generate plausible-sounding content that is not reliably correct or explainable. When outcomes have legal/financial impact, GenAI should generally augment humans (summaries, drafting, guided Q&A, recommendations with sources) while keeping a qualified human as the decision maker.

A safer pattern is:

  • Ground outputs in enterprise knowledge (for example, RAG with citations).
  • Apply safety controls (for example, Bedrock Guardrails) and logging/auditing.
  • Keep human review for final decisions, especially when errors are costly.

Fully automating claim approvals/denials based solely on model output is an anti-pattern because it removes the human-in-the-loop control needed to manage hallucinations, bias, and accountability.

  • Summarization for review is an augmentation task and keeps accountability with the adjuster.
  • RAG with citations improves reliability and supports explainability for the human decision maker.
  • Guardrails + human approval helps reduce unsafe content while maintaining human control over external communications.

Question 2

Topic: Applications of Foundation Models

A company uses a large foundation model to draft short customer-support replies. The app must respond in under 150 ms and stay within a fixed monthly inference budget, but the current model is too slow and expensive. The company wants to keep the same response quality while using a smaller, faster model.

Which approach is the BEST fit?

Options:

  • A. Use Bedrock Knowledge Bases for RAG with the same large model

  • B. Add Bedrock Guardrails to reduce tokens and improve latency

  • C. Use Bedrock Provisioned Throughput on the large model

  • D. Use SageMaker AI model distillation to create a smaller model

Best answer: D

Explanation: Model distillation is used to transfer behavior from a larger, higher-quality model to a smaller model. This is a good fit when the goal is lower latency and lower inference cost while maintaining similar output quality. Using SageMaker AI for distillation aligns with creating and deploying an optimized smaller model for production inference.

Distillation is a model compression technique where a large “teacher” model generates guidance (often target outputs or probability distributions) that is used to train a smaller “student” model to behave similarly. In this scenario, the constraints are strict latency and a fixed inference budget, so the best solution is to reduce inference compute by moving to a smaller model while preserving quality as much as possible. Distillation directly targets that tradeoff because it aims to keep the teacher model’s capabilities while producing a smaller, faster model that is cheaper to run.

Key takeaway: distillation changes the model size/performance profile, whereas retrieval, safety guardrails, or capacity reservations do not inherently make the underlying model smaller.

  • RAG mismatch adds retrieval for better grounding, not a smaller/faster core model.
  • Safety controls guardrails help manage undesirable outputs, not reduce model compute meaningfully.
  • Capacity reservation provisioned throughput can help with consistent performance but doesn’t reduce per-request cost or model size to meet the budget goal.

Question 3

Topic: Applications of Foundation Models

A team is iterating on a prompt for an Amazon Bedrock chat assistant that answers questions about a company’s return/refund policy. They ran a quick evaluation and collected the following summary.

Exhibit: Prompt evaluation summary

Test prompts: 20
Prompt coverage: standard return requests only (no edge cases)
Scoring used: ROUGE-L vs. reference answers
Results: ROUGE-L improved from 0.41 to 0.55
Known issue (from support tickets): incorrect refund eligibility answers

Based only on the exhibit, which is the best next step to evaluate response quality more effectively?

Options:

  • A. Create a broader test set (including edge cases) and add a factuality/groundedness quality check in addition to ROUGE-L

  • B. Increase the model temperature to improve answer variety and re-run ROUGE-L

  • C. Fine-tune the foundation model on the 20 reference answers to reduce ticket volume

  • D. Deploy the new prompt because ROUGE-L increased from 0.41 to 0.55

Best answer: A

Explanation: The exhibit indicates the current evaluation is not aligned to the real failure mode: support tickets report incorrect eligibility decisions. Because the test prompts cover only standard cases and the only metric is ROUGE-L, the team should expand test prompts to include edge cases and measure factual correctness/grounding, not just text overlap.

High-level prompt evaluation works best when your test prompts represent real usage (including difficult edge cases) and your measurements match the quality you care about. Here, the exhibit shows the test set is small and only covers “standard return requests,” while production feedback reports “incorrect refund eligibility answers.” ROUGE-L is a text-overlap metric and can improve even when answers are factually wrong.

A better evaluation next step is to:

  • Add test prompts that cover edge cases and tricky policy scenarios
  • Add a correctness-focused measure (for example, groundedness/factuality checks using human review or an LLM-as-judge rubric)

This aligns the evaluation with the known issue rather than optimizing only for ROUGE-L.

  • ROUGE-L = quality is risky because the exhibit lists incorrect eligibility answers even though ROUGE-L improved.
  • Change temperature does not address the exhibit’s gaps in prompt coverage or correctness measurement.
  • Fine-tune now is premature because the exhibit indicates the evaluation setup (prompts/metrics) is insufficient to validate quality changes.

Question 4

Topic: Fundamentals of Genai

A company is building a GenAI clinical assistant using Amazon Bedrock. Review the exhibit.

Exhibit: Deployment notes
1) Users: Frankfurt (EU) and Virginia (US)
2) Compliance: EU patient data must be processed in the EU
3) Current stack Region: us-east-1
4) Measured p95 RTT: Frankfurt->us-east-1 210 ms; Frankfurt->eu-central-1 35 ms
5) Selected Bedrock model is available in: us-east-1, eu-central-1
6) Availability target: tolerate loss of one AWS Region

Which action is the best next step to meet the requirements?

Options:

  • A. Invoke the model only in us-east-1 and rely on KMS encryption

  • B. Keep the stack in us-east-1 and add CloudFront caching in EU

  • C. Use S3 Cross-Region Replication to copy prompts to us-east-1

  • D. Deploy active-active in us-east-1 and eu-central-1 with latency routing

Best answer: D

Explanation: A multi-Region deployment is needed because EU data must be processed in an EU Region and EU users have significantly lower RTT to eu-central-1. The exhibit also shows the Bedrock model is available in eu-central-1, enabling in-Region inference, and an availability goal that requires tolerating a Region failure.

Regional coverage affects three core outcomes: latency (distance to the Region), compliance (where data is processed), and availability (resilience to a Region outage). The exhibit states EU patient data must be processed in the EU (line 2) and shows much lower RTT from Frankfurt to eu-central-1 than to us-east-1 (line 4). Because the selected Bedrock model is available in both us-east-1 and eu-central-1 (line 5), the application can run inference in eu-central-1 for EU users to meet data residency and latency needs. Finally, the requirement to tolerate loss of one AWS Region (line 6) implies a multi-Region design (for example, active-active) with routing to the nearest healthy Region.

Key takeaway: choose Regions where the model is available and route users so data stays in-region while improving latency and availability.

  • Edge caching misconception: CloudFront can reduce web content latency but does not change where model inference and data processing occur for compliance (line 2).
  • Encryption vs residency: Encrypting data with KMS does not satisfy a requirement that processing occurs in the EU (line 2).
  • Replicating prompts: Copying prompts to us-east-1 directly conflicts with EU-only processing and does not address EU latency (lines 2 and 4).

Question 5

Topic: Security, Compliance, and Governance for AI Solutions

A company uses Amazon Bedrock to add a generative AI assistant to its application. The security team notes that AWS secures the underlying Bedrock service infrastructure, while the company must configure IAM permissions, encrypt any customer-provided prompts stored in Amazon S3, and control access to logs.

Which security principle does this statement describe?

Options:

  • A. AWS shared responsibility model

  • B. Least privilege access

  • C. Defense in depth

  • D. Data minimization

Best answer: A

Explanation: This is the AWS shared responsibility model applied to an AI service. AWS is responsible for security of the managed Bedrock service and its underlying infrastructure, while the customer is responsible for security in the cloud, such as IAM configuration, data encryption, and access controls around stored prompts and logs.

The AWS shared responsibility model divides security ownership between AWS and the customer. For managed AI services such as Amazon Bedrock, AWS is responsible for the security “of” the cloud (facilities, hardware, software, and managed service operations). The customer is responsible for security “in” the cloud, including configuring IAM permissions, choosing how and where data is stored, encrypting that data (for example, with AWS KMS when storing prompts in Amazon S3), and controlling access to application and audit logs. The key idea is that using a managed AI service reduces what you operate, but it does not remove your responsibility to securely configure and govern your data and access.

  • Least privilege focuses on granting only the minimum permissions, not on dividing responsibilities between AWS and the customer.
  • Defense in depth is about using multiple layers of controls; it doesn’t specifically define who is responsible for which layers.
  • Data minimization is about collecting/retaining the minimum necessary data; it doesn’t describe AWS vs customer responsibility boundaries.

Question 6

Topic: Applications of Foundation Models

A team is evaluating a foundation model that produces short summaries of customer support chats. They have human-written reference summaries for a test set and want an automated metric that scores similarity between the model summaries and the references based on overlapping text units. Which metric is most appropriate, and what does it measure at a high level?

Options:

  • A. ROUGE; overlap of n-grams (and related units) between generated and reference summaries

  • B. BLEU; overlap of n-grams between generated and reference summaries

  • C. Perplexity; how well the model predicts the next token on the test set

  • D. AUC; the model’s ability to rank positive examples above negative examples

Best answer: A

Explanation: ROUGE is an appropriate metric when you have reference summaries and want an automated score based on textual overlap with the model’s generated summaries. At a high level, it measures how much the generated summary matches the reference summary using overlap of units such as n-grams (and sometimes longest common subsequence).

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a standard automatic evaluation approach for text summarization when you can compare generated summaries to one or more human-written reference summaries. It measures similarity primarily through lexical overlap, such as matching n-grams (ROUGE-N) and, in some variants, overlap based on sequences (for example, ROUGE-L uses longest common subsequence). Higher ROUGE scores generally indicate the generated summary shares more content with the references, which is useful for quick, repeatable comparisons across model versions. A key takeaway is that ROUGE evaluates overlap with reference summaries rather than model confidence or classification ranking quality.

  • BLEU is translation-focused and is more commonly used for machine translation than summarization.
  • Perplexity measures prediction fit to text, not overlap against reference summaries.
  • AUC is for classification/ranking and does not evaluate summary similarity to references.

Question 7

Topic: Fundamentals of Genai

A support organization currently uses a predictive ML model on Amazon SageMaker to classify incoming customer tickets (for example, “billing”, “technical”, “account”). A Lambda function then selects a prewritten response template based on the predicted label.

The team wants to reduce ongoing template maintenance and produce more natural, personalized draft replies. They are not allowed to train new models and must add controls to reduce the chance of returning PII or unsafe content. Which change best meets these requirements with the least operational effort?

Options:

  • A. Use Amazon Comprehend to extract entities and sentiment, then assemble replies from templates

  • B. Train a custom text-generation model in SageMaker using past agent replies

  • C. Improve the ticket classifier accuracy and expand the library of response templates

  • D. Use Amazon Bedrock to generate draft replies, grounding responses with a Knowledge Base and adding Bedrock Guardrails

Best answer: D

Explanation: The requirement is to generate natural draft replies, which is a generative AI use case (creating new text) rather than a predictive ML use case (predicting labels). Amazon Bedrock provides managed access to foundation models without training, and a Knowledge Base (RAG) plus Bedrock Guardrails improves relevance and safety while keeping operations low.

Generative AI is used to create new content (for example, drafting a reply), while predictive ML is used to predict an outcome from inputs (for example, assigning a ticket category). In the current design, the model output is a label and the “reply” comes from maintained templates, which drives operational overhead and limits natural language quality.

Using Amazon Bedrock shifts the solution to GenAI for text generation, and it can meet the constraints without model training by:

  • Grounding the model on approved support content using a Knowledge Base (RAG) to reduce hallucinations
  • Applying Bedrock Guardrails to help block unsafe outputs and redact or avoid PII

Key takeaway: when the primary output must be newly generated text, a foundation model-based GenAI approach is more appropriate than improving a classifier-based predictive workflow.

  • Better classification still produces labels, so templates remain the main mechanism and ongoing maintenance stays high.
  • Custom training violates the “not allowed to train new models” constraint and increases operational burden.
  • Entity/sentiment extraction is predictive NLP and still relies on assembling templates rather than generating fluent replies end-to-end.

Question 8

Topic: Fundamentals of Genai

Which option best describes a GenAI agent use case (for example, using Agents for Amazon Bedrock) rather than a simple Q&A assistant?

Options:

  • A. Answers HR policy questions from an uploaded PDF

  • B. Classifies customer emails into support categories

  • C. Forecasts next month’s product demand from sales history

  • D. Automatically completes a refund by calling business APIs

Best answer: D

Explanation: A GenAI agent is designed to go beyond answering questions by selecting tools and taking actions to achieve a user goal. The best example is a workflow where the model calls approved business APIs (for example, ticketing or payments) to complete a multi-step task on the user’s behalf.

The core concept is the difference between an assistant and an agent. A GenAI assistant primarily generates text responses (Q&A, summarization, drafting) and may use retrieval (RAG) to ground answers in enterprise content. A GenAI agent adds “action-taking” behavior: it can break a request into steps, decide which tools to use, and invoke APIs or workflows to complete tasks with guardrails and permissions. In AWS, this pattern is commonly implemented with Agents for Amazon Bedrock, where the model orchestrates tool calls to perform operations like checking order status, creating tickets, or processing returns. The key distinguishing feature is autonomous tool use to accomplish an outcome, not just producing an answer.

  • Grounded Q&A (RAG) is assistant behavior: it retrieves documents to answer questions.
  • Email classification is typically a traditional ML/NLP task (for example, Amazon Comprehend).
  • Demand forecasting is a predictive ML use case, commonly done with purpose-built ML services rather than an LLM agent.

Question 9

Topic: Applications of Foundation Models

A company is iterating on a prompt template used with Amazon Bedrock to draft customer-support replies. The team needs a repeatable way to evaluate whether each prompt change improves response quality over time.

Which approach is the best practice for evaluation, given this requirement for repeatability?

Options:

  • A. Measure improvement by comparing prompt length and model output token counts between versions

  • B. Create a version-controlled test set of representative and edge-case prompts and score outputs with a consistent rubric

  • C. Rely on developers to try a few new, ad hoc prompts after each change

  • D. Use only production user feedback and support ticket volume to judge prompt quality

Best answer: B

Explanation: Repeatable evaluation requires comparing prompt versions on the same inputs using the same criteria. A curated, version-controlled set of test prompts (including edge cases) combined with a consistent scoring rubric makes changes measurable and supports prompt regression testing over time.

The core evaluation best practice for prompt engineering is to test prompt changes against a stable, representative set of prompts and to measure quality with consistent criteria. In this scenario, the team’s discriminating requirement is repeatability across iterations, so they should build a “golden set” of prompts that covers typical requests and known failure modes, then score responses using a rubric (often with human review, optionally augmented by automated checks).

  • Collect representative prompts plus edge/adversarial cases
  • Define a rubric (for example, correctness, tone, policy adherence)
  • Run the same test set for each prompt version
  • Compare scores and investigate regressions

This is more reliable than judging quality from changing, ad hoc prompts or indirect proxies like token counts.

  • Ad hoc prompting is not repeatable, so results vary with whoever tests and which prompts they choose.
  • Production-only signals are delayed and confounded by seasonality, user mix, and other changes.
  • Token counts measure verbosity/cost, not whether responses are correct, safe, or helpful.

Question 10

Topic: Applications of Foundation Models

A team is evaluating retrieval-augmented generation (RAG) for a generative AI assistant. Which TWO statements are INCORRECT or unsafe descriptions of when RAG is a good fit? (Select TWO.)

Options:

  • A. RAG is primarily used to retrain the model on new documents

  • B. Reducing hallucinations by grounding responses in retrieved context

  • C. Enterprise Q&A over internal wikis, PDFs, and runbooks

  • D. RAG removes the need for document access controls

  • E. Employee policy and compliance guidance lookup from internal documents

  • F. Customer support assistants grounded in up-to-date knowledge articles

Correct answers: A and D

Explanation: RAG is best when a model must answer using external, frequently changing knowledge such as internal documentation, policies, or support content. It works by retrieving relevant passages at request time and providing them as context to the model. It does not replace model training, and it must still respect enterprise data access controls.

RAG is a design pattern for “grounded” generation: the application retrieves relevant enterprise content (for example, wiki pages, PDFs, policy docs, or support articles) and supplies that context to a foundation model so answers reflect the latest source material.

RAG is a good fit for:

  • Enterprise Q&A and knowledge retrieval
  • Policy/procedure lookup and summarization for employees
  • Support assistants that use the latest KB content
  • Reducing hallucinations by grounding responses in retrieved passages

RAG is not the same as updating model weights (training/fine-tuning), and it does not eliminate security requirements—authorization, least privilege, and data governance must still be enforced on the retrieved content.

  • Retraining misconception confuses retrieval at inference time with updating model weights through training or fine-tuning.
  • Access-control bypass is unsafe because RAG must still enforce permissions on source documents and results.
  • Enterprise Q&A/policy lookup are classic RAG use cases because answers must reference internal knowledge.
  • Support grounding fits RAG because KB content changes and should be reflected without retraining.

Question 11

Topic: Applications of Foundation Models

A team is reviewing an Amazon Bedrock chatbot before launch. For a sample of real user questions, they ask multiple reviewers to score each model response for helpfulness, correctness, and tone using a rubric, then compare reviewer agreement and summarize the results.

Which evaluation approach is the team using?

Options:

  • A. Adversarial red teaming

  • B. Offline benchmark evaluation

  • C. Automated metric-based evaluation

  • D. Human evaluation

Best answer: D

Explanation: The described practice is human evaluation: people review model outputs and score them against a rubric (for example, helpfulness, correctness, tone). This approach is commonly used to assess subjective and context-dependent qualities of foundation model responses and to validate behavior on realistic prompts.

Human evaluation is an approach to assess foundation model output quality by having people judge responses against defined criteria (a rubric), often using multiple reviewers to reduce individual bias and measure consistency (inter-rater agreement). It’s especially useful for qualities that are hard to capture with simple automated scores—such as tone, helpfulness, and nuanced instruction-following—and for checking performance on representative, real-world prompts.

A typical high-level process is:

  • Select a representative prompt set
  • Define rating criteria and a rubric
  • Collect ratings from multiple reviewers
  • Aggregate results and review disagreements

Compared with offline benchmarks or automated metrics, the key differentiator is that humans directly assess the usefulness and appropriateness of the generated outputs.

  • Automated metrics rely on computed scores (for example, similarity or task metrics) rather than human judgment.
  • Adversarial red teaming focuses on intentionally probing for unsafe or policy-violating behavior with challenging prompts.
  • Offline benchmarks use standardized datasets and predefined scoring to compare models, not reviewer rubrics on sampled outputs.

Question 12

Topic: Applications of Foundation Models

A company uses an Amazon Bedrock text model to draft short responses for a customer support team. For the same customer message, the model’s wording and included details vary between invocations, and it occasionally invents policy details. The company wants more consistent, safer responses with minimal engineering effort and without adding new data sources or model training.

Which change best meets these requirements?

Options:

  • A. Add a Knowledge Base for Amazon Bedrock for RAG grounding

  • B. Fine-tune a custom model on historical support transcripts

  • C. Lower temperature and top_p for more deterministic outputs

  • D. Switch to a larger, more capable foundation model

Best answer: C

Explanation: To reduce hallucinations and increase consistency, adjust inference sampling parameters to make generation more deterministic. Lowering temperature and/or top_p reduces randomness in token selection, producing more repeatable responses and fewer invented details. This meets the constraints because it requires no new data sources, training, or additional infrastructure.

Inference parameters control how “creative” or “deterministic” an FM is at runtime. When an application needs consistent wording and should avoid making up details, reduce randomness in decoding by lowering sampling parameters such as temperature (and often top_p). This pushes the model to pick higher-probability tokens more consistently across repeated calls, which generally improves repeatability and reduces hallucinated additions.

A practical approach is:

  • Decrease temperature (often toward 0) to reduce variation.
  • If supported, reduce top_p to narrow the candidate token set.
  • Keep prompts explicit about using only provided information.

This change is lightweight and typically does not add operational overhead compared with adding retrieval or training.

  • Fine-tuning overhead adds training effort and data handling, violating the minimal-change constraint.
  • RAG adds components requires a knowledge source/index, which the scenario disallows.
  • Bigger model tradeoff can increase cost/latency and doesn’t directly enforce consistency.

Question 13

Topic: Guidelines for Responsible AI

A company is deploying a customer-support chatbot using Amazon Bedrock and an internal knowledge base built from documents in Amazon S3. The company wants to improve transparency for reviewers and business stakeholders.

Which action is NOT an appropriate use of documentation (such as model cards and data cards) to increase transparency?

Options:

  • A. Create a data card that records dataset sources, collection methods, and known gaps

  • B. Maintain versioned documentation when prompts, models, or data sources change

  • C. Publish a model card describing intended use, limitations, and evaluation results

  • D. Skip creating model/data documentation because the model is managed by AWS

Best answer: D

Explanation: Model cards and data cards exist to make an AI system’s purpose, limitations, evaluation, and data provenance understandable to others. Using a managed foundation model does not remove the need to document how the application uses the model and what data it relies on. Transparency improves reviewability, governance, and responsible use over time.

The core idea is transparency: stakeholders should be able to understand what a model-based system is for, what it is not for, how it was evaluated, and what data influences its behavior.

Model cards typically capture items like intended use, performance/evaluation summaries, known limitations, and risks. Data cards focus on dataset provenance, how data was collected/processed, and known quality or coverage issues. Even when using Amazon Bedrock (managed models), you still need documentation for your specific implementation (for example, prompt patterns, knowledge base sources, and operational constraints). The key takeaway is that managed service ownership does not eliminate the customer’s responsibility to provide clear, accessible documentation for their AI application.

  • Managed service misconception fails because transparency applies to the full application, not only the model provider.
  • Document intended use and evaluation supports review and appropriate usage decisions.
  • Document data provenance and gaps increases traceability and helps assess bias/quality risks.
  • Versioned documentation supports auditability as models, prompts, and data evolve.

Question 14

Topic: Guidelines for Responsible AI

Which TWO statements about model generalization are INCORRECT? (Select TWO.)

Options:

  • A. Overfitting is primarily caused by high bias, so simplifying the model is the main fix.

  • B. Underfitting happens when a model has too much variance and memorizes training noise.

  • C. Overfitting often shows low training error but higher error on new data.

  • D. Increasing model complexity tends to reduce bias and increase variance, raising overfitting risk.

  • E. Underfitting often shows high error on both training and new data.

  • F. Using a held-out validation set or cross-validation helps detect overfitting before deployment.

Correct answers: A and B

Explanation: Overfitting and underfitting are outcomes of the bias–variance tradeoff. Overfitting is typically low bias and high variance (fits noise), while underfitting is typically high bias and low variance (too simple to learn the signal). The incorrect statements swap these associations or overstate a single “main fix.”

The core idea is the bias–variance tradeoff and how it affects generalization (a responsible AI concern because poor generalization can cause unreliable or unfair outcomes in real use).

  • Overfitting: the model learns training details/noise, so training performance can look great but performance degrades on unseen data; this is commonly associated with low bias and high variance.
  • Underfitting: the model is too simple or constrained to learn the underlying pattern, so it performs poorly even on the training data; this is commonly associated with high bias and low variance.
  • Model complexity often decreases bias but increases variance, which can shift a model toward overfitting.
  • Validation approaches (held-out sets, cross-validation) help detect generalization gaps before deployment.

Key takeaway: overfitting - high variance, underfitting - high bias.

  • Bias/variance swapped: The claim that overfitting is primarily high bias is incorrect; overfitting is mainly high variance.
  • Wrong definition: Memorizing training noise describes overfitting, not underfitting.
  • Generalization symptoms: Low training error with worse new-data error is a standard overfitting pattern.
  • Responsible evaluation: Validation or cross-validation is a common way to identify overfitting risk early.

Question 15

Topic: Security, Compliance, and Governance for AI Solutions

A team is building an AI data pipeline on AWS using Amazon S3 as the data lake. Which TWO statements are NOT secure data engineering best practices for protecting dataset access, integrity, and quality? (Select TWO.)

Options:

  • A. Encrypt S3 objects with SSE-KMS and control key access

  • B. Automate data quality checks and quarantine records that fail validation

  • C. Make the S3 bucket public-read to simplify access for analysts and jobs

  • D. Use IAM least privilege and role-based access for datasets

  • E. Disable versioning to allow silent overwrites when data needs correction

  • F. Use checksums and S3 versioning to detect and recover from tampering

Correct answers: C and E

Explanation: Secure data engineering emphasizes least-privilege access control, encryption, integrity controls, and quality gates before data is used by AI workloads. Making datasets publicly readable is unsafe because it removes strong authorization boundaries. Disabling object history (such as S3 versioning) weakens tamper detection and recovery.

Secure data engineering for AI focuses on preventing unauthorized access, ensuring data hasn’t been altered, and blocking low-quality data from entering downstream model workflows. Using IAM roles with least privilege limits who and what can read/write datasets. Encrypting S3 objects with SSE-KMS adds a strong protection layer and lets you control decryption via KMS key policies and grants. Integrity controls such as checksums plus S3 versioning help detect unexpected changes and support rollback during incident response or pipeline errors. Data quality validation (schema/range/null checks) and quarantining failures reduces the chance of training or RAG on corrupted or poisoned inputs. In contrast, public-read buckets and disabling versioning both reduce control and traceability, increasing breach and tampering risk.

  • Public-read access is unsafe because it removes least-privilege authorization boundaries.
  • Disabling versioning is unsafe because it weakens tamper detection and rollback.
  • SSE-KMS encryption helps protect data at rest and centralizes key access control.
  • Quality checks with quarantine prevent bad or suspicious records from propagating to AI workloads.

Question 16

Topic: Fundamentals of AI and ML

A company uses an ML model to support two workflows: (1) score each payment transaction during checkout and return a decision in under 200 ms, and (2) score 50 million historical transactions every night and store results in Amazon S3 for reporting.

Which recommendation is INCORRECT for choosing between batch inference and real-time inference?

Options:

  • A. Use a real-time inference endpoint for the checkout transaction decision

  • B. Use batch inference to score the nightly 50 million historical records and write outputs to Amazon S3

  • C. Use batch inference for the checkout transaction decision because the predictions can be processed later

  • D. Use batch inference when the application can tolerate delayed results and wants to optimize cost per prediction

Best answer: C

Explanation: Batch inference is designed for large volumes of predictions where results can be delayed (for example, nightly jobs written to Amazon S3). Real-time inference is designed for low-latency, on-demand responses that must be returned during a user interaction. Using batch processing for an under-200-ms checkout decision is an operational anti-pattern.

The core distinction is latency and how predictions are consumed. Real-time inference is used when an application needs an immediate response per request (interactive or synchronous workflows), typically via a hosted endpoint. Batch inference is used when you can run predictions on a dataset asynchronously (scheduled or ad hoc), often optimizing for throughput and cost, and writing outputs to storage such as Amazon S3.

In this scenario:

  • Checkout decisions with a sub-200-ms requirement need real-time inference.
  • Nightly scoring of 50 million historical records fits batch inference.

Key takeaway: choose real-time for per-event decisions at request time; choose batch when delayed, high-volume scoring is acceptable.

  • Interactive latency Using a real-time endpoint aligns with a strict sub-200-ms checkout requirement.
  • High-volume offline scoring Nightly processing of 50 million records is a classic batch inference use case, with results stored in Amazon S3.
  • Cost/throughput optimization Batch inference is appropriate when predictions don’t need to be returned to a user immediately.
  • Delayed decisions Processing checkout predictions “later” is incompatible with synchronous approval/decline at purchase time.

Question 17

Topic: Fundamentals of AI and ML

A retail company wants to build an ML model to predict customer churn using historical transactions and support tickets stored in Amazon S3. The dataset has missing fields, outliers, and several categorical columns, and the team must keep PII inside the AWS account and avoid writing custom code. The team also needs a defensible way to decide which preprocessing steps (for example, imputation, encoding) are necessary before modeling.

Which approach is the BEST way to use exploratory data analysis (EDA) to inform preprocessing at a high level?

Options:

  • A. Skip EDA and immediately train a churn model so the algorithm automatically learns how to handle missing values and outliers

  • B. Use Amazon Comprehend on the entire dataset to detect sentiment and replace missing values with the dominant sentiment label

  • C. Use Amazon Bedrock to generate a textual summary of the dataset and proceed with standard preprocessing steps for all columns

  • D. Use SageMaker Data Wrangler to profile and visualize the data, then choose preprocessing (impute, treat outliers, encode categories) based on the observed issues

Best answer: D

Explanation: EDA is used to understand a dataset’s structure, quality, and patterns (missingness, outliers, distributions, and relationships) so you can choose appropriate preprocessing steps. SageMaker Data Wrangler provides managed, no-code data profiling and visual EDA on data in S3, helping the team justify targeted transformations while keeping data in AWS.

EDA is the step in the ML lifecycle where you inspect and summarize data to decide what cleaning and transformations are necessary before modeling. In this scenario, profiling and visualizations help the team identify patterns such as which columns have missing values, whether numeric fields have extreme outliers, and how categorical values are distributed, which directly informs preprocessing choices.

Using SageMaker Data Wrangler supports this at a practitioner level by enabling managed, no/low-code:

  • Data profiling (missing values, duplicates, type inference)
  • Visual checks for distributions and outliers
  • Selection of targeted transforms (imputation, outlier handling, encoding)

Key takeaway: EDA guides which preprocessing is needed; it is not replaced by generic, one-size-fits-all transformations or unrelated NLP analysis.

  • Train first fails because EDA is needed to decide and justify preprocessing rather than assuming the model will address data quality issues.
  • LLM summary is not a substitute for quantitative profiling/visualization and can lead to untargeted, unnecessary preprocessing.
  • Comprehend misuse fails because sentiment detection does not address general missing data/outliers and is not appropriate for transaction fields.

Question 18

Topic: Fundamentals of Genai

A retail company wants to add a generative AI feature to its customer support portal that can draft polite responses to customer emails. The company wants to experiment with multiple foundation models without provisioning or managing model servers, and it requires that prompts and responses stay within AWS and that unsafe content be filtered. Which AWS service should the company use as the primary way to access foundation models for this use case?

Options:

  • A. Amazon Q Developer

  • B. Amazon Bedrock (with Bedrock Guardrails)

  • C. Amazon Comprehend sentiment analysis

  • D. Amazon SageMaker AI training jobs

Best answer: B

Explanation: Amazon Bedrock is AWS’s managed service for accessing foundation models through a unified API, letting teams try different models without standing up or operating model endpoints. It also supports responsible AI controls such as Bedrock Guardrails to help filter unsafe content while keeping data in AWS.

The core requirement is managed, API-based access to foundation models without provisioning or operating model servers. Amazon Bedrock is designed for this: it provides access to multiple foundation models from AWS and partners through a single managed service, and you can add safety controls like Bedrock Guardrails to help meet content policy requirements. This fits a practitioner use case where the team wants to quickly integrate GenAI into an application while meeting basic privacy and responsible AI constraints.

Services focused on model training, traditional NLP analysis, or a prebuilt assistant for developers do not match the need to directly invoke foundation models for drafting customer responses.

  • Training instead of inference: SageMaker AI training jobs are for building/training models, not simple managed access to multiple FMs via an API.
  • Non-GenAI NLP: Comprehend provides analysis (sentiment, entities) rather than generating email drafts with a foundation model.
  • Wrong Q product: Amazon Q Developer targets developer productivity and isn’t the primary service for invoking FMs in a customer support portal.

Question 19

Topic: Fundamentals of Genai

A marketing team uses Amazon Bedrock to generate a weekly newsletter by summarizing 30 internal articles stored in Amazon S3. The current approach sends the full text of all articles to a large foundation model in every request, which is slow and expensive. It also sometimes includes outdated or incorrect details when an article changes. The team needs lower latency and cost, and the newsletter must be grounded only in the approved articles.

Which change best meets these requirements?

Options:

  • A. Increase model temperature and max_tokens to improve writing quality

  • B. Use Knowledge Bases for Amazon Bedrock to implement RAG over the S3 articles

  • C. Fine-tune a foundation model weekly on prior newsletters and prompts

  • D. Continue sending all articles each time, but add more detailed prompt instructions

Best answer: B

Explanation: Using Knowledge Bases for Amazon Bedrock enables retrieval-augmented generation (RAG) so the model pulls only the most relevant excerpts from the approved S3 corpus. This reduces the amount of text sent to the model (lower cost and latency) and improves factual grounding because the response is based on retrieved source passages rather than the model’s memory.

This workflow is a classic content-heavy summarization use case where efficiency and accuracy depend on how much context you send to the model. Sending all articles every time inflates input tokens (higher cost/latency) and still doesn’t guarantee the model will consistently focus on the most current, relevant details.

Using Knowledge Bases for Amazon Bedrock applies RAG:

  • Ingest the approved S3 articles into a managed knowledge base
  • At generation time, retrieve only the most relevant passages for the newsletter prompt
  • Generate summaries grounded in retrieved content

The key win is that retrieval reduces repeated, unnecessary context and improves grounded outputs without the operational overhead of training or frequent retraining.

  • Frequent fine-tuning increases cost and operational effort and won’t automatically stay current as articles change.
  • Higher temperature and longer outputs typically increase hallucination risk and raise latency and cost.
  • Better prompt wording can improve formatting, but it does not reduce token volume or reliably enforce grounding when all content is still shoved into the prompt.

Question 20

Topic: Security, Compliance, and Governance for AI Solutions

A company is rolling out an Amazon Bedrock powered assistant to multiple business units. The company creates a cross-functional review board, defines approval gates for model/prompt changes, requires documentation of intended use and risks, and sets up ongoing monitoring and periodic audits.

Which principle does this practice best represent, and why is it required?

Options:

  • A. Defense in depth to add multiple security layers

  • B. Fairness to reduce biased outcomes across user groups

  • C. Least privilege to minimize access to Bedrock APIs

  • D. AI governance to provide lifecycle oversight and accountability

Best answer: D

Explanation: The described board, approval gates, documentation, monitoring, and audits are governance processes. AI governance is required to ensure accountable decision-making and controlled change management across an AI system’s lifecycle. It helps manage compliance, operational risk, and consistent alignment to organizational policies as the system evolves.

AI governance is the set of roles, policies, and repeatable processes used to oversee an AI system across its lifecycle (design, deployment, change management, and ongoing operation). In the scenario, a review board and approval gates define who can make decisions and when changes are allowed, while required documentation, monitoring, and audits provide traceability and evidence.

Governance processes are required because AI systems can change behavior over time (through updates, new data, and new prompts/use cases) and can introduce legal, security, and operational risks. Governance creates accountability, consistent controls, and an audit trail to support compliance and responsible use, beyond purely technical safeguards.

  • Least privilege is an access-control principle, not a lifecycle oversight process.
  • Defense in depth focuses on layered technical security controls rather than approval, audit, and accountability mechanisms.
  • Fairness is a responsible AI objective (bias mitigation) and does not by itself define organizational oversight and change control.

Question 21

Topic: Fundamentals of Genai

A retail company is comparing a rule-based support chatbot with a GenAI support assistant for its customer portal.

Exhibit: 2-week pilot summary

MetricRule-based chatbotGenAI assistant
Add support for new return-policy change4 weeks (new intents)1 day (update knowledge)
Handles previously unseen customer questions22%71%
Customer input formatMenu + keywordsFree-form natural language
Median first response time6.4 s2.1 s

Based on the exhibit, which interpretation best describes a key advantage of GenAI solutions?

Options:

  • A. Lowest per-message cost is the main benefit.

  • B. Retrieval guarantees perfect factual accuracy and no hallucinations.

  • C. Needs extensive labeled data and predefined intents to work.

  • D. Faster adaptation and free-form responses improve responsiveness.

Best answer: D

Explanation: The exhibit shows the GenAI assistant can adapt quickly to new information, respond more effectively to unfamiliar requests, and provide a simpler natural language interface. It also improves responsiveness, as seen in lower median first response time. These are core high-level advantages of GenAI for business interactions.

GenAI solutions are often advantageous for customer-facing and knowledge-heavy workflows because they can adapt to changing content, respond well to a broad variety of user requests, and let users interact in natural language.

In the exhibit, the GenAI assistant demonstrates:

  • Adaptability: “Add support for new return-policy change” is 1 day vs 4 weeks.
  • General responsiveness to varied inputs: “Handles previously unseen customer questions” is 71% vs 22%.
  • Natural language simplicity: “Customer input format” is free-form natural language.
  • Faster interaction: “Median first response time” is 2.1 s vs 6.4 s.

The best interpretation is the option that reflects these exhibit-backed advantages rather than claims about accuracy guarantees, training data requirements, or cost.

  • Accuracy guarantee is unsupported; the exhibit has no accuracy/hallucination evidence.
  • Labeled data and intents conflicts with the exhibit showing faster updates via “update knowledge,” not new intents.
  • Cost-driven choice is unsupported; no cost metric is shown in the exhibit.

Question 22

Topic: Fundamentals of AI and ML

Which scenario most clearly shows where AI/ML adds value by enabling personalization at scale?

Options:

  • A. Applying a fixed tax formula to every invoice in an accounting system

  • B. Delivering real-time product recommendations that adapt to each user’s behavior

  • C. Triggering an alert when CPU utilization exceeds a static threshold

  • D. Sending a nightly CSV report of total sales by region

Best answer: B

Explanation: AI/ML adds clear value when a system must tailor outputs to each individual while handling very large numbers of users or events. Learning patterns from interaction data enables continuously improving, per-user recommendations without hand-crafted rules. This is a classic personalization-at-scale use case for ML.

Personalization at scale is a strong signal that AI/ML can add value. When outputs should differ by user (or context) and must update as behavior changes, ML can learn patterns from historical interactions and produce predictions or rankings for each user automatically.

In contrast, fixed formulas, static reports, and simple threshold-based alerts are deterministic problems that are typically solved more reliably with traditional programming and rules. The key differentiator is the need to infer preferences or patterns from data to produce individualized results for a large population.

  • Fixed formula is deterministic and does not require learning from data.
  • Static reporting summarizes data but does not personalize outputs.
  • Threshold alerting is rule-based and doesn’t adapt to user behavior or context.

Question 23

Topic: Applications of Foundation Models

A team deploys an Amazon Bedrock-powered customer support chatbot. Which statement best explains why the team should continue monitoring outputs and collecting user feedback after deployment?

Options:

  • A. Once deployed, the model’s performance is fixed because the parameters do not change

  • B. Amazon Bedrock automatically guarantees response accuracy for all future prompts

  • C. Model quality can drift as user behavior and input data change over time

  • D. Foundation models do not generate responses unless retrained each week

Best answer: C

Explanation: Monitoring and feedback are needed because real-world prompts, user behavior, and content patterns change over time, which can reduce answer quality compared with initial testing. Ongoing evaluation helps detect drift and informs prompt updates, guardrail tuning, or workflow changes to restore desired performance.

The core reason to monitor an FM application after deployment is that the operating environment changes. Even if the underlying model weights stay the same, the distribution of prompts, retrieved documents, user expectations, and downstream systems can shift, causing quality drift (for example, lower relevance, more hallucinations, or new failure modes). Collecting feedback (implicit signals like thumbs-up/down and explicit issue reports) provides evidence of these changes and supports continuous evaluation against the original success criteria. The key takeaway is that deployment is not the end of evaluation; it is when real usage starts exposing drift and new edge cases.

  • Weekly retraining myth is not a requirement for foundation models and is not why monitoring is needed.
  • Guaranteed accuracy is not provided by managed services; you must evaluate and monitor quality.
  • Fixed performance assumption ignores that changing inputs and context can degrade outcomes even with unchanged weights.

Question 24

Topic: Fundamentals of Genai

A company runs a customer support chat application used in 20 countries. The company needs to translate incoming chat messages into each agent’s language with low latency, and it must keep brand and product names consistent across translations. The messages can include PII, so the company wants a fully managed service that does not require building or training models.

Which solution BEST meets these requirements?

Options:

  • A. Use Amazon Comprehend to detect the source language and translate the text

  • B. Use Amazon Bedrock to prompt an FM to translate each message and store prompts for reuse

  • C. Use Amazon Translate real-time translation with a custom terminology list

  • D. Use Amazon Transcribe to convert messages to text, then deliver to agents

Best answer: C

Explanation: Amazon Translate is purpose-built for language translation and localization and is offered as a fully managed service. It supports low-latency real-time translation and custom terminology to keep product and brand names consistent. This matches the requirement to avoid building or training models while handling chat text that may include PII.

For translation and localization use cases, the best fit is typically Amazon Translate because it is a managed machine translation service designed for text translation workloads. In a chat scenario, you can use its real-time translation API for low latency, and you can add custom terminology to enforce consistent translations for brand and product names.

Using a general-purpose foundation model for translation is usually unnecessary when you primarily need accurate, consistent translation at scale; it can also introduce extra governance considerations compared with a dedicated translation service. The key takeaway is to choose the AWS service built for translation when the core need is language conversion and terminology consistency.

  • Comprehend is not translation: it can detect language and extract insights, but it does not perform text translation.
  • Transcribe is for audio: it converts speech to text and does not translate text chat messages.
  • FM prompt translation adds risk/overhead: using an FM and storing prompts is not required for this use case and can conflict with strict handling expectations for PII.

Question 25

Topic: Guidelines for Responsible AI

Which statement about Amazon Bedrock Guardrails is INCORRECT?

Options:

  • A. It can redact or mask sensitive data like PII.

  • B. It can block prompts about configured denied topics.

  • C. It permanently changes the model’s weights to enforce safety.

  • D. It can filter and block harmful model responses.

Best answer: C

Explanation: Amazon Bedrock Guardrails supports responsible AI by applying configurable policies at inference time to control what users can ask and what the model can return. These controls help reduce unsafe content and limit exposure of sensitive information without changing the underlying foundation model.

The core idea of Amazon Bedrock Guardrails is to enforce responsible-AI constraints by evaluating and constraining both prompts (inputs) and completions (outputs) at runtime. You can configure policies such as denied topics, content filters, and sensitive information filters so that requests or responses that violate your rules are blocked or sanitized.

Guardrails does not “fix” a foundation model by retraining it or changing its parameters. Instead, it sits in the invocation path and applies your safety policies consistently across supported models and use cases, helping you standardize protections like topic restrictions, toxicity filtering, and PII handling.

  • Model weight changes is misleading because guardrails are runtime policy checks, not fine-tuning.
  • Denied topics is accurate because guardrails can block prompts on prohibited subjects.
  • Sensitive data controls is accurate because guardrails can detect and redact/mask PII.
  • Harmful output filtering is accurate because guardrails can block responses that violate content policies.

Questions 26-50

Question 26

Topic: Security, Compliance, and Governance for AI Solutions

A key risk in GenAI applications is prompt injection, where a user crafts input like “ignore previous instructions” to override the system prompt and potentially expose sensitive information. Which mitigation best helps reduce this risk?

Options:

  • A. Enable AWS CloudTrail for all AWS accounts and regions

  • B. Use Amazon Bedrock Guardrails to filter and constrain inputs/outputs

  • C. Fine-tune the model on your organization’s policies

  • D. Encrypt all prompts stored in Amazon S3 with AWS KMS

Best answer: B

Explanation: Prompt injection is primarily mitigated by constraining model behavior and filtering malicious or policy-violating content at inference time. Amazon Bedrock Guardrails provides managed controls to validate and filter user prompts and model responses to reduce instruction override and sensitive data disclosure.

Prompt injection is an application-layer attack against LLM-based systems where untrusted user text attempts to override higher-priority instructions (such as the system prompt) or to coerce the model into revealing sensitive content. A high-level mitigation is to apply guardrails at inference time to filter and constrain what inputs are accepted and what outputs are allowed. Amazon Bedrock Guardrails provides managed input/output filtering and policy enforcement to help block jailbreak-style instructions and reduce the chance of leaking sensitive data.

Key takeaway: encryption and auditing are important controls, but they do not stop the model from following malicious instructions in the prompt.

  • Auditing only CloudTrail helps detect and investigate activity but does not prevent prompt injection.
  • Data-at-rest focus KMS encryption protects stored data, not the model’s runtime behavior.
  • Training is not a control Fine-tuning can improve style/compliance but is not a primary defense against prompt injection.

Question 27

Topic: Fundamentals of AI and ML

A company is building an AWS solution to support workplace safety audits in warehouses. The solution will (1) review uploaded photos to flag workers who are not wearing required helmets and (2) read equipment label images to capture the printed serial number for compliance records.

Which TWO options correctly identify the needed computer vision tasks? (Select TWO.)

Options:

  • A. Perform image classification to predict whether a shipment will be late

  • B. Encrypt the images in Amazon S3 with SSE-KMS and restrict access with least-privilege IAM

  • C. Perform OCR to extract the serial number text from label images

  • D. Add Amazon Bedrock Guardrails to block toxic language in model responses

  • E. Perform sentiment analysis on written safety reports to find negative feedback

  • F. Perform object detection to locate helmets and flag missing helmets

Correct answers: C and F

Explanation: Computer vision applies ML to understand images. Flagging missing helmets requires finding and identifying an object within a photo (object detection). Reading printed serial numbers from label photos requires converting image text into characters (OCR).

Computer vision is the use of ML to interpret and extract meaning from visual inputs such as images and video. Common high-level vision tasks include:

  • Image classification: assign a label to an entire image (for example, “hardhat” vs “no hardhat” for the whole photo).
  • Object detection: find and locate specific objects in an image (for example, detect helmets and where they appear).
  • OCR (optical character recognition): extract text from images (for example, serial numbers on equipment labels).

In this scenario, helmet compliance needs object detection because the system must identify a specific object in a photo, and serial-number capture needs OCR because the output is text. Governance, guardrails, and encryption are important for AI solutions but are not themselves vision tasks.

  • ✔ Object detection for helmets: matches locating specific objects in photos.
  • ✔ OCR for serial numbers: matches extracting printed text from images.
  • ✖ Image classification for shipment lateness: lateness prediction is not a vision task.
  • ✖ Sentiment analysis on reports: this is an NLP task over text, not images.
  • ✖ Bedrock Guardrails: a GenAI safety control, not a vision task.
  • ✖ SSE-KMS + least-privilege IAM: security best practice, but not a vision task.

Question 28

Topic: Applications of Foundation Models

A company wants a customer support chatbot that answers questions using its internal policy PDFs stored in Amazon S3. The company must keep answers grounded in the latest documents and provide citations to the source passages. The company wants to avoid training a custom model and prefers a managed approach.

Which solution BEST meets these requirements?

Options:

  • A. Use Amazon Comprehend to extract key phrases from the PDFs and have Amazon Bedrock answer from the extracted keywords

  • B. Use Amazon Bedrock Knowledge Bases to implement RAG over the S3 documents and have the model answer with source citations

  • C. Fine-tune an Amazon Bedrock foundation model on the policy PDFs and prompt it to include citations

  • D. Use Amazon Rekognition to analyze the PDFs and generate answers from detected text

Best answer: B

Explanation: Amazon Bedrock Knowledge Bases is designed to connect an application to proprietary data for retrieval-augmented generation (RAG). It retrieves relevant passages from the S3-hosted documents at query time to ground the foundation model’s response and can include citations to the source content. This meets the managed, no-custom-training requirement while keeping answers aligned to the latest documents.

Retrieval-augmented generation (RAG) improves factuality by having the application retrieve relevant context from trusted data sources at query time, then providing that context to a foundation model to generate an answer. Amazon Bedrock Knowledge Bases is a managed capability for implementing RAG: you connect your data (such as policy PDFs in Amazon S3), it indexes/organizes it for retrieval, and the application uses the retrieved passages as grounding context for model inference.

Because the company needs answers based on the latest internal documents and wants source citations, a managed RAG approach with Knowledge Bases fits best: retrieval happens at runtime (so updates can be reflected without retraining), and the response can include references to the underlying source passages. The key distinction is that RAG grounds responses by retrieval, rather than trying to “bake in” knowledge through model training.

  • Fine-tuning for knowledge is the wrong fit because it is a form of customization/training and is not ideal for frequently changing documents or reliable citations.
  • Keyword extraction only does not provide grounded passages for the model to cite and often loses necessary context for accurate answers.
  • Rekognition for PDFs targets image/video analysis and is not the appropriate service for retrieving and grounding answers from document content.

Question 29

Topic: Fundamentals of AI and ML

Which statement is INCORRECT about recommendation and personalization applications on AWS?

Options:

  • A. Amazon Personalize can generate personalized item recommendations from user-item interaction data.

  • B. Amazon Personalize can ingest real-time user events to reflect recent behavior in recommendations.

  • C. Next-best product or content is a common personalization use case that ranks items for each user.

  • D. Amazon Personalize requires fine-tuning a foundation model in Amazon Bedrock to make recommendations.

Best answer: D

Explanation: Recommendations and personalization systems commonly produce “next-best” items by learning from historical user behavior and item interactions. Amazon Personalize is an AWS managed service specifically built for this and can incorporate real-time events. Using Amazon Bedrock foundation models is optional for generative experiences, but it is not a requirement for building recommendations with Amazon Personalize.

The core idea in recommendation and personalization is to predict or rank items (products, content, offers) that are most relevant to a specific user or context—often described as “next-best product/content.” Amazon Personalize is designed for this use case and learns patterns from data such as user-item interactions (clicks, views, purchases) and optional item/user metadata.

Amazon Personalize can also consume streaming event data so very recent user activity can influence the recommendations you request, without you having to manage servers or build custom training infrastructure. Amazon Bedrock foundation models can be used to generate explanations, summaries, or conversational shopping experiences, but they are not required to create recommendation results with Amazon Personalize.

  • Personalize purpose is accurate: it produces individualized recommendations from interaction patterns.
  • Next-best ranking is accurate: personalization commonly ranks items per user/context.
  • Real-time events is accurate: ingesting user events can help reflect recent behavior in recommendation results.
  • Bedrock requirement is wrong: Personalize does not depend on fine-tuning a Bedrock foundation model.

Question 30

Topic: Applications of Foundation Models

A team is evaluating the quality of an English-to-Spanish machine translation system by comparing generated translations to human-created reference translations. Which metric is most appropriate, and what does it measure at a high level?

Options:

  • A. Perplexity; measures how well a model predicts the next token

  • B. F1 score; measures the balance of precision and recall for labels

  • C. ROUGE; measures n-gram overlap between model output and references

  • D. BLEU; measures n-gram overlap between model output and reference translations

Best answer: D

Explanation: BLEU is a standard metric for evaluating machine translation when you have one or more reference translations. At a high level, it scores translation quality by measuring how much the system output overlaps with the reference text in terms of matching word sequences (n-grams).

BLEU (Bilingual Evaluation Understudy) is appropriate when evaluating machine translation outputs against human-written reference translations. It measures similarity by checking overlap of n-grams (for example, 1- to 4-word sequences) between the candidate translation and the reference(s), often with a brevity penalty so very short outputs are not over-rewarded. In practice, higher BLEU generally indicates the translation uses word sequences that are more similar to the references, making it a common, time-stable way to compare translation systems or model versions on the same test set. The key idea is “reference-based n-gram overlap for translation,” not task-specific label accuracy or next-token probability.

  • Summarization metric ROUGE is more commonly used for summarization-style overlap evaluation, not as the primary translation metric.
  • Classification metric F1 score applies to labeled classification tasks with precision/recall.
  • Language modeling metric Perplexity evaluates next-token prediction fit, not reference translation similarity.

Question 31

Topic: Applications of Foundation Models

Which option best describes prompt injection risk when using a foundation model (FM) in an application?

Options:

  • A. A user repeatedly requests policy-violating content until the model complies

  • B. The model generates plausible but incorrect content due to uncertainty

  • C. Malicious input that manipulates the model to ignore prior instructions or reveal data

  • D. An attacker takes over the model hosting endpoint by exploiting an OS vulnerability

Best answer: C

Explanation: Prompt injection is a prompt engineering risk where untrusted user or retrieved content is crafted to override the application’s intended instructions. The goal is to hijack the model’s behavior, such as exfiltrating sensitive context or producing disallowed outputs, even though the app did not intend it.

Prompt injection occurs when attacker-controlled text (for example, a user message or content pulled in via RAG) is designed to alter the model’s instruction-following, such as “ignore the system prompt” or “print your hidden rules.” This is different from traditional infrastructure compromise: the attacker is not exploiting the server; they are exploiting how the model prioritizes and follows instructions in the prompt. It’s also distinct from the model simply being wrong; the output can be highly targeted and harmful because the prompt was crafted to manipulate behavior.

Key takeaway: treat external content as untrusted and assume it can contain instructions meant to hijack the model.

  • Jailbreaking vs injection describes coaxing the model through persistent requests, not embedding malicious instructions in untrusted content.
  • Endpoint compromise is a host or service security exploit, not a prompt-level manipulation.
  • Hallucination is unintentional inaccuracy, not adversarial instruction hijacking.

Question 32

Topic: Applications of Foundation Models

A team is selecting an Amazon Bedrock foundation model for a new feature that summarizes internal incident reports. The team starts evaluation with a smaller model to reduce cost and latency, then only moves to a larger model if the summary quality is not acceptable.

Which principle does this practice demonstrate?

Options:

  • A. Use retrieval-augmented generation (RAG) to improve factual accuracy

  • B. Fine-tune a model to learn the organization’s writing style

  • C. Right-size the model by balancing model size with required output quality

  • D. Apply safety guardrails to reduce harmful or noncompliant outputs

Best answer: C

Explanation: This practice demonstrates right-sizing a foundation model: start with a smaller model for lower cost/latency and increase model size only if evaluation shows quality is insufficient. Model size is a high-level proxy for capability, so selection is a trade-off between quality and efficiency for the specific task.

The core principle is right-sizing the foundation model to the task by explicitly trading off expected output quality against operational constraints like cost and latency. In general, larger models tend to produce higher-quality results on complex language tasks, but they also typically increase latency and cost. A practical selection approach is:

  • Define a quality bar for the task (for example, acceptable summary completeness and coherence)
  • Evaluate starting with a smaller model
  • Increase model size only when the smaller model cannot meet the quality bar

This keeps the solution efficient while still meeting business requirements for output quality.

  • RAG focus improves grounding with external content, but it’s not about selecting model size for quality.
  • Fine-tuning focus changes model behavior with training data, not the initial size-versus-quality selection practice.
  • Guardrails focus targets safety and policy compliance, not capability right-sizing.

Question 33

Topic: Fundamentals of AI and ML

A retail company wants an ML model to predict next week’s demand for each product as a numeric value (for example, 0–10,000 units). Which ML problem type best matches this use case?

Options:

  • A. Reinforcement learning

  • B. Binary classification

  • C. Clustering

  • D. Regression

Best answer: D

Explanation: This is a regression problem because the model must predict a continuous numeric quantity (units demanded). Regression learns to map input features (such as historical sales and promotions) to a real-valued output for forecasting.

Choosing the ML problem type starts with the target variable you need to predict. When the target is a continuous numeric value (for example, “expected units sold next week”), the correct framing is regression. In regression, the model outputs a number on a continuous scale, which fits forecasting and estimation tasks like demand, revenue, temperature, or time-to-failure. By contrast, classification predicts discrete labels (such as yes/no or a set of categories), clustering groups similar items without a target label, and reinforcement learning optimizes actions through rewards over time. The key takeaway is to select regression whenever the desired prediction is a continuous number.

  • Binary classification predicts a discrete yes/no label, not a units value.
  • Clustering is unsupervised grouping and does not forecast a labeled numeric target.
  • Reinforcement learning is for sequential decision-making with rewards, not direct demand prediction.

Question 34

Topic: Applications of Foundation Models

A company has a text classification model that was trained on a large dataset of retail product reviews. The company now needs the model to perform the same classification task on customer emails from the airline industry, where wording and topics differ. The team plans to start from the existing model weights and fine-tune using a small set of airline-domain examples to reduce data and training time.

Which principle does this practice best illustrate?

Options:

  • A. Retrieval-augmented generation (RAG)

  • B. Prompt engineering

  • C. Domain adaptation

  • D. Transfer learning to a new task

Best answer: C

Explanation: This is domain adaptation: reusing an existing trained model and tuning it so it works well on a different domain’s data distribution while keeping the task the same. It leverages prior learned representations to reduce the amount of new domain data and compute required compared with training from scratch.

Transfer learning is the broad idea of starting from a pre-trained model (source) and reusing its learned representations to improve learning on a target problem, usually with less data and training time than training from scratch. Domain adaptation is a specific form of transfer learning where the task stays the same, but the input data distribution changes between the source domain and the target domain.

In this scenario, the classification task is unchanged, but the text shifts from retail reviews to airline emails, so fine-tuning on airline-domain examples is intended to adapt the model to the new domain. The key distinguishing detail is “same task, different domain,” which points to domain adaptation rather than a task change.

  • Transfer to a new task would apply if the team changed objectives (for example, from classification to summarization), not just the domain.
  • Prompt engineering changes how you ask a foundation model to respond without updating model weights.
  • RAG adds external knowledge at inference time; it does not primarily address a domain shift via fine-tuning.

Question 35

Topic: Applications of Foundation Models

Which statement is INCORRECT about mitigating prompt risks when building a generative AI application with Amazon Bedrock?

Options:

  • A. Treat the system prompt as controlled server-side configuration

  • B. Use Bedrock Guardrails to block disallowed topics and PII

  • C. Grant the app bedrock:* to avoid authorization failures

  • D. Filter and validate user input to reduce prompt injection

Best answer: C

Explanation: Least privilege is a key mitigation for prompt-related risks because it limits what the application can do even if an attacker manipulates prompts. Using overly broad IAM permissions makes data access and unintended actions much more likely during a prompt injection or data exfiltration attempt. Guardrails, controlled system prompts, and input filtering are standard complementary defenses.

Prompt risk mitigations work best as layered controls: constrain what the model is allowed to produce, constrain what the application is allowed to do, and reduce the chance that malicious input changes behavior. In AWS, “least privilege” means granting only the specific bedrock:InvokeModel (and related) permissions and only the data-store permissions required, scoped to the necessary resources. This way, even if a prompt injection succeeds, the app cannot automatically read arbitrary data or perform unintended actions. Bedrock Guardrails help enforce safety and content policies (for example, blocking certain topics or masking PII), while keeping the system prompt server-side prevents users from altering your core instructions. Input filtering and validation further reduces harmful or irrelevant content before it reaches the model.

Key takeaway: avoid broad permissions; use guardrails, controlled prompts, and input filtering together.

  • Over-permissive IAM is the exception because bedrock:* expands impact if prompts are manipulated.
  • Guardrails are a valid control to enforce safety policies and reduce harmful outputs.
  • System prompt control is valid because keeping it server-side reduces user tampering.
  • Input filtering is valid because it helps mitigate prompt injection and unsafe content.

Question 36

Topic: Security, Compliance, and Governance for AI Solutions

A company is preparing an internal audit for a new generative AI application running on AWS. The auditors request (1) official AWS compliance reports (for example, SOC and ISO reports) and (2) the ability to review and accept any relevant AWS compliance agreements (for example, a BAA) without opening a support case.

Which TWO actions should the company take? (Select TWO)

Options:

  • A. Encrypt all data stores with AWS KMS customer managed keys (CMKs)

  • B. Use AWS Artifact Agreements to review and accept applicable AWS agreements

  • C. Use AWS Artifact to download AWS compliance reports

  • D. Use AWS Config conformance packs to prove resources match compliance frameworks

  • E. Add Amazon Bedrock Guardrails to enforce safety policies during inference

  • F. Enable AWS CloudTrail to produce audit evidence of Bedrock model invocations

Correct answers: B and C

Explanation: AWS Artifact is the central portal to obtain AWS compliance documentation and to manage compliance-related agreements. Downloading the required SOC/ISO reports and accepting applicable agreements (such as a BAA) directly addresses the auditors’ requests without additional support engagement.

AWS Artifact is designed for compliance and governance needs where you must quickly produce official AWS audit evidence or manage compliance agreements.

In this scenario, the auditors specifically want:

  • Official AWS compliance reports (for example, SOC/ISO): retrieve these from AWS Artifact.
  • The ability to review and accept compliance agreements (for example, a BAA): do this in AWS Artifact Agreements.

Other controls like logging, configuration compliance checks, guardrails, and encryption are good practices for securing AI systems, but they do not replace the need for AWS-provided compliance reports and agreement management in a portal.

  • ✔ Download AWS compliance reports from AWS Artifact: provides SOC/ISO and similar audit reports.
  • ✔ Review/accept agreements in AWS Artifact Agreements: supports accepting items like a BAA.
  • ✖ CloudTrail logging: produces activity logs, not AWS-issued compliance reports or agreements.
  • ✖ AWS Config conformance packs: evaluates resource configuration, not AWS compliance documentation/agreements.

Question 37

Topic: Fundamentals of Genai

A company deploys a customer support chatbot using an Amazon Bedrock foundation model. After launch, the team continuously reviews conversation transcripts, collects thumbs-up/thumbs-down ratings from agents, analyzes failure cases (for example, hallucinations and unsafe responses), and then updates prompts, Bedrock Guardrails, and the knowledge base content before re-running evaluations and redeploying.

Which principle of the foundation model (FM) lifecycle does this practice most directly represent, and why does it matter?

Options:

  • A. Iteration and feedback loops to improve quality and safety over time

  • B. Least privilege to minimize access to model invocation APIs

  • C. Transparency by publishing model architecture and weights

  • D. Data governance by enforcing data retention and lineage controls

Best answer: A

Explanation: This practice is an iterative feedback loop in the FM lifecycle: observe behavior in production, evaluate outcomes, and refine the system. It matters because foundation model applications can drift or fail in new ways, so continuous iteration is needed to improve response quality and reduce safety issues over time.

The core principle is iteration via feedback loops across the FM application lifecycle. After deployment, real user interactions and human feedback (ratings, reviews, failure-case analysis) become inputs to update the parts you control—such as prompts, retrieval/knowledge content, and safety controls (for example, guardrails)—followed by re-evaluation before the next release. This matters because an FM’s behavior in production can differ from initial testing due to new queries, changing knowledge, or emerging unsafe patterns, so continuous monitoring and iteration are essential to maintain quality and safety.

Key takeaway: this is about continuous improvement cycles, not primarily access control, transparency disclosures, or retention/lineage governance.

  • Least privilege focus is a security principle, but it doesn’t address improving responses through iterative evaluation.
  • Transparency disclosures can help users and auditors, but publishing weights/architecture is not the feedback loop described.
  • Data governance controls manage data handling, but they are different from using user feedback to refine behavior.

Question 38

Topic: Security, Compliance, and Governance for AI Solutions

Which statement is INCORRECT about data lineage and data cataloging for improving auditability of datasets used in AI/GenAI workloads on AWS?

Options:

  • A. Data lineage lets you stop using AWS CloudTrail because lineage proves who accessed data.

  • B. On AWS, catalog/metadata services (for example, AWS Glue Data Catalog or Amazon DataZone) can help document dataset provenance for reviews.

  • C. A data catalog centralizes dataset metadata such as schema, owner, and classifications to support governance and audits.

  • D. Data lineage can show a dataset’s source, transformations, and downstream usage to support audits and reproducibility.

Best answer: A

Explanation: Data lineage and data catalogs improve auditability by documenting what data is, where it came from, how it changed, and where it is used. They do not replace security audit logs that record who performed actions. Access auditing still relies on services such as AWS CloudTrail and data access logs.

Data lineage and data cataloging are governance mechanisms that support auditability by making dataset provenance and context visible. Lineage focuses on the end-to-end trail (sources -9 transformations -9 destinations/consumers), which helps reviewers reconstruct which inputs fed an AI workflow and how those inputs were produced. A data catalog complements this by storing centralized metadata (schemas, owners, sensitivity labels, descriptions), enabling controlled discovery and evidence during audits. However, neither lineage nor a catalog is an access log: they typically do not prove which IAM principal accessed or changed resources. For that, you still need audit logging such as AWS CloudTrail (and relevant data access logs) to establish who did what, when, and from where.

  • Lineage vs access logs is misleading because lineage does not replace identity/action auditing provided by CloudTrail.
  • Traceability is accurate because lineage supports reproducibility by recording sources and transformations.
  • Central metadata is accurate because catalogs capture ownership, schema, and classification used in audits.
  • AWS governance services is accurate because Glue Data Catalog/DataZone help manage metadata that supports provenance evidence.

Question 39

Topic: Guidelines for Responsible AI

A team is preparing to launch a customer-support chatbot using a foundation model. They ran a pilot evaluation and captured the results below.

Exhibit: Pilot evaluation summary

Hallucination rate on policy Q&A: 18%
Jailbreak success rate (unsafe prompts): 12%
Unsafe content in responses: 6%
PII leakage: 2 of 200 conversations

Based only on the exhibit, what is the best next step before production release?

Options:

  • A. Add safety and grounding mitigations before launch

  • B. Remove all user context from prompts to improve response accuracy

  • C. Deploy as-is because the model passed a pilot evaluation

  • D. Increase the model temperature to reduce refusals

Best answer: A

Explanation: The pilot results show measurable rates of hallucinations, unsafe outputs, jailbreak success, and PII leakage. These are concrete risks to customers and the business (misleading guidance, harmful content, and privacy violations). Therefore, mitigation is required before production, such as adding safety guardrails and grounding to reduce unsafe and untrusted responses.

The core issue is that the exhibit quantifies multiple high-impact GenAI risks: ungrounded answers (hallucinations), unsafe content, prompt-injection/jailbreak susceptibility, and privacy leakage. Specifically, “Hallucination rate on policy Q&A: 18%” indicates the assistant often provides incorrect policy guidance, while “Jailbreak success rate: 12%” and “Unsafe content: 6%” show safety controls are not adequate. “PII leakage: 2 of 200 conversations” demonstrates a data-protection risk.

A responsible next step is to mitigate before release by applying controls such as:

  • Grounding (for example, RAG/knowledge base) to reduce hallucinations
  • Safety enforcement (for example, Amazon Bedrock Guardrails) to block unsafe content and sensitive data
  • Additional adversarial testing and monitoring to validate improvements

Shipping without these mitigations would expose users to unsafe or incorrect outputs.

  • Deploy without changes is not acceptable given the exhibit’s 18% hallucination rate and documented PII leakage.
  • Increase temperature typically increases randomness, which can worsen hallucinations and unsafe outputs.
  • Remove all user context can reduce personalization but does not address the exhibit’s jailbreak, unsafe-content, and hallucination signals.

Question 40

Topic: Fundamentals of Genai

A company wants to deploy Amazon Q Business so employees can ask questions over internal knowledge (for example, SharePoint and Amazon S3) and trigger common workflows. The security team must ensure the deployment follows good governance practices.

Which TWO choices are NOT appropriate and increase security/governance risk? (Select TWO.)

Options:

  • A. Create separate Q Business apps for HR and Engineering

  • B. Turn off CloudTrail/audit logs for Amazon Q Business

  • C. Limit data sources to approved repositories and encrypt with KMS

  • D. Index all repositories using a shared credential, ignoring ACLs

  • E. Use IAM Identity Center groups to control who can use Q Business

  • F. Configure connectors to enforce source document ACLs per user

Correct answers: B and D

Explanation: Amazon Q Business is a managed AI assistant that connects to organizational data sources and helps users find information and complete tasks. Good deployments preserve existing access controls (users only see what they’re authorized to see) and maintain auditable records of access and activity. Choices that bypass authorization boundaries or remove audit trails create clear governance and security risk.

Amazon Q Business is designed to act as an AI assistant over an organization’s knowledge and workflows by connecting to approved enterprise data sources through managed connectors. A key governance expectation is that access to retrieved content aligns to the user’s identity and the source system’s permissions, so Q Business does not become a new path to access sensitive documents.

Security and governance best practices for Q Business deployments typically include:

  • Integrating with an enterprise identity provider (such as IAM Identity Center) for user and group-based access.
  • Ensuring connectors respect document-level permissions (ACLs) rather than using overly broad shared credentials.
  • Keeping audit logs enabled so prompts, access, and administrative actions can be monitored and investigated.

The safest choice is to preserve least-privilege access and maintain strong auditability end-to-end.

  • Centralized identity using IAM Identity Center is a standard way to control who can access the assistant.
  • Respecting ACLs keeps Q Business aligned with existing authorization boundaries in source repositories.
  • Data minimization/segmentation (approved sources, separate apps per department) can reduce accidental exposure blast radius.

Question 41

Topic: Fundamentals of Genai

Which statement is INCORRECT about prompt engineering when using a foundation model (FM) in Amazon Bedrock?

Options:

  • A. It crafts instructions to steer the FM’s output.

  • B. It complements controls like Bedrock Guardrails for safety.

  • C. Constraints and examples can improve response format adherence.

  • D. Prompt engineering permanently retrains the FM’s weights.

Best answer: D

Explanation: Prompt engineering is the practice of designing and iterating prompts to steer a model’s behavior and outputs at inference time. It does not modify or retrain the FM. Safety and policy needs are typically enforced with dedicated controls (for example, Bedrock Guardrails) in addition to good prompts.

Prompt engineering means writing and refining the input (instructions, context, constraints, and examples) to guide how an FM responds. Its purpose is to steer outputs—such as tone, structure, completeness, and refusal behavior—during inference, without changing the underlying model.

Common prompt elements include:

  • A clear task and role (what the model should do)
  • Constraints (what to avoid, required format)
  • Few-shot examples (what “good” looks like)
  • Relevant context (only what the model needs)

Changing the model’s weights is a different activity (fine-tuning or training). For governance and safety, prompts help, but service-level controls like Bedrock Guardrails are designed to consistently enforce policies across many prompts and users.

  • Retraining confusion: The claim about permanently retraining weights describes fine-tuning/training, not prompt engineering.
  • Steering at inference: Crafting instructions to guide the model’s response is the core purpose of prompt engineering.
  • Format control: Constraints and examples can improve consistency and adherence to an expected output structure.
  • Defense in depth: Guardrails are complementary controls for safety/policy beyond prompt wording alone.

Question 42

Topic: Fundamentals of AI and ML

A product team is discussing whether to use deep learning for an image classification feature on AWS. During the discussion, several statements are made about deep learning, neural networks, and machine learning (ML).

Which TWO statements are INCORRECT?

Options:

  • A. A neural network is considered “deep” even if it has only a single layer of learnable weights.

  • B. Deep learning is not part of ML; it is a separate approach outside ML.

  • C. A neural network can be used for tasks such as classification and regression, depending on how it is trained and configured.

  • D. Neural networks are one family of ML models; deep learning typically uses neural networks with many layers.

  • E. ML includes a variety of model types, and deep learning is one approach within ML that often performs well on unstructured data like images and text.

  • F. Deep learning commonly uses neural networks with multiple layers to learn hierarchical representations from data.

Correct answers: A and B

Explanation: Deep learning is a subfield within machine learning that primarily uses neural networks with multiple layers to learn representations from data. A neural network becomes “deep” when it has multiple hidden layers (not just a single layer of learnable weights). The incorrect statements either place deep learning outside ML or misstate what “deep” means in deep learning.

The core idea is the relationship: ML is a broad set of methods for learning patterns from data, neural networks are one type of ML model, and deep learning is a subset of ML that typically uses neural networks with multiple hidden layers. Those stacked layers let the model learn increasingly abstract features (for example, edges r shapes r objects in images).

At a high level:

  • ML: umbrella category of learning algorithms and models.
  • Neural networks: a specific model family within ML.
  • Deep learning: neural networks with multiple layers that learn hierarchical representations.

A single-layer network (or a network with no meaningful hidden depth) does not match the common meaning of “deep” in deep learning. The key takeaway is that “deep” refers to the depth (multiple layers) of the neural network, and deep learning sits within ML.

  • Deep learning outside ML is incorrect because deep learning is a subfield within ML.
  • Single-layer is “deep” is incorrect because “deep” generally implies multiple hidden layers.
  • Hierarchical representations is accurate and describes why deep models work well for unstructured data.
  • Neural networks as an ML family is accurate and correctly relates the terms.

Question 43

Topic: Fundamentals of Genai

A company wants to standardize on Amazon Q Developer as an AI assistant inside IDEs to help engineers write, explain, and refactor code. The company must reduce the risk of sensitive-data exposure and meet audit requirements for who can use the tool.

Which TWO rollout practices are NOT appropriate? (Select TWO.)

Options:

  • A. Limit repository context access to least-privilege read-only and require reviews

  • B. Enable AWS CloudTrail and store logs centrally for audit retention

  • C. Use IAM Identity Center groups to control who can use Amazon Q Developer

  • D. Disable CloudTrail for Amazon Q Developer to reduce logging costs

  • E. Train developers not to include secrets or customer PII in prompts

  • F. Use one shared admin IAM user for all Amazon Q Developer access

Correct answers: D and F

Explanation: Amazon Q Developer is a managed AI coding assistant, so rollout should emphasize identity-based access control and strong governance. Unsafe practices are those that remove accountability (shared credentials) or reduce auditability (disabling CloudTrail). The safe approach is per-user authorization with least privilege and retained logs to support compliance investigations.

Amazon Q Developer helps with software development tasks (for example, generating code suggestions, answering coding questions, and refactoring) and should be governed like any other developer tool that can interact with proprietary code and environments. Good rollout focuses on controlling who can use it, limiting what data and resources it can access, and maintaining an audit trail.

Two practices clearly violate core governance principles:

  • Using a shared administrator identity removes individual accountability and typically grants excessive permissions, conflicting with least privilege.
  • Disabling CloudTrail reduces auditability and makes it difficult to demonstrate compliant use or investigate potential misuse.

The key takeaway is to combine per-user access control with logging and least-privilege scoping of any connected resources.

  • Shared credentials undermines per-user attribution and often results in overly broad permissions.
  • No audit logging prevents reliable monitoring and compliance evidence for tool usage.
  • Centralized SSO control supports least-privilege access and clean join/leave processes.
  • Prompt hygiene training helps reduce accidental leakage of secrets and regulated data.

Question 44

Topic: Fundamentals of AI and ML

Which TWO statements about data preprocessing for machine learning are INCORRECT? (Select TWO.)

Options:

  • A. Preprocessing should use only training data statistics to avoid leakage.

  • B. Cleaning inconsistent labels and duplicates reduces noise in training data.

  • C. Handling missing values can prevent training failures and biased patterns.

  • D. You should always delete any row with missing values.

  • E. Most ML algorithms automatically handle missing values and scale.

  • F. Normalization can help scale-sensitive models converge and compare features.

Correct answers: D and E

Explanation: Data preprocessing is needed because real-world datasets commonly include errors, inconsistent formats, missing values, and features on different scales. Many ML algorithms require complete numeric inputs and can be skewed by unscaled features. The incorrect statements overstate what algorithms handle automatically and promote an unsafe “always delete” approach to missing data.

Data preprocessing prepares raw data so an ML model can learn stable, meaningful patterns. Cleaning removes issues like duplicates, inconsistent labels, and malformed values that add noise and can mislead training. Handling missing values matters because many models cannot train with NaN values, and missingness itself can correlate with outcomes; common treatments include imputation and adding a missingness indicator rather than blindly deleting data.

Normalization (or standardization) is often necessary because features can be on very different scales (for example, dollars vs. counts). Scale-sensitive methods (such as gradient-based models and distance-based methods) can converge poorly or overweight large-magnitude features without it. Also, compute preprocessing parameters (means, mins/maxes) on the training set only to avoid data leakage into evaluation.

  • “Algorithms handle missing and scale” is incorrect because many models require imputation and can be scale-sensitive.
  • “Always delete missing rows” is unsafe because it can reduce data and introduce bias; imputation/indicators are common.
  • Normalization benefit is true: scaling can improve training stability and feature comparability.
  • Training-only statistics is true: fitting preprocessing on all data can leak information into validation/test.

Question 45

Topic: Guidelines for Responsible AI

A company uses an Amazon Bedrock-based chatbot for customer support. During a responsible AI review, leaders note that some harmful or biased behavior is subtle and context-dependent (for example, different tone or refusal patterns for similar requests). They want a control specifically suited to detecting these issues rather than only blocking known unsafe content.

Which approach best matches this need?

Options:

  • A. Conduct periodic human audits of sampled conversations

  • B. Rely on automated toxicity and bias scores in testing

  • C. Enable Bedrock Guardrails to block unsafe content

  • D. Monitor CloudWatch metrics and set operational alarms

Best answer: A

Explanation: Human audits and reviews are a key control for detecting subtle bias and harmful behavior because they apply human judgment to real interactions and edge cases. In this scenario, the discriminating factor is that the issues are context-dependent and may not be captured by automated filters or numerical scores.

Human audits and reviews (for example, red teaming, QA sampling, and structured bias reviews by diverse or independent reviewers) are used to detect harmful or biased behavior that is difficult to reliably encode as rules or measure with a single automated metric. By reviewing sampled prompts and responses, humans can spot patterns like disparate treatment, stereotyping, or uneven refusal/helpfulness across demographic cues.

Automated controls (such as safety filters, policy rules, and scoring classifiers) are valuable for prevention and scalable screening, but they can miss nuanced or novel issues and often require humans to interpret borderline cases and decide what is unacceptable. The key takeaway is that human review complements automated safeguards when the risk is subtle and context-dependent.

  • Guardrails are preventive: they can block or redact known unsafe categories but are not a substitute for auditing subtle bias patterns.
  • Scores aren’t sufficient alone: automated toxicity/bias scoring can help flag content but may miss context and requires human judgment to validate.
  • Ops monitoring is different: latency/error alarms address reliability, not harmful or biased behavior in responses.

Question 46

Topic: Applications of Foundation Models

When iterating on prompts for an Amazon Bedrock text model, which practices help you experiment, compare outputs, and refine prompts systematically? (Select TWO.)

Options:

  • A. Avoid logging outputs; rely on memory during iterations

  • B. Use a fixed evaluation set and rerun after each change

  • C. Increase temperature to reduce hallucinations

  • D. Use only the model’s self-grading to compare prompts

  • E. Change one prompt element at a time and version it

  • F. Judge prompt quality using only a single “best-case” example

Correct answers: B and E

Explanation: Systematic prompt iteration relies on controlled comparisons. Keeping a fixed evaluation set and making one change at a time (with version tracking) lets you attribute output differences to specific prompt edits and reliably decide which prompt performs better.

The core idea in prompt iteration is controlled evaluation: compare prompt variants under the same conditions so you can confidently attribute differences in outputs to the prompt change.

Good high-level practices include:

  • Start with a small, representative, fixed evaluation set (multiple inputs) and rerun it for every prompt revision.
  • Change one prompt factor at a time (instructions, examples, format constraints), and version your prompt and results so you can reproduce and roll back.
  • Compare outputs using consistent criteria (e.g., correctness, completeness, tone, formatting adherence), ideally with human review for a sample.

Tuning randomness (temperature) can change style/variance but is not a reliable way to improve factuality, and single-example or unlogged evaluations make comparisons untrustworthy.

  • ✔ Fixed evaluation set enables consistent, repeatable comparisons across iterations.
  • ✔ One change at a time with versioning makes results attributable and auditable.
  • ✖ Increasing temperature generally increases variability, not factual reliability.
  • ✖ Single-example or memory-only evaluation undermines systematic comparison.

Question 47

Topic: Applications of Foundation Models

Which statement is INCORRECT about high-level qualitative criteria used to evaluate GenAI outputs (helpfulness, accuracy, relevance, safety)?

Options:

  • A. Helpfulness evaluates whether the response addresses the user’s intent and is actionable.

  • B. Accuracy evaluates whether claims are factually correct and can be validated against reliable sources.

  • C. Relevance evaluates whether the response stays on-topic and uses any provided context appropriately.

  • D. Safety checks can be skipped if an answer is helpful and accurate.

Best answer: D

Explanation: The incorrect statement is the one that treats safety as optional when other criteria look good. Safety is a separate qualitative dimension that checks for harmful, disallowed, or sensitive outputs even when the response is useful, on-topic, and factually correct. Evaluations typically consider all four criteria together to understand overall output quality.

Qualitative evaluation of GenAI outputs commonly scores multiple dimensions because they capture different failure modes. In the scenario, helpfulness, accuracy, relevance, and safety are complementary: a response can be helpful but irrelevant to the user’s request, relevant but inaccurate (hallucinated), or accurate and relevant but still unsafe (for example, providing instructions for wrongdoing or disclosing sensitive data).

A simple way to apply these criteria is:

  • Helpfulness: does it solve the user’s task?
  • Accuracy: are the claims correct and verifiable?
  • Relevance: is it on-topic and grounded in provided context (if any)?
  • Safety: does it avoid harmful or prohibited content and sensitive disclosures?

Key takeaway: safety is not implied by the other criteria and must be evaluated explicitly.

  • Safety as optional is misleading because policy-violating or sensitive content can be accurate and still unsafe.
  • Helpfulness definition is appropriate because it focuses on meeting user intent and usefulness.
  • Accuracy definition is appropriate because it focuses on factual correctness and verifiability.
  • Relevance definition is appropriate because it focuses on staying on-task and properly using supplied context.

Question 48

Topic: Fundamentals of Genai

A retail company wants marketing analysts (no coding experience) to quickly experiment with building simple GenAI applications (for example, a product-description generator) and share the prototypes with teammates. The company does not need production integration or custom model training yet.

Which AWS offering best fits this low-code/no-code experimentation requirement?

Options:

  • A. AWS Amplify hosting with a custom web UI that calls Amazon Bedrock

  • B. Amazon Bedrock PartyRock

  • C. Amazon Bedrock InvokeModel API integrated into an AWS Lambda function

  • D. Amazon SageMaker AI Studio to develop and deploy a custom model endpoint

Best answer: B

Explanation: Amazon Bedrock PartyRock is designed as a low-code/no-code playground for quickly experimenting with GenAI application ideas and sharing prototypes. In this scenario, the key discriminator is enabling non-technical users to build and iterate without writing code or setting up infrastructure.

The deciding attribute is a low-code/no-code experience intended for rapid GenAI experimentation. Amazon Bedrock PartyRock provides a simple interface to assemble GenAI app workflows (prompts and components) and share working prototypes, which matches the need for marketing analysts to experiment without coding or deployment steps.

By contrast, using the Bedrock API (even with AWS Lambda) or building a hosted application with Amplify still requires software development and application integration work, and SageMaker AI Studio is oriented toward building/training and deploying ML solutions rather than no-code GenAI prototyping. The best match is the option optimized for no-code experimentation and sharing.

  • API-based integration using Lambda is for production-style invocation and requires code.
  • ML development environment in SageMaker AI targets building/deploying ML models, not no-code GenAI prototyping.
  • Custom web app with Amplify still requires developing and maintaining an application UI and backend logic.

Question 49

Topic: Fundamentals of AI and ML

Which TWO statements about common classification model performance metrics are INCORRECT? (Select TWO.)

Options:

  • A. F1 is best when true negatives matter most.

  • B. F1 is the harmonic mean of precision and recall.

  • C. AUC-ROC summarizes ranking across all classification thresholds.

  • D. AUC-ROC requires choosing one threshold like accuracy does.

  • E. Accuracy can look high on heavily imbalanced classes.

  • F. Accuracy is the fraction of all predictions that are correct.

Correct answers: A and D

Explanation: AUC-ROC measures how well a classifier ranks positives above negatives across all possible thresholds, so it does not require selecting one cutoff. F1 focuses on the balance between precision and recall and ignores true negatives, so it is not a good choice when true negatives (specificity) are most important.

At a high level, these metrics answer different questions for classification:

  • Accuracy is the proportion of all predictions that are correct; it can be misleading when classes are imbalanced because predicting the majority class can still yield a high accuracy.
  • AUC-ROC summarizes how well the model ranks positive examples higher than negative examples across many decision thresholds (0.5 is roughly random ranking; 1.0 is ideal).
  • F1 is the harmonic mean of precision and recall, so it’s useful when you care about balancing false positives and false negatives; it does not account for true negatives.

Key takeaway: AUC-ROC is threshold-independent, while F1 and accuracy are evaluated at a chosen threshold.

  • AUC needs one cutoff is wrong because AUC-ROC aggregates performance across thresholds, not at a single threshold.
  • F1 prioritizes true negatives is wrong because true negatives are not part of the F1 calculation.
  • Accuracy on imbalance is a known pitfall; high accuracy can hide poor minority-class performance.
  • AUC as ranking metric is accurate because it evaluates separation over thresholds.

Question 50

Topic: Applications of Foundation Models

When configuring an application to call a foundation model (FM) in Amazon Bedrock, what is the main effect of setting a low maximum output length?

Options:

  • A. The model can no longer use long input prompts or retrieved context

  • B. The model must be retrained to support longer responses

  • C. The model becomes more accurate because it is forced to be concise

  • D. Responses can be truncated, and allowing longer outputs can increase per-call cost

Best answer: D

Explanation: Maximum output length limits how many tokens the FM is allowed to generate in its response. If set too low, the model may stop early and return an incomplete answer. Increasing the limit can raise cost because many FM services charge based on the number of output tokens generated.

The maximum output length (often expressed as a maximum number of output tokens) is a generation parameter that caps the model’s response size. If this limit is set too low, the model can hit the cap and stop generating, which can truncate an otherwise correct response and make it seem incomplete. Because FM usage is commonly metered by tokens, allowing the model to produce more output tokens can also increase the cost (and often latency) of each request. A practical design tradeoff is to set the output limit high enough for completeness while keeping it low enough to control spend for the use case.

  • Input vs output tokens confuses the response cap with the prompt/context size limit.
  • Conciseness guarantee is incorrect because shorter outputs are not inherently more accurate.
  • Retraining requirement is incorrect because output length is a runtime inference setting, not training.

Questions 51-65

Question 51

Topic: Fundamentals of AI and ML

A retail company’s website shows “recommended for you” products. Today, a nightly batch job on a single Amazon EC2 instance generates the same top-10 list for each customer segment and stores it in Amazon S3. Traffic is increasing to millions of users, and the company wants recommendations that adapt to each user’s recent clicks within minutes, with minimal operational overhead and predictable cost.

Which change best meets these requirements?

Options:

  • A. Use Amazon Personalize for real-time, per-user recommendations

  • B. Cache the segment-based recommendation lists with Amazon CloudFront to reduce latency

  • C. Use a foundation model in Amazon Bedrock to generate recommendations from the full catalog for each page view

  • D. Keep the batch job but add an EC2 Auto Scaling group to run more nightly workers

Best answer: A

Explanation: Amazon Personalize is purpose-built to deliver scalable, individualized recommendations that can incorporate recent user interactions quickly. It reduces operational effort compared with maintaining custom recommender infrastructure and avoids the high per-request cost and latency of generating recommendations with a general-purpose foundation model.

The core value of AI/ML in this scenario is improving personalization at scale: recommendations should be tailored per user and refresh quickly as behavior changes, without the team operating custom training and serving stacks. Amazon Personalize is a managed AWS service designed for recommender use cases and can produce real-time, per-user results using interaction events (for example, clicks and purchases), while AWS handles scaling and much of the operational burden.

By contrast, scaling a nightly batch pipeline or caching segment-level lists improves throughput/latency but does not provide the required per-user, rapidly adapting personalization. Using a foundation model to generate recommendations per page view is typically higher latency and cost and is not optimized for large-scale ranking compared to a recommender service.

Key takeaway: choose managed recommender ML when the goal is scalable, behavior-driven personalization.

  • More batch workers scales the existing approach but still produces segment-level, slow-to-update recommendations.
  • LLM-generated recommendations can work but is usually dominated on cost/latency for high-volume ranking.
  • Caching reduces response time but does not improve personalization freshness or per-user relevance.

Question 52

Topic: Security, Compliance, and Governance for AI Solutions

Which governance framework is specifically intended to help an organization scope and assess security risks introduced by generative AI workloads at a high level (for example, by mapping risks across components such as data, model, prompts, and applications)?

Options:

  • A. Generative AI Security Scoping Matrix

  • B. AWS Artifact

  • C. AWS Well-Architected Framework

  • D. IAM Access Analyzer

Best answer: A

Explanation: The Generative AI Security Scoping Matrix is a GenAI-focused governance framework used to identify and scope security risks for generative AI solutions. It helps structure risk discussions by relating potential threats and controls to major GenAI components such as data sources, models, prompts, and consuming applications.

A key governance need for generative AI is quickly scoping what new risks exist and where they show up in the solution (for example, prompt injection, data leakage, and unsafe outputs). The Generative AI Security Scoping Matrix is intended for this high-level scoping: it provides a structured way to map GenAI security considerations to the major parts of a GenAI system (data, model/provider, orchestration/prompting, and application/integration). This helps teams identify which areas require additional controls, assurance, and ownership alignment before selecting detailed technical mitigations. In contrast, general architecture guidance, AWS account-level access analysis, or compliance report delivery do not specifically provide a GenAI risk-scoping framework.

  • General architecture guidance is broader and not GenAI risk-scoping specific.
  • Access analysis tooling focuses on identifying unintended IAM/resource access, not mapping GenAI risks across prompts/models/apps.
  • Compliance report portal provides audit artifacts and agreements, not a framework to scope GenAI security risks.

Question 53

Topic: Fundamentals of Genai

In a GenAI application built on Amazon Bedrock, the model’s output will be used to draft customer-facing responses and could affect refunds and account actions. Which statement best describes when human oversight (human-in-the-loop) is needed in this workflow?

Options:

  • A. Use Bedrock Guardrails so humans are not required for any outputs

  • B. Fine-tune the model so oversight is only needed during training

  • C. Enable RAG so the model never needs human review

  • D. Add a human review/approval step before high-impact actions or external responses

Best answer: D

Explanation: Human-in-the-loop means a person reviews, corrects, or approves GenAI outputs before they are sent to users or used to trigger consequential actions. It is most appropriate when errors could cause material harm, such as incorrect account changes, refunds, or policy violations.

Human-in-the-loop (HITL) is a control where a person reviews, edits, or approves GenAI outputs before the system treats them as final—especially for customer-facing communication or actions with financial, legal, or safety impact. In the refund/account-action scenario, HITL reduces risk from hallucinations, misinterpretation of policy, or unsafe content by requiring human validation at key decision points.

Guardrails and RAG can reduce unsafe or incorrect outputs, but they do not guarantee correctness; HITL provides an explicit accountability checkpoint when the business impact of mistakes is high. The key takeaway is to use HITL when outputs drive consequential decisions or require high confidence.

  • RAG guarantee misconception: RAG can ground answers in retrieved data but does not eliminate errors or misapplication of policy.
  • Guardrails replace humans: Guardrails help filter or constrain content, but they are not a substitute for approval in high-stakes use.
  • Fine-tuning limits oversight: Fine-tuning may improve task performance, but review is still needed when outputs can cause material harm.

Question 54

Topic: Fundamentals of AI and ML

A product team is reviewing an ML deployment note.

Exhibit: Run summary

1) Training container: XGBoost built-in
2) Training method: gradient boosting decision trees
3) Output artifact: s3://ml-bucket/churn/model.tar.gz
4) Artifact contents: trained trees + learned weights
5) Inference endpoint loads: model.tar.gz

Based on the exhibit, which interpretation best distinguishes the model from the algorithm at a high level?

Options:

  • A. Algorithm is the S3 bucket; model is the endpoint

  • B. Algorithm is the learned weights; model is the training container

  • C. Algorithm is model.tar.gz; model is gradient boosting

  • D. Algorithm is XGBoost; model is the trained model.tar.gz

Best answer: D

Explanation: The exhibit separates the learning procedure from the learned result. The algorithm is the method used during training (XGBoost / gradient boosting decision trees in lines 1–2). The model is the trained artifact produced by that process and loaded for inference (model.tar.gz in lines 3–5).

At a high level, an algorithm is the procedure used to learn from data, and a model is the learned representation produced by running that procedure.

In the exhibit:

  • Lines 1–2 (“XGBoost built-in” and “gradient boosting decision trees”) describe the training method, which corresponds to the algorithm.
  • Lines 3–5 identify a concrete output (“model.tar.gz”), describe what it contains (“trained trees + learned weights”), and show it is loaded by the endpoint for inference—this is the model.

A quick check is: if it’s reused to make predictions after training, it’s the model artifact; if it describes how learning happens, it’s the algorithm.

  • Reverses terms mislabels the training method (lines 1–2) as the model and the output artifact (lines 3–5) as the algorithm.
  • Confuses infrastructure with ML concepts treats storage and endpoints as the model/algorithm, but they only host and serve the artifact.
  • Mixes components calls the learned weights an algorithm even though line 4 shows they are part of the trained artifact (the model).

Question 55

Topic: Fundamentals of AI and ML

A company is building a customer churn prediction model in Amazon SageMaker AI. The training data in Amazon S3 includes usage metrics, billing fields, support history, and some customer PII (names, emails). The team proposes the following feature engineering actions to improve model performance.

Which TWO actions are NOT appropriate because they are anti-patterns or unsafe governance/security practices? (Select TWO.)

Options:

  • A. Normalize numeric fields such as monthly charges before training

  • B. Engineer features in a local spreadsheet and overwrite the S3 dataset without versioning or an audit trail

  • C. Bucket customer tenure into ranges to reduce outlier sensitivity

  • D. One-hot encode contract type and payment method

  • E. Use raw customer email address and full name as model input features

  • F. Create a feature for total support tickets in the last 90 days

Correct answers: B and E

Explanation: Feature engineering improves model performance by transforming raw fields into more informative and well-behaved inputs (for example, aggregates, encodings, and scaled values). However, feature creation must follow responsible data use: avoid unnecessary PII and maintain traceable, auditable data preparation. Actions that increase privacy exposure or break lineage and auditability are anti-patterns.

Feature engineering is the process of converting raw data into features that help a model learn relevant patterns (for example, by adding signal, reducing noise, or making values easier for an algorithm to use). In this churn scenario, common high-level feature engineering includes creating aggregates from event history, encoding categorical variables into numeric representations, and applying scaling or bucketing to stabilize learning.

Governance matters during feature engineering because the same steps can introduce risk:

  • Prefer derived, task-relevant features over direct identifiers (data minimization).
  • Keep transformations reproducible with versioning and lineage so results can be audited and explained.

A key takeaway is that effective feature engineering should improve predictive signal while also reducing unnecessary sensitivity and preserving traceability.

  • Aggregation and history signals like ticket counts are common engineered features that can add predictive signal.
  • Encoding categories (for example, one-hot encoding) is a standard way to make categorical inputs usable by many models.
  • Scaling or bucketing numeric inputs can improve training stability and reduce sensitivity to extreme values.
  • Identifiers and untracked transforms are risky because they add privacy exposure and remove reproducibility/auditability.

Question 56

Topic: Fundamentals of Genai

A company is exploring GenAI on AWS for multiple business processes. The company wants to avoid using GenAI in situations that require fully deterministic outcomes, strict constraints, and minimal tolerance for errors.

Which TWO situations are poor fits for GenAI at a high level? (Select TWO.)

Options:

  • A. Summarize support chats while masking PII using guardrails

  • B. Make final loan approvals/denials under audited regulatory rules

  • C. Generate multiple marketing taglines for an upcoming campaign

  • D. Answer employee policy questions using a cited internal knowledge base

  • E. Let GenAI automatically decide and block card transactions in real time

  • F. Draft clinician visit summaries for review before filing

Correct answers: B and E

Explanation: GenAI is a poor fit when the task demands deterministic, tightly constrained decisions with very low error tolerance, especially in regulated or high-stakes workflows. In those cases, a rules-based or traditional deterministic system (often with explicit approvals) is more appropriate than probabilistic text generation.

The core limitation is that GenAI outputs are probabilistic and can be inconsistent or hallucinate, which makes it unsuitable as the decision-maker for workflows that require strict determinism and auditability. In regulated, high-impact decisions (for example, credit decisions) organizations typically need traceable logic, repeatable outcomes, and clear accountability.

GenAI is usually a better fit for assistive tasks such as drafting, summarizing, searching, and explaining content—especially when you add controls like citations (RAG), guardrails, and human review. Key takeaway: use GenAI to assist humans and deterministic systems, not to replace them in high-stakes deterministic enforcement or adjudication.

  • ✔ Make final loan approvals/denials under audited regulatory rules: requires deterministic, auditable decisioning and consistent outcomes.
  • ✖ Generate multiple marketing taglines for an upcoming campaign: open-ended creative generation is a common GenAI use case.
  • ✖ Draft clinician visit summaries for review before filing: acceptable as an assistive draft with human validation.
  • ✔ Let GenAI automatically decide and block card transactions in real time: enforcement decisions need strict, predictable behavior with minimal error tolerance.
  • ✖ Answer employee policy questions using a cited internal knowledge base: well-suited to RAG with citations and access controls.
  • ✖ Summarize support chats while masking PII using guardrails: summarization with safety controls is a typical GenAI pattern.

Question 57

Topic: Guidelines for Responsible AI

A company uses a closed, third-party LLM API for an internal knowledge assistant. New governance requirements say the team must be able to inspect the model artifacts for documentation (to improve transparency) and must confirm the model can be used commercially in customer-facing features. The team also wants to reduce ongoing operational effort.

Which change best meets these requirements while optimizing the solution?

Options:

  • A. Download an open-source model from the internet and deploy it on self-managed EC2 without reviewing the license

  • B. Adopt an open-source model in SageMaker JumpStart and verify its license permits commercial use and required attributions

  • C. Fine-tune the closed LLM API on company data to improve explainability

  • D. Keep the closed LLM API and rely on the vendor’s model card for transparency

Best answer: B

Explanation: Using an open-source model can improve transparency because the team can inspect model artifacts (such as weights and documentation) instead of relying solely on a vendor. Verifying the license terms at a high level (commercial use, redistribution/derivatives, and attribution requirements) ensures the model can be used legally in the intended product. Deploying through SageMaker JumpStart reduces operational effort compared with self-managed hosting.

Open-source models can support transparency and explainability goals because organizations can review available artifacts (for example, model code, weights, and accompanying documentation) and better document how the model is used. However, “open-source” does not automatically mean “free to use for any purpose.” Before adopting a model, teams should verify license terms at a high level, such as whether commercial use is allowed, whether attribution is required, and whether there are restrictions on redistribution or creating derivative works.

Using a managed option like SageMaker JumpStart can further optimize the solution by reducing ongoing operational work (packaging, deployment tooling, and scaling) compared with running everything on self-managed infrastructure. The key takeaway is to pair transparency benefits from open-source with an explicit license check for the intended use.

  • Model card only can help documentation but does not provide the same inspectability as open model artifacts.
  • Fine-tuning a closed model might improve task performance but does not change the lack of access to underlying model artifacts for transparency.
  • Skip license review creates compliance risk even if the model is technically open-source, and self-managed EC2 increases operational burden.

Question 58

Topic: Security, Compliance, and Governance for AI Solutions

A company is preparing an AI model for internal use. The team uses Amazon SageMaker Model Cards to record the model’s intended use, training data origins, evaluation results, and known limitations so that reviewers and auditors can understand the model’s context.

Which responsible AI principle does this practice most directly support?

Options:

  • A. Least privilege for controlling who can invoke the model

  • B. Defense in depth by adding multiple security layers

  • C. Data minimization by reducing the amount of data collected

  • D. Transparency through clear model documentation

Best answer: D

Explanation: Amazon SageMaker Model Cards are used to document key information about a model, including its intended use, training data origins, evaluation metrics, and limitations. This documentation helps stakeholders understand how and when the model should be used, which is primarily a transparency practice.

The core principle is transparency: making an AI system understandable to stakeholders through clear, consistent documentation. Amazon SageMaker Model Cards provide a structured way to capture model context such as intended use, training and evaluation datasets (data origins/provenance), performance characteristics, and known limitations or risks. This supports governance reviews and audits because reviewers can trace what the model was designed to do and what it should not be used for. The key takeaway is that Model Cards focus on explainability and disclosure of model facts, not on access control or layered security mechanisms.

  • Access control focus is about limiting permissions (least privilege), not documenting intent and limitations.
  • Layered protection describes defense in depth (multiple controls), not model documentation.
  • Collect less data is data minimization, which is different from recording data origins and limitations.

Question 59

Topic: Applications of Foundation Models

Which statement best describes model distillation and why it is used in generative AI workloads?

Options:

  • A. Training a smaller student model to mimic a larger teacher model to reduce latency and cost

  • B. Retrieving enterprise documents at inference time to ground responses and reduce hallucinations

  • C. Adding safety filters to block restricted content in model inputs and outputs

  • D. Fine-tuning a foundation model on labeled data to improve accuracy on a specific task

Best answer: A

Explanation: Model distillation is a compression technique where a smaller “student” model learns to reproduce the outputs/behavior of a larger “teacher” model. It is used to create models that are faster and cheaper to run (for example, for lower-latency inference) while aiming to retain much of the teacher model’s quality.

Distillation is a training approach used to produce a smaller or more efficient model by having it learn from a larger, higher-quality model. In a typical setup, a large “teacher” model generates target outputs (often probability distributions or generated text), and a smaller “student” model is trained to match the teacher’s behavior. The practical reason to use distillation in foundation model applications is efficiency: smaller models usually have lower inference latency and require fewer compute resources, which can reduce serving cost and help meet performance constraints. Distillation is different from fine-tuning for task specialization, retrieval-augmented generation for grounding with external knowledge, and guardrails for safety controls. The key takeaway is that distillation primarily targets making models smaller/faster while preserving behavior.

  • Fine-tuning vs distillation fine-tunes for task adaptation, not primarily to make a smaller model.
  • RAG grounding improves factuality by fetching documents at inference time, not by compressing a model.
  • Safety controls are enforced by guardrails/filters and do not create a smaller model.

Question 60

Topic: Guidelines for Responsible AI

A retail bank uses a generative AI solution on AWS to draft loan approval recommendations from applicant documents. The bank is concerned about biased outcomes that could disadvantage protected groups and damage customer trust.

Which action should the bank AVOID?

Options:

  • A. Continuously monitor approval rates by segment to detect drift

  • B. Evaluate recommendations for bias across protected groups before launch

  • C. Keep a human loan officer as the final decision maker for edge cases

  • D. Automate approvals with no bias evaluation or applicant appeal process

Best answer: D

Explanation: The anti-pattern is deploying the model to make high-stakes decisions without measuring disparate impact, without human oversight, and without an appeal path. Biased outputs can disproportionately harm protected groups and quickly erode customer trust. Responsible AI requires evaluation, accountability, and ongoing monitoring for bias over time.

Bias risk is especially critical in high-stakes domains like lending because model outputs can create disparate impact for protected groups (for example, systematically lower approval rates) and undermine customer trust. A responsible approach is to treat model output as decision support, not an unquestioned decision maker, and to put controls in place to detect and reduce bias.

Practical steps include:

  • Evaluate for bias before deployment (for example, with Amazon SageMaker Clarify) using representative data and fairness metrics.
  • Add human oversight and an appeals/review process to improve accountability and reduce harm.
  • Monitor post-deployment outcomes (approval rates and output patterns) to detect drift and emerging bias.

The key takeaway is that “deploy and trust the model” without evaluation and recourse is a clear responsible-AI failure.

  • Pre-deployment bias testing is appropriate because it helps identify disparate impact before customers are affected.
  • Human-in-the-loop oversight is appropriate because it adds accountability for high-stakes decisions and reduces harm from erroneous or biased outputs.
  • Ongoing monitoring is appropriate because bias can emerge over time due to data drift or changing populations, impacting fairness and trust.

Question 61

Topic: Guidelines for Responsible AI

A financial services company will fine-tune a foundation model on historical customer-support chats. An internal audit requirement states: “We must be able to prove which approved source systems contributed to the training data and remove a specific customer’s records if requested.”

Which approach best meets this requirement?

Options:

  • A. Fine-tune mostly on synthetic examples; skip tracking original sources

  • B. Fine-tune on all available logs; discard the raw source records

  • C. Fine-tune using a versioned, curated dataset with documented lineage

  • D. Use RAG and ignore training-data lineage for the fine-tune

Best answer: C

Explanation: The deciding factor is data provenance for fine-tuning: the company must trace training examples back to approved sources and be able to delete specific contributed records. A curated, versioned dataset with documented lineage preserves an auditable chain of custody and supports reproducible training runs.

When training or fine-tuning, curated data sources and data provenance are essential for responsible AI because model behavior is influenced by what you train on. In this scenario, the requirement is an auditable record of which approved systems contributed data and the ability to honor deletion requests, which both depend on traceability.

A strong high-level pattern is:

  • Curate data from approved sources only (quality, consent/licensing, scope).
  • Preserve lineage metadata that links training examples back to source records.
  • Version the exact dataset snapshot used for each fine-tuning run for reproducibility.

The closest alternative is using RAG, which can improve response transparency at retrieval time, but it does not satisfy a requirement to prove and manage the provenance of the fine-tuning dataset itself.

  • Discarding raw sources breaks the audit trail needed to prove which systems contributed data.
  • Synthetic-only focus can still introduce risk and does not eliminate the need for provenance of any real data used.
  • RAG instead of provenance may help cite retrieved documents, but it doesn’t govern fine-tuning data lineage.

Question 62

Topic: Guidelines for Responsible AI

A financial services company is building a customer-facing chatbot using a foundation model in Amazon Bedrock. The company must prevent users from submitting or receiving personally identifiable information (PII), block harmful or offensive content, and ensure the bot answers only from approved policy documents with clear citations.

Which solution BEST meets these requirements with the least operational overhead?

Options:

  • A. Encrypt all prompts and responses with AWS KMS customer managed keys

  • B. Use Amazon Bedrock Guardrails with sensitive info, topic, and grounding controls

  • C. Run Amazon Comprehend on responses to detect toxicity before displaying them

  • D. Fine-tune a custom model in Amazon SageMaker AI to follow company policies

Best answer: B

Explanation: Amazon Bedrock Guardrails is designed to constrain both user inputs and model outputs with configurable policies. It can help filter harmful content, detect and handle sensitive information such as PII, and apply grounding controls to keep responses tied to approved sources with citations. This directly addresses responsible AI requirements without building custom filtering pipelines.

Bedrock Guardrails provides policy-based controls that sit around model invocation to help enforce responsible AI behavior at runtime. In this chatbot, the key needs are to constrain what users can ask (inputs) and what the model can return (outputs), including preventing PII exposure and restricting the assistant to an approved knowledge scope with citations.

With Guardrails you can:

  • Apply input/output content filters (for example, hate/offensive content).
  • Detect and handle sensitive information such as PII.
  • Use contextual grounding controls to keep responses aligned to approved sources and support citations.

Encryption or after-the-fact analysis can help with security or detection, but they don’t reliably enforce safe, scoped responses during generation.

  • Encryption is not behavior control because KMS protects data at rest/in transit but does not filter unsafe prompts or unsafe model outputs.
  • Post-processing only with a separate NLP service can detect issues after generation and may miss PII or scope violations without comprehensive blocking.
  • Custom fine-tuning increases effort and still benefits from runtime guardrails for consistent input/output constraints.

Question 63

Topic: Security, Compliance, and Governance for AI Solutions

A company is building a GenAI assistant for its compliance team to answer questions about internal HR and security policies. The team requires each answer to include source citations, and auditors must be able to trace responses back to the exact document location and version used. The solution must keep sensitive documents private and minimize custom development.

Which solution BEST meets these requirements?

Options:

  • A. Use Amazon Comprehend to extract entities and generate answers from them

  • B. Use Amazon Bedrock Knowledge Bases (RAG) with S3 versioning and citations

  • C. Fine-tune a foundation model on policies and disable retrieval

  • D. Use Bedrock Guardrails only to enforce safe, policy-compliant responses

Best answer: B

Explanation: Source citation and data provenance require the system to show where each claim came from and to preserve a traceable link back to the original records (including versions). Amazon Bedrock Knowledge Bases supports retrieval-augmented generation over private enterprise data and can return citations to the underlying source chunks, helping auditors validate exactly which documents were used.

Source citation is the ability to show users the specific documents (and passages) used to produce an answer, while data provenance is the end-to-end traceability of where that information originated (location, ownership, and version/history). In this scenario, auditors need repeatable evidence of which exact policy document version supported each response.

Amazon Bedrock Knowledge Bases implements RAG over a controlled corpus (for example, documents stored privately in Amazon S3) and can include source references/citations in responses. Pairing this with S3 versioning and consistent metadata helps preserve an auditable lineage from an answer back to the exact stored object version and its source, improving trust, compliance review, and change tracking compared with “black box” generation.

  • Fine-tuning instead of citations makes the model’s outputs harder to trace to specific documents and versions.
  • Guardrails only can restrict unsafe content but does not provide grounding, citations, or provenance to specific sources.
  • Entity extraction summarizes text into entities but does not produce grounded, citable answers with document-level traceability.

Question 64

Topic: Fundamentals of AI and ML

A company is building a regression model in Amazon SageMaker AI to predict equipment failures. The training data comes from multiple IoT systems and includes duplicate records, different numeric scales (for example, temperature in °C and vibration in mm/s), and some missing sensor values.

Which statement about data preprocessing is INCORRECT?

Options:

  • A. Skip preprocessing because managed ML algorithms will automatically clean, normalize, and handle missing values without impact.

  • B. Clean the dataset to remove duplicates and obvious data-entry errors before training.

  • C. Normalize or standardize numeric features so differences in scale do not skew learning.

  • D. Handle missing values (impute or remove) and record the approach to reduce training issues.

Best answer: A

Explanation: Data preprocessing is necessary because most ML models assume consistent, representative input data. Cleaning, scaling/normalization, and handling missing values help prevent biased learning, unstable training, and misleading performance metrics. Treating raw exports as “model-ready” is a common anti-pattern that leads to unreliable results.

A key step in the ML development lifecycle is preparing data so the model learns signal rather than artifacts. Real-world datasets often contain errors (duplicates, invalid values), mixed scales (one feature numerically dominates others), and missing values (which many algorithms cannot use directly or may treat inconsistently). Preprocessing addresses these issues by improving data quality and making feature distributions comparable, which generally leads to more stable training and more trustworthy evaluation.

Typical high-level preprocessing includes:

  • Cleaning: remove duplicates, fix obvious errors
  • Handling missing values: impute, remove, or flag missingness
  • Normalization/standardization: put numeric features on comparable scales

The key takeaway is that preprocessing reduces noise and bias so model performance reflects real patterns, not data problems.

  • Basic data cleaning is a standard first step to reduce noise from duplicates and obvious errors.
  • Feature scaling is commonly needed so large-magnitude features don’t disproportionately influence training.
  • Missing-value handling is necessary because absent data can break training or introduce unintended bias if left unmanaged.
  • Assuming algorithms auto-fix raw data is risky; “managed” does not mean data quality issues are automatically resolved without impact.

Question 65

Topic: Fundamentals of Genai

Which statement best describes where feedback loops and iteration occur in the foundation model (FM) lifecycle and why they matter?

Options:

  • A. They occur only during initial model pretraining because that is when the model learns from data

  • B. They are generally unnecessary once a model is selected because inference behavior is fixed

  • C. They continue after deployment by monitoring outputs and user feedback to refine prompts/guardrails and improve quality and safety

  • D. They are needed only when building a supervised ML model with labeled data, not for FMs

Best answer: C

Explanation: Feedback loops in an FM lifecycle are not a one-time step; they happen repeatedly, including after deployment. Monitoring real prompts, responses, and user feedback enables iterative improvements (for example, prompt and guardrail adjustments) to maintain output quality and reduce harmful or inaccurate behavior.

The core idea is that FM applications are improved through iteration based on evaluation results and real-world usage signals. Teams typically iterate during development (testing prompts, retrieval strategies, and safety controls) and also after release (monitoring production outputs and collecting user feedback).

This matters because model behavior can be unpredictable across different inputs and can change in practice as prompts, data sources, or user behavior evolves. Using feedback loops helps identify failure modes (hallucinations, policy violations, poor relevance) and then refine the application layer (prompts, RAG, guardrails) or, when appropriate, update the model configuration to improve quality and safety over time.

  • Pretraining only is too narrow; iteration is also needed during evaluation and after release.
  • Only supervised learning is incorrect; GenAI apps benefit from human and system feedback even without labeled datasets.
  • Inference is fixed ignores that application-level controls and configurations are routinely adjusted based on observed results.

Continue with full practice

Use the AWS AIF-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AWS AIF-C01 on Web View AWS AIF-C01 Practice Test

Focused topic pages

Free review resource

Read the AWS AIF-C01 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.

Revised on Thursday, May 14, 2026