Prepare for AWS Certified Generative AI Developer - Professional (AIP-C01) with free sample questions, a full-length diagnostic, topic drills, timed practice, Amazon Bedrock, RAG, agents, safety, governance, optimization, production troubleshooting, and detailed explanations in IT Mastery.
AIP-C01 is AWS Certified Generative AI Developer - Professional. It validates advanced technical expertise in building and deploying production-ready generative AI solutions on AWS, especially solutions that integrate foundation models into applications and business workflows using services such as Amazon Bedrock.
IT Mastery practice for AIP-C01 is live now. Use this page to start the web simulator, review the exam snapshot, work through 24 public sample questions, and continue into full IT Mastery practice with the same IT Mastery account on web, iOS, iPadOS, macOS, or Android.
Start a practice session for AWS Certified Generative AI Developer - Professional (AIP-C01) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.
Open Full App in a New TabA small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same app-family account they use on web and mobile.
Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same IT Mastery account across web and mobile.
Free diagnostic: Try the 75-question AWS AIP-C01 full-length practice exam before subscribing. Use it as one professional-level GenAI baseline, then return to IT Mastery for timed mocks, domain drills, explanations, and the full AIP-C01 question bank.
AIP-C01 questions reward production-grade GenAI decisions: choosing the right foundation model integration pattern, grounding and evaluating outputs, securing data and identities, controlling cost and latency, and troubleshooting deployed AI workflows.
| Domain | Weight |
|---|---|
| Foundation Model Integration, Data Management, and Compliance | 31% |
| Implementation and Integration | 26% |
| AI Safety, Security, and Governance | 20% |
| Operational Efficiency and Optimization for GenAI Applications | 12% |
| Testing, Validation, and Troubleshooting | 11% |
Use these filters before choosing between two advanced Bedrock or GenAI architecture options:
| Area | What strong readiness looks like |
|---|---|
| FM integration, data, and compliance | You can design grounded GenAI applications that respect data boundaries, compliance expectations, and model-access constraints. |
| Implementation and integration | You can connect Bedrock, agents, tools, APIs, vector stores, prompt workflows, and application code into production workflows. |
| Safety, security, and governance | You can apply guardrails, IAM, encryption, monitoring, human oversight, and responsible-AI controls to risky outputs. |
| Operational efficiency | You can optimize latency, cost, throughput, model selection, token usage, and scaling without weakening quality or safety. |
| Testing and troubleshooting | You can diagnose bad retrieval, poor prompts, unsafe outputs, model mismatch, missing permissions, and weak evaluation design. |
| Day | Practice focus |
|---|---|
| 7 | Take the free full-length diagnostic and separate misses into architecture, implementation, governance, optimization, and troubleshooting. |
| 6 | Drill foundation-model integration, RAG, vector search, knowledge bases, data handling, and compliance boundaries. |
| 5 | Drill agents, tool use, prompt workflows, API integration, orchestration, and application implementation scenarios. |
| 4 | Drill safety, guardrails, responsible AI, IAM, encryption, monitoring, and audit controls. |
| 3 | Drill cost, latency, token usage, model selection, scaling, and operational optimization decisions. |
| 2 | Complete a timed mixed set and explain the production failure mode or trade-off behind each miss. |
| 1 | Review only weak GenAI patterns and troubleshooting signals; avoid late memorization of unfamiliar feature details. |
If you can score above roughly 75% on several unseen mixed attempts and explain how each answer handles grounding, safety, operations, and cost, you are ready to treat the exam as a reasoning test rather than a memorization exercise. More practice should improve production GenAI judgment, not just recognition of repeated scenarios.
Use these child pages when you want focused IT Mastery practice before returning to mixed sets and timed mocks.
Need concept review first? Read the AWS AIP-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks, topic drills, and full IT Mastery practice.
These are original IT Mastery practice questions aligned to AWS generative AI solution design, Amazon Bedrock service selection, evaluation, security, deployment, and operational decisions. They are not AWS exam questions and are not copied from any exam sponsor. Use them to check readiness here, then continue in IT Mastery with mixed sets, topic drills, and timed mocks.
Topic: Content Domain 2: Implementation and Integration
A developer is building a browser-based chat application on Amazon Bedrock. After the user submits a prompt over HTTPS, the UI must render partial model output as it is generated. The connection path supports long-lived HTTP responses but blocks WebSocket upgrades, and the application does not need bidirectional messages during generation. Which integration pattern best matches this requirement?
Best answer: A
Explanation: The requirement is real-time, one-way token delivery to a browser over an HTTP-compatible path. Amazon Bedrock streaming APIs can emit partial output, and server-sent events or chunked transfer can relay those chunks to the client without requiring WebSocket support. For browser chat responses that need progressive rendering, the key concept is streaming the foundation model output and preserving that stream through the application layer. Bedrock streaming APIs such as ConverseStream or InvokeModelWithResponseStream return incremental chunks from the model. The backend can translate those chunks into server-sent events or an HTTP chunked response so the browser can update the UI as data arrives. This fits the stated network constraint because SSE uses standard HTTP rather than a WebSocket upgrade. WebSockets are useful when bidirectional, low-latency messaging is required, but they are not the best fit when upgrades are blocked and only server-to-client updates are needed.
Topic: Content Domain 5: Testing, Validation, and Troubleshooting
A company operates a custom RAG support assistant. S3 manuals are chunked, embedded by an ingestion Lambda function, and stored in an OpenSearch Service vector index. A query Lambda function embeds user questions, runs k-NN search, and calls Amazon Bedrock. Since a release, answers are hallucinated and Recall@5 dropped from 0.84 to 0.46. Logs show the new ingestion job uses a different embedding model and normalization step, while the query Lambda function and 70% of existing vectors still use the previous embedding model. Both models produce the configured vector dimension, so searches do not error.
Which change fixes the root cause with the smallest safe change?
Best answer: C
Explanation: The retrieval system is using mixed embedding spaces. Even with matching vector dimensions, similarity scores become unreliable when documents and queries are embedded with different models or preprocessing. The smallest safe fix is to rebuild the index with one approved embedding pipeline and cut over consistently. Symptom: hallucinated answers and a sharp Recall@5 drop appeared after the ingestion release, while searches still returned results. Root cause: the vector index now contains embeddings created with different model and preprocessing configurations, so k-NN similarity is no longer comparing vectors from one semantic space. Fix: create a new index, re-embed all chunks with the same embedding model and normalization used by the query path, then cut over the application to that index. Increasing retrieval volume or changing the generator does not repair the retrieval signal.
Topic: Content Domain 3: AI Safety, Security, and Governance
A development team is defining safety controls for a customer support assistant that uses Amazon Bedrock. The assistant must block prompt injection attempts before retrieval, enforce content policy during model invocation, and redact sensitive values before responses are returned. Which principle best maps to this requirement?
Best answer: A
Explanation: The requirement calls for layered safety controls across the full GenAI flow. Defense in depth applies preprocessing checks, model-invocation controls such as Bedrock Guardrails, and post-processing validation or redaction so one missed detection does not become a user-facing failure. For GenAI safety, defense in depth means placing complementary controls at multiple stages: input validation and prompt-injection screening before retrieval or tool use, policy enforcement during model invocation, and output validation, schema checks, citation checks, or PII redaction before returning a response. This pattern is stronger than relying only on a single guardrail because different controls catch different failure modes and provide better governance evidence. The key takeaway is to design safety as a lifecycle pattern, not as one isolated runtime filter.
Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance
A company runs an employee benefits assistant on Amazon Bedrock. The team is reviewing a high-latency conversation turn and must reduce token use without degrading continuity or retaining unnecessary sensitive data.
Exhibit: Bedrock invocation context
Session: emp-4421
Prompt tokens: 15,920 of 16,000
Included context:
- Raw 12-turn transcript: 8,600 tokens
- User preference: "brief answers"
- Identity-check SSN fragment: ***-**-1234; PII detected
- HR policy chunks: 5 chunks from same parental-leave guide, 6,900 tokens
User asks: "Which forms do I need for parental leave?"
What is the best next step?
Best answer: C
Explanation: The best action classifies each context type by its value and risk. The raw transcript should be summarized to preserve continuity, the small preference can be stored, policy facts should be retrieved as needed, and the SSN fragment should be discarded after identity verification. Conversational context should be managed by separating durable user experience signals, short-term conversation state, external knowledge, and sensitive one-time data. The decisive exhibit details are the 15,920-token prompt, the 8,600-token raw transcript, the reusable preference, the 6,900-token policy chunks, and the PII detection on the SSN fragment. A compact conversation summary reduces tokens while preserving continuity. The preference is small and useful for future turns, so it can be stored as a session or profile attribute under policy. HR policy content should be retrieved from the authoritative knowledge source instead of repeatedly carrying large chunks. The SSN fragment has no ongoing prompt value and creates privacy risk, so it should be discarded or redacted after the identity workflow.
Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications
A company is tuning inference settings for an Amazon Bedrock knowledge assistant that answers HR policy questions from retrieved documents. Compliance requires the same question and retrieved context to produce nearly the same wording and no speculative alternatives. Which parameter tuning principle best maps to this requirement?
Best answer: D
Explanation: The requirement prioritizes determinism over creativity. Lowering temperature and constraining top-p and top-k reduce sampling randomness, making outputs more repeatable when the same prompt and retrieved context are used. Temperature controls how random token selection is during generation. Top-p and top-k further limit the token candidates the model may sample from. For an HR policy assistant with audit and compliance needs, the better pattern is to reduce creativity by using a low temperature and narrower sampling controls. This does not guarantee factuality by itself, but it helps make the model less variable when the grounding context is unchanged. Creative drafting or brainstorming workloads would typically use higher temperature and broader sampling instead.
Topic: Content Domain 2: Implementation and Integration
A team is building several Amazon Bedrock-based agents that need to call the same internal CRM, ticketing, and policy lookup tools. The team wants a consistent pattern for discovering and invoking tools without writing separate adapters for each agent framework. Which statement best defines the role of an MCP client library?
Best answer: B
Explanation: An MCP client library provides the application-side interface for Model Context Protocol. It lets FMs or agents access tools, resources, or prompts exposed by MCP servers through a consistent protocol rather than custom integration code for each tool. Model Context Protocol separates the agent application from tool implementation details. In this pattern, tools such as CRM lookups or ticket updates are exposed by MCP servers, and the agent uses an MCP client library to discover available capabilities and invoke them through a standard interface. This improves portability across agent frameworks and reduces custom adapter logic. It does not replace retrieval storage, evaluation, or safety controls; those are separate responsibilities in a production GenAI application.
Topic: Content Domain 5: Testing, Validation, and Troubleshooting
An enterprise is preparing to promote version 2025.03 of an Amazon Bedrock RAG support assistant from a 5% canary to broad release. The CI/CD pipeline runs synthetic workflows, hallucination tests, and semantic drift checks against approved baselines.
Exhibit: Pre-release validation summary
| Check | Release gate | Candidate result | Status |
|---|---|---|---|
| Synthetic workflow success | >=95% | 97% | Pass |
| Unsupported claim rate | <=2% | 6.8% | Fail |
| Semantic similarity to baseline intents | >=0.90 | 0.82 | Fail |
What is the best next step?
Best answer: A
Explanation: The validation run has multiple hard gate failures. Even though synthetic workflow success passed, the unsupported claim rate and semantic similarity checks failed. Broad release should be blocked until the team remediates the deployment and reruns validation. Pre-release validation should treat hallucination and semantic drift checks as automated quality gates, not as advisory metrics after launch. In the exhibit, the canary candidate passes workflow success at 97%, but it fails unsupported claim rate at 6.8% against a <=2% gate and fails semantic similarity at 0.82 against a >=0.90 gate. That means the deployment could produce ungrounded responses and has drifted from approved intent behavior. The safe next step is to stop promotion, investigate the prompt or retrieval changes, remediate, and rerun the validation pipeline. Monitoring a larger rollout would expose more users to known failures instead of preventing them.
Topic: Content Domain 3: AI Safety, Security, and Governance
A SaaS company is deploying a customer-support assistant on Amazon Bedrock in us-east-1. The assistant uses a Bedrock Knowledge Base backed by OpenSearch Serverless and calls internal ticket APIs with OAuth bearer tokens. Security requires response-level audit evidence within 5 minutes, including request ID, tenant ID, model ID, guardrail action, retrieval source IDs, tool name/status, and redacted response text. OAuth bearer tokens, session cookies, and API keys must not be stored in logs, and audit data must stay in us-east-1 encrypted with a customer managed KMS key. Which architecture is the best fit?
Best answer: D
Explanation: The best design logs structured response evidence after redaction, not raw headers or transcripts. Redacting token patterns before log emission satisfies the no-token-storage constraint, while CloudWatch Logs or S3 with KMS and CloudTrail provide queryable, regional audit evidence. Governed GenAI logging should separate audit evidence from secrets. A regional application or GenAI gateway can capture the model response, guardrail result, retrieval source IDs, and tool-call status, then redact token-shaped fields and patterns before sending any event to CloudWatch Logs or S3. That preserves lineage and response-level evidence without persisting bearer tokens, cookies, or API keys. CloudWatch Logs supports near-real-time queries, S3 supports durable retention, KMS provides customer managed encryption, and CloudTrail correlation ties the audit record to Bedrock and related AWS activity. Encrypting full raw logs is the closest trap: encryption and IAM reduce access, but they do not meet a requirement that token values must not be stored.
Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance
A healthcare software team is selecting an FM approach for a production claims-coding assistant. The service must reach F1 score of at least 0.92, keep p95 latency under 2,500 ms, and use only governance-approved artifacts.
Exhibit: Pilot evaluation
| Approach | F1 score | p95 latency | Governance note |
|---|---|---|---|
| General-purpose zero-shot | 0.74 | 900 ms | No custom artifacts |
| Task-specialized extraction FM | 0.86 | 1,200 ms | No custom artifacts |
| General-purpose few-shot prompt | 0.91 | 3,800 ms | No custom artifacts |
| Customized FM on approved corpus | 0.94 | 1,900 ms | Version approval required |
Which next step is best?
Best answer: B
Explanation: The exhibit shows that the customized FM is the only candidate meeting both production quality and latency requirements. Its governance note does not block deployment because the corpus is approved and the remaining action is version approval. The core decision is matching the FM approach to measurable application requirements. General-purpose approaches are attractive when latency, cost, and simplicity matter, but the zero-shot result is far below the required F1 score and the few-shot prompt misses the latency target. The task-specialized extraction FM improves latency and domain fit but still falls short of the required F1 score. The customized FM reaches 0.94 F1, stays under 2,500 ms p95 latency, and uses an approved domain corpus, so the right next step is to move it through the controlled model lifecycle and approval process. The decisive exhibit detail is that only the customized FM satisfies both numeric thresholds while remaining governable.
Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications
A company runs a synchronous customer-support assistant on Amazon Bedrock. Output quality is acceptable, but during a predictable 9 AM peak, CloudWatch shows immediate ThrottlingException responses before token generation starts; off-peak latency is normal. The team must preserve the same model behavior and interactive user experience. Which optimization lever best fits this symptom?
Best answer: B
Explanation: The symptom is service capacity pressure during a predictable peak, not poor generation quality or nondeterminism. Because requests are throttled before generation starts and the user experience must remain synchronous, the best lever is capacity planning for peak throughput. Model parameter tuning is appropriate when output behavior needs adjustment, such as reducing randomness or limiting response length. Architecture changes such as asynchronous queues help when delayed processing is acceptable. Here, quality is already acceptable, off-peak behavior is normal, and failures occur immediately during a known traffic spike. That maps to capacity planning: forecast peak demand, request or configure sufficient throughput, and monitor utilization and throttling metrics. The key takeaway is to match the lever to the bottleneck: throughput errors require capacity action before changing prompts or model parameters.
Topic: Content Domain 2: Implementation and Integration
A financial services company is launching a production GenAI assistant that summarizes account-service notes by using a supported Amazon Bedrock foundation model. The workload has predictable weekday traffic, must meet a steady p95 latency target during business hours, and cannot tolerate inference throttling during call-center peaks. The team does not need custom model hosting or training, and all model invocations must stay on private AWS networking with centralized audit logging. Which architecture is the best fit?
Best answer: C
Explanation: The best fit is Amazon Bedrock Provisioned Throughput because the model is already supported in Bedrock and the workload needs predictable, reserved inference capacity. Private runtime access and AWS audit controls can satisfy the networking and governance requirements without operating model infrastructure. This scenario is primarily about matching the FM deployment pattern to production inference needs. On-demand Bedrock invocation through Lambda is well suited for intermittent or variable workloads, but it does not reserve capacity for predictable peak demand. Bedrock Provisioned Throughput provides dedicated throughput for a selected Bedrock model, helping meet steady latency and throttling requirements while keeping the team out of custom model hosting. Private connectivity through AWS networking controls and centralized logging can be added around the runtime path. SageMaker AI endpoints are a better fit when the team must host a custom model artifact, container, or deployment stack. Here, that added operational control is unnecessary and would overbuild the solution.
Topic: Content Domain 5: Testing, Validation, and Troubleshooting
A financial services company operates an Amazon Bedrock customer-support assistant that uses Prompt Management, Prompt Flows, and a Bedrock Knowledge Base over regulated documents. Weekly releases may change the system prompt, selected FM, retriever settings, or tool workflow definitions. Evaluation data and artifacts must stay in the application’s AWS account and Region, and the release process must prove before production that answer quality, grounding, and tool-call behavior have not regressed without using live customer traffic. Which architecture is the best fit?
Best answer: B
Explanation: The best design treats prompt, model, retrieval, and workflow configurations as release artifacts that must pass automated regression tests before production. A CodePipeline quality gate with Bedrock evaluations, custom RAG and tool tests, and stored evidence satisfies the preproduction, locality, and audit requirements. Continuous evaluation for a production GenAI application should be integrated into CI/CD, not left to production feedback. The candidate Bedrock prompt, model choice, retriever configuration, and Prompt Flow should be deployed to an isolated staging environment in the same account and Region. Step Functions or CodeBuild can run Amazon Bedrock evaluation jobs plus custom tests for retrieval grounding, expected citations, tool-call schemas, and task completion against versioned S3 test fixtures. Metrics and artifacts should be written to S3, CloudWatch, and pipeline execution records, with thresholds that fail the release and require approval before promotion. This creates repeatable regression evidence without exposing live users to unvalidated changes.
Topic: Content Domain 3: AI Safety, Security, and Governance
A company monitors a Bedrock-powered support assistant with CloudTrail, WAF, invocation logs, and guardrail metrics. During one hour, output quality and retrieval relevance remain stable, but CloudTrail shows InvokeModel calls using a newly created IAM access key from an unapproved IP range, bypassing API Gateway and the application role. Which signal category best describes this finding?
Best answer: D
Explanation: This finding is best treated as a security incident signal. The key evidence is not degraded model behavior or blocked content; it is unauthorized Bedrock access using a new IAM access key from an unapproved source outside the normal application path. GenAI monitoring signals should be classified by the control plane and runtime evidence they provide. Model drift usually appears as degraded quality, changed output distributions, or lower retrieval/grounding metrics. Misuse usually involves abusive or unintended user behavior through the application. Policy violations are typically guardrail, compliance, or content-rule failures. Here, CloudTrail shows direct InvokeModel activity with a new access key from an unapproved IP range, bypassing expected API Gateway and application-role controls. That pattern points to unauthorized access or credential compromise, so it should trigger incident response, key revocation, IAM investigation, and CloudTrail evidence preservation. The key takeaway is to separate content and quality signals from identity, network, and access-control anomalies.
Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance
A company exposes a support-triage API through API Gateway and Lambda. Lambda calls an Amazon Bedrock model and returns JSON that Step Functions routes with JSONPath. After a prompt edit, 8% of executions fail because the response includes prose before the JSON, and CloudWatch shows staging and production used different hard-coded prompt text. The team needs auditable prompt approval, prompt updates without Lambda code deployments, and automated regression checks against 150 golden cases before production. Which implementation should the team use?
Best answer: D
Explanation: The issue is prompt operations failure, not model capacity. Bedrock Prompt Management provides auditable, versioned prompt templates, while explicit JSON-only instructions, Lambda schema validation, and golden-set regression tests prevent malformed outputs from being promoted. For production prompt governance, prompt text should be managed as a versioned artifact rather than copied into Lambda code. A practical implementation is to store the prompt in Bedrock Prompt Management, create approved versions, have the runtime reference the approved version, and include clear nonconflicting instructions for JSON-only output. Lambda should validate the model response against the expected JSON schema before Step Functions consumes it. A promotion workflow should run the 150 golden cases and promote only versions that pass output-shape and quality checks. This solves the root causes: prompt drift, weak schema enforcement, and missing regression tests.
Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications
A company runs a RAG assistant with Amazon Bedrock Knowledge Bases backed by an Amazon OpenSearch Service vector index. The team must detect vector index degradation and retrieval-data quality regressions before users report issues. The solution must not log raw prompts or chunks, must not add latency to live API calls, and must publish alarms in CloudWatch. Which implementation best meets these requirements?
Best answer: D
Explanation: The best implementation is an out-of-band monitoring workflow that checks both the vector store and retrieval results. EventBridge and Lambda can run scheduled canary queries, collect OpenSearch index metrics, and publish sanitized custom metrics and alarms to CloudWatch without adding request latency or logging sensitive content. For a production RAG system, vector-store monitoring should combine infrastructure/index signals with retrieval-data quality signals. A scheduled EventBridge rule can invoke Lambda or Step Functions to inspect OpenSearch health, index size, deleted-document ratio, ingestion freshness, query latency, and error rates. The same workflow can run a small golden set of retrieval canaries through Bedrock Knowledge Bases, compare returned document IDs or scores to expected results, and publish aggregate custom metrics to CloudWatch. Logging only IDs, hashes, counts, and scores preserves the privacy constraint. This approach catches stale indexes, poor chunk retrieval, and index degradation before live users are affected.
Topic: Content Domain 2: Implementation and Integration
A team is building a production RAG assistant on AWS. Developers use Amazon Q Developer to generate Lambda code, IAM policy drafts, and Bedrock prompt templates. Management wants faster delivery but must preserve audit evidence, least-privilege access, automated regression tests, and production observability. Which principle best maps to this requirement?
Best answer: A
Explanation: Developer productivity tools are implementation accelerators, not control replacements. For production GenAI workloads, generated code, prompts, and policies must still pass the normal software delivery, security, testing, and observability gates before release. The core principle is assisted development with governed delivery. Tools such as Amazon Q Developer can help draft code, policies, prompts, and troubleshooting steps, but their output is not automatically production-ready or compliant. The team should treat generated artifacts like any other change: review them, test them, validate IAM least privilege, deploy through CI/CD gates, and monitor the application in production. This preserves delivery speed while maintaining accountability and operational readiness. The key takeaway is that productivity tooling improves developer workflow but does not replace architecture, testing, security, or operations controls.
Topic: Content Domain 5: Testing, Validation, and Troubleshooting
A company is releasing a new version of a RAG-based claims assistant that uses Amazon Bedrock Knowledge Bases and Bedrock Agents. Before routing more than 1% of traffic, the release process must run synthetic agent workflows, test answers for unsupported claims against retrieved sources, detect semantic drift from the last approved prompt/model baseline, and block promotion automatically with auditable artifacts kept in the workload account and Region. Which architecture best meets these requirements?
Best answer: D
Explanation: The requirement is a pre-release validation gate, not only production monitoring or manual review. A CI/CD gate orchestrated by Step Functions can exercise synthetic workflows, score grounding and quality, compare semantic similarity to an approved baseline, and block promotion automatically while storing audit artifacts. A production GenAI release should be validated against the deployed staging path that users will actually hit. CodePipeline provides the promotion gate, while Step Functions can orchestrate synthetic conversations through the staging Bedrock Agent and RAG stack. Bedrock grounding and model evaluation checks can score unsupported claims against retrieved sources, and Titan Embeddings can compare outputs or retrieved contexts with approved baselines to detect semantic drift. Writing results to KMS-encrypted S3 and CloudWatch in the workload account and Region supports auditability and governance. The key pattern is automated release gating with repeatable GenAI-specific quality signals before canary or broad release.
Topic: Content Domain 3: AI Safety, Security, and Governance
A financial services company is building a customer assistant. API Gateway starts an AWS Step Functions workflow that writes the turn to DynamoDB, retrieves context from an Amazon Bedrock knowledge base, and invokes a Bedrock model. The company must reject prompt-injection attempts and SSNs in user input before any DynamoDB write or retrieval call. Approved and rejected decisions must be logged for audit with minimal added latency. Which implementation meets these requirements?
ApplyGuardrail and branches.Best answer: A
Explanation: The key requirement is pre-processing input safety before any persistence or retrieval. Calling Amazon Bedrock Guardrails with ApplyGuardrail from an initial Lambda state lets the workflow synchronously block unsafe prompts and log the moderation outcome before continuing. For input safety controls, place moderation at the first trusted boundary of the workflow. A Lambda state can call the Amazon Bedrock Guardrails ApplyGuardrail API with the source set for input, using prompt-attack and sensitive-information policies. Step Functions can then branch: return a safe rejection response when the guardrail intervenes, or continue to the DynamoDB write, knowledge base retrieval, and model invocation when the input passes. The Lambda and Step Functions execution logs provide audit evidence in CloudWatch with minimal synchronous overhead. Applying controls only at generation time is too late because unsafe input could already be stored or used for retrieval.
Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance
A team is designing maintenance for a multi-tenant RAG application. The vector search engine is already selected. The team must track source URI, content hash, document version, ingestion status, tenant ID, ACL groups, and TTL for millions of documents. Ingestion workers need high-volume point reads and conditional updates without reading document text from the vector index. Which pattern best fits this requirement?
Best answer: C
Explanation: The requirement is about metadata and maintenance, not choosing the vector engine itself. DynamoDB is a strong fit for a metadata control plane that tracks document state, versions, ACL attributes, and TTL while embeddings remain in the selected vector store. For vector-backed GenAI applications, a common pattern is to separate retrieval data from operational metadata. The vector store handles embedding similarity search, while DynamoDB tracks document identity, source hashes, ingestion state, tenant ownership, ACL metadata, and lifecycle attributes such as TTL. This supports fast point lookups and conditional writes during ingestion or re-indexing workflows without overloading the vector index with control-plane responsibilities. Bedrock Knowledge Bases is better when the goal is managed RAG ingestion and retrieval orchestration; OpenSearch Service is better when the requirement is the vector search engine itself. Here, the key requirement is scalable metadata maintenance around an already selected vector store.
Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications
An internal benefits assistant uses Amazon Bedrock Knowledge Bases with documents in Amazon S3. After a weekly content refresh, users report answers that are fluent but cite outdated or irrelevant policy sections. Latency, error rate, and throttling are normal. The team wants an early warning signal before user complaints increase. Which observability check best maps to this symptom?
Best answer: A
Explanation: The symptom is a retrieval and grounding quality problem, not a service health or cost problem. A synthetic RAG canary with expected queries and known current sources can detect stale or irrelevant retrieval before users report bad citations. For RAG applications, fluent answers with stale or irrelevant citations usually point to retrieval quality, indexing freshness, metadata filtering, or grounding issues. The best operational signal is a synthetic check that runs representative questions after refreshes and records retrieved chunk IDs, source timestamps, citation matches, and groundedness or relevance scores. This gives an early warning when the knowledge base returns the wrong evidence even though the model invocation path is healthy. Latency and error dashboards remain useful for availability, but they do not validate whether the answer is grounded in the right documents.
Topic: Content Domain 2: Implementation and Integration
A company is building a synchronous customer-support assistant behind API Gateway and Lambda. The workflow uses Bedrock Prompt Management and a Bedrock-hosted FM that supports reserved capacity; no custom model artifact or inference container is required. Load tests with on-demand calls show intermittent throttling at expected business-hour traffic, and the SLA requires predictable low latency without queueing user requests. Which implementation best meets these requirements?
Best answer: C
Explanation: The workload uses a Bedrock-hosted model, has known steady demand, and requires predictable synchronous latency. Bedrock Provisioned Throughput is the right deployment choice because it reserves model capacity without moving to a custom SageMaker hosting model. For Amazon Bedrock applications, on-demand invocation is usually best for variable or low-volume traffic because it requires no reserved capacity. This scenario has already shown throttling at expected load and has a synchronous SLA, so adding retries would increase tail latency rather than guarantee capacity. Bedrock Provisioned Throughput lets the application call a provisioned model ARN from Lambda while keeping the managed Bedrock integration, prompt management, and operational model. SageMaker AI real-time endpoints are better when the team must host a custom or open-weight model artifact, control the inference container, or use SageMaker deployment workflows. The key distinction is reserved Bedrock capacity versus custom model hosting.
Topic: Content Domain 5: Testing, Validation, and Troubleshooting
A team is choosing between two Amazon Bedrock prompt configurations for a customer support summarization workflow. The release gate requires no critical factual accuracy or task-alignment defects.
Exhibit: Evaluation summary
Scale: 1=poor, 5=excellent
Config A: relevance 4.8, factual 2.0, consistency 2.3, fluency 4.9, alignment 2.1
Config B: relevance 4.3, factual 4.5, consistency 4.4, fluency 4.4, alignment 4.6
Reviewer note for A: invents 24x7 phone support and refund promises.
What is the best interpretation and next step?
Best answer: B
Explanation: Config A looks polished but fails the most important release criteria. The decisive exhibit details are the low factual accuracy and alignment scores plus the reviewer note that it invents support and refund policies. FM output evaluation should consider relevance, factual accuracy, consistency, fluency, and task alignment together, not just whether the response reads well. In the exhibit, Config A has strong relevance and fluency, but factual accuracy is 2.0, consistency is 2.3, and alignment is 2.1. The reviewer note confirms the defect: unsupported claims about 24x7 phone support and refunds. Because the release gate blocks critical factual or task-alignment defects, Config A should not be promoted. Config B has slightly lower relevance and fluency but strong factual accuracy, consistency, and alignment, making it the safer production candidate.
Topic: Content Domain 3: AI Safety, Security, and Governance
A financial services team uses Amazon Bedrock Prompt Management to version prompts for a customer-support summarization assistant. Before promoting candidate prompt v8, the team ran a Bedrock LLM-as-judge evaluation with a fixed, balanced test set. The judge used the same pass rubric for both prompt versions.
Exhibit: Evaluation results
| Evaluation slice | Baseline v7 pass | Candidate v8 pass | Difference |
|---|---|---|---|
| Overall | 88% | 91% | +3% |
| Age 18-34 | 89% | 94% | +5% |
| Age 35-54 | 88% | 91% | +3% |
| Age 55+ | 87% | 78% | -9% |
Which next step should the team take?
Best answer: C
Explanation: The candidate prompt improves the aggregate score but creates a clear fairness regression for one evaluation slice. A responsible release process should keep the prompt version unpromoted and use the versioned evaluation evidence to investigate and remediate that slice before rollout. Fairness evaluation should not rely only on aggregate quality metrics. The exhibit shows v8 improves overall pass rate from 88% to 91%, but the Age 55+ slice drops from 87% to 78%. With Bedrock Prompt Management, the team can keep v8 as a candidate version while using Prompt Flows or a controlled A/B evaluation with the same rubric to inspect failed examples, rerun slice-level tests, and add human review if needed. Slice metadata is appropriate for evaluating fairness, even if the model should not use that attribute to make user-specific decisions. The key takeaway is that aggregate improvement does not justify release when a protected or sensitive group regresses materially.
Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance
Which FM approach is best defined as adapting a base model with organization-approved domain data or task examples, then governing the resulting model version because off-the-shelf models do not meet the required accuracy or domain specificity?
Best answer: B
Explanation: A customized FM approach modifies or adapts a base model to better match a domain, style, or task requirement. It is appropriate when prompt engineering, RAG, or an off-the-shelf task model cannot meet accuracy, domain specificity, or governance needs. The core distinction is whether the model is used as-is, selected for a built-in task specialization, or adapted for the organization. A general-purpose FM is broad and flexible but may not capture specialized terminology or output behavior. A task-specialized FM is already optimized for a known task, such as embeddings or summarization. A customized FM is created by adapting a base model with approved data or examples and then controlling the resulting model lifecycle, such as through model versioning and approval workflows. The key signal in the stem is that the model itself must be adapted and governed, not merely prompted or supplemented with retrieved context.
Use this map after the sample questions to connect individual items to the AWS Generative AI Developer Professional decisions these practice samples test.
flowchart LR
S1["Business GenAI requirement"] --> S2
S2["Choose Bedrock model and retrieval pattern"] --> S3
S3["Design prompt guardrail and data controls"] --> S4
S4["Integrate through APIs workflows and agents"] --> S5
S5["Evaluate safety quality and latency"] --> S6
S6["Operate cost monitoring and rollback"]
| Cue | What to remember |
|---|---|
| Bedrock fit | Use Bedrock when managed foundation models, guardrails, agents, knowledge bases, and prompt workflows fit the workload. |
| RAG vs customization | Use retrieval when the answer needs current enterprise context; customize only when model behavior or domain fit requires it. |
| Security | Protect prompts, retrieved content, model access, IAM, KMS keys, and data boundaries. |
| Evaluation | Check factuality, relevance, safety, latency, cost, and slice-level regressions before release. |
| Operations | Plan throttling, provisioned throughput, traces, prompt versions, rollbacks, and feedback loops. |