AWS AIP-C01 Practice Test: GenAI Developer Pro

Prepare for AWS Certified Generative AI Developer - Professional (AIP-C01) with free sample questions, a full-length diagnostic, topic drills, timed practice, Amazon Bedrock, RAG, agents, safety, governance, optimization, production troubleshooting, and detailed explanations in IT Mastery.

AIP-C01 is AWS Certified Generative AI Developer - Professional. It validates advanced technical expertise in building and deploying production-ready generative AI solutions on AWS, especially solutions that integrate foundation models into applications and business workflows using services such as Amazon Bedrock.

IT Mastery practice for AIP-C01 is live now. Use this page to start the web simulator, review the exam snapshot, work through 24 public sample questions, and continue into full IT Mastery practice with the same IT Mastery account on web, iOS, iPadOS, macOS, or Android.

Interactive Practice Center

Start a practice session for AWS Certified Generative AI Developer - Professional (AIP-C01) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.

Open Full App in a New Tab

A small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same app-family account they use on web and mobile.

Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same IT Mastery account across web and mobile.

Free diagnostic: Try the 75-question AWS AIP-C01 full-length practice exam before subscribing. Use it as one professional-level GenAI baseline, then return to IT Mastery for timed mocks, domain drills, explanations, and the full AIP-C01 question bank.

What this AIP-C01 practice page gives you

a direct route into IT Mastery practice for AIP-C01
24 on-page sample questions with detailed explanations
topic drills and mixed sets across Bedrock, RAG, agents, safety, governance, optimization, and troubleshooting
a clear free-preview path before you subscribe
the same IT Mastery account across web and mobile

Who AIP-C01 is for

developers building production-grade generative AI applications on AWS
candidates with AWS application experience who need deeper Amazon Bedrock, RAG, agent, governance, monitoring, and optimization judgment
teams moving from AI proofs of concept into secure, observable, cost-aware production GenAI systems

AIP-C01 exam snapshot

Vendor: AWS
Official exam name: AWS Certified Generative AI Developer - Professional
Exam code: AIP-C01
Category: Professional
Items: 75 total, including 65 scored and 10 unscored
Exam time: 180 minutes
Question types: multiple-choice and multiple-response
Passing score: 750 scaled
Current IT Mastery status: live practice available

AIP-C01 questions reward production-grade GenAI decisions: choosing the right foundation model integration pattern, grounding and evaluating outputs, securing data and identities, controlling cost and latency, and troubleshooting deployed AI workflows.

Topic coverage for AIP-C01

Domain	Weight
Foundation Model Integration, Data Management, and Compliance	31%
Implementation and Integration	26%
AI Safety, Security, and Governance	20%
Operational Efficiency and Optimization for GenAI Applications	12%
Testing, Validation, and Troubleshooting	11%

AIP-C01 production GenAI decision filters

Use these filters before choosing between two advanced Bedrock or GenAI architecture options:

Grounding strategy: decide whether the workload needs prompt engineering, RAG, knowledge bases, fine-tuning/customization, agents, or a simpler managed AI service.
Data and compliance boundary: protect customer data, training data, vector stores, logs, and model inputs with the right IAM, encryption, retention, and residency controls.
Safety and governance: apply guardrails, content filters, human review, evaluation, auditability, and responsible-AI controls where output risk is material.
Operational efficiency: balance latency, throughput, token cost, model choice, caching, batching, prompt size, and endpoint or service limits.
Troubleshooting signal: separate retrieval failure, prompt failure, model limitation, permissions failure, evaluation weakness, and downstream integration defects.

AIP-C01 readiness map

Area	What strong readiness looks like
FM integration, data, and compliance	You can design grounded GenAI applications that respect data boundaries, compliance expectations, and model-access constraints.
Implementation and integration	You can connect Bedrock, agents, tools, APIs, vector stores, prompt workflows, and application code into production workflows.
Safety, security, and governance	You can apply guardrails, IAM, encryption, monitoring, human oversight, and responsible-AI controls to risky outputs.
Operational efficiency	You can optimize latency, cost, throughput, model selection, token usage, and scaling without weakening quality or safety.
Testing and troubleshooting	You can diagnose bad retrieval, poor prompts, unsafe outputs, model mismatch, missing permissions, and weak evaluation design.

How to use the AIP-C01 simulator efficiently

Start with foundation-model integration, data-management, and compliance drills so you can separate architecture fit from implementation detail.
Review every miss until you can explain why the best answer handles grounding, access, safety, and operational control better than the distractors.
Move into mixed sets once you can switch between RAG, knowledge bases, agents, prompt workflows, evaluation, and observability without losing the scenario’s priority.
Finish with timed runs so the 180-minute professional-level pace feels normal before test day.

Final 7-day AIP-C01 practice sequence

Day	Practice focus
7	Take the free full-length diagnostic and separate misses into architecture, implementation, governance, optimization, and troubleshooting.
6	Drill foundation-model integration, RAG, vector search, knowledge bases, data handling, and compliance boundaries.
5	Drill agents, tool use, prompt workflows, API integration, orchestration, and application implementation scenarios.
4	Drill safety, guardrails, responsible AI, IAM, encryption, monitoring, and audit controls.
3	Drill cost, latency, token usage, model selection, scaling, and operational optimization decisions.
2	Complete a timed mixed set and explain the production failure mode or trade-off behind each miss.
1	Review only weak GenAI patterns and troubleshooting signals; avoid late memorization of unfamiliar feature details.

When AIP-C01 practice is enough

If you can score above roughly 75% on several unseen mixed attempts and explain how each answer handles grounding, safety, operations, and cost, you are ready to treat the exam as a reasoning test rather than a memorization exercise. More practice should improve production GenAI judgment, not just recognition of repeated scenarios.

Focused sample questions

Use these child pages when you want focused IT Mastery practice before returning to mixed sets and timed mocks.

Free study resources

Need concept review first? Read the AWS AIP-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks, topic drills, and full IT Mastery practice.

Free preview vs premium

Free preview: a smaller web set so you can validate the question style and explanation depth.
Premium: the full AIP-C01 practice bank, focused drills, mixed sets, timed mock exams, detailed explanations, and progress tracking across web and mobile.

Good next pages after AIP-C01

AIF-C01 if you need AWS AI and GenAI fundamentals before the professional route
MLA-C01 if you also need machine learning lifecycle and deployment practice
DEA-C01 if your GenAI systems depend on governed data pipelines
AWS certification hub if you are still comparing AWS routes

Official sources

24 AIP-C01 sample questions with detailed explanations

These are original IT Mastery practice questions aligned to AWS generative AI solution design, Amazon Bedrock service selection, evaluation, security, deployment, and operational decisions. They are not AWS exam questions and are not copied from any exam sponsor. Use them to check readiness here, then continue in IT Mastery with mixed sets, topic drills, and timed mocks.

Question 1

Topic: Content Domain 2: Implementation and Integration

A developer is building a browser-based chat application on Amazon Bedrock. After the user submits a prompt over HTTPS, the UI must render partial model output as it is generated. The connection path supports long-lived HTTP responses but blocks WebSocket upgrades, and the application does not need bidirectional messages during generation. Which integration pattern best matches this requirement?

A. Use Bedrock streaming and forward chunks with SSE
B. Use asynchronous invocation with SQS polling
C. Use a WebSocket API for token delivery
D. Use non-streaming invocation with response buffering

Best answer: A

Explanation: The requirement is real-time, one-way token delivery to a browser over an HTTP-compatible path. Amazon Bedrock streaming APIs can emit partial output, and server-sent events or chunked transfer can relay those chunks to the client without requiring WebSocket support. For browser chat responses that need progressive rendering, the key concept is streaming the foundation model output and preserving that stream through the application layer. Bedrock streaming APIs such as ConverseStream or InvokeModelWithResponseStream return incremental chunks from the model. The backend can translate those chunks into server-sent events or an HTTP chunked response so the browser can update the UI as data arrives. This fits the stated network constraint because SSE uses standard HTTP rather than a WebSocket upgrade. WebSockets are useful when bidirectional, low-latency messaging is required, but they are not the best fit when upgrades are blocked and only server-to-client updates are needed.

Question 2

Topic: Content Domain 5: Testing, Validation, and Troubleshooting

A company operates a custom RAG support assistant. S3 manuals are chunked, embedded by an ingestion Lambda function, and stored in an OpenSearch Service vector index. A query Lambda function embeds user questions, runs k-NN search, and calls Amazon Bedrock. Since a release, answers are hallucinated and Recall@5 dropped from 0.84 to 0.46. Logs show the new ingestion job uses a different embedding model and normalization step, while the query Lambda function and 70% of existing vectors still use the previous embedding model. Both models produce the configured vector dimension, so searches do not error.

Which change fixes the root cause with the smallest safe change?

A. Increase k-NN results and pass more chunks to the FM
B. Replace the FM with a larger Bedrock model
C. Rebuild one vector index with a single embedding pipeline
D. Add stricter product metadata filters only

Best answer: C

Explanation: The retrieval system is using mixed embedding spaces. Even with matching vector dimensions, similarity scores become unreliable when documents and queries are embedded with different models or preprocessing. The smallest safe fix is to rebuild the index with one approved embedding pipeline and cut over consistently. Symptom: hallucinated answers and a sharp Recall@5 drop appeared after the ingestion release, while searches still returned results. Root cause: the vector index now contains embeddings created with different model and preprocessing configurations, so k-NN similarity is no longer comparing vectors from one semantic space. Fix: create a new index, re-embed all chunks with the same embedding model and normalization used by the query path, then cut over the application to that index. Increasing retrieval volume or changing the generator does not repair the retrieval signal.

Question 3

Topic: Content Domain 3: AI Safety, Security, and Governance

A development team is defining safety controls for a customer support assistant that uses Amazon Bedrock. The assistant must block prompt injection attempts before retrieval, enforce content policy during model invocation, and redact sensitive values before responses are returned. Which principle best maps to this requirement?

A. Defense in depth across the request lifecycle
B. Single guardrail enforcement at inference time
C. Post-deployment monitoring only
D. Static prompt templating

Best answer: A

Explanation: The requirement calls for layered safety controls across the full GenAI flow. Defense in depth applies preprocessing checks, model-invocation controls such as Bedrock Guardrails, and post-processing validation or redaction so one missed detection does not become a user-facing failure. For GenAI safety, defense in depth means placing complementary controls at multiple stages: input validation and prompt-injection screening before retrieval or tool use, policy enforcement during model invocation, and output validation, schema checks, citation checks, or PII redaction before returning a response. This pattern is stronger than relying only on a single guardrail because different controls catch different failure modes and provide better governance evidence. The key takeaway is to design safety as a lifecycle pattern, not as one isolated runtime filter.

Question 4

Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance

A company runs an employee benefits assistant on Amazon Bedrock. The team is reviewing a high-latency conversation turn and must reduce token use without degrading continuity or retaining unnecessary sensitive data.

Exhibit: Bedrock invocation context

Session: emp-4421
Prompt tokens: 15,920 of 16,000
Included context:
- Raw 12-turn transcript: 8,600 tokens
- User preference: "brief answers"
- Identity-check SSN fragment: ***-**-1234; PII detected
- HR policy chunks: 5 chunks from same parental-leave guide, 6,900 tokens
User asks: "Which forms do I need for parental leave?"

What is the best next step?

A. Persist the full transcript and SSN for future personalization
B. Discard all prior context and answer only from the current question
C. Summarize transcript, store preference, retrieve policy chunks, discard SSN
D. Embed the full session into the vector index after each turn

Best answer: C

Explanation: The best action classifies each context type by its value and risk. The raw transcript should be summarized to preserve continuity, the small preference can be stored, policy facts should be retrieved as needed, and the SSN fragment should be discarded after identity verification. Conversational context should be managed by separating durable user experience signals, short-term conversation state, external knowledge, and sensitive one-time data. The decisive exhibit details are the 15,920-token prompt, the 8,600-token raw transcript, the reusable preference, the 6,900-token policy chunks, and the PII detection on the SSN fragment. A compact conversation summary reduces tokens while preserving continuity. The preference is small and useful for future turns, so it can be stored as a session or profile attribute under policy. HR policy content should be retrieved from the authoritative knowledge source instead of repeatedly carrying large chunks. The SSN fragment has no ongoing prompt value and creates privacy risk, so it should be discarded or redacted after the identity workflow.

Question 5

Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications

A company is tuning inference settings for an Amazon Bedrock knowledge assistant that answers HR policy questions from retrieved documents. Compliance requires the same question and retrieved context to produce nearly the same wording and no speculative alternatives. Which parameter tuning principle best maps to this requirement?

A. Use high temperature with wide top-p and top-k sampling.
B. Use low temperature with wide top-p and top-k sampling.
C. Increase maximum output tokens and leave sampling unchanged.
D. Use low temperature with narrow top-p and top-k sampling.

Best answer: D

Explanation: The requirement prioritizes determinism over creativity. Lowering temperature and constraining top-p and top-k reduce sampling randomness, making outputs more repeatable when the same prompt and retrieved context are used. Temperature controls how random token selection is during generation. Top-p and top-k further limit the token candidates the model may sample from. For an HR policy assistant with audit and compliance needs, the better pattern is to reduce creativity by using a low temperature and narrower sampling controls. This does not guarantee factuality by itself, but it helps make the model less variable when the grounding context is unchanged. Creative drafting or brainstorming workloads would typically use higher temperature and broader sampling instead.

Question 6

Topic: Content Domain 2: Implementation and Integration

A team is building several Amazon Bedrock-based agents that need to call the same internal CRM, ticketing, and policy lookup tools. The team wants a consistent pattern for discovering and invoking tools without writing separate adapters for each agent framework. Which statement best defines the role of an MCP client library?

A. Store embeddings for retrieval-augmented generation queries.
B. Connect agents to MCP servers for standardized tool discovery and invocation.
C. Evaluate model outputs as an LLM-as-judge.
D. Enforce content filters before and after model calls.

Best answer: B

Explanation: An MCP client library provides the application-side interface for Model Context Protocol. It lets FMs or agents access tools, resources, or prompts exposed by MCP servers through a consistent protocol rather than custom integration code for each tool. Model Context Protocol separates the agent application from tool implementation details. In this pattern, tools such as CRM lookups or ticket updates are exposed by MCP servers, and the agent uses an MCP client library to discover available capabilities and invoke them through a standard interface. This improves portability across agent frameworks and reduces custom adapter logic. It does not replace retrieval storage, evaluation, or safety controls; those are separate responsibilities in a production GenAI application.

Question 7

Topic: Content Domain 5: Testing, Validation, and Troubleshooting

An enterprise is preparing to promote version 2025.03 of an Amazon Bedrock RAG support assistant from a 5% canary to broad release. The CI/CD pipeline runs synthetic workflows, hallucination tests, and semantic drift checks against approved baselines.

Exhibit: Pre-release validation summary

Check	Release gate	Candidate result	Status
Synthetic workflow success	>=95%	97%	Pass
Unsupported claim rate	<=2%	6.8%	Fail
Semantic similarity to baseline intents	>=0.90	0.82	Fail

What is the best next step?

A. Block promotion and remediate failing quality gates
B. Promote broadly because workflow success passed
C. Expand the canary and rely on live feedback
D. Rebuild the vector index and skip prompt validation

Best answer: A

Explanation: The validation run has multiple hard gate failures. Even though synthetic workflow success passed, the unsupported claim rate and semantic similarity checks failed. Broad release should be blocked until the team remediates the deployment and reruns validation. Pre-release validation should treat hallucination and semantic drift checks as automated quality gates, not as advisory metrics after launch. In the exhibit, the canary candidate passes workflow success at 97%, but it fails unsupported claim rate at 6.8% against a <=2% gate and fails semantic similarity at 0.82 against a >=0.90 gate. That means the deployment could produce ungrounded responses and has drifted from approved intent behavior. The safe next step is to stop promotion, investigate the prompt or retrieval changes, remediate, and rerun the validation pipeline. Monitoring a larger rollout would expose more users to known failures instead of preventing them.

Question 8

Topic: Content Domain 3: AI Safety, Security, and Governance

A SaaS company is deploying a customer-support assistant on Amazon Bedrock in us-east-1. The assistant uses a Bedrock Knowledge Base backed by OpenSearch Serverless and calls internal ticket APIs with OAuth bearer tokens. Security requires response-level audit evidence within 5 minutes, including request ID, tenant ID, model ID, guardrail action, retrieval source IDs, tool name/status, and redacted response text. OAuth bearer tokens, session cookies, and API keys must not be stored in logs, and audit data must stay in us-east-1 encrypted with a customer managed KMS key. Which architecture is the best fit?

A. Enable full body and header logging with KMS-restricted access.
B. Store only CloudTrail events and tool status metrics.
C. Redact daily after cross-Region raw transcript export.
D. Redact tokens before emitting encrypted regional response logs.

Best answer: D

Explanation: The best design logs structured response evidence after redaction, not raw headers or transcripts. Redacting token patterns before log emission satisfies the no-token-storage constraint, while CloudWatch Logs or S3 with KMS and CloudTrail provide queryable, regional audit evidence. Governed GenAI logging should separate audit evidence from secrets. A regional application or GenAI gateway can capture the model response, guardrail result, retrieval source IDs, and tool-call status, then redact token-shaped fields and patterns before sending any event to CloudWatch Logs or S3. That preserves lineage and response-level evidence without persisting bearer tokens, cookies, or API keys. CloudWatch Logs supports near-real-time queries, S3 supports durable retention, KMS provides customer managed encryption, and CloudTrail correlation ties the audit record to Bedrock and related AWS activity. Encrypting full raw logs is the closest trap: encryption and IAM reduce access, but they do not meet a requirement that token values must not be stored.

Question 9

Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance

A healthcare software team is selecting an FM approach for a production claims-coding assistant. The service must reach F1 score of at least 0.92, keep p95 latency under 2,500 ms, and use only governance-approved artifacts.

Exhibit: Pilot evaluation

Approach	F1 score	p95 latency	Governance note
General-purpose zero-shot	0.74	900 ms	No custom artifacts
Task-specialized extraction FM	0.86	1,200 ms	No custom artifacts
General-purpose few-shot prompt	0.91	3,800 ms	No custom artifacts
Customized FM on approved corpus	0.94	1,900 ms	Version approval required

Which next step is best?

A. Deploy the task-specialized extraction FM
B. Proceed with the customized FM lifecycle
C. Use the few-shot general-purpose prompt
D. Use zero-shot prompting with postprocessing

Best answer: B

Explanation: The exhibit shows that the customized FM is the only candidate meeting both production quality and latency requirements. Its governance note does not block deployment because the corpus is approved and the remaining action is version approval. The core decision is matching the FM approach to measurable application requirements. General-purpose approaches are attractive when latency, cost, and simplicity matter, but the zero-shot result is far below the required F1 score and the few-shot prompt misses the latency target. The task-specialized extraction FM improves latency and domain fit but still falls short of the required F1 score. The customized FM reaches 0.94 F1, stays under 2,500 ms p95 latency, and uses an approved domain corpus, so the right next step is to move it through the controlled model lifecycle and approval process. The decisive exhibit detail is that only the customized FM satisfies both numeric thresholds while remaining governable.

Question 10

Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications

A company runs a synchronous customer-support assistant on Amazon Bedrock. Output quality is acceptable, but during a predictable 9 AM peak, CloudWatch shows immediate ThrottlingException responses before token generation starts; off-peak latency is normal. The team must preserve the same model behavior and interactive user experience. Which optimization lever best fits this symptom?

A. Lower the temperature parameter
B. Capacity planning for peak throughput
C. Move all requests to an asynchronous queue
D. Run hallucination detection after each response

Best answer: B

Explanation: The symptom is service capacity pressure during a predictable peak, not poor generation quality or nondeterminism. Because requests are throttled before generation starts and the user experience must remain synchronous, the best lever is capacity planning for peak throughput. Model parameter tuning is appropriate when output behavior needs adjustment, such as reducing randomness or limiting response length. Architecture changes such as asynchronous queues help when delayed processing is acceptable. Here, quality is already acceptable, off-peak behavior is normal, and failures occur immediately during a known traffic spike. That maps to capacity planning: forecast peak demand, request or configure sufficient throughput, and monitor utilization and throttling metrics. The key takeaway is to match the lever to the bottleneck: throughput errors require capacity action before changing prompts or model parameters.

Question 11

Topic: Content Domain 2: Implementation and Integration

A financial services company is launching a production GenAI assistant that summarizes account-service notes by using a supported Amazon Bedrock foundation model. The workload has predictable weekday traffic, must meet a steady p95 latency target during business hours, and cannot tolerate inference throttling during call-center peaks. The team does not need custom model hosting or training, and all model invocations must stay on private AWS networking with centralized audit logging. Which architecture is the best fit?

A. Invoke Bedrock on demand from Lambda with retry backoff
B. Deploy the FM to a SageMaker AI real-time endpoint
C. Use Bedrock Provisioned Throughput with private runtime access
D. Run SageMaker Processing jobs for scheduled summarization

Best answer: C

Explanation: The best fit is Amazon Bedrock Provisioned Throughput because the model is already supported in Bedrock and the workload needs predictable, reserved inference capacity. Private runtime access and AWS audit controls can satisfy the networking and governance requirements without operating model infrastructure. This scenario is primarily about matching the FM deployment pattern to production inference needs. On-demand Bedrock invocation through Lambda is well suited for intermittent or variable workloads, but it does not reserve capacity for predictable peak demand. Bedrock Provisioned Throughput provides dedicated throughput for a selected Bedrock model, helping meet steady latency and throttling requirements while keeping the team out of custom model hosting. Private connectivity through AWS networking controls and centralized logging can be added around the runtime path. SageMaker AI endpoints are a better fit when the team must host a custom model artifact, container, or deployment stack. Here, that added operational control is unnecessary and would overbuild the solution.

Question 12

Topic: Content Domain 5: Testing, Validation, and Troubleshooting

A financial services company operates an Amazon Bedrock customer-support assistant that uses Prompt Management, Prompt Flows, and a Bedrock Knowledge Base over regulated documents. Weekly releases may change the system prompt, selected FM, retriever settings, or tool workflow definitions. Evaluation data and artifacts must stay in the application’s AWS account and Region, and the release process must prove before production that answer quality, grounding, and tool-call behavior have not regressed without using live customer traffic. Which architecture is the best fit?

A. Run a production canary and promote the highest user-feedback score.
B. Add a CodePipeline preproduction quality gate with automated evaluations.
C. Use Bedrock Guardrails and CloudTrail as the release approval gate.
D. Fine-tune a new FM with SageMaker Pipelines for each release.

Best answer: B

Explanation: The best design treats prompt, model, retrieval, and workflow configurations as release artifacts that must pass automated regression tests before production. A CodePipeline quality gate with Bedrock evaluations, custom RAG and tool tests, and stored evidence satisfies the preproduction, locality, and audit requirements. Continuous evaluation for a production GenAI application should be integrated into CI/CD, not left to production feedback. The candidate Bedrock prompt, model choice, retriever configuration, and Prompt Flow should be deployed to an isolated staging environment in the same account and Region. Step Functions or CodeBuild can run Amazon Bedrock evaluation jobs plus custom tests for retrieval grounding, expected citations, tool-call schemas, and task completion against versioned S3 test fixtures. Metrics and artifacts should be written to S3, CloudWatch, and pipeline execution records, with thresholds that fail the release and require approval before promotion. This creates repeatable regression evidence without exposing live users to unvalidated changes.

Question 13

Topic: Content Domain 3: AI Safety, Security, and Governance

A company monitors a Bedrock-powered support assistant with CloudTrail, WAF, invocation logs, and guardrail metrics. During one hour, output quality and retrieval relevance remain stable, but CloudTrail shows InvokeModel calls using a newly created IAM access key from an unapproved IP range, bypassing API Gateway and the application role. Which signal category best describes this finding?

A. Model drift
B. User misuse
C. Policy violation
D. Security incident

Best answer: D

Explanation: This finding is best treated as a security incident signal. The key evidence is not degraded model behavior or blocked content; it is unauthorized Bedrock access using a new IAM access key from an unapproved source outside the normal application path. GenAI monitoring signals should be classified by the control plane and runtime evidence they provide. Model drift usually appears as degraded quality, changed output distributions, or lower retrieval/grounding metrics. Misuse usually involves abusive or unintended user behavior through the application. Policy violations are typically guardrail, compliance, or content-rule failures. Here, CloudTrail shows direct InvokeModel activity with a new access key from an unapproved IP range, bypassing expected API Gateway and application-role controls. That pattern points to unauthorized access or credential compromise, so it should trigger incident response, key revocation, IAM investigation, and CloudTrail evidence preservation. The key takeaway is to separate content and quality signals from identity, network, and access-control anomalies.

Question 14

Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance

A company exposes a support-triage API through API Gateway and Lambda. Lambda calls an Amazon Bedrock model and returns JSON that Step Functions routes with JSONPath. After a prompt edit, 8% of executions fail because the response includes prose before the JSON, and CloudWatch shows staging and production used different hard-coded prompt text. The team needs auditable prompt approval, prompt updates without Lambda code deployments, and automated regression checks against 150 golden cases before production. Which implementation should the team use?

A. Reference the latest draft prompt at runtime and retry JSON parse failures.
B. Use Bedrock Guardrails to enforce the JSON shape and keep prompts hard-coded.
C. Fine-tune a SageMaker AI endpoint and replace prompt changes with model deployments.
D. Use Bedrock Prompt Management versions with JSON-only instructions, Lambda schema validation, and gated regression promotion.

Best answer: D

Explanation: The issue is prompt operations failure, not model capacity. Bedrock Prompt Management provides auditable, versioned prompt templates, while explicit JSON-only instructions, Lambda schema validation, and golden-set regression tests prevent malformed outputs from being promoted. For production prompt governance, prompt text should be managed as a versioned artifact rather than copied into Lambda code. A practical implementation is to store the prompt in Bedrock Prompt Management, create approved versions, have the runtime reference the approved version, and include clear nonconflicting instructions for JSON-only output. Lambda should validate the model response against the expected JSON schema before Step Functions consumes it. A promotion workflow should run the 150 golden cases and promote only versions that pass output-shape and quality checks. This solves the root causes: prompt drift, weak schema enforcement, and missing regression tests.

Question 15

Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications

A company runs a RAG assistant with Amazon Bedrock Knowledge Bases backed by an Amazon OpenSearch Service vector index. The team must detect vector index degradation and retrieval-data quality regressions before users report issues. The solution must not log raw prompts or chunks, must not add latency to live API calls, and must publish alarms in CloudWatch. Which implementation best meets these requirements?

A. Log every retrieved chunk from the production Lambda to CloudWatch Logs.
B. Use API Gateway latency alarms as the retrieval-quality signal.
C. Run SageMaker Model Monitor on the Bedrock foundation model.
D. Schedule retrieval canaries and index checks with EventBridge and Lambda.

Best answer: D

Explanation: The best implementation is an out-of-band monitoring workflow that checks both the vector store and retrieval results. EventBridge and Lambda can run scheduled canary queries, collect OpenSearch index metrics, and publish sanitized custom metrics and alarms to CloudWatch without adding request latency or logging sensitive content. For a production RAG system, vector-store monitoring should combine infrastructure/index signals with retrieval-data quality signals. A scheduled EventBridge rule can invoke Lambda or Step Functions to inspect OpenSearch health, index size, deleted-document ratio, ingestion freshness, query latency, and error rates. The same workflow can run a small golden set of retrieval canaries through Bedrock Knowledge Bases, compare returned document IDs or scores to expected results, and publish aggregate custom metrics to CloudWatch. Logging only IDs, hashes, counts, and scores preserves the privacy constraint. This approach catches stale indexes, poor chunk retrieval, and index degradation before live users are affected.

Question 16

Topic: Content Domain 2: Implementation and Integration

A team is building a production RAG assistant on AWS. Developers use Amazon Q Developer to generate Lambda code, IAM policy drafts, and Bedrock prompt templates. Management wants faster delivery but must preserve audit evidence, least-privilege access, automated regression tests, and production observability. Which principle best maps to this requirement?

A. Use AI-assisted development with gated SDLC validation.
B. Deploy generated artifacts directly to production.
C. Replace regression testing with assistant explanations.
D. Restrict AI assistants to nonproduction prototypes.

Best answer: A

Explanation: Developer productivity tools are implementation accelerators, not control replacements. For production GenAI workloads, generated code, prompts, and policies must still pass the normal software delivery, security, testing, and observability gates before release. The core principle is assisted development with governed delivery. Tools such as Amazon Q Developer can help draft code, policies, prompts, and troubleshooting steps, but their output is not automatically production-ready or compliant. The team should treat generated artifacts like any other change: review them, test them, validate IAM least privilege, deploy through CI/CD gates, and monitor the application in production. This preserves delivery speed while maintaining accountability and operational readiness. The key takeaway is that productivity tooling improves developer workflow but does not replace architecture, testing, security, or operations controls.

Question 17

Topic: Content Domain 5: Testing, Validation, and Troubleshooting

A company is releasing a new version of a RAG-based claims assistant that uses Amazon Bedrock Knowledge Bases and Bedrock Agents. Before routing more than 1% of traffic, the release process must run synthetic agent workflows, test answers for unsupported claims against retrieved sources, detect semantic drift from the last approved prompt/model baseline, and block promotion automatically with auditable artifacts kept in the workload account and Region. Which architecture best meets these requirements?

A. Route 10% of production traffic by using AppConfig and roll back if CloudWatch latency, errors, or user feedback metrics degrade.
B. Use SageMaker Experiments to fine-tune a replacement model and require lower validation loss before updating the Bedrock agent alias.
C. Attach Bedrock Guardrails to the production agent and require manual approval after reviewers sample chat transcripts.
D. Use a CodePipeline gate with Step Functions to run synthetic staging tests, apply Bedrock grounding/evaluation checks, compare Titan Embeddings to baselines, store evidence in encrypted S3, and promote only on passing thresholds.

Best answer: D

Explanation: The requirement is a pre-release validation gate, not only production monitoring or manual review. A CI/CD gate orchestrated by Step Functions can exercise synthetic workflows, score grounding and quality, compare semantic similarity to an approved baseline, and block promotion automatically while storing audit artifacts. A production GenAI release should be validated against the deployed staging path that users will actually hit. CodePipeline provides the promotion gate, while Step Functions can orchestrate synthetic conversations through the staging Bedrock Agent and RAG stack. Bedrock grounding and model evaluation checks can score unsupported claims against retrieved sources, and Titan Embeddings can compare outputs or retrieved contexts with approved baselines to detect semantic drift. Writing results to KMS-encrypted S3 and CloudWatch in the workload account and Region supports auditability and governance. The key pattern is automated release gating with repeatable GenAI-specific quality signals before canary or broad release.

Question 18

Topic: Content Domain 3: AI Safety, Security, and Governance

A financial services company is building a customer assistant. API Gateway starts an AWS Step Functions workflow that writes the turn to DynamoDB, retrieves context from an Amazon Bedrock knowledge base, and invokes a Bedrock model. The company must reject prompt-injection attempts and SSNs in user input before any DynamoDB write or retrieval call. Approved and rejected decisions must be logged for audit with minimal added latency. Which implementation meets these requirements?

A. Start the workflow with a Lambda that calls ApplyGuardrail and branches.
B. Attach a Bedrock Guardrail only to the final model invocation.
C. Store the prompt in DynamoDB, then run EventBridge moderation asynchronously.
D. Use OpenSearch metadata filters to exclude sensitive retrieved documents.

Best answer: A

Explanation: The key requirement is pre-processing input safety before any persistence or retrieval. Calling Amazon Bedrock Guardrails with ApplyGuardrail from an initial Lambda state lets the workflow synchronously block unsafe prompts and log the moderation outcome before continuing. For input safety controls, place moderation at the first trusted boundary of the workflow. A Lambda state can call the Amazon Bedrock Guardrails ApplyGuardrail API with the source set for input, using prompt-attack and sensitive-information policies. Step Functions can then branch: return a safe rejection response when the guardrail intervenes, or continue to the DynamoDB write, knowledge base retrieval, and model invocation when the input passes. The Lambda and Step Functions execution logs provide audit evidence in CloudWatch with minimal synchronous overhead. Applying controls only at generation time is too late because unsafe input could already be stored or used for retrieval.

Question 19

Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance

A team is designing maintenance for a multi-tenant RAG application. The vector search engine is already selected. The team must track source URI, content hash, document version, ingestion status, tenant ID, ACL groups, and TTL for millions of documents. Ingestion workers need high-volume point reads and conditional updates without reading document text from the vector index. Which pattern best fits this requirement?

A. Use Amazon Bedrock Knowledge Bases for managed RAG
B. Use Amazon OpenSearch Service as the metadata control plane
C. Use DynamoDB as the metadata control plane
D. Use Amazon RDS with S3 as the document repository

Best answer: C

Explanation: The requirement is about metadata and maintenance, not choosing the vector engine itself. DynamoDB is a strong fit for a metadata control plane that tracks document state, versions, ACL attributes, and TTL while embeddings remain in the selected vector store. For vector-backed GenAI applications, a common pattern is to separate retrieval data from operational metadata. The vector store handles embedding similarity search, while DynamoDB tracks document identity, source hashes, ingestion state, tenant ownership, ACL metadata, and lifecycle attributes such as TTL. This supports fast point lookups and conditional writes during ingestion or re-indexing workflows without overloading the vector index with control-plane responsibilities. Bedrock Knowledge Bases is better when the goal is managed RAG ingestion and retrieval orchestration; OpenSearch Service is better when the requirement is the vector search engine itself. Here, the key requirement is scalable metadata maintenance around an already selected vector store.

Question 20

Topic: Content Domain 4: Operational Efficiency and Optimization for GenAI Applications

An internal benefits assistant uses Amazon Bedrock Knowledge Bases with documents in Amazon S3. After a weekly content refresh, users report answers that are fluent but cite outdated or irrelevant policy sections. Latency, error rate, and throttling are normal. The team wants an early warning signal before user complaints increase. Which observability check best maps to this symptom?

A. Synthetic RAG canary measuring retrieval relevance and citation freshness
B. CloudWatch dashboard for invocation latency and throttling
C. Token usage metric by model and application tenant
D. CloudTrail log filter for Bedrock API authorization failures

Best answer: A

Explanation: The symptom is a retrieval and grounding quality problem, not a service health or cost problem. A synthetic RAG canary with expected queries and known current sources can detect stale or irrelevant retrieval before users report bad citations. For RAG applications, fluent answers with stale or irrelevant citations usually point to retrieval quality, indexing freshness, metadata filtering, or grounding issues. The best operational signal is a synthetic check that runs representative questions after refreshes and records retrieved chunk IDs, source timestamps, citation matches, and groundedness or relevance scores. This gives an early warning when the knowledge base returns the wrong evidence even though the model invocation path is healthy. Latency and error dashboards remain useful for availability, but they do not validate whether the answer is grounded in the right documents.

Question 21

Topic: Content Domain 2: Implementation and Integration

A company is building a synchronous customer-support assistant behind API Gateway and Lambda. The workflow uses Bedrock Prompt Management and a Bedrock-hosted FM that supports reserved capacity; no custom model artifact or inference container is required. Load tests with on-demand calls show intermittent throttling at expected business-hour traffic, and the SLA requires predictable low latency without queueing user requests. Which implementation best meets these requirements?

A. Invoke Bedrock on demand from Lambda with retries
B. Deploy the FM to a SageMaker AI endpoint
C. Use Bedrock Provisioned Throughput from Lambda
D. Buffer requests in SQS for Lambda processing

Best answer: C

Explanation: The workload uses a Bedrock-hosted model, has known steady demand, and requires predictable synchronous latency. Bedrock Provisioned Throughput is the right deployment choice because it reserves model capacity without moving to a custom SageMaker hosting model. For Amazon Bedrock applications, on-demand invocation is usually best for variable or low-volume traffic because it requires no reserved capacity. This scenario has already shown throttling at expected load and has a synchronous SLA, so adding retries would increase tail latency rather than guarantee capacity. Bedrock Provisioned Throughput lets the application call a provisioned model ARN from Lambda while keeping the managed Bedrock integration, prompt management, and operational model. SageMaker AI real-time endpoints are better when the team must host a custom or open-weight model artifact, control the inference container, or use SageMaker deployment workflows. The key distinction is reserved Bedrock capacity versus custom model hosting.

Question 22

Topic: Content Domain 5: Testing, Validation, and Troubleshooting

A team is choosing between two Amazon Bedrock prompt configurations for a customer support summarization workflow. The release gate requires no critical factual accuracy or task-alignment defects.

Exhibit: Evaluation summary

Scale: 1=poor, 5=excellent
Config A: relevance 4.8, factual 2.0, consistency 2.3, fluency 4.9, alignment 2.1
Config B: relevance 4.3, factual 4.5, consistency 4.4, fluency 4.4, alignment 4.6
Reviewer note for A: invents 24x7 phone support and refund promises.

What is the best interpretation and next step?

A. Promote Config A because its relevance and fluency are highest.
B. Promote Config B and remediate Config A’s unsupported claims.
C. Raise Config A temperature and re-run the same evaluation.
D. Run latency tests before making a quality decision.

Best answer: B

Explanation: Config A looks polished but fails the most important release criteria. The decisive exhibit details are the low factual accuracy and alignment scores plus the reviewer note that it invents support and refund policies. FM output evaluation should consider relevance, factual accuracy, consistency, fluency, and task alignment together, not just whether the response reads well. In the exhibit, Config A has strong relevance and fluency, but factual accuracy is 2.0, consistency is 2.3, and alignment is 2.1. The reviewer note confirms the defect: unsupported claims about 24x7 phone support and refunds. Because the release gate blocks critical factual or task-alignment defects, Config A should not be promoted. Config B has slightly lower relevance and fluency but strong factual accuracy, consistency, and alignment, making it the safer production candidate.

Question 23

Topic: Content Domain 3: AI Safety, Security, and Governance

A financial services team uses Amazon Bedrock Prompt Management to version prompts for a customer-support summarization assistant. Before promoting candidate prompt v8, the team ran a Bedrock LLM-as-judge evaluation with a fixed, balanced test set. The judge used the same pass rubric for both prompt versions.

Exhibit: Evaluation results

Evaluation slice	Baseline v7 pass	Candidate v8 pass	Difference
Overall	88%	91%	+3%
Age 18-34	89%	94%	+5%
Age 35-54	88%	91%	+3%
Age 55+	87%	78%	-9%

Which next step should the team take?

A. Promote v8 and monitor fairness metrics in production.
B. Remove age-slice metadata and compare only aggregate scores.
C. Keep v8 unpromoted and investigate the Age 55+ regression.
D. Replace the LLM-as-judge with unsegmented user satisfaction scores.

Best answer: C

Explanation: The candidate prompt improves the aggregate score but creates a clear fairness regression for one evaluation slice. A responsible release process should keep the prompt version unpromoted and use the versioned evaluation evidence to investigate and remediate that slice before rollout. Fairness evaluation should not rely only on aggregate quality metrics. The exhibit shows v8 improves overall pass rate from 88% to 91%, but the Age 55+ slice drops from 87% to 78%. With Bedrock Prompt Management, the team can keep v8 as a candidate version while using Prompt Flows or a controlled A/B evaluation with the same rubric to inspect failed examples, rerun slice-level tests, and add human review if needed. Slice metadata is appropriate for evaluating fairness, even if the model should not use that attribute to make user-specific decisions. The key takeaway is that aggregate improvement does not justify release when a protected or sensitive group regresses materially.

Question 24

Topic: Content Domain 1: Foundation Model Integration, Data Management, and Compliance

Which FM approach is best defined as adapting a base model with organization-approved domain data or task examples, then governing the resulting model version because off-the-shelf models do not meet the required accuracy or domain specificity?

A. General-purpose FM approach
B. Customized FM approach
C. Task-specialized FM approach
D. Embedding-only RAG approach

Best answer: B

Explanation: A customized FM approach modifies or adapts a base model to better match a domain, style, or task requirement. It is appropriate when prompt engineering, RAG, or an off-the-shelf task model cannot meet accuracy, domain specificity, or governance needs. The core distinction is whether the model is used as-is, selected for a built-in task specialization, or adapted for the organization. A general-purpose FM is broad and flexible but may not capture specialized terminology or output behavior. A task-specialized FM is already optimized for a known task, such as embeddings or summarization. A customized FM is created by adapting a base model with approved data or examples and then controlling the resulting model lifecycle, such as through model versioning and approval workflows. The key signal in the stem is that the model itself must be adapted and governed, not merely prompted or supplemented with retrieved context.

AIP-C01 generative AI delivery map

Use this map after the sample questions to connect individual items to the AWS Generative AI Developer Professional decisions these practice samples test.

    flowchart LR
	  S1["Business GenAI requirement"] --> S2
	  S2["Choose Bedrock model and retrieval pattern"] --> S3
	  S3["Design prompt guardrail and data controls"] --> S4
	  S4["Integrate through APIs workflows and agents"] --> S5
	  S5["Evaluate safety quality and latency"] --> S6
	  S6["Operate cost monitoring and rollback"]

Quick Cheat Sheet

Cue	What to remember
Bedrock fit	Use Bedrock when managed foundation models, guardrails, agents, knowledge bases, and prompt workflows fit the workload.
RAG vs customization	Use retrieval when the answer needs current enterprise context; customize only when model behavior or domain fit requires it.
Security	Protect prompts, retrieved content, model access, IAM, KMS keys, and data boundaries.
Evaluation	Check factuality, relevance, safety, latency, cost, and slice-level regressions before release.
Operations	Plan throttling, provisioned throughput, traces, prompt versions, rollbacks, and feedback loops.

Mini Glossary

Bedrock Guardrails: AWS controls for filtering or constraining model inputs and outputs.
Knowledge Base: Bedrock capability for retrieval-grounded generation over indexed data.
Provisioned Throughput: Reserved model capacity for predictable Bedrock inference.
RAG: Retrieval augmented generation using retrieved context to ground model responses.
Token: Text unit processed by a foundation model and often used for cost or limit calculation.

In this section

AWS AIP-C01: FM Integration and Data
Try 10 focused AWS AIP-C01 questions on FM Integration and Data, with explanations, then continue with IT Mastery.
AWS AIP-C01: Implementation and Integration
Try 10 focused AWS AIP-C01 questions on Implementation and Integration, with explanations, then continue with IT Mastery.
AWS AIP-C01: AI Safety, Security, and Governance
Try 10 focused AWS AIP-C01 questions on AI Safety, Security, and Governance, with explanations, then continue with IT Mastery.
AWS AIP-C01: Genai Operations
Try 10 focused AWS AIP-C01 questions on Genai Operations, with explanations, then continue with IT Mastery.
AWS AIP-C01: Testing, Validation, and Troubleshooting
Try 10 focused AWS AIP-C01 questions on Testing, Validation, and Troubleshooting, with explanations, then continue with IT Mastery.
Free AWS AIP-C01 Full-Length Practice Exam: 75 Questions
Try 75 free AWS AIP-C01 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

Revised on Friday, May 15, 2026

AIF-C01

ANS-C01

Browse Certification Practice Tests by Exam Family

AWS AIP-C01 Practice Test: GenAI Developer Pro

What this AIP-C01 practice page gives you

Who AIP-C01 is for

AIP-C01 exam snapshot

Topic coverage for AIP-C01

AIP-C01 production GenAI decision filters

AIP-C01 readiness map

How to use the AIP-C01 simulator efficiently

Final 7-day AIP-C01 practice sequence

When AIP-C01 practice is enough

Focused sample questions

Free study resources

Free preview vs premium

Good next pages after AIP-C01

Official sources

24 AIP-C01 sample questions with detailed explanations

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

AIP-C01 generative AI delivery map

Quick Cheat Sheet

Mini Glossary

In this section