AIP-C01 — AWS Certified Generative AI Developer – Professional Quick Review

Last revised: June 29, 2026

High-yield Quick Review for AWS Certified Generative AI Developer – Professional (AIP-C01): Bedrock, RAG, agents, security, evaluation, deployment, and cost.

Quick Review purpose

This Quick Review is for candidates preparing for the AWS Certified Generative AI Developer – Professional (AIP-C01) exam from AWS. Use it to refresh high-yield concepts before moving into IT Mastery practice, topic drills, mock exams, original practice questions, and detailed explanations.

Focus less on memorizing service names in isolation and more on choosing the best architecture for a scenario: secure generative AI application design, retrieval-augmented generation, agentic workflows, prompt engineering, evaluation, observability, cost control, and responsible AI controls.

High-yield AWS generative AI architecture map

Requirement	AWS services or patterns to recognize	Fast decision rule	Common trap
Use a foundation model without managing infrastructure	Amazon Bedrock	Prefer managed model access when the scenario emphasizes speed, low operations, or multiple model choices	Choosing custom training when prompt engineering or RAG would solve the problem
Build an API around model inference	Amazon API Gateway, AWS Lambda, Amazon ECS, Amazon EKS, AWS Step Functions	Use serverless for variable event-driven workloads; use containers for custom runtime or long-running workloads	Putting all orchestration inside one Lambda when retries, branching, or state are needed
Stream responses to users	Bedrock streaming invocation, application streaming over WebSocket or HTTP streaming patterns	Use streaming to improve perceived latency for chat and long responses	Confusing streaming with lower total model processing cost
Ground answers in private documents	RAG with Amazon Bedrock Knowledge Bases, Amazon S3, vector stores such as Amazon OpenSearch Serverless or database vector features	Use RAG for frequently changing or source-backed knowledge	Fine-tuning a model to memorize changing business documents
Execute business actions from natural language	Amazon Bedrock Agents, Lambda action groups, Step Functions, API integrations	Use agents when the model must choose tools or plan steps; use Step Functions for deterministic business workflows	Giving an agent broad permissions or unvalidated tool inputs
Customize model behavior	Prompt templates, RAG, Bedrock model customization, Amazon SageMaker	Start with prompts and retrieval; customize only when behavior or domain adaptation requires it	Treating fine-tuning as a substitute for clean data, retrieval, or guardrails
Protect sensitive data	IAM, AWS KMS, Secrets Manager, VPC endpoints, CloudTrail, S3 controls, data redaction	Apply least privilege, encryption, network controls, audit logging, and data minimization	Logging prompts/responses containing PII without redaction or retention controls
Monitor production behavior	Amazon CloudWatch, AWS CloudTrail, AWS X-Ray where applicable, application metrics, evaluation datasets	Monitor latency, errors, token usage, grounding, safety, and user feedback	Monitoring only infrastructure metrics and ignoring response quality
Deploy repeatably	AWS CDK, AWS CloudFormation, CodePipeline, CodeBuild, staged environments	Version prompts, model choices, retrieval config, and guardrails like application artifacts	Changing prompts manually in production without rollback

Core decision: prompt engineering, RAG, fine-tuning, or custom model?

Choose this	When the scenario says	Strength	Watch out for
Prompt engineering	Need better formatting, tone, role instructions, examples, or task decomposition	Fast, cheap, reversible	Prompt-only solutions do not reliably add private or current facts
RAG	Need answers from private, current, auditable, or source-cited documents	Keeps knowledge external and updateable	Poor chunking, missing metadata filters, or low retrieval quality can cause hallucinations
Fine-tuning or model customization	Need consistent style, domain-specific patterns, classification behavior, or specialized task performance	Can improve behavior on repeated task types	Not ideal for rapidly changing facts; requires training data quality and evaluation
Continued pretraining or deeper customization	Need broad domain language adaptation and have substantial domain corpus	May improve domain fluency	Higher cost, complexity, and governance burden
Custom model on SageMaker or containerized infrastructure	Need full control over model, runtime, dependencies, or specialized deployment	Maximum flexibility	More operations, scaling, security, and lifecycle responsibility

Candidate mistake: assuming “more training” is always better. On AIP-C01-style scenarios, prefer the least complex option that meets the requirement: prompt template first, RAG for knowledge, fine-tuning for behavior, custom infrastructure only when managed services do not satisfy constraints.

RAG review: retrieval-augmented generation

RAG is one of the most testable generative AI architecture patterns because it combines data engineering, search quality, prompt design, security, and evaluation.

RAG pipeline checklist

Ingest source data from controlled repositories such as S3 or application data stores.
Normalize and clean documents: remove noise, preserve titles, sections, timestamps, and access metadata.
Chunk content into retrieval-sized units.
Embed chunks using an embedding model.
Index vectors and metadata in a vector store.
Retrieve relevant chunks for a user query.
Filter by tenant, user entitlement, document type, region, sensitivity, or freshness.
Construct prompt with instructions, user question, retrieved context, and output requirements.
Generate answer using a foundation model.
Cite sources or return supporting references when required.
Evaluate retrieval quality, answer quality, safety, cost, and latency.

Chunking and retrieval decisions

Design choice	Good default thinking	If you choose poorly
Chunk size	Large enough to preserve meaning, small enough for precise retrieval	Chunks too small lose context; chunks too large dilute relevance and increase tokens
Chunk overlap	Use overlap when meaning crosses boundaries	Too much overlap increases storage, retrieval duplication, and cost
Metadata	Store source, owner, timestamp, tenant, permissions, document type, and business attributes	You cannot enforce access control or filter results effectively
Embedding model	Match the embedding model to language, domain, and retrieval quality needs	Changing embedding models may require re-indexing
Hybrid retrieval	Combine keyword and vector search when exact terms and semantic similarity both matter	Pure semantic search may miss exact identifiers, codes, or product names
Reranking	Use when top-k retrieval quality is weak or many similar chunks compete	Extra latency and cost may not be justified for simple retrieval
Citations	Include source IDs and snippets when auditability matters	Answers may be useful but not trusted or verifiable

RAG traps

Security leak trap: retrieving documents before checking user authorization.
Freshness trap: using stale embeddings after source documents change.
Context stuffing trap: sending too many retrieved chunks, increasing cost and confusing the model.
Evaluation trap: checking only final answer quality while ignoring retrieval recall and precision.
Tenant isolation trap: using one shared vector index without metadata filtering or separate tenant isolation controls.
Hallucination trap: telling the model to “answer confidently” instead of instructing it to answer only from provided context and say when context is insufficient.

Prompt engineering review

Good prompts reduce ambiguity. Production prompts should be versioned, tested, and treated as application assets.

Prompt element	Purpose	Example decision
Role or task instruction	Defines what the model should do	“Classify the support ticket into one of these categories” is stronger than “Analyze this”
Context	Provides facts the model may use	In RAG, distinguish retrieved context from user instructions
Output schema	Makes responses machine-parseable	Use JSON-like or field-based outputs when the next system consumes the result
Constraints	Defines boundaries	“Use only the provided context” or “Do not include legal advice”
Examples	Shows desired behavior	Few-shot examples help with classification, formatting, tone, and edge cases
Refusal rule	Handles missing or unsafe input	“If the context does not contain the answer, say so”
Tool instructions	Defines when and how to call tools	Keep tool inputs structured and validated

Prompt mistakes to avoid

Mixing untrusted user text with system instructions without separation.
Asking the model to reveal hidden reasoning instead of requesting concise, auditable explanations or structured intermediate outputs.
Depending on prompt wording alone for security-sensitive enforcement.
Forgetting adversarial inputs such as “ignore previous instructions.”
Using long prompts with repeated policies that increase cost and may reduce clarity.
Not regression-testing prompt changes against known examples.

Model inference and application patterns

Pattern	Best fit	AWS-oriented review point
Synchronous inference	Short request/response tasks	Keep timeout, latency, and user experience requirements in mind
Streaming inference	Chat, drafting, long generated output	Improves perceived responsiveness; still requires error handling mid-stream
Asynchronous processing	Long-running jobs, batch generation, document processing	Use queues, Step Functions, events, and durable state
Batch inference	Large offline workloads	Optimize for throughput and cost rather than interactive latency
Human-in-the-loop review	High-risk, regulated, customer-facing, or low-confidence outputs	Route uncertain or sensitive outputs for review instead of full automation
Caching	Repeated prompts, static context, deterministic queries	Cache carefully when responses depend on user permissions or freshness

Agents and tool use

Agents are useful when a model must reason about which tool to call, gather information, or perform a sequence of actions. They are not a replacement for deterministic workflow design.

Scenario cue	Prefer	Why
“The application must decide which backend API to call based on user intent”	Bedrock Agents or tool-calling pattern	The model can map natural language intent to tool selection
“The workflow has fixed approval, retry, wait, and branching steps”	Step Functions	Deterministic orchestration is easier to audit and operate
“The model must update a customer record”	Agent/tool with Lambda plus strict validation and IAM	Keep business action code outside the prompt and enforce permissions
“The task is safety-critical or financially impactful”	Human approval plus deterministic controls	Do not let a model independently perform high-risk irreversible actions
“The tool accepts free-form user input”	Input schema validation and sanitization	Prevent prompt injection and malformed API calls

Agent security checklist

Limit each tool to the minimum action needed.
Validate tool inputs before execution.
Use IAM roles with least privilege.
Separate read-only tools from write or destructive tools.
Require confirmation or human approval for irreversible actions.
Log tool calls, inputs, outputs, and request IDs with sensitive data controls.
Design idempotent actions where retries are possible.
Do not let retrieved text redefine tool permissions.

Guardrails, safety, and responsible AI

Responsible AI questions often test layered controls. A single prompt instruction is not enough.

Risk	Practical control
Toxic, hateful, or unsafe output	Bedrock Guardrails, content filters, output validation, policy prompts
Prompt injection	Instruction hierarchy, input sanitization, context separation, tool allowlists, retrieval filtering
Sensitive data exposure	PII detection/redaction, data minimization, encryption, restricted logging
Hallucination	RAG grounding, citations, refusal rules, confidence thresholds, evaluation datasets
Biased or unfair outputs	Representative test sets, human review, model evaluation, policy constraints
Unauthorized access	IAM, tenant-aware metadata filters, application authorization before retrieval
Inappropriate business action	Tool validation, approval workflows, scoped IAM, audit trails
Low-quality answers	Golden datasets, human evaluation, automated scoring, feedback loops

Defense-in-depth pattern

For sensitive generative AI applications, think in layers:

Before the model: authenticate user, authorize data access, sanitize input, classify sensitivity.
During retrieval: filter by permissions, freshness, tenant, and source.
During generation: use clear system instructions, guardrails, constrained context, and structured output.
After generation: validate output, redact sensitive data, check policy, route to human review if needed.
In operations: log safely, monitor drift, evaluate regularly, and maintain rollback options.

Security review for AIP-C01 candidates

Area	What to remember	Candidate trap
IAM	Grant only required actions such as model invocation, S3 reads, vector store access, and Lambda execution	Giving application roles broad administrator permissions
Encryption	Use KMS-managed encryption for stored data where appropriate	Encrypting S3 but forgetting vector indexes, logs, or temporary stores
Secrets	Store API keys and credentials in Secrets Manager or controlled AWS mechanisms	Hardcoding secrets in Lambda environment variables, prompts, or container images
Network path	Use private connectivity patterns where required, such as VPC endpoints supported by the service architecture	Assuming public internet access is acceptable for sensitive workloads
Audit	Use CloudTrail and application logs for model calls, data access, and tool execution	Logging everything without considering sensitive prompt/response content
Data retention	Define how long prompts, responses, embeddings, and source documents are retained	Keeping raw user input indefinitely by default
Multi-tenant isolation	Use tenant-aware authorization, metadata filtering, separate indexes, or separate accounts where needed	Relying on the model to “not reveal” unauthorized data
Least data	Send only the context needed for the task	Sending full documents or unnecessary PII to the model

Data lifecycle and privacy

Generative AI applications create and transform data in several places: source documents, chunks, embeddings, prompts, responses, logs, evaluation datasets, and feedback records.

High-yield review points:

Embeddings are derived data. Treat them according to the sensitivity of the source content and your governance requirements.
Logs can become a data leak. Redact or suppress sensitive prompts, retrieved context, responses, and tool outputs when needed.
Access control must happen before generation. Do not retrieve unauthorized context and hope the model ignores it.
Data freshness matters. Re-index or update embeddings when source documents change.
Deletion workflows matter. If source content must be removed, consider dependent chunks, embeddings, caches, and evaluation copies.
Training data quality controls model behavior. Duplicates, label noise, sensitive data, and unrepresentative examples can degrade results.

Evaluation and testing

A professional-level generative AI scenario usually requires both software testing and model-output evaluation.

Evaluation target	Useful measures	What it tells you
Retrieval	Recall, precision, top-k relevance, metadata filter correctness	Whether the right context reaches the model
Groundedness	Faithfulness to retrieved context, citation accuracy	Whether the answer is supported by sources
Task success	Classification accuracy, extraction correctness, rubric score	Whether the model solves the business task
Safety	Toxicity, policy violations, sensitive data exposure	Whether outputs meet safety requirements
Robustness	Adversarial prompts, prompt injection tests, edge cases	Whether behavior holds under attack or unusual input
Latency	End-to-end response time, retrieval time, model time	Whether user experience targets are met
Cost	Tokens per request, model choice, retrieval cost, provisioned capacity	Whether the design is economically sustainable
User experience	Human ratings, thumbs up/down, escalation rate	Whether real users find the output useful

Evaluation workflow

Build a golden dataset of representative prompts, expected answers, unacceptable answers, and edge cases.
Test retrieval separately from generation.
Test prompt templates and model choices against the same dataset.
Include safety and prompt injection cases, not just happy paths.
Track results over time so prompt, model, data, and retrieval changes do not silently regress.
Use human review for subjective tasks and high-impact decisions.

Candidate mistake: choosing a model based only on a demo response. For exam scenarios, prefer repeatable evaluation with representative data, measurable criteria, and deployment controls.

Deployment and LLMOps review

Concern	Good practice
Environment separation	Use development, test, staging, and production environments
Infrastructure	Define resources with infrastructure as code such as AWS CDK or CloudFormation
Prompt management	Version prompts, templates, examples, and output schemas
Model management	Record model ID, configuration, parameters, and deployment mode
Retrieval management	Version chunking strategy, embedding model, index schema, and metadata filters
Guardrail management	Version guardrail policies and test them before production release
Release strategy	Use canary, blue/green, or staged rollout when behavior risk is material
Rollback	Keep known-good prompt, model, retrieval, and guardrail configurations
Observability	Track quality, safety, latency, errors, throttling, and cost
Incident response	Preserve audit trails, disable risky tools, and fall back to safe responses

Performance and cost controls

Lever	Effect	Review note
Model choice	Larger or more capable models usually cost more and may add latency	Match model capability to task complexity
Prompt length	Longer prompts consume more input tokens	Remove redundant instructions and irrelevant context
Retrieved context size	More chunks increase token cost and may reduce answer focus	Tune top-k, chunk size, reranking, and filters
Output length	Long responses increase output tokens	Set concise output requirements where appropriate
Caching	Reduces repeated computation	Be careful with user-specific or permission-specific responses
Streaming	Improves perceived latency	Does not necessarily reduce total cost
Asynchronous processing	Improves reliability for long work	Good for document processing and batch generation
Provisioned capacity	Useful for predictable high-throughput or low-latency workloads	Do not choose it automatically for sporadic usage
Serverless design	Scales with demand and reduces idle cost	Watch timeouts, cold starts, and service quotas
Batch jobs	Efficient for offline workloads	Not suitable for interactive user response needs

Common scenario patterns

Enterprise document assistant

Best-fit pattern:

S3 or enterprise repository as source.
Ingestion and chunking pipeline.
Embeddings and vector index.
Metadata filters for tenant, department, classification, and document permissions.
RAG prompt that instructs the model to answer only from retrieved context.
Citations returned to the user.
Guardrails and sensitive data controls.
Evaluation for retrieval relevance, groundedness, and hallucination.

Common wrong answer: fine-tune a model on all company documents so it “knows” internal policies. RAG is usually better for current, permissioned, source-backed knowledge.

Customer support assistant

Best-fit pattern:

RAG over product docs, ticket history, and approved macros.
Output template for response draft, confidence, sources, and escalation reason.
Human review for low confidence, high-value accounts, or sensitive categories.
Guardrails for unsafe advice and tone.
Feedback loop from agent edits into evaluation data.

Common wrong answer: fully automate all responses without escalation or monitoring.

Structured extraction from documents

Best-fit pattern:

Preprocess documents.
Use prompt template with required fields and output schema.
Validate output against schema and business rules.
Send failures to retry or human review.
Evaluate field-level accuracy.

Common wrong answer: accept free-form generated text when downstream systems require structured data.

Natural-language operations assistant

Best-fit pattern:

Agent or tool-calling interface.
Read-only tools for search and diagnostics.
Write tools separated and restricted.
Step Functions or approval workflow for changes.
Full audit logging.

Common wrong answer: allow the model to execute broad infrastructure changes directly.

Exam wording cues and likely decisions

If the scenario emphasizes	Think first
“Private documents,” “latest policies,” “citations,” or “source-backed answers”	RAG with secure retrieval and metadata filtering
“Consistent output format” or “machine-readable result”	Prompt template plus schema validation
“Changing business knowledge”	Update external knowledge base, not fine-tune for facts
“Need to reduce hallucinations”	Grounding, citations, refusal rules, evaluation, and guardrails
“User should not see unauthorized documents”	Authorization before retrieval plus tenant/document filters
“Model must call APIs”	Agent/tool use with scoped permissions and input validation
“Long-running multi-step workflow”	Step Functions, queues, events, and durable orchestration
“High-risk or irreversible action”	Human approval and deterministic controls
“Need lower latency for chat”	Streaming, smaller model if adequate, optimized retrieval, caching
“Need lower cost”	Reduce tokens, right-size model, cache, tune retrieval, batch where possible
“Need repeatable deployments”	Infrastructure as code, prompt/model versioning, staged releases
“Need to detect regressions”	Golden datasets, automated evaluation, monitoring, rollback

Common candidate mistakes

Choosing model customization before considering prompt engineering or RAG.
Ignoring IAM and data authorization in RAG designs.
Treating guardrails as a complete security boundary instead of one layer.
Forgetting that embeddings, logs, cached responses, and evaluation records may contain sensitive information.
Selecting an agent for a deterministic workflow that should be implemented with Step Functions.
Failing to validate tool inputs and outputs.
Monitoring only errors and latency, not answer quality, grounding, safety, and cost.
Sending too much context to the model and increasing hallucination risk.
Not accounting for freshness when documents change.
Using one evaluation prompt instead of a representative test set.
Overlooking throttling, quotas, retries, dead-letter queues, and backpressure in production designs.
Assuming a human-like response means the answer is correct.

Quick final checklist

Before practice questions, make sure you can quickly answer:

When should you use RAG instead of fine-tuning?
How do chunk size, overlap, metadata, and top-k affect retrieval quality?
How do you prevent one tenant from retrieving another tenant’s data?
What controls reduce prompt injection risk?
When is an agent appropriate, and when is Step Functions better?
What should be logged, and what should be redacted?
How do you evaluate groundedness, retrieval quality, and safety?
Which cost levers reduce token usage?
How do you deploy prompt and model changes safely?
How do guardrails, IAM, KMS, network controls, and audit logs work together?
How would you design rollback for a bad prompt or retrieval change?
What metrics prove the application is improving rather than just running?

Practice connection

Use this Quick Review as a checklist, then move into AIP-C01 topic drills and original practice questions. For each missed question, identify the decision point: RAG vs fine-tuning, managed vs custom, agent vs workflow, prompt vs guardrail, synchronous vs asynchronous, or quality vs cost. Detailed explanations are most useful when you map them back to these architecture choices and then retest with a focused question bank.

Continue in IT Mastery

Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official AWS questions, copied live-exam content, or exam dumps.

Study Plan

AIP-C01 — AWS Certified Generative AI Developer – Professional Quick Review

Quick Review purpose

High-yield AWS generative AI architecture map

Core decision: prompt engineering, RAG, fine-tuning, or custom model?

RAG review: retrieval-augmented generation

RAG pipeline checklist

Chunking and retrieval decisions

RAG traps

Prompt engineering review

Prompt mistakes to avoid

Model inference and application patterns

Agents and tool use

Agent security checklist

Guardrails, safety, and responsible AI

Defense-in-depth pattern

Security review for AIP-C01 candidates

Data lifecycle and privacy

Evaluation and testing

Evaluation workflow

Deployment and LLMOps review

Performance and cost controls

Common scenario patterns

Enterprise document assistant

Customer support assistant

Structured extraction from documents

Natural-language operations assistant

Exam wording cues and likely decisions

Common candidate mistakes

Quick final checklist

Practice connection

Continue in IT Mastery

Browse Certification Practice Tests by Exam Family