AIP-C01 — AWS Certified Generative AI Developer – Professional Quick Review
High-yield Quick Review for AWS Certified Generative AI Developer – Professional (AIP-C01): Bedrock, RAG, agents, security, evaluation, deployment, and cost.
Quick Review purpose
This Quick Review is for candidates preparing for the AWS Certified Generative AI Developer – Professional (AIP-C01) exam from AWS. Use it to refresh high-yield concepts before moving into IT Mastery practice, topic drills, mock exams, original practice questions, and detailed explanations.
Focus less on memorizing service names in isolation and more on choosing the best architecture for a scenario: secure generative AI application design, retrieval-augmented generation, agentic workflows, prompt engineering, evaluation, observability, cost control, and responsible AI controls.
High-yield AWS generative AI architecture map
| Requirement | AWS services or patterns to recognize | Fast decision rule | Common trap |
|---|---|---|---|
| Use a foundation model without managing infrastructure | Amazon Bedrock | Prefer managed model access when the scenario emphasizes speed, low operations, or multiple model choices | Choosing custom training when prompt engineering or RAG would solve the problem |
| Build an API around model inference | Amazon API Gateway, AWS Lambda, Amazon ECS, Amazon EKS, AWS Step Functions | Use serverless for variable event-driven workloads; use containers for custom runtime or long-running workloads | Putting all orchestration inside one Lambda when retries, branching, or state are needed |
| Stream responses to users | Bedrock streaming invocation, application streaming over WebSocket or HTTP streaming patterns | Use streaming to improve perceived latency for chat and long responses | Confusing streaming with lower total model processing cost |
| Ground answers in private documents | RAG with Amazon Bedrock Knowledge Bases, Amazon S3, vector stores such as Amazon OpenSearch Serverless or database vector features | Use RAG for frequently changing or source-backed knowledge | Fine-tuning a model to memorize changing business documents |
| Execute business actions from natural language | Amazon Bedrock Agents, Lambda action groups, Step Functions, API integrations | Use agents when the model must choose tools or plan steps; use Step Functions for deterministic business workflows | Giving an agent broad permissions or unvalidated tool inputs |
| Customize model behavior | Prompt templates, RAG, Bedrock model customization, Amazon SageMaker | Start with prompts and retrieval; customize only when behavior or domain adaptation requires it | Treating fine-tuning as a substitute for clean data, retrieval, or guardrails |
| Protect sensitive data | IAM, AWS KMS, Secrets Manager, VPC endpoints, CloudTrail, S3 controls, data redaction | Apply least privilege, encryption, network controls, audit logging, and data minimization | Logging prompts/responses containing PII without redaction or retention controls |
| Monitor production behavior | Amazon CloudWatch, AWS CloudTrail, AWS X-Ray where applicable, application metrics, evaluation datasets | Monitor latency, errors, token usage, grounding, safety, and user feedback | Monitoring only infrastructure metrics and ignoring response quality |
| Deploy repeatably | AWS CDK, AWS CloudFormation, CodePipeline, CodeBuild, staged environments | Version prompts, model choices, retrieval config, and guardrails like application artifacts | Changing prompts manually in production without rollback |
Core decision: prompt engineering, RAG, fine-tuning, or custom model?
| Choose this | When the scenario says | Strength | Watch out for |
|---|---|---|---|
| Prompt engineering | Need better formatting, tone, role instructions, examples, or task decomposition | Fast, cheap, reversible | Prompt-only solutions do not reliably add private or current facts |
| RAG | Need answers from private, current, auditable, or source-cited documents | Keeps knowledge external and updateable | Poor chunking, missing metadata filters, or low retrieval quality can cause hallucinations |
| Fine-tuning or model customization | Need consistent style, domain-specific patterns, classification behavior, or specialized task performance | Can improve behavior on repeated task types | Not ideal for rapidly changing facts; requires training data quality and evaluation |
| Continued pretraining or deeper customization | Need broad domain language adaptation and have substantial domain corpus | May improve domain fluency | Higher cost, complexity, and governance burden |
| Custom model on SageMaker or containerized infrastructure | Need full control over model, runtime, dependencies, or specialized deployment | Maximum flexibility | More operations, scaling, security, and lifecycle responsibility |
Candidate mistake: assuming “more training” is always better. On AIP-C01-style scenarios, prefer the least complex option that meets the requirement: prompt template first, RAG for knowledge, fine-tuning for behavior, custom infrastructure only when managed services do not satisfy constraints.
RAG review: retrieval-augmented generation
RAG is one of the most testable generative AI architecture patterns because it combines data engineering, search quality, prompt design, security, and evaluation.
RAG pipeline checklist
- Ingest source data from controlled repositories such as S3 or application data stores.
- Normalize and clean documents: remove noise, preserve titles, sections, timestamps, and access metadata.
- Chunk content into retrieval-sized units.
- Embed chunks using an embedding model.
- Index vectors and metadata in a vector store.
- Retrieve relevant chunks for a user query.
- Filter by tenant, user entitlement, document type, region, sensitivity, or freshness.
- Construct prompt with instructions, user question, retrieved context, and output requirements.
- Generate answer using a foundation model.
- Cite sources or return supporting references when required.
- Evaluate retrieval quality, answer quality, safety, cost, and latency.
Chunking and retrieval decisions
| Design choice | Good default thinking | If you choose poorly |
|---|---|---|
| Chunk size | Large enough to preserve meaning, small enough for precise retrieval | Chunks too small lose context; chunks too large dilute relevance and increase tokens |
| Chunk overlap | Use overlap when meaning crosses boundaries | Too much overlap increases storage, retrieval duplication, and cost |
| Metadata | Store source, owner, timestamp, tenant, permissions, document type, and business attributes | You cannot enforce access control or filter results effectively |
| Embedding model | Match the embedding model to language, domain, and retrieval quality needs | Changing embedding models may require re-indexing |
| Hybrid retrieval | Combine keyword and vector search when exact terms and semantic similarity both matter | Pure semantic search may miss exact identifiers, codes, or product names |
| Reranking | Use when top-k retrieval quality is weak or many similar chunks compete | Extra latency and cost may not be justified for simple retrieval |
| Citations | Include source IDs and snippets when auditability matters | Answers may be useful but not trusted or verifiable |
RAG traps
- Security leak trap: retrieving documents before checking user authorization.
- Freshness trap: using stale embeddings after source documents change.
- Context stuffing trap: sending too many retrieved chunks, increasing cost and confusing the model.
- Evaluation trap: checking only final answer quality while ignoring retrieval recall and precision.
- Tenant isolation trap: using one shared vector index without metadata filtering or separate tenant isolation controls.
- Hallucination trap: telling the model to “answer confidently” instead of instructing it to answer only from provided context and say when context is insufficient.
Prompt engineering review
Good prompts reduce ambiguity. Production prompts should be versioned, tested, and treated as application assets.
| Prompt element | Purpose | Example decision |
|---|---|---|
| Role or task instruction | Defines what the model should do | “Classify the support ticket into one of these categories” is stronger than “Analyze this” |
| Context | Provides facts the model may use | In RAG, distinguish retrieved context from user instructions |
| Output schema | Makes responses machine-parseable | Use JSON-like or field-based outputs when the next system consumes the result |
| Constraints | Defines boundaries | “Use only the provided context” or “Do not include legal advice” |
| Examples | Shows desired behavior | Few-shot examples help with classification, formatting, tone, and edge cases |
| Refusal rule | Handles missing or unsafe input | “If the context does not contain the answer, say so” |
| Tool instructions | Defines when and how to call tools | Keep tool inputs structured and validated |
Prompt mistakes to avoid
- Mixing untrusted user text with system instructions without separation.
- Asking the model to reveal hidden reasoning instead of requesting concise, auditable explanations or structured intermediate outputs.
- Depending on prompt wording alone for security-sensitive enforcement.
- Forgetting adversarial inputs such as “ignore previous instructions.”
- Using long prompts with repeated policies that increase cost and may reduce clarity.
- Not regression-testing prompt changes against known examples.
Model inference and application patterns
| Pattern | Best fit | AWS-oriented review point |
|---|---|---|
| Synchronous inference | Short request/response tasks | Keep timeout, latency, and user experience requirements in mind |
| Streaming inference | Chat, drafting, long generated output | Improves perceived responsiveness; still requires error handling mid-stream |
| Asynchronous processing | Long-running jobs, batch generation, document processing | Use queues, Step Functions, events, and durable state |
| Batch inference | Large offline workloads | Optimize for throughput and cost rather than interactive latency |
| Human-in-the-loop review | High-risk, regulated, customer-facing, or low-confidence outputs | Route uncertain or sensitive outputs for review instead of full automation |
| Caching | Repeated prompts, static context, deterministic queries | Cache carefully when responses depend on user permissions or freshness |
Agents and tool use
Agents are useful when a model must reason about which tool to call, gather information, or perform a sequence of actions. They are not a replacement for deterministic workflow design.
| Scenario cue | Prefer | Why |
|---|---|---|
| “The application must decide which backend API to call based on user intent” | Bedrock Agents or tool-calling pattern | The model can map natural language intent to tool selection |
| “The workflow has fixed approval, retry, wait, and branching steps” | Step Functions | Deterministic orchestration is easier to audit and operate |
| “The model must update a customer record” | Agent/tool with Lambda plus strict validation and IAM | Keep business action code outside the prompt and enforce permissions |
| “The task is safety-critical or financially impactful” | Human approval plus deterministic controls | Do not let a model independently perform high-risk irreversible actions |
| “The tool accepts free-form user input” | Input schema validation and sanitization | Prevent prompt injection and malformed API calls |
Agent security checklist
- Limit each tool to the minimum action needed.
- Validate tool inputs before execution.
- Use IAM roles with least privilege.
- Separate read-only tools from write or destructive tools.
- Require confirmation or human approval for irreversible actions.
- Log tool calls, inputs, outputs, and request IDs with sensitive data controls.
- Design idempotent actions where retries are possible.
- Do not let retrieved text redefine tool permissions.
Guardrails, safety, and responsible AI
Responsible AI questions often test layered controls. A single prompt instruction is not enough.
| Risk | Practical control |
|---|---|
| Toxic, hateful, or unsafe output | Bedrock Guardrails, content filters, output validation, policy prompts |
| Prompt injection | Instruction hierarchy, input sanitization, context separation, tool allowlists, retrieval filtering |
| Sensitive data exposure | PII detection/redaction, data minimization, encryption, restricted logging |
| Hallucination | RAG grounding, citations, refusal rules, confidence thresholds, evaluation datasets |
| Biased or unfair outputs | Representative test sets, human review, model evaluation, policy constraints |
| Unauthorized access | IAM, tenant-aware metadata filters, application authorization before retrieval |
| Inappropriate business action | Tool validation, approval workflows, scoped IAM, audit trails |
| Low-quality answers | Golden datasets, human evaluation, automated scoring, feedback loops |
Defense-in-depth pattern
For sensitive generative AI applications, think in layers:
- Before the model: authenticate user, authorize data access, sanitize input, classify sensitivity.
- During retrieval: filter by permissions, freshness, tenant, and source.
- During generation: use clear system instructions, guardrails, constrained context, and structured output.
- After generation: validate output, redact sensitive data, check policy, route to human review if needed.
- In operations: log safely, monitor drift, evaluate regularly, and maintain rollback options.
Security review for AIP-C01 candidates
| Area | What to remember | Candidate trap |
|---|---|---|
| IAM | Grant only required actions such as model invocation, S3 reads, vector store access, and Lambda execution | Giving application roles broad administrator permissions |
| Encryption | Use KMS-managed encryption for stored data where appropriate | Encrypting S3 but forgetting vector indexes, logs, or temporary stores |
| Secrets | Store API keys and credentials in Secrets Manager or controlled AWS mechanisms | Hardcoding secrets in Lambda environment variables, prompts, or container images |
| Network path | Use private connectivity patterns where required, such as VPC endpoints supported by the service architecture | Assuming public internet access is acceptable for sensitive workloads |
| Audit | Use CloudTrail and application logs for model calls, data access, and tool execution | Logging everything without considering sensitive prompt/response content |
| Data retention | Define how long prompts, responses, embeddings, and source documents are retained | Keeping raw user input indefinitely by default |
| Multi-tenant isolation | Use tenant-aware authorization, metadata filtering, separate indexes, or separate accounts where needed | Relying on the model to “not reveal” unauthorized data |
| Least data | Send only the context needed for the task | Sending full documents or unnecessary PII to the model |
Data lifecycle and privacy
Generative AI applications create and transform data in several places: source documents, chunks, embeddings, prompts, responses, logs, evaluation datasets, and feedback records.
High-yield review points:
- Embeddings are derived data. Treat them according to the sensitivity of the source content and your governance requirements.
- Logs can become a data leak. Redact or suppress sensitive prompts, retrieved context, responses, and tool outputs when needed.
- Access control must happen before generation. Do not retrieve unauthorized context and hope the model ignores it.
- Data freshness matters. Re-index or update embeddings when source documents change.
- Deletion workflows matter. If source content must be removed, consider dependent chunks, embeddings, caches, and evaluation copies.
- Training data quality controls model behavior. Duplicates, label noise, sensitive data, and unrepresentative examples can degrade results.
Evaluation and testing
A professional-level generative AI scenario usually requires both software testing and model-output evaluation.
| Evaluation target | Useful measures | What it tells you |
|---|---|---|
| Retrieval | Recall, precision, top-k relevance, metadata filter correctness | Whether the right context reaches the model |
| Groundedness | Faithfulness to retrieved context, citation accuracy | Whether the answer is supported by sources |
| Task success | Classification accuracy, extraction correctness, rubric score | Whether the model solves the business task |
| Safety | Toxicity, policy violations, sensitive data exposure | Whether outputs meet safety requirements |
| Robustness | Adversarial prompts, prompt injection tests, edge cases | Whether behavior holds under attack or unusual input |
| Latency | End-to-end response time, retrieval time, model time | Whether user experience targets are met |
| Cost | Tokens per request, model choice, retrieval cost, provisioned capacity | Whether the design is economically sustainable |
| User experience | Human ratings, thumbs up/down, escalation rate | Whether real users find the output useful |
Evaluation workflow
- Build a golden dataset of representative prompts, expected answers, unacceptable answers, and edge cases.
- Test retrieval separately from generation.
- Test prompt templates and model choices against the same dataset.
- Include safety and prompt injection cases, not just happy paths.
- Track results over time so prompt, model, data, and retrieval changes do not silently regress.
- Use human review for subjective tasks and high-impact decisions.
Candidate mistake: choosing a model based only on a demo response. For exam scenarios, prefer repeatable evaluation with representative data, measurable criteria, and deployment controls.
Deployment and LLMOps review
| Concern | Good practice |
|---|---|
| Environment separation | Use development, test, staging, and production environments |
| Infrastructure | Define resources with infrastructure as code such as AWS CDK or CloudFormation |
| Prompt management | Version prompts, templates, examples, and output schemas |
| Model management | Record model ID, configuration, parameters, and deployment mode |
| Retrieval management | Version chunking strategy, embedding model, index schema, and metadata filters |
| Guardrail management | Version guardrail policies and test them before production release |
| Release strategy | Use canary, blue/green, or staged rollout when behavior risk is material |
| Rollback | Keep known-good prompt, model, retrieval, and guardrail configurations |
| Observability | Track quality, safety, latency, errors, throttling, and cost |
| Incident response | Preserve audit trails, disable risky tools, and fall back to safe responses |
Performance and cost controls
| Lever | Effect | Review note |
|---|---|---|
| Model choice | Larger or more capable models usually cost more and may add latency | Match model capability to task complexity |
| Prompt length | Longer prompts consume more input tokens | Remove redundant instructions and irrelevant context |
| Retrieved context size | More chunks increase token cost and may reduce answer focus | Tune top-k, chunk size, reranking, and filters |
| Output length | Long responses increase output tokens | Set concise output requirements where appropriate |
| Caching | Reduces repeated computation | Be careful with user-specific or permission-specific responses |
| Streaming | Improves perceived latency | Does not necessarily reduce total cost |
| Asynchronous processing | Improves reliability for long work | Good for document processing and batch generation |
| Provisioned capacity | Useful for predictable high-throughput or low-latency workloads | Do not choose it automatically for sporadic usage |
| Serverless design | Scales with demand and reduces idle cost | Watch timeouts, cold starts, and service quotas |
| Batch jobs | Efficient for offline workloads | Not suitable for interactive user response needs |
Common scenario patterns
Enterprise document assistant
Best-fit pattern:
- S3 or enterprise repository as source.
- Ingestion and chunking pipeline.
- Embeddings and vector index.
- Metadata filters for tenant, department, classification, and document permissions.
- RAG prompt that instructs the model to answer only from retrieved context.
- Citations returned to the user.
- Guardrails and sensitive data controls.
- Evaluation for retrieval relevance, groundedness, and hallucination.
Common wrong answer: fine-tune a model on all company documents so it “knows” internal policies. RAG is usually better for current, permissioned, source-backed knowledge.
Customer support assistant
Best-fit pattern:
- RAG over product docs, ticket history, and approved macros.
- Output template for response draft, confidence, sources, and escalation reason.
- Human review for low confidence, high-value accounts, or sensitive categories.
- Guardrails for unsafe advice and tone.
- Feedback loop from agent edits into evaluation data.
Common wrong answer: fully automate all responses without escalation or monitoring.
Structured extraction from documents
Best-fit pattern:
- Preprocess documents.
- Use prompt template with required fields and output schema.
- Validate output against schema and business rules.
- Send failures to retry or human review.
- Evaluate field-level accuracy.
Common wrong answer: accept free-form generated text when downstream systems require structured data.
Natural-language operations assistant
Best-fit pattern:
- Agent or tool-calling interface.
- Read-only tools for search and diagnostics.
- Write tools separated and restricted.
- Step Functions or approval workflow for changes.
- Full audit logging.
Common wrong answer: allow the model to execute broad infrastructure changes directly.
Exam wording cues and likely decisions
| If the scenario emphasizes | Think first |
|---|---|
| “Private documents,” “latest policies,” “citations,” or “source-backed answers” | RAG with secure retrieval and metadata filtering |
| “Consistent output format” or “machine-readable result” | Prompt template plus schema validation |
| “Changing business knowledge” | Update external knowledge base, not fine-tune for facts |
| “Need to reduce hallucinations” | Grounding, citations, refusal rules, evaluation, and guardrails |
| “User should not see unauthorized documents” | Authorization before retrieval plus tenant/document filters |
| “Model must call APIs” | Agent/tool use with scoped permissions and input validation |
| “Long-running multi-step workflow” | Step Functions, queues, events, and durable orchestration |
| “High-risk or irreversible action” | Human approval and deterministic controls |
| “Need lower latency for chat” | Streaming, smaller model if adequate, optimized retrieval, caching |
| “Need lower cost” | Reduce tokens, right-size model, cache, tune retrieval, batch where possible |
| “Need repeatable deployments” | Infrastructure as code, prompt/model versioning, staged releases |
| “Need to detect regressions” | Golden datasets, automated evaluation, monitoring, rollback |
Common candidate mistakes
- Choosing model customization before considering prompt engineering or RAG.
- Ignoring IAM and data authorization in RAG designs.
- Treating guardrails as a complete security boundary instead of one layer.
- Forgetting that embeddings, logs, cached responses, and evaluation records may contain sensitive information.
- Selecting an agent for a deterministic workflow that should be implemented with Step Functions.
- Failing to validate tool inputs and outputs.
- Monitoring only errors and latency, not answer quality, grounding, safety, and cost.
- Sending too much context to the model and increasing hallucination risk.
- Not accounting for freshness when documents change.
- Using one evaluation prompt instead of a representative test set.
- Overlooking throttling, quotas, retries, dead-letter queues, and backpressure in production designs.
- Assuming a human-like response means the answer is correct.
Quick final checklist
Before practice questions, make sure you can quickly answer:
- When should you use RAG instead of fine-tuning?
- How do chunk size, overlap, metadata, and top-k affect retrieval quality?
- How do you prevent one tenant from retrieving another tenant’s data?
- What controls reduce prompt injection risk?
- When is an agent appropriate, and when is Step Functions better?
- What should be logged, and what should be redacted?
- How do you evaluate groundedness, retrieval quality, and safety?
- Which cost levers reduce token usage?
- How do you deploy prompt and model changes safely?
- How do guardrails, IAM, KMS, network controls, and audit logs work together?
- How would you design rollback for a bad prompt or retrieval change?
- What metrics prove the application is improving rather than just running?
Practice connection
Use this Quick Review as a checklist, then move into AIP-C01 topic drills and original practice questions. For each missed question, identify the decision point: RAG vs fine-tuning, managed vs custom, agent vs workflow, prompt vs guardrail, synchronous vs asynchronous, or quality vs cost. Detailed explanations are most useful when you map them back to these architecture choices and then retest with a focused question bank.
Continue in IT Mastery
Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official AWS questions, copied live-exam content, or exam dumps.