AIP-C01 — AWS Certified Generative AI Developer – Professional Quick Review

High-yield Quick Review for AWS Certified Generative AI Developer – Professional (AIP-C01): Bedrock, RAG, agents, security, evaluation, deployment, and cost.

Quick Review purpose

This Quick Review is for candidates preparing for the AWS Certified Generative AI Developer – Professional (AIP-C01) exam from AWS. Use it to refresh high-yield concepts before moving into IT Mastery practice, topic drills, mock exams, original practice questions, and detailed explanations.

Focus less on memorizing service names in isolation and more on choosing the best architecture for a scenario: secure generative AI application design, retrieval-augmented generation, agentic workflows, prompt engineering, evaluation, observability, cost control, and responsible AI controls.

High-yield AWS generative AI architecture map

RequirementAWS services or patterns to recognizeFast decision ruleCommon trap
Use a foundation model without managing infrastructureAmazon BedrockPrefer managed model access when the scenario emphasizes speed, low operations, or multiple model choicesChoosing custom training when prompt engineering or RAG would solve the problem
Build an API around model inferenceAmazon API Gateway, AWS Lambda, Amazon ECS, Amazon EKS, AWS Step FunctionsUse serverless for variable event-driven workloads; use containers for custom runtime or long-running workloadsPutting all orchestration inside one Lambda when retries, branching, or state are needed
Stream responses to usersBedrock streaming invocation, application streaming over WebSocket or HTTP streaming patternsUse streaming to improve perceived latency for chat and long responsesConfusing streaming with lower total model processing cost
Ground answers in private documentsRAG with Amazon Bedrock Knowledge Bases, Amazon S3, vector stores such as Amazon OpenSearch Serverless or database vector featuresUse RAG for frequently changing or source-backed knowledgeFine-tuning a model to memorize changing business documents
Execute business actions from natural languageAmazon Bedrock Agents, Lambda action groups, Step Functions, API integrationsUse agents when the model must choose tools or plan steps; use Step Functions for deterministic business workflowsGiving an agent broad permissions or unvalidated tool inputs
Customize model behaviorPrompt templates, RAG, Bedrock model customization, Amazon SageMakerStart with prompts and retrieval; customize only when behavior or domain adaptation requires itTreating fine-tuning as a substitute for clean data, retrieval, or guardrails
Protect sensitive dataIAM, AWS KMS, Secrets Manager, VPC endpoints, CloudTrail, S3 controls, data redactionApply least privilege, encryption, network controls, audit logging, and data minimizationLogging prompts/responses containing PII without redaction or retention controls
Monitor production behaviorAmazon CloudWatch, AWS CloudTrail, AWS X-Ray where applicable, application metrics, evaluation datasetsMonitor latency, errors, token usage, grounding, safety, and user feedbackMonitoring only infrastructure metrics and ignoring response quality
Deploy repeatablyAWS CDK, AWS CloudFormation, CodePipeline, CodeBuild, staged environmentsVersion prompts, model choices, retrieval config, and guardrails like application artifactsChanging prompts manually in production without rollback

Core decision: prompt engineering, RAG, fine-tuning, or custom model?

Choose thisWhen the scenario saysStrengthWatch out for
Prompt engineeringNeed better formatting, tone, role instructions, examples, or task decompositionFast, cheap, reversiblePrompt-only solutions do not reliably add private or current facts
RAGNeed answers from private, current, auditable, or source-cited documentsKeeps knowledge external and updateablePoor chunking, missing metadata filters, or low retrieval quality can cause hallucinations
Fine-tuning or model customizationNeed consistent style, domain-specific patterns, classification behavior, or specialized task performanceCan improve behavior on repeated task typesNot ideal for rapidly changing facts; requires training data quality and evaluation
Continued pretraining or deeper customizationNeed broad domain language adaptation and have substantial domain corpusMay improve domain fluencyHigher cost, complexity, and governance burden
Custom model on SageMaker or containerized infrastructureNeed full control over model, runtime, dependencies, or specialized deploymentMaximum flexibilityMore operations, scaling, security, and lifecycle responsibility

Candidate mistake: assuming “more training” is always better. On AIP-C01-style scenarios, prefer the least complex option that meets the requirement: prompt template first, RAG for knowledge, fine-tuning for behavior, custom infrastructure only when managed services do not satisfy constraints.

RAG review: retrieval-augmented generation

RAG is one of the most testable generative AI architecture patterns because it combines data engineering, search quality, prompt design, security, and evaluation.

RAG pipeline checklist

  1. Ingest source data from controlled repositories such as S3 or application data stores.
  2. Normalize and clean documents: remove noise, preserve titles, sections, timestamps, and access metadata.
  3. Chunk content into retrieval-sized units.
  4. Embed chunks using an embedding model.
  5. Index vectors and metadata in a vector store.
  6. Retrieve relevant chunks for a user query.
  7. Filter by tenant, user entitlement, document type, region, sensitivity, or freshness.
  8. Construct prompt with instructions, user question, retrieved context, and output requirements.
  9. Generate answer using a foundation model.
  10. Cite sources or return supporting references when required.
  11. Evaluate retrieval quality, answer quality, safety, cost, and latency.

Chunking and retrieval decisions

Design choiceGood default thinkingIf you choose poorly
Chunk sizeLarge enough to preserve meaning, small enough for precise retrievalChunks too small lose context; chunks too large dilute relevance and increase tokens
Chunk overlapUse overlap when meaning crosses boundariesToo much overlap increases storage, retrieval duplication, and cost
MetadataStore source, owner, timestamp, tenant, permissions, document type, and business attributesYou cannot enforce access control or filter results effectively
Embedding modelMatch the embedding model to language, domain, and retrieval quality needsChanging embedding models may require re-indexing
Hybrid retrievalCombine keyword and vector search when exact terms and semantic similarity both matterPure semantic search may miss exact identifiers, codes, or product names
RerankingUse when top-k retrieval quality is weak or many similar chunks competeExtra latency and cost may not be justified for simple retrieval
CitationsInclude source IDs and snippets when auditability mattersAnswers may be useful but not trusted or verifiable

RAG traps

  • Security leak trap: retrieving documents before checking user authorization.
  • Freshness trap: using stale embeddings after source documents change.
  • Context stuffing trap: sending too many retrieved chunks, increasing cost and confusing the model.
  • Evaluation trap: checking only final answer quality while ignoring retrieval recall and precision.
  • Tenant isolation trap: using one shared vector index without metadata filtering or separate tenant isolation controls.
  • Hallucination trap: telling the model to “answer confidently” instead of instructing it to answer only from provided context and say when context is insufficient.

Prompt engineering review

Good prompts reduce ambiguity. Production prompts should be versioned, tested, and treated as application assets.

Prompt elementPurposeExample decision
Role or task instructionDefines what the model should do“Classify the support ticket into one of these categories” is stronger than “Analyze this”
ContextProvides facts the model may useIn RAG, distinguish retrieved context from user instructions
Output schemaMakes responses machine-parseableUse JSON-like or field-based outputs when the next system consumes the result
ConstraintsDefines boundaries“Use only the provided context” or “Do not include legal advice”
ExamplesShows desired behaviorFew-shot examples help with classification, formatting, tone, and edge cases
Refusal ruleHandles missing or unsafe input“If the context does not contain the answer, say so”
Tool instructionsDefines when and how to call toolsKeep tool inputs structured and validated

Prompt mistakes to avoid

  • Mixing untrusted user text with system instructions without separation.
  • Asking the model to reveal hidden reasoning instead of requesting concise, auditable explanations or structured intermediate outputs.
  • Depending on prompt wording alone for security-sensitive enforcement.
  • Forgetting adversarial inputs such as “ignore previous instructions.”
  • Using long prompts with repeated policies that increase cost and may reduce clarity.
  • Not regression-testing prompt changes against known examples.

Model inference and application patterns

PatternBest fitAWS-oriented review point
Synchronous inferenceShort request/response tasksKeep timeout, latency, and user experience requirements in mind
Streaming inferenceChat, drafting, long generated outputImproves perceived responsiveness; still requires error handling mid-stream
Asynchronous processingLong-running jobs, batch generation, document processingUse queues, Step Functions, events, and durable state
Batch inferenceLarge offline workloadsOptimize for throughput and cost rather than interactive latency
Human-in-the-loop reviewHigh-risk, regulated, customer-facing, or low-confidence outputsRoute uncertain or sensitive outputs for review instead of full automation
CachingRepeated prompts, static context, deterministic queriesCache carefully when responses depend on user permissions or freshness

Agents and tool use

Agents are useful when a model must reason about which tool to call, gather information, or perform a sequence of actions. They are not a replacement for deterministic workflow design.

Scenario cuePreferWhy
“The application must decide which backend API to call based on user intent”Bedrock Agents or tool-calling patternThe model can map natural language intent to tool selection
“The workflow has fixed approval, retry, wait, and branching steps”Step FunctionsDeterministic orchestration is easier to audit and operate
“The model must update a customer record”Agent/tool with Lambda plus strict validation and IAMKeep business action code outside the prompt and enforce permissions
“The task is safety-critical or financially impactful”Human approval plus deterministic controlsDo not let a model independently perform high-risk irreversible actions
“The tool accepts free-form user input”Input schema validation and sanitizationPrevent prompt injection and malformed API calls

Agent security checklist

  • Limit each tool to the minimum action needed.
  • Validate tool inputs before execution.
  • Use IAM roles with least privilege.
  • Separate read-only tools from write or destructive tools.
  • Require confirmation or human approval for irreversible actions.
  • Log tool calls, inputs, outputs, and request IDs with sensitive data controls.
  • Design idempotent actions where retries are possible.
  • Do not let retrieved text redefine tool permissions.

Guardrails, safety, and responsible AI

Responsible AI questions often test layered controls. A single prompt instruction is not enough.

RiskPractical control
Toxic, hateful, or unsafe outputBedrock Guardrails, content filters, output validation, policy prompts
Prompt injectionInstruction hierarchy, input sanitization, context separation, tool allowlists, retrieval filtering
Sensitive data exposurePII detection/redaction, data minimization, encryption, restricted logging
HallucinationRAG grounding, citations, refusal rules, confidence thresholds, evaluation datasets
Biased or unfair outputsRepresentative test sets, human review, model evaluation, policy constraints
Unauthorized accessIAM, tenant-aware metadata filters, application authorization before retrieval
Inappropriate business actionTool validation, approval workflows, scoped IAM, audit trails
Low-quality answersGolden datasets, human evaluation, automated scoring, feedback loops

Defense-in-depth pattern

For sensitive generative AI applications, think in layers:

  1. Before the model: authenticate user, authorize data access, sanitize input, classify sensitivity.
  2. During retrieval: filter by permissions, freshness, tenant, and source.
  3. During generation: use clear system instructions, guardrails, constrained context, and structured output.
  4. After generation: validate output, redact sensitive data, check policy, route to human review if needed.
  5. In operations: log safely, monitor drift, evaluate regularly, and maintain rollback options.

Security review for AIP-C01 candidates

AreaWhat to rememberCandidate trap
IAMGrant only required actions such as model invocation, S3 reads, vector store access, and Lambda executionGiving application roles broad administrator permissions
EncryptionUse KMS-managed encryption for stored data where appropriateEncrypting S3 but forgetting vector indexes, logs, or temporary stores
SecretsStore API keys and credentials in Secrets Manager or controlled AWS mechanismsHardcoding secrets in Lambda environment variables, prompts, or container images
Network pathUse private connectivity patterns where required, such as VPC endpoints supported by the service architectureAssuming public internet access is acceptable for sensitive workloads
AuditUse CloudTrail and application logs for model calls, data access, and tool executionLogging everything without considering sensitive prompt/response content
Data retentionDefine how long prompts, responses, embeddings, and source documents are retainedKeeping raw user input indefinitely by default
Multi-tenant isolationUse tenant-aware authorization, metadata filtering, separate indexes, or separate accounts where neededRelying on the model to “not reveal” unauthorized data
Least dataSend only the context needed for the taskSending full documents or unnecessary PII to the model

Data lifecycle and privacy

Generative AI applications create and transform data in several places: source documents, chunks, embeddings, prompts, responses, logs, evaluation datasets, and feedback records.

High-yield review points:

  • Embeddings are derived data. Treat them according to the sensitivity of the source content and your governance requirements.
  • Logs can become a data leak. Redact or suppress sensitive prompts, retrieved context, responses, and tool outputs when needed.
  • Access control must happen before generation. Do not retrieve unauthorized context and hope the model ignores it.
  • Data freshness matters. Re-index or update embeddings when source documents change.
  • Deletion workflows matter. If source content must be removed, consider dependent chunks, embeddings, caches, and evaluation copies.
  • Training data quality controls model behavior. Duplicates, label noise, sensitive data, and unrepresentative examples can degrade results.

Evaluation and testing

A professional-level generative AI scenario usually requires both software testing and model-output evaluation.

Evaluation targetUseful measuresWhat it tells you
RetrievalRecall, precision, top-k relevance, metadata filter correctnessWhether the right context reaches the model
GroundednessFaithfulness to retrieved context, citation accuracyWhether the answer is supported by sources
Task successClassification accuracy, extraction correctness, rubric scoreWhether the model solves the business task
SafetyToxicity, policy violations, sensitive data exposureWhether outputs meet safety requirements
RobustnessAdversarial prompts, prompt injection tests, edge casesWhether behavior holds under attack or unusual input
LatencyEnd-to-end response time, retrieval time, model timeWhether user experience targets are met
CostTokens per request, model choice, retrieval cost, provisioned capacityWhether the design is economically sustainable
User experienceHuman ratings, thumbs up/down, escalation rateWhether real users find the output useful

Evaluation workflow

  1. Build a golden dataset of representative prompts, expected answers, unacceptable answers, and edge cases.
  2. Test retrieval separately from generation.
  3. Test prompt templates and model choices against the same dataset.
  4. Include safety and prompt injection cases, not just happy paths.
  5. Track results over time so prompt, model, data, and retrieval changes do not silently regress.
  6. Use human review for subjective tasks and high-impact decisions.

Candidate mistake: choosing a model based only on a demo response. For exam scenarios, prefer repeatable evaluation with representative data, measurable criteria, and deployment controls.

Deployment and LLMOps review

ConcernGood practice
Environment separationUse development, test, staging, and production environments
InfrastructureDefine resources with infrastructure as code such as AWS CDK or CloudFormation
Prompt managementVersion prompts, templates, examples, and output schemas
Model managementRecord model ID, configuration, parameters, and deployment mode
Retrieval managementVersion chunking strategy, embedding model, index schema, and metadata filters
Guardrail managementVersion guardrail policies and test them before production release
Release strategyUse canary, blue/green, or staged rollout when behavior risk is material
RollbackKeep known-good prompt, model, retrieval, and guardrail configurations
ObservabilityTrack quality, safety, latency, errors, throttling, and cost
Incident responsePreserve audit trails, disable risky tools, and fall back to safe responses

Performance and cost controls

LeverEffectReview note
Model choiceLarger or more capable models usually cost more and may add latencyMatch model capability to task complexity
Prompt lengthLonger prompts consume more input tokensRemove redundant instructions and irrelevant context
Retrieved context sizeMore chunks increase token cost and may reduce answer focusTune top-k, chunk size, reranking, and filters
Output lengthLong responses increase output tokensSet concise output requirements where appropriate
CachingReduces repeated computationBe careful with user-specific or permission-specific responses
StreamingImproves perceived latencyDoes not necessarily reduce total cost
Asynchronous processingImproves reliability for long workGood for document processing and batch generation
Provisioned capacityUseful for predictable high-throughput or low-latency workloadsDo not choose it automatically for sporadic usage
Serverless designScales with demand and reduces idle costWatch timeouts, cold starts, and service quotas
Batch jobsEfficient for offline workloadsNot suitable for interactive user response needs

Common scenario patterns

Enterprise document assistant

Best-fit pattern:

  • S3 or enterprise repository as source.
  • Ingestion and chunking pipeline.
  • Embeddings and vector index.
  • Metadata filters for tenant, department, classification, and document permissions.
  • RAG prompt that instructs the model to answer only from retrieved context.
  • Citations returned to the user.
  • Guardrails and sensitive data controls.
  • Evaluation for retrieval relevance, groundedness, and hallucination.

Common wrong answer: fine-tune a model on all company documents so it “knows” internal policies. RAG is usually better for current, permissioned, source-backed knowledge.

Customer support assistant

Best-fit pattern:

  • RAG over product docs, ticket history, and approved macros.
  • Output template for response draft, confidence, sources, and escalation reason.
  • Human review for low confidence, high-value accounts, or sensitive categories.
  • Guardrails for unsafe advice and tone.
  • Feedback loop from agent edits into evaluation data.

Common wrong answer: fully automate all responses without escalation or monitoring.

Structured extraction from documents

Best-fit pattern:

  • Preprocess documents.
  • Use prompt template with required fields and output schema.
  • Validate output against schema and business rules.
  • Send failures to retry or human review.
  • Evaluate field-level accuracy.

Common wrong answer: accept free-form generated text when downstream systems require structured data.

Natural-language operations assistant

Best-fit pattern:

  • Agent or tool-calling interface.
  • Read-only tools for search and diagnostics.
  • Write tools separated and restricted.
  • Step Functions or approval workflow for changes.
  • Full audit logging.

Common wrong answer: allow the model to execute broad infrastructure changes directly.

Exam wording cues and likely decisions

If the scenario emphasizesThink first
“Private documents,” “latest policies,” “citations,” or “source-backed answers”RAG with secure retrieval and metadata filtering
“Consistent output format” or “machine-readable result”Prompt template plus schema validation
“Changing business knowledge”Update external knowledge base, not fine-tune for facts
“Need to reduce hallucinations”Grounding, citations, refusal rules, evaluation, and guardrails
“User should not see unauthorized documents”Authorization before retrieval plus tenant/document filters
“Model must call APIs”Agent/tool use with scoped permissions and input validation
“Long-running multi-step workflow”Step Functions, queues, events, and durable orchestration
“High-risk or irreversible action”Human approval and deterministic controls
“Need lower latency for chat”Streaming, smaller model if adequate, optimized retrieval, caching
“Need lower cost”Reduce tokens, right-size model, cache, tune retrieval, batch where possible
“Need repeatable deployments”Infrastructure as code, prompt/model versioning, staged releases
“Need to detect regressions”Golden datasets, automated evaluation, monitoring, rollback

Common candidate mistakes

  • Choosing model customization before considering prompt engineering or RAG.
  • Ignoring IAM and data authorization in RAG designs.
  • Treating guardrails as a complete security boundary instead of one layer.
  • Forgetting that embeddings, logs, cached responses, and evaluation records may contain sensitive information.
  • Selecting an agent for a deterministic workflow that should be implemented with Step Functions.
  • Failing to validate tool inputs and outputs.
  • Monitoring only errors and latency, not answer quality, grounding, safety, and cost.
  • Sending too much context to the model and increasing hallucination risk.
  • Not accounting for freshness when documents change.
  • Using one evaluation prompt instead of a representative test set.
  • Overlooking throttling, quotas, retries, dead-letter queues, and backpressure in production designs.
  • Assuming a human-like response means the answer is correct.

Quick final checklist

Before practice questions, make sure you can quickly answer:

  • When should you use RAG instead of fine-tuning?
  • How do chunk size, overlap, metadata, and top-k affect retrieval quality?
  • How do you prevent one tenant from retrieving another tenant’s data?
  • What controls reduce prompt injection risk?
  • When is an agent appropriate, and when is Step Functions better?
  • What should be logged, and what should be redacted?
  • How do you evaluate groundedness, retrieval quality, and safety?
  • Which cost levers reduce token usage?
  • How do you deploy prompt and model changes safely?
  • How do guardrails, IAM, KMS, network controls, and audit logs work together?
  • How would you design rollback for a bad prompt or retrieval change?
  • What metrics prove the application is improving rather than just running?

Practice connection

Use this Quick Review as a checklist, then move into AIP-C01 topic drills and original practice questions. For each missed question, identify the decision point: RAG vs fine-tuning, managed vs custom, agent vs workflow, prompt vs guardrail, synchronous vs asynchronous, or quality vs cost. Detailed explanations are most useful when you map them back to these architecture choices and then retest with a focused question bank.

Continue in IT Mastery

Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official AWS questions, copied live-exam content, or exam dumps.

Browse Certification Practice Tests by Exam Family