Try 75 free AWS AIP-C01 questions across the exam domains, with explanations, then continue with full IT Mastery practice.
This free full-length AWS AIP-C01 practice exam includes 75 original IT Mastery questions across the exam domains.
These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.
Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.
Need concept review first? Read the AWS AIP-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
| Domain | Weight |
|---|---|
| Foundation Model Integration, Data Management, and Compliance | 31% |
| Implementation and Integration | 26% |
| AI Safety, Security, and Governance | 20% |
| Operational Efficiency and Optimization for Genai Applications | 12% |
| Testing, Validation, and Troubleshooting | 11% |
Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.
Topic: Testing, Validation, and Troubleshooting
A company runs a customer-support RAG API on AWS Lambda that calls Amazon Bedrock Converse. After a release, long customer conversations intermittently fail before streaming begins.
ValidationException: input is too long for the requested model
retrievedChunks: 12
chatHistory: full conversation transcript
systemPrompt: 9 KB policy text
The approved model and source attribution requirements cannot change. Which change fixes the root cause with the smallest safe impact?
Options:
A. Increase Lambda memory and timeout for the RAG function.
B. Retry the Bedrock request with exponential backoff and jitter.
C. Switch to a larger-context model and skip prompt changes.
D. Add token-aware prompt assembly with history summaries and citation-preserving truncation.
Best answer: D
Explanation: The symptom is an Amazon Bedrock validation error before generation starts. The root cause is that the assembled prompt exceeds the model context window, so the smallest safe fix is token-aware prompt construction that preserves required citations while reducing input size.
This is a context window overflow, not a compute, timeout, or throttling issue. The prompt builder is sending the full conversation transcript, many retrieved chunks, and a large system prompt until the input is too long for the approved model. A safe fix is to set a token budget before calling Bedrock: preserve the system instructions, summarize older conversation turns, include only the most relevant retrieved content, and truncate chunk text while retaining source IDs needed for attribution. This changes prompt assembly without changing the model or weakening auditability. A larger-context model might mask the issue, but it violates the stated approval constraint and does not address unbounded prompt growth.
Topic: Foundation Model Integration, Data Management, and Compliance
A financial services company is building a GenAI gateway for internal assistants. The gateway must choose different foundation models for chat, summarization, and code review by tenant and data classification; switch providers or model versions without redeploying application code; keep prompts and retrieval data in the approved AWS Region; and support governed canary rollout with rollback if latency or error metrics degrade. Which architecture best meets these requirements?
Options:
A. Separate Lambda functions with hard-coded model IDs
B. API Gateway stage variables that clients set per request
C. API Gateway with a Lambda router using AWS AppConfig
D. Custom SageMaker training jobs for each tenant model
Best answer: C
Explanation: A GenAI gateway should separate model-selection policy from application code. API Gateway and Lambda provide the runtime routing layer, while AWS AppConfig provides governed dynamic configuration, staged rollout, validation, and rollback for model/provider changes.
The best-fit pattern is a centralized GenAI routing layer: API Gateway receives the request, Lambda evaluates tenant, task, and classification metadata, and AWS AppConfig supplies the current routing rules. Those rules can map use cases to approved Bedrock model IDs, provider choices, regional endpoints, prompt versions, guardrails, retrieval settings, and fallback behavior without changing code. AppConfig deployment strategies, validators, and alarms support canary rollout and rollback when metrics degrade. The Lambda function should invoke approved regional Bedrock endpoints and retrieval resources, and log the selected config version and model for auditability. Hard-coding or client-driven model selection cannot provide the same governance or operational control.
Topic: Implementation and Integration
An enterprise uses CloudFront and WAF in front of an API Gateway GenAI gateway. The gateway uses a Cognito authorizer, a Lambda function, and Amazon Bedrock. IAM allows only the Lambda execution role to call bedrock:InvokeModel. During an audit, CloudTrail shows every model call as the same Lambda role, and API Gateway logs do not include the user, tenant, model, or guardrail decision. The company must not log prompt or response text. Which change fixes the root cause with the smallest safe change?
Options:
A. Enable full prompt and response model invocation logging
B. Grant employees IAM permission to invoke Bedrock directly
C. Emit gateway audit records with user, tenant, model, guardrail, and request IDs
D. Enable AWS X-Ray tracing for the Lambda function
Best answer: C
Explanation: The symptom is missing audit evidence for who used the model through the gateway. CloudTrail records the AWS role that invoked Bedrock, but it does not know the authenticated application user behind the shared Lambda role. The smallest safe fix is structured audit logging at the GenAI gateway.
Symptom: Security can see Bedrock invocations, but cannot prove which employee, tenant, model, or guardrail decision was involved. Root cause: the gateway uses a shared Lambda execution role, and the gateway is not emitting application-level audit records. Fix: write a minimal, KMS-protected, retention-controlled audit event from the gateway that maps the authenticated Cognito user and tenant to the Bedrock request ID, model, and guardrail outcome. This preserves the gateway as the controlled access path and avoids storing sensitive prompt or response text.
Topic: Operational Efficiency and Optimization for Genai Applications
In an Amazon Bedrock RAG application, which optimization technique reduces latency for repeated or paraphrased user requests while preserving retrieval quality and safety by serving a stored response only when semantic intent, tenant permissions, source freshness, and guardrail policy still match?
Options:
A. Agent memory
B. Prompt management
C. Reranking
D. Semantic caching
Best answer: D
Explanation: Semantic caching is the optimization described in the stem. It can reduce repeated retrieval and generation work, but it must be scoped by authorization, data freshness, and safety policy so speed does not override correctness or governance.
Semantic caching stores prior responses or intermediate results and matches future requests by meaning rather than exact text. In a production RAG system, this must not be a simple nearest-neighbor shortcut. The cache key or validation logic should account for tenant, user permissions, source version or TTL, prompt or guardrail version, and confidence thresholds before returning a cached answer. This preserves retrieval quality and safety while reducing latency for semantically equivalent requests. Reranking improves result relevance, prompt management versions prompts, and agent memory stores conversational or user context; none primarily define safe reuse of prior RAG responses.
Topic: Foundation Model Integration, Data Management, and Compliance
A financial services team uses Amazon Bedrock Knowledge Bases from a Lambda API to answer policy questions. After a prompt update in Amazon Bedrock Prompt Management, CloudWatch Logs show JSONDecodeError in 18% of calls. Retrieval traces show relevant chunks, and Amazon Bedrock Guardrails did not intervene. The downstream parser expects only answer and citations fields.
Exhibit: Current prompt excerpt
Use only retrieved context.
Return JSON with answer and citations.
Begin with a friendly summary for executives.
If the context is incomplete, infer the most likely answer.
Which change fixes the root cause with the smallest safe change?
Options:
A. Publish a regression-tested prompt version with JSON-only schema and no summary/inference instructions.
B. Increase maxTokens and lower temperature for the same prompt.
C. Rebuild the Knowledge Base embeddings and resync the S3 documents.
D. Add a Lambda regex extractor for the JSON substring.
Best answer: A
Explanation: The symptom is parser failure despite good retrieval and no guardrail block. The root cause is a prompt contract problem: it asks for JSON while also asking for prose and permitting unsupported inference. A corrected, regression-tested prompt version is the smallest safe fix.
Symptom: The API fails with JSONDecodeError even though retrieval is relevant and guardrails are not blocking. Root cause: The production prompt has conflicting and weak output instructions: it asks for JSON, asks for a prose summary before JSON, and permits inference when context is incomplete. Fix: Publish a new managed prompt version that requires only the exact JSON object and instructs the model to stay within retrieved context. Validate it with prompt regression tests before promotion. This targets the prompt contract instead of changing retrieval, model capacity, or parser tolerance.
Topic: Foundation Model Integration, Data Management, and Compliance
A healthcare device company runs a RAG assistant on Amazon Bedrock with an Amazon OpenSearch Serverless vector index. Users ask about exact model numbers and firmware KB IDs. The FM often cites a semantically related but wrong device. Retrieval logs show vector topK=8 returns older-device chunks; separate BM25 keyword tests return the exact device chunks, but passing both result sets directly causes context-window overflow. The team cannot reingest documents this week. Which change fixes the root cause with the smallest safe change?
Options:
A. Route exact-identifier queries to BM25-only retrieval.
B. Increase vector topK and send all chunks to the FM.
C. Lower the FM temperature and add stricter guardrails.
D. Use keyword-plus-vector hybrid retrieval with Bedrock reranking.
Best answer: D
Explanation: The symptom is poor context relevance for exact identifiers, not a generation-quality problem. The root cause is vector-only ranking underweighting model numbers and KB IDs, while a naive union overflows the prompt. Hybrid retrieval plus Bedrock reranking fixes the ranking problem with minimal change.
Symptom: exact device identifiers retrieve semantically similar but wrong chunks, and simply adding keyword results exceeds the context window. Root cause: vector search is good for semantic similarity but can miss or under-rank lexical identifiers; unranked result merging does not decide which chunks deserve prompt space. Fix: query both BM25 keyword and vector search over the existing index, deduplicate and preserve filters, then use an Amazon Bedrock reranker model to score candidates against the user query and pass only the top passages to the FM. Changing generation settings or sending more chunks does not repair retrieval relevance.
Topic: Foundation Model Integration, Data Management, and Compliance
A GenAI application starts returning malformed JSON after a release. Investigation shows several Lambda functions embed slightly different prompt strings, and the team cannot tell which instruction set was tested. Which AWS capability is specifically intended to address this prompt operations problem by managing reusable prompt templates and versions for controlled rollout?
Options:
A. AWS X-Ray tracing
B. Amazon Bedrock Guardrails
C. Amazon Bedrock Prompt Management
D. Amazon Bedrock Knowledge Bases
Best answer: C
Explanation: Amazon Bedrock Prompt Management is the prompt operations capability for storing and versioning prompts as managed assets. It helps prevent template drift when multiple application components need to use the same tested prompt instructions.
Prompt failures after a release often come from uncontrolled prompt copies, missing instructions, or conflicting template changes. Amazon Bedrock Prompt Management lets teams manage prompt templates, variables, model configuration, and versions so applications can reference a known prompt revision. This supports governance and regression workflows because teams can compare candidate prompts against a stable baseline before rollout. Retrieval, safety filtering, and tracing services are useful in GenAI architectures, but they do not solve unmanaged prompt template versioning.
Topic: Foundation Model Integration, Data Management, and Compliance
A company is moving several Amazon Bedrock applications to a shared retrieval layer in front of an Amazon OpenSearch Service vector index. The current API passes caller-supplied filter JSON directly to the index. Review the log excerpt.
Exhibit: Retrieval API log
principal.app = customer-support-bot
principal.tenant = tenant-42
POST /retrieve
body.filter = {"tenant_id":"tenant-99","doc_class":"pricing"}
body.k = 20
result[0].metadata = {"tenant_id":"tenant-99","acl":"sales"}
result[0].score = 0.86
Which next step best addresses the issue while standardizing retrieval access for FM applications?
Options:
A. Add a generation guardrail to redact tenant identifiers
B. Expose direct OpenSearch vector access to each FM application
C. Lower k and require a higher similarity score
D. Enforce server-derived metadata filters in a typed retrieval API
Best answer: D
Explanation: The decisive detail is the mismatch between principal.tenant = tenant-42 and the caller-supplied filter and result for tenant-99. A standardized retrieval API should hide raw vector search details and enforce authorization filters server-side from trusted identity context.
For a safe shared retrieval layer, FM applications should call a narrow, typed retrieval API that accepts inputs such as query text, allowed corpus, and approved retrieval parameters. The API should derive tenant and ACL constraints from IAM, Cognito, or another trusted identity source, then apply those constraints to the vector search request before returning only authorized chunks, scores, and citations. In the exhibit, the client controls body.filter and retrieves tenant-99 content despite authenticating as tenant-42, which is an authorization failure at retrieval time. The key takeaway is to centralize retrieval policy enforcement instead of exposing raw vector database filters to FM applications.
Topic: AI Safety, Security, and Governance
A healthcare company is deploying a GenAI application that uses a domain-adapted foundation model hosted on SageMaker real-time endpoints to draft appeal letters from S3 documents. Auditors require evidence for each approved model version, lineage from source documents through AWS Glue ETL into prompt and retrieval datasets, metadata tags for PHI classification and retention, and per-request decision logs that exclude raw PHI. Which architecture best meets these requirements?
Options:
A. Use CloudTrail audit events and SageMaker endpoint metrics as the compliance record.
B. Create a custom compliance database and retrain the model for each audit cycle.
C. Use SageMaker Model Cards, AWS Glue lineage, required tags, and structured CloudWatch decision logs.
D. Store all prompts, retrieved chunks, and responses in S3 for Athena-based audit queries.
Best answer: C
Explanation: The best design creates audit evidence at model, data, resource, and decision layers. SageMaker Model Cards document approved model versions, AWS Glue lineage connects governed data pipelines, tags make compliance metadata queryable, and CloudWatch decision logs provide per-inference evidence without exposing raw PHI.
The core concept is layered AI governance evidence. In regulated GenAI, model approval documentation alone is not enough; auditors need to trace from the model version and its acceptable use, through the data pipeline and retrieval assets, to the runtime decision. SageMaker Model Cards tied to approved model versions capture intended use, evaluation, risk, and ownership. AWS Glue lineage records connect S3 sources and ETL jobs to curated prompt and retrieval datasets, while required AWS tags make ownership, classification, and retention visible across resources. Structured CloudWatch Logs with correlation IDs, redacted fields, prompt version, retrieved document IDs, guardrail results, and final action create operational decision evidence. CloudTrail is still useful for API audit, but it does not replace per-request application decision logs.
Topic: Foundation Model Integration, Data Management, and Compliance
A team is comparing FMs available through Amazon Bedrock for an internal policy assistant. The application will use RAG to send extracted text passages and recent chat history; it does not need image, audio, or code generation. Public benchmarks favor a larger model, but an internal evaluation set shows a smaller text model meets the grounded-answer rubric with the same retrieved context, has lower p95 latency, and costs less per request. Which model-selection principle best fits this requirement?
Options:
A. Use general benchmark leaderboard selection
B. Use multimodal capability selection
C. Use task-specific cost-performance evaluation
D. Use maximum context window selection
Best answer: C
Explanation: Model selection should be driven by the target workload, not only by broad public benchmarks. In this case, the smaller text model meets the application’s grounded-answer quality bar while improving latency and cost, so it is the better fit.
The core concept is fit-for-purpose FM assessment. Developers should compare candidate models using representative prompts, retrieval context, output rubrics, latency targets, modality needs, and cost constraints. Public benchmarks can help shortlist models, but they do not replace workload-specific evidence. Because the app is text-only and RAG supplies the needed context, selecting a larger or more capable model adds cost and latency without a demonstrated benefit. The key takeaway is to choose the least complex model that meets measured quality and operational requirements.
Topic: Operational Efficiency and Optimization for Genai Applications
A financial services company is building an internal GenAI assistant on AWS. Most requests are routine policy FAQs or meeting summaries, but about 15% require reasoning over regulated documents with citations. The solution must reduce inference spend, keep routine responses under 2 seconds when possible, use private network paths in one AWS Region, and record the selected model and prompt version for audit. Which architecture best meets these requirements?
Options:
A. Send every request to a lower-cost Bedrock model and ask users to verify citations manually.
B. Send every request to the strongest Bedrock model with Knowledge Bases retrieval enabled.
C. Implement a Bedrock model router with complexity classification, selective Knowledge Bases retrieval, private endpoints, and audit metrics.
D. Train a custom SageMaker model to replace Bedrock and embed all routing rules in client applications.
Best answer: C
Explanation: The best fit is an AWS-native model routing pattern. A lightweight classifier or rules engine can route routine requests to lower-cost models, invoke retrieval only when needed, and escalate complex document-grounded prompts to a stronger model while logging the routing decision and prompt version.
Cost-conscious GenAI routing uses a gateway or orchestration layer, such as API Gateway with Lambda or containerized services, to classify request complexity before invoking Amazon Bedrock. Routine summaries and FAQs can use a lower-cost, lower-latency model. Requests that need multi-document reasoning and citations can invoke Bedrock Knowledge Bases and a stronger model. The router should keep configuration in a controlled service such as AppConfig or IaC, use AWS PrivateLink or VPC endpoints where supported, and publish model choice, prompt version, latency, and token metrics to CloudWatch for audit and optimization. This design reduces unnecessary retrieval and high-capability model calls without sacrificing grounded answers for complex cases.
Topic: Operational Efficiency and Optimization for Genai Applications
A SaaS company uses Amazon Bedrock to generate dashboard summaries. The same tenant-date summary is valid for all users until the nightly data refresh. Leadership wants p95 latency under 1 second and lower FM spend.
Exhibit: CloudWatch metric summary
Requests: 18,400 per day
Unique tenant-date inputs: 230 per day
Bedrock invocations: 18,400 per day
p95 API latency: 6.8 s
p95 Bedrock generation: 6.1 s
Average tokens: 3,200 input, 650 output
CacheHitCount: 0
Data refresh cadence: nightly
Which next step best addresses the latency-cost tradeoff?
Options:
A. Send duplicate parallel invocations and keep the fastest
B. Switch every request to a latency-optimized model
C. Enable streaming responses from Bedrock
D. Precompute and cache summaries after each nightly refresh
Best answer: D
Explanation: Precomputation is the best fit because the output is deterministic for each tenant-date until the nightly refresh. The exhibit shows 18,400 daily requests but only 230 unique inputs, so caching precomputed summaries can reduce both p95 latency and FM invocation cost.
The core optimization is to avoid repeated generation when inputs and outputs are stable. After each nightly data refresh, the application can generate the 230 tenant-date summaries once, store them in DynamoDB or S3 with a refresh version or TTL, and serve dashboard requests from the cache. This directly targets the exhibit details: CacheHitCount: 0, 18,400 Bedrock invocations, and only 230 unique inputs. Streaming can improve perceived first-token latency, but it does not reduce token usage or total generation work. A latency-optimized model may reduce generation time, but it still invokes the FM for every dashboard view. Parallel duplicate invocations trade more cost for lower tail latency, which conflicts with the spend goal.
Topic: AI Safety, Security, and Governance
A financial services company runs a customer support assistant that drafts responses and can also write approved summaries to a CRM. Organizational policy requires output filtering to block personalized investment advice before any generated content reaches users or downstream systems. The application logs this result after calling the output policy filter.
Exhibit: Guardrail finding
stage=post_generation_output_filter
action=GUARDRAIL_INTERVENED
blockedPolicy=topic:PersonalizedInvestmentAdvice
configuredResponse="I can't provide personalized investment advice."
deliverRawModelOutput=false
downstreamWrite=blocked
What is the best next step?
Options:
A. Return the configured refusal and suppress downstream delivery.
B. Retry the same prompt with a lower temperature.
C. Write the raw output to the CRM for human review.
D. Publish the response because the guardrail produced safe text.
Best answer: A
Explanation: The output policy filter has already determined that the generated content violates the organization’s blocked investment-advice policy. Because the exhibit explicitly blocks raw output delivery and downstream writes, the application should enforce the configured refusal before anything reaches the user or CRM.
Output policy enforcement happens after generation but before delivery to users or integrated systems. In the exhibit, the decisive details are action=GUARDRAIL_INTERVENED, blockedPolicy=topic:PersonalizedInvestmentAdvice, deliverRawModelOutput=false, and downstreamWrite=blocked. That means the application must treat the original model output as prohibited, return only the configured safe refusal if a user-facing response is allowed, and prevent any raw or policy-violating content from being written to the CRM. Monitoring logs should retain the finding for audit and trend analysis, but logging is not a substitute for enforcement. The key takeaway is that output filters must be placed on the delivery path, not only in post-delivery monitoring.
Topic: Implementation and Integration
A company is adding an internal knowledge assistant for developers. Source documents are in SharePoint Online and Confluence, and enterprise users sign in through IAM Identity Center. Security requires each answer to use only documents that the signed-in user is already authorized to view in the source systems. The team wants minimal custom authorization code. Which implementation best meets these requirements?
Options:
A. Invoke a Bedrock model through Lambda with access rules in the prompt.
B. Fine-tune a SageMaker AI model on all internal documents.
C. Index exported files in Bedrock Knowledge Bases with department metadata filters.
D. Create an Amazon Q Business app with governed data source connectors.
Best answer: D
Explanation: Amazon Q Business is designed for enterprise knowledge assistants that respect organizational identities and source permissions. Using its data source connectors with IAM Identity Center preserves governed access without building a custom retrieval and ACL enforcement layer.
The key mechanism is Amazon Q Business with enterprise data source connectors and identity integration. For sources such as SharePoint Online and Confluence, Amazon Q Business can sync content and associated access-control information so retrieval is evaluated in the context of the signed-in user. This matches the requirement that answers be grounded only in documents the user is already allowed to see, while avoiding custom ACL filtering in Lambda, OpenSearch, or application code. A generic RAG stack can be valid for many use cases, but it requires the team to design and maintain reliable document-level authorization. For this governed internal knowledge tool, the managed Amazon Q Business data source pattern is the best fit.
Topic: Testing, Validation, and Troubleshooting
A GenAI application invokes Amazon Bedrock through a service layer that reads modelId and prompt template from runtime configuration. The team emits CloudWatch metrics for grounded-answer score, p95 latency, estimated token cost, and guardrail intervention rate. They want to expose a challenger model/prompt to 10% of internal users first and automatically stop the rollout if those metrics breach alarms, without redeploying the API. Which AWS service capability best fits this release control role?
Options:
A. AWS AppConfig feature flags with deployment strategies
B. AWS X-Ray sampling rules
C. Amazon Bedrock Model Evaluation jobs
D. AWS CloudTrail Lake event data stores
Best answer: A
Explanation: AWS AppConfig is the release-control fit for runtime model and prompt configuration. Feature flags and deployment strategies let teams canary or target challenger variants while CloudWatch alarms provide automated rollback signals.
The core concept is separating evaluation metrics from rollout control. The application can publish quality, latency, cost, and safety metrics to CloudWatch, while AWS AppConfig controls which model and prompt configuration a user receives. AppConfig feature flags support variants and controlled deployments, and deployment strategies can progressively expose a change rather than requiring a code redeploy. When configured with CloudWatch alarms, AppConfig can stop or roll back a bad configuration if the observed metrics cross thresholds. This makes it appropriate for canary releases and controlled A/B-style experiments in production. Offline model-evaluation services are useful before rollout, but they do not control production traffic exposure.
Topic: Foundation Model Integration, Data Management, and Compliance
A developer is building a support-case summarization pipeline on AWS. A Lambda function currently sends each uploaded object from S3 to a text prompt in Amazon Bedrock. The application must produce searchable summaries and preserve timestamps for call recordings.
Exhibit:
Pipeline: support-media-ingest
S3 object: calls/2025/04/case-8841.mp3
Detected type: audio/mpeg
Step: Bedrock InvokeModel
Prompt input: summarize this customer interaction
Error: ValidationException: expected text input, received audio/mpeg
Requirement: transcript with speaker timestamps
Which next step best prepares this object for FM consumption?
Options:
A. Use SageMaker Processing to batch normalize the MP3 files.
B. Invoke a Bedrock multimodal model with the MP3 directly.
C. Store the MP3 metadata in a vector index for retrieval.
D. Run Amazon Transcribe, then send transcript text to Bedrock.
Best answer: D
Explanation: The object is an audio recording, and the failure occurs because the current Bedrock invocation expects text input. Amazon Transcribe is the appropriate modality-specific preprocessing service when speech audio must become searchable, timestamped text for downstream FM consumption.
The decisive exhibit detail is Detected type: audio/mpeg combined with Requirement: transcript with speaker timestamps. For call recordings, the preprocessing step should convert speech to text before invoking a text summarization prompt in Amazon Bedrock. Amazon Transcribe provides speech-to-text output and can include time-aligned transcript data that the application can store, search, and pass to the FM as grounded text context.
A multimodal FM is useful when the target model supports the needed input modality and output requirement, but the stated requirement is a timestamped transcript. SageMaker Processing is better for custom batch preprocessing jobs, not as the simplest managed speech transcription service for this case.
Topic: Testing, Validation, and Troubleshooting
A team is comparing Amazon Bedrock model and prompt variants for a customer support assistant. A candidate must pass all gates before receiving production traffic, and the rollout must limit blast radius.
Exhibit: 24-hour shadow evaluation
Baseline: quality=0.72 p95=1.8s cost=$4.20/1k unsafe=0.3%
VariantA: quality=0.79 p95=2.4s cost=$6.80/1k unsafe=0.4%
VariantB: quality=0.83 p95=3.7s cost=$9.50/1k unsafe=0.2%
VariantC: quality=0.77 p95=1.9s cost=$4.70/1k unsafe=1.8%
Gate: quality >= baseline+0.05; p95 <=2.5s; cost <=$7.00/1k; unsafe <=0.5%
Which interpretation and next step are best?
Options:
A. Canary Variant C because it is lowest cost
B. Fully promote Variant B for all production traffic
C. Canary Variant A with rollback alarms
D. Keep the baseline and delay online testing
Best answer: C
Explanation: Variant A is the only variant that satisfies every stated gate: quality improvement, p95 latency, cost, and unsafe-output rate. Because the team must limit blast radius, the next step is a controlled canary with monitoring and rollback rather than a full promotion.
This is a gated multi-model comparison followed by a canary decision. Variant A improves quality from 0.72 to 0.79, meets the p95 latency limit at 2.4s, stays under the cost cap at USD 6.80 per 1,000 requests, and keeps unsafe outputs at 0.4%. A canary release lets the team validate real production behavior with CloudWatch, X-Ray, Bedrock invocation logs, and guardrail metrics before increasing traffic. The key is not choosing the highest-quality model alone; the selected candidate must satisfy the full quality, latency, cost, and safety tradeoff envelope.
Topic: Implementation and Integration
A customer support assistant runs on AWS Lambda and invokes Amazon Bedrock with InvokeModel. After a prompt deployment, requests to one model return HTTP 400, while the same inputs work with the previous prompt version and with another model. A CloudWatch Logs Insights query over the structured Lambda logs returns:
promptVersion modelFamily requestShape errorCode sampleError
v17 Claude 3 prompt ValidationException extraneous key [prompt]; required key [messages]
v17 Titan Text prompt - -
v16 Claude 3 messages - -
Which action fixes the root cause with the smallest safe change?
Options:
A. Increase max_tokens and reduce prompt chunk size.
B. Add exponential backoff around the Bedrock invocation.
C. Rebuild the knowledge base vector index.
D. Use the model-specific messages payload for the affected model.
Best answer: D
Explanation: The symptom is a deterministic HTTP 400 for one model after the prompt deployment. CloudWatch Logs Insights ties the failures to a ValidationException caused by the request shape, so the safe fix is to correct the model-specific payload format.
Symptom: the application fails only for prompt version v17 and the Claude 3 model family. Root cause: the Logs Insights summary shows the invocation body uses requestShape=prompt, but the model rejects that field and requires messages. This is a malformed FM invocation payload, not throttling, retrieval quality, or context overflow. Fix: update the Bedrock request mapper for the affected model family to construct the required messages payload, then validate the change while continuing to log prompt version, model ID, error code, and request ID for troubleshooting evidence.
Topic: AI Safety, Security, and Governance
A company exposes a customer-support GenAI API through API Gateway and Lambda. The Lambda function calls Amazon Bedrock with an approved prompt from Bedrock Prompt Management and a Bedrock Guardrail. Security wants near-real-time alerting and automated remediation for guardrail blocks, tenant token-use anomalies, and requests that do not use the approved prompt version. Alerts must not include raw prompts or outputs, and the API must not add more than 300 ms of remediation latency. Which implementation should the developer build?
Options:
A. Query Bedrock invocation logs in S3 hourly and email raw samples to SecOps.
B. Emit sanitized EMF metrics; use CloudWatch/EventBridge to trigger Step Functions notification and DynamoDB remediation.
C. Run a synchronous Step Functions review workflow before returning every response.
D. Use CloudTrail InvokeModel events to infer failures and attach tenant IAM deny policies.
Best answer: B
Explanation: The Lambda layer has the information needed to detect these policy events without logging content. Emitting sanitized metrics/events and using CloudWatch/EventBridge to launch Step Functions provides near-real-time alerting and automated remediation outside the request path.
Continuous policy enforcement for a GenAI API should emit security-relevant telemetry at the application boundary because Lambda has tenant, guardrail-action, token-usage, and prompt-version context. Publishing only metadata as EMF metrics or custom events lets CloudWatch alarms or EventBridge rules react quickly without exposing prompts or outputs. EventBridge can start Step Functions asynchronously to notify SecOps, write an audit record, and call a remediation Lambda to mark the tenant or session blocked in DynamoDB. This preserves low latency because the user response path only emits telemetry; remediation runs out of band. CloudTrail, S3 logs, and X-Ray are useful for auditing or observability, but they do not replace app-level policy signals for this workflow.
Topic: Implementation and Integration
An insurer exposes a claims assistant to a CRM through API Gateway and Lambda in us-east-1. After adding photo triage and policy-grounded Q&A, users see unsupported-input errors, context-window failures, and CRM JSON parsing errors. Requirements: route text questions to a low-latency text FM, route damage photos to a multimodal FM, retrieve only approved policy excerpts, keep Bedrock calls private in the same Region, and return one stable JSON schema. Which architecture best addresses these integration issues?
Options:
A. Add a Lambda GenAI gateway that uses Bedrock Converse, routes by modality, retrieves filtered knowledge, token-budgets context, invokes Bedrock through VPC endpoints, and validates CRM JSON.
B. Train a custom multimodal model in SageMaker AI and replace the retrieval workflow with model fine-tuning.
C. Send every request to the largest multimodal Bedrock model and let the CRM parse each model-specific response.
D. Use a single Bedrock Agent with a text-only FM and pass raw retrieved policy chunks to the prompt.
Best answer: A
Explanation: The failures point to an integration boundary problem, not a need for a bigger model. A GenAI gateway can route by modality, transform retrieved context to fit token limits, use private Bedrock runtime access, and enforce a stable response contract for the CRM.
The best fit is a routing and transformation layer between the CRM/API and Amazon Bedrock. The gateway should classify requests by modality, select the appropriate text or multimodal model, retrieve only approved policy excerpts, reduce or summarize context to fit the target model’s token budget, and validate the structured JSON before returning it to the CRM. Using Bedrock Converse helps standardize model request and response handling across supported models, while VPC endpoints and same-Region calls satisfy the private connectivity and data locality constraints. The key takeaway is to fix routing, payload transformation, and response normalization rather than treating all failures as model-capacity problems.
Topic: AI Safety, Security, and Governance
A company runs an Amazon Bedrock chat application for employees. Security wants to detect rapid token spikes, repeated jailbreak-style prompts, unusual response patterns, and abnormal per-user conversation activity so analysts can investigate abuse or unintended behavior. Which principle or pattern best maps to this requirement?
Options:
A. Correlated abuse and anomaly monitoring
B. Grounding evaluation for citations
C. Static routing to one model
D. Semantic caching for repeated prompts
Best answer: A
Explanation: The requirement is about continuous monitoring for abuse and abnormal behavior across token usage, responses, and user activity. Correlating these telemetry sources supports detection, investigation, and policy enforcement for production GenAI applications.
For GenAI safety operations, abuse and anomaly monitoring correlates multiple signals: token consumption, request frequency, prompt and response logs, guardrail findings, user identity, and conversation flow. In AWS, this commonly means capturing Bedrock invocation and application logs, publishing metrics to CloudWatch, retaining audit evidence, and alerting on abnormal patterns such as token spikes or repeated policy violations. This is different from evaluating answer grounding or optimizing repeat prompt cost; the goal is detecting suspicious behavior and unintended system outcomes over time.
Topic: AI Safety, Security, and Governance
An enterprise RAG application will index HR and finance documents from an S3 data lake cataloged in AWS Glue. The security team requires centrally auditable permissions so the ingestion role can access only approved datasets based on sensitivity classifications, without maintaining separate bucket policies for every dataset. Which governance pattern best fits this requirement?
Options:
A. Semantic caching for retrieval results
B. Bedrock Guardrails content filtering
C. Lake Formation tag-based access control
D. KMS encryption-only access control
Best answer: C
Explanation: The requirement is about governing which data sources the RAG ingestion role can access. Lake Formation tag-based access control maps sensitivity classifications to centrally managed, auditable permissions for Glue-cataloged data lake resources.
For RAG applications that ingest from an AWS data lake, source governance should be enforced before content enters the retrieval index. Lake Formation can manage permissions on Glue Data Catalog databases, tables, and registered S3 locations, and LF-tags allow policy rules based on classifications such as department or sensitivity. This avoids duplicating access logic across many bucket policies and provides centralized auditability for who can read which governed datasets. Runtime safety controls still matter, but they do not replace source-level data authorization.
Topic: Testing, Validation, and Troubleshooting
A team is releasing a new Amazon Bedrock RAG assistant version. Before increasing traffic from a canary to all users, they must replay representative synthetic tasks, flag unsupported claims against retrieved sources, compare outputs with a baseline for semantic drift, and automatically stop promotion if thresholds fail. Which validation pattern best matches this requirement?
Options:
A. Dynamic model routing by complexity
B. Production-only A/B evaluation
C. Semantic caching for repeated prompts
D. Automated pre-release quality gate
Best answer: D
Explanation: The requirement is to block promotion until automated tests prove the new GenAI deployment meets quality expectations. Synthetic workflow replay, hallucination checks, and semantic drift comparison are deployment validation controls used as pre-release quality gates.
An automated pre-release quality gate runs repeatable validation before broad rollout. For a GenAI application, the gate can replay curated synthetic RAG or agent workflows, check whether answers are grounded in retrieved evidence, compare responses with a known-good baseline for semantic drift, and fail the pipeline or stop canary promotion when thresholds are not met. This is different from runtime optimization patterns such as caching or routing because the primary goal is deployment safety and quality assurance before release.
Topic: Implementation and Integration
A team used a developer productivity tool to generate an update for a production customer-support summarization API that invokes Amazon Bedrock. The pull request includes this summary:
Source: Amazon Q Developer generated update
Change: Lambda calls Bedrock InvokeModel
Tests: skipped (--no-verify)
IAM: bedrock:* on *
Guardrails: not configured
Deployment target: prod alias
What is the best next step?
Options:
A. Approve deployment because Amazon Q Developer generated the change
B. Replace Bedrock Guardrails with prompt-only safety instructions
C. Run the normal architecture, security, test, and deployment gates
D. Validate the missing controls only with post-deployment CloudWatch metrics
Best answer: C
Explanation: Developer productivity tools can accelerate implementation, but they do not replace production engineering controls. The exhibit’s skipped tests, wildcard Bedrock permissions, missing guardrails, and production deployment target make review and gated validation the necessary next step.
The core concept is that generated code is an implementation aid, not an approval authority. In this case, the pull request has several production-blocking signals: tests were skipped, IAM uses bedrock:* on *, guardrails are absent, and the target is the production alias. A production GenAI application still needs architecture review, least-privilege IAM, safety controls, automated tests or evaluations, and controlled deployment through CI/CD gates. Developer tooling can help write code, explain changes, or suggest tests, but the team remains responsible for validating security, safety, and operational readiness before release.
Topic: Testing, Validation, and Troubleshooting
A healthcare company runs a RAG assistant on Amazon Bedrock Knowledge Bases over clinical policy PDFs in Amazon S3. Before each prompt, chunking, embedding, or corpus change, the team must run a 500-question regression suite in the same AWS Region, validate retrieval relevance, grounding, context completeness, and citation usefulness, finish in under 30 minutes, and produce auditable results without exposing PHI in logs. Which architecture is the best fit?
Options:
A. Fine-tune a custom model in Amazon SageMaker AI on the policy PDFs, compare validation perplexity, and replace retrieval when perplexity improves.
B. Enable detailed CloudWatch Logs for all production conversations, have reviewers inspect transcripts weekly, and promote changes when user complaints decrease.
C. Run a Step Functions evaluation workflow that replays a curated golden set, captures retrieved chunks, answers, and citations, scores retrieval and grounding with Amazon Bedrock evaluator models, stores redacted results in encrypted S3, publishes CloudWatch metrics, and gates the release.
D. Use Amazon Bedrock model evaluation only on the foundation model with generic prompt-response pairs, then promote the release when the model score improves.
Best answer: C
Explanation: A RAG evaluation must test the full retrieval-and-generation path, not just the base model. The best design runs a controlled regression set, records retrieved evidence and generated citations, scores relevance and grounding, and stores governed results for release decisions.
The core concept is end-to-end RAG validation. The evaluation workflow should replay known questions against the candidate RAG configuration and compare the retrieved chunks, generated answer, required facts, and cited sources with a curated baseline. Retrieval relevance can be scored with expected source IDs or chunk labels; grounding and completeness can be judged with Bedrock evaluator models or human-reviewed rubrics; citation usefulness requires checking whether cited sources support the answer. Step Functions provides controlled parallel orchestration to meet the 30-minute target, while encrypted S3 and redaction keep auditable evidence without leaking PHI to logs. The key is evaluating the deployed RAG behavior, not only the foundation model or user satisfaction after release.
Topic: Implementation and Integration
A team is implementing an agentic claims-assistant workflow on AWS. The agent can call Lambda tools that read DynamoDB and an external CRM. A security review requires the workflow to end after a fixed number of reasoning/tool steps or 20 seconds, restrict each tool to only its tenant resources, and temporarily stop CRM calls when recent failures exceed a configured threshold. Which principle best maps to these requirements?
Options:
A. Semantic caching
B. Bounded tool orchestration
C. Dynamic model routing
D. A/B prompt evaluation
Best answer: B
Explanation: The requirements describe runtime safety controls for an agentic tool workflow. Bounded tool orchestration limits how long the agent can act, constrains what each tool can access, and fails safely when dependencies become unhealthy.
Bounded tool orchestration is the best concept for safeguarded agent workflows that invoke tools such as Lambda functions and external APIs. The workflow should have explicit stopping conditions, such as maximum steps, maximum tool calls, or total execution time. Each tool should run with least-privilege IAM permissions and resource boundaries, such as tenant-scoped DynamoDB access. Circuit breakers prevent repeated calls to failing downstream systems and help avoid retry storms or cascading failures. These controls work together at runtime to keep the agent from looping, overreaching, or amplifying dependency failures.
Topic: Foundation Model Integration, Data Management, and Compliance
A SaaS company uses Amazon Bedrock Knowledge Bases backed by a single OpenSearch vector index for a RAG assistant. Most authenticated sessions include one approved document domain. Retrieval is slow and citations often come from the wrong domain.
Exhibit: Vector retrieval summary
Index: kb-prod-all-docs
Vectors: 18.4M
Domains: legal, support, HR, product
p95 retrieval latency: 2.8s; target: 800ms
Top-10 off-domain results: 31% of sampled queries
Domain filter: applied after ANN candidates are returned
Query mix: 92% target exactly one domain
What is the best next step to optimize retrieval performance at scale?
Options:
A. Increase the API Gateway and Lambda timeouts for retrieval calls.
B. Create domain-specific vector indexes and route queries before ANN search.
C. Disable domain filtering to reduce vector search overhead.
D. Re-embed all documents with a larger embedding model in one index.
Best answer: B
Explanation: The decisive detail is that 92% of queries target one domain, but the current design searches a single 18.4M-vector index and filters only after ANN candidates are returned. Domain-specific indexes with pre-retrieval routing reduce the candidate space and prevent off-domain results from competing in the initial search.
When retrieval domains differ and most queries are domain-scoped, a multi-index design is usually more efficient than one large shared vector index with late filtering. The exhibit shows both scale pressure and relevance leakage: 18.4M vectors, 2.8-second p95 latency, and 31% off-domain top-10 results. Routing each request to a legal, support, HR, or product index before ANN search narrows the search space and makes domain boundaries part of retrieval architecture instead of a post-processing step. This can also allow domain-specific chunking, metadata, and maintenance policies later. Increasing timeouts only hides latency, while re-embedding alone does not address cross-domain search competition.
Topic: Foundation Model Integration, Data Management, and Compliance
A company runs a RAG support API on AWS. The application sends retrieved passages to an Amazon Bedrock foundation model. The current retrieval tier uses Amazon OpenSearch Service match queries over title and body, with tenantId metadata filters. Users report hallucinated answers when they ask semantically equivalent questions, such as “termination provisions,” unless the document uses the exact phrase “cancellation rules.” Which change fixes the root cause with the smallest safe impact?
Options:
A. Increase the keyword query top_k value for every request.
B. Add a vector field and perform embedding similarity search with metadata filters.
C. Fine-tune the Bedrock foundation model on support tickets.
D. Remove tenantId filters to widen the search corpus.
Best answer: B
Explanation: The symptom is poor retrieval relevance for semantically similar wording. The root cause is a keyword-only retrieval architecture, so the smallest safe fix is to add embeddings and vector similarity search while keeping metadata filters.
Symptom: users ask valid natural-language questions, but retrieval only succeeds when document terms match exactly. Root cause: OpenSearch match queries are lexical, not semantic, so the FM receives weak or missing grounding context and may hallucinate. Fix: store embeddings for document chunks in a vector-capable index, embed the user query with the same embedding model, and run vector similarity retrieval combined with tenantId metadata filtering. This upgrades retrieval semantics without changing the FM or weakening tenant isolation.
Topic: AI Safety, Security, and Governance
A healthcare company is building a patient-support summarization API with Amazon Bedrock and Bedrock Knowledge Bases. Compliance requires proof that every response used approved prompt and safety-policy versions, allowed source collections, and auditable lineage without storing raw PHI. Developers currently call InvokeModel from several Lambda functions. Which implementation best meets these requirements?
Options:
A. Enable CloudWatch Logs for each Lambda function and require developers to log the prompt, completion, user ID, and timestamp for each request.
B. Run weekly SageMaker Clarify jobs on sampled outputs and store aggregate responsible AI metrics in encrypted S3 buckets.
C. Attach a Bedrock Guardrail in each Lambda function and let teams select prompt templates if they write full requests and responses to OpenSearch.
D. Route requests through a Step Functions gateway that uses approved Bedrock Prompt Management versions, applies a Bedrock Guardrail, records lineage metadata and citations to an encrypted audit store, and denies direct Bedrock invocation by application roles.
Best answer: D
Explanation: The best implementation is a governed invocation path that enforces approved prompt and guardrail versions at runtime and records lineage metadata. It also prevents unmanaged direct model calls and avoids storing raw PHI while preserving audit evidence.
For governance and auditability, the application should route Bedrock calls through a controlled gateway or workflow that applies approved artifacts consistently. Bedrock Prompt Management versions provide prompt lineage, Bedrock Guardrails enforce the safety policy, Knowledge Bases citations support source traceability, and an encrypted audit store can retain metadata such as user ID, timestamp, model ID, prompt version, guardrail version, source collection, and citation IDs. IAM controls are important because audit controls are weak if application roles can bypass the governed path and call Bedrock directly. The key takeaway is to enforce policy at the integration boundary and store minimal, structured evidence rather than raw sensitive content.
Topic: Implementation and Integration
A financial services company is building an internal GenAI gateway for several application teams. The gateway must invoke Amazon Bedrock models now and SageMaker AI endpoints later. Applications must not hard-code model IDs, provider-specific payloads, or endpoint ARNs. Compliance requires request-level traceability for user ID, application ID, prompt version, model alias, guardrail decision, latency, and errors in CloudWatch and X-Ray. Which implementation best meets these requirements?
Options:
A. Require clients to submit provider model IDs
B. Use Knowledge Bases as the gateway router
C. Create a provider-neutral API Gateway and Lambda gateway
D. Let each application call model endpoints directly
Best answer: C
Explanation: A GenAI gateway should abstract model providers from applications while centralizing controls. API Gateway with Lambda can accept a stable contract, resolve model aliases and prompt versions from configuration, apply compliance logic, and publish consistent CloudWatch logs and X-Ray traces.
The core implementation pattern is a provider-neutral GenAI gateway. API Gateway exposes one enterprise API contract, while Lambda or Step Functions resolves a logical model alias to the current Bedrock model or SageMaker AI endpoint. The gateway can load prompt versions from Bedrock Prompt Management or a configuration store, apply configured guardrails, translate requests into provider-specific payloads, and write structured CloudWatch logs with X-Ray annotations for audit and troubleshooting. Applications send use-case inputs and aliases only, so provider changes remain behind the gateway boundary. Direct client calls or client-selected provider IDs break abstraction and make compliance controls inconsistent.
Topic: Implementation and Integration
An insurer is adding a claims copilot to an existing claims application that runs in a corporate data center. The copilot must return REST API answers to adjusters with p95 latency under 4 seconds, use Amazon Bedrock with RAG over policy PDFs from a file share and claim metadata from the on-premises database synchronized within 15 minutes, create asynchronous claim-summary updates when claim-status events occur, and keep traffic private with auditable IAM/KMS controls. Which architecture best meets these requirements?
Options:
A. Train a custom FM in SageMaker AI, deploy a public endpoint, and batch-load policy data monthly for generation.
B. Use Direct Connect to a private API Gateway backed by Lambda/ECS; call Bedrock and Bedrock Knowledge Bases through VPC endpoints; process events with EventBridge/SQS/Step Functions; sync data with DataSync/DMS to S3 and trigger ingestion, with IAM, KMS, and CloudTrail.
C. Deploy Amazon Q Business as a separate portal, crawl the file share, and export daily CSV summaries to the claims application.
D. Call Bedrock directly from the on-premises application over the public endpoint; upload document snapshots nightly; poll the database for summaries.
Best answer: B
Explanation: The best architecture uses an enterprise GenAI gateway pattern with separate paths for synchronous APIs, asynchronous events, and data synchronization. It keeps model access private, refreshes retrieval data within the required window, and provides governance evidence through IAM, KMS, and CloudTrail.
The core pattern is to enhance the existing application without bypassing enterprise integration controls. A private REST entry point over Direct Connect gives the claims system a stable API, while VPC endpoints keep Bedrock traffic off the public internet. Lambda or ECS can orchestrate prompt construction and retrieval with Bedrock Knowledge Bases. EventBridge, SQS, and Step Functions decouple claim-status events from summary generation and provide retry handling. DataSync and AWS DMS CDC can move file-share and database changes to S3, then trigger knowledge base ingestion within the 15-minute freshness target. IAM, KMS, and CloudTrail provide least privilege, encryption, and auditability. Public model calls, batch-only processing, or a separate portal miss one or more stated integration constraints.
Topic: Operational Efficiency and Optimization for Genai Applications
An enterprise support assistant invokes Amazon Bedrock through a shared GenAI gateway. Most requests are simple FAQ answers or short summaries, but some require multi-step reasoning across retrieved policy documents. The team must reduce inference cost while preserving output quality for difficult cases, and the routing decision should be made per request based on task complexity and confidence. Which pattern best maps to this requirement?
Options:
A. Static endpoint routing
B. A/B model evaluation
C. Semantic response caching
D. Dynamic model routing
Best answer: D
Explanation: Dynamic model routing is the best fit when each request should be sent to an appropriate model tier. It supports cost control by using lower-cost models for routine tasks while escalating complex or low-confidence requests to stronger models.
The core concept is cost-aware model selection at inference time. In a Bedrock-based gateway, a lightweight classifier, rules engine, or confidence score can route simple FAQ and summary requests to a lower-cost model, while sending complex reasoning requests to a more capable model. This keeps quality where it matters without paying premium inference cost for every request. The routing policy should be observable and governed so teams can tune thresholds using quality, latency, and cost metrics. Caching and evaluation can support the system, but they do not by themselves decide which model tier should handle each new request.
Topic: Implementation and Integration
A company is designing an agentic GenAI assistant with separate specialists for legal policy, support troubleshooting, and sales summarization. The developer wants an AWS-native capability that can route a user request to the appropriate specialist agent and synthesize the specialists’ results without writing a custom routing service. Which capability best fits this role?
Options:
A. Amazon Bedrock Model Evaluation
B. Amazon Bedrock Agents multi-agent collaboration
C. Amazon Bedrock Knowledge Bases
D. Amazon Bedrock Guardrails
Best answer: B
Explanation: Amazon Bedrock Agents multi-agent collaboration is designed for coordinating specialized agents in an agentic workflow. It lets a supervisor agent delegate work to collaborators and combine their outputs, which matches the routing and aggregation requirement.
The core concept is model and agent coordination. In Amazon Bedrock Agents multi-agent collaboration, a supervisor agent can break down a user request, select the relevant collaborator agents, pass tasks to them, and synthesize the final response. This supports production patterns where different agents use different instructions, tools, knowledge sources, or foundation models for specialized work. It avoids requiring the application team to build all routing, delegation, and aggregation logic as a separate custom service. The key distinction is that this is orchestration, not content filtering, retrieval alone, or offline evaluation.
Topic: Operational Efficiency and Optimization for Genai Applications
A team runs a production RAG chat assistant by using Amazon Bedrock Knowledge Bases. Daily Bedrock spend increased after a deployment, but user traffic and the selected foundation model did not change.
Exhibit: 24-hour metric comparison
User sessions: 18,400 (baseline 18,200)
Bedrock invocations/session: 1.1 (baseline 1.1)
Avg input tokens/invocation: 16,900 (baseline 2,400)
Avg output tokens/invocation: 310 (baseline 325)
KB retrieval topK: 30 chunks (baseline 5)
Model ID: unchanged
Which next step is most appropriate?
Options:
A. Disable prompt and response logging.
B. Increase Bedrock client retry attempts.
C. Prune retrieved context before prompt assembly.
D. Move to a larger context-window model.
Best answer: C
Explanation: The cost spike is best explained by excessive prompt context. The decisive exhibit details are the jump in average input tokens and the increase in Knowledge Bases topK from 5 to 30 chunks, while sessions, invocations, output tokens, and model ID remain stable.
Token spend for Bedrock applications is strongly affected by the amount of input context sent to the model. In this case, the deployment expanded retrieved context, causing each invocation to include far more tokens even though demand and model selection did not change. The next step is to reduce the retrieved context size, such as by lowering topK, applying metadata filters, using reranking, or trimming chunks before prompt assembly.
A larger context window would allow more text but would not address the cost driver. Retry tuning is not indicated because invocations per session stayed flat.
Topic: AI Safety, Security, and Governance
A financial services team uses Amazon Bedrock Prompt Management for a customer-assistance prompt. Before promoting a new prompt version, compliance requires repeatable evidence that refusal rates and answer-quality scores do not worsen for protected-class proxy cohorts. The model, retrieval corpus, and test set must stay fixed so only the prompt version changes. Which evaluation pattern best fits this requirement?
Options:
A. Controlled A/B fairness comparison
B. Dynamic model routing by latency
C. Grounding and hallucination detection
D. Semantic caching for repeated prompts
Best answer: A
Explanation: The requirement is to compare two prompt versions while holding other variables constant and measuring cohort-level fairness outcomes. A controlled A/B fairness comparison provides repeatable, auditable evidence for whether the new prompt changes refusal or quality gaps.
Controlled A/B evaluation is the best pattern when a team must compare a current prompt and a candidate prompt with fairness metrics. Bedrock Prompt Management can preserve prompt versions, while an evaluation workflow can run the same test set, model, retrieval inputs, and scoring rubric against each version. Cohort-level metrics such as refusal rate, helpfulness score, or policy-compliant answer rate can then be compared without confounding changes from model selection or data updates. An LLM-as-judge can be part of the scoring approach, but the key principle is the controlled comparison across prompt variants. The closest distractors focus on other concerns, such as factual grounding, latency, or cost efficiency, rather than fairness deltas between versions.
Topic: Foundation Model Integration, Data Management, and Compliance
A financial services team is building a RAG API on AWS. For each user question, the application must rewrite the query, retrieve candidates from an Amazon Bedrock Knowledge Base and an OpenSearch Service index in parallel, rerank the candidates, assemble a cited context window, and then invoke an FM. Operations needs per-stage retries and CloudWatch visibility without embedding the whole workflow in one function. Which pattern best maps to this requirement?
Options:
A. Semantic response caching
B. Bedrock Guardrails enforcement
C. Static model routing
D. Step Functions workflow orchestration
Best answer: D
Explanation: The requirement is about coordinating multiple deterministic retrieval-augmentation steps before FM invocation. AWS Step Functions is the best pattern because it provides explicit workflow states, branching or parallel execution, retries, error handling, and operational visibility for each stage.
For a production RAG pipeline, Step Functions can model each augmentation stage as a state: query transformation, parallel retrieval, reranking, context assembly, and FM invocation. This keeps the orchestration outside a single monolithic function and makes each stage observable and independently retryable. It also supports service integrations and Lambda tasks, so teams can mix Bedrock, OpenSearch Service, and custom rerankers while preserving a clear execution history.
The key takeaway is that the requirement is workflow orchestration, not just response optimization, safety filtering, or model selection.
Topic: Foundation Model Integration, Data Management, and Compliance
An insurance company is building a synchronous claims-policy assistant on Amazon Bedrock. Policies are stored in Amazon S3 and indexed by Bedrock Knowledge Bases with OpenSearch Serverless in us-east-1. Requirements: retrieval must enforce policyholder region and adjuster role from IAM Identity Center, malformed or overbroad filters must be rejected before vector search, the FM may receive only a fixed JSON context with chunk IDs and citations, and audit logs must show who retrieved which chunks. Which architecture best meets these requirements?
Options:
A. Call RetrieveAndGenerate directly from the browser.
B. Use a server-side retrieval gateway before FM invocation.
C. Use S3 Object Lambda to redact after retrieval.
D. Fine-tune separate models for each role and region.
Best answer: B
Explanation: The best design makes retrieval a trusted server-side control point before any context reaches the FM. It derives authorization filters from trusted identity claims, rejects unsafe retrieval requests, validates the context format, and records audit evidence for returned chunks.
The core concept is a trusted retrieval gateway rather than prompt-based authorization. A Lambda or container service behind API Gateway can authenticate the requester, derive region and role filters from IAM Identity Center claims, and prevent users from submitting arbitrary metadata filters or oversized queries. Calling Bedrock Knowledge Bases Retrieve lets the application inspect returned chunks, enforce metadata constraints, and build a fixed JSON context containing fields such as chunk ID, source, citation, and excerpt before invoking the FM. Structured CloudWatch logs can capture the principal, filters, and returned chunk IDs. Post-generation formatting or guardrails can help, but they do not prevent unauthorized context from being supplied to the model.
Topic: Implementation and Integration
A team deployed a synchronous customer-support chat API on AWS Lambda. The function runs in private subnets with no NAT gateway and uses the AWS SDK to call Amazon Bedrock. IAM permissions for bedrock:InvokeModel are already attached.
Exhibit: CloudWatch log excerpt
Client: BedrockRuntimeClient us-east-1
Operation: InvokeModel
Endpoint: bedrock-runtime.us-east-1.amazonaws.com
VPC endpoints: com.amazonaws.us-east-1.bedrock
Error: ETIMEDOUT connect to bedrock-runtime
Retries: exhausted
Which next step best addresses the failure?
Options:
A. Increase the model maxTokens parameter
B. Switch the Lambda function to asynchronous invocation
C. Create a Bedrock Runtime interface VPC endpoint
D. Add s3:GetObject permission to the Lambda role
Best answer: C
Explanation: The failure is a network path issue, not a prompt or IAM problem. The exhibit shows the Lambda function calls the Bedrock Runtime endpoint, but the VPC endpoint configured is only for the Bedrock control plane.
For synchronous FM calls from a Lambda function in private subnets without NAT, the function must have private network access to the specific Amazon Bedrock runtime endpoint used by the SDK. InvokeModel and similar runtime operations use the Bedrock Runtime service endpoint, so an interface VPC endpoint for Bedrock Runtime must be present and reachable by the Lambda security group and subnets. The existing com.amazonaws.us-east-1.bedrock endpoint does not provide connectivity to bedrock-runtime.us-east-1.amazonaws.com. The key takeaway is to match the SDK client endpoint to the private connectivity path required by the compute environment.
ETIMEDOUT connect, which occurs before the model processes a request.InvokeModel to the Bedrock Runtime endpoint.Topic: Foundation Model Integration, Data Management, and Compliance
A fintech team uses Amazon Bedrock Prompt Management for a customer-support prompt that must cite policy documents from a Bedrock Knowledge Base. After three manual wording changes, hallucinated fee-waiver answers remain near 15%. Retrieval logs show the correct policy document is in the top 3 results for failing cases. The only feedback stored is thumbs-up/thumbs-down plus the transcript, so developers cannot tell whether failures are due to missing constraints, bad citations, or overconfident wording. Which action fixes the root cause with the smallest safe change?
Options:
A. Increase the Knowledge Base retrieval result count
B. Replace the model with a larger-context model
C. Add rubric labels and replay failures against prompt versions
D. Fine-tune the model on low-rated transcripts
Best answer: C
Explanation: The symptom is persistent hallucination even when relevant source documents are retrieved. The root cause is an unstructured prompt feedback loop that does not identify which response-quality criterion failed. Adding rubric-based feedback and replaying failures against prompt versions enables iterative, measurable prompt refinement.
The troubleshooting path is Symptom -> Root cause -> Fix. The symptom is hallucinated fee-waiver answers after manual prompt edits. The root cause is not retrieval relevance because the correct policy appears in the top results; it is prompt operations feedback that is too coarse to guide refinement. A structured loop should capture fields such as expected answer, grounding score, citation correctness, policy constraint missed, and failure category, then replay those labeled cases against each prompt version before promotion. This creates a regression set for prompt quality and makes prompt changes measurable rather than guess-based. Changing retrieval volume, model size, or training data does not address the missing structured feedback.
Topic: Operational Efficiency and Optimization for Genai Applications
A retail company runs a synchronous customer-support RAG API: API Gateway → Lambda → OpenSearch vector search → Amazon Bedrock Converse. After a prompt deployment, p95 latency rose from 3 seconds to 12 seconds and Bedrock cost per request doubled.
Traces for slow requests show:
topK=5maxTokens=4096Support policy requires answers under 800 output tokens with citations. Which change fixes the root cause with the smallest safe change?
Options:
A. Set a response-token budget near 800.
B. Reduce the OpenSearch topK value.
C. Increase the Lambda memory setting.
D. Use Bedrock Provisioned Throughput.
Best answer: A
Explanation: The trace places almost all added latency in the Bedrock Converse call, not in Lambda, DynamoDB, or vector retrieval. The high median output tokens and large maxTokens value show that long generation is driving both latency and cost. Enforcing the required output-token budget is the smallest safe change.
Symptom: p95 latency and per-request cost increased after a prompt deployment, while dependent service timings remain low. Root cause: profiling localizes the delay and resource use to the Bedrock Converse call, and the token metrics show responses are generating thousands of output tokens. Fix: set the Converse output-token budget near the 800-token policy limit, and keep the model and retrieval path unchanged. The key takeaway is to optimize the parameter that the profile identifies as the bottleneck instead of changing unrelated capacity.
Topic: AI Safety, Security, and Governance
A GenAI platform team needs a reusable AWS-native mechanism to start remediation when monitoring detects unsafe model outputs or policy-control failures. Sources include CloudWatch alarms for guardrail intervention metrics and CloudTrail events for denied Amazon Bedrock API calls. Which AWS service is designed to route these events to targets such as Lambda or Step Functions workflows?
Options:
A. CloudWatch Logs Insights
B. Amazon EventBridge
C. AWS CloudTrail
D. AWS X-Ray
Best answer: B
Explanation: Amazon EventBridge is the event-routing service for reacting to monitoring and governance signals across AWS services. It can match CloudWatch alarm state changes or CloudTrail-delivered API events and invoke automated remediation workflows.
For continuous policy enforcement, the key pattern is event-driven remediation. CloudWatch and CloudTrail provide important signals, but EventBridge is the service that matches those events with rules and routes them to targets such as Lambda, Step Functions, SNS, or incident workflows. In a GenAI application, this lets teams respond when guardrail intervention rates spike, denied Bedrock API calls occur, or another policy-control signal appears.
CloudTrail records API activity, CloudWatch stores metrics and alarms, and X-Ray traces requests. EventBridge connects those signals to automated action.
Topic: Testing, Validation, and Troubleshooting
A team must choose an Amazon Bedrock foundation model for a production summarization feature. The team needs cost-performance evidence, but the decision must not depend on current published prices, quotas, or regional limit trivia because those values can change. Which evaluation pattern best fits this requirement?
Options:
A. Production-only A/B test using click-through rate
B. Workload-normalized benchmark with quality, latency, and token-usage metrics
C. Selection by largest context window and throughput quota
D. Selection by lowest current price per output token
Best answer: B
Explanation: Use a workload-normalized evaluation to compare model choices on stable measurements from the same representative tasks. Quality, latency, success rate, and token or request usage can be recorded, then combined with current business cost assumptions outside the test.
Cost-performance comparison should separate durable evaluation evidence from volatile commercial or quota values. For model selection, run the same representative prompts or documents through each candidate configuration, score output quality with consistent evaluators, and record operational metrics such as latency, success rate, input tokens, output tokens, and retries. These measurements support a cost-performance decision without embedding changing price sheets or service limits into the evaluation logic. Current unit prices or reserved capacity assumptions can be applied later as parameters in a business analysis.
The key principle is to benchmark the workload and normalize results per task or successful output, rather than choosing from static price or quota facts.
Topic: Implementation and Integration
Which statement best defines a centralized GenAI gateway architecture in an AWS enterprise application platform?
Options:
A. A vector database pattern that stores embeddings for semantic retrieval before generation.
B. A workflow service that chains prompts, tools, and model invocations.
C. A shared API facade that brokers FM requests, applies identity-aware policies, normalizes prompts and responses, and emits centralized telemetry.
D. A registry that versions trained models before deployment to endpoints.
Best answer: C
Explanation: A centralized GenAI gateway is an enterprise integration layer for foundation model consumption. It gives application teams a consistent interface while centralizing controls such as authentication, routing, prompt policy, guardrails, logging, tracing, and cost governance.
The core concept is gateway-based abstraction for GenAI applications. In AWS, this commonly sits in front of Amazon Bedrock, SageMaker endpoints, or approved external providers and exposes a standard API to product teams. The gateway can enforce IAM or tenant policy, select approved models, attach prompt templates or guardrails, standardize request and response formats, capture CloudWatch and X-Ray telemetry, and apply throttling or budget controls. This pattern reduces duplicated integration code and makes governance evidence consistent across environments. It is broader than retrieval, prompt orchestration, or model lifecycle management alone.
Topic: Implementation and Integration
An insurer runs a Bedrock-powered claims assistant with a planner agent and two specialist agents. In production, traces show that the planner repeatedly invokes the same policy-lookup tool after transient timeouts, specialists lose facts that another agent already retrieved, and some sessions run until the token limit. Requirements: keep claim data in one AWS Region, return normal chat turns in under 6 seconds, produce an auditable tool-call history, and hand off to a human after bounded retries. Which AWS-native architecture is the best fit?
Options:
A. Fine-tune a SageMaker-hosted model on prior traces so it learns when to stop delegating.
B. Use stateless Lambda wrappers, Lambda automatic retries, and a larger model token budget.
C. Use Step Functions Express with DynamoDB state, idempotency, loop limits, handoff branches, and CloudWatch/X-Ray traces.
D. Use Bedrock Knowledge Bases as agent memory and increase retrieval top-k for every specialist agent.
Best answer: C
Explanation: The main failure is not missing knowledge; it is unmanaged agent state and control flow. A bounded orchestration layer with durable shared state, idempotent tool calls, explicit handoff paths, and observability directly addresses repeated tool calls, lost facts, and runaway reasoning.
Agentic troubleshooting often requires making workflow state explicit instead of relying on the model to remember everything in the prompt. Step Functions Express can coordinate the planner, specialists, Bedrock calls, and Lambda tools for low-latency turns. DynamoDB stores session facts, tool results, and idempotency keys so retries do not repeat the same external action. State-machine timeouts, retry policies, and maximum loop counters stop unmanaged reasoning and branch to a human-handoff queue or case system. CloudWatch Logs and X-Ray provide the tool-call trace needed for audit and troubleshooting, while deploying the workflow, data store, and endpoints in the target Region supports data locality. The closest distractors focus on retrieval, training, or token budget, but none provide deterministic state and loop control.
Topic: AI Safety, Security, and Governance
A developer is troubleshooting a customer support assistant that uses Amazon Bedrock Guardrails. The policy requires the application to return only the managed safe response when output moderation intervenes.
Exhibit: Invocation log excerpt
requestId: 8f3a
promptVersion: support-prod-17
guardrail.inputAction: NONE
modelOutput: "Your refund is approved for $1,200."
guardrail.outputAction: INTERVENED
guardrail.safeResponse: "I can’t approve refunds. Please contact support."
api.responseStatus: 200
api.responseBodySource: modelOutput
What is the best next step to close the safety-control gap?
Options:
A. Increase the model temperature for more varied responses.
B. Return guardrail.safeResponse when output action is INTERVENED.
C. Grant the prompt service role permission to edit production prompts.
D. Disable output moderation and rely on input checks.
Best answer: B
Explanation: The decisive detail is that guardrail.outputAction is INTERVENED while api.responseBodySource is modelOutput. The application is not honoring the guardrail result, so it must fail closed and return the managed safe response when output moderation intervenes.
This is an output safety-control integration gap, not a model-quality tuning problem. Bedrock Guardrails can detect and intervene on unsafe or policy-violating output, but the application must enforce the moderation decision in its response path. In the exhibit, the guardrail produced a safe response, yet the API returned the raw model output that approved a refund. The next step is to update the response handling logic so an intervened output cannot be sent to the user.
The key takeaway is to treat moderation results as enforcement signals, not just observability data.
Topic: AI Safety, Security, and Governance
A bank is implementing a Bedrock-based support assistant behind API Gateway and Lambda. It retrieves policy text from Bedrock Knowledge Bases and uses a Step Functions workflow to call Lambda tools for read-only account lookups. The security team requires controls for harmful language, prohibited investment advice, SSN exposure in both requests and responses, unauthorized tool actions, and unsupported answers. Answers must cite retrieved policy sources or state that the answer cannot be determined. Which implementation best satisfies these requirements?
Options:
A. Use IAM and CloudWatch/X-Ray tracing for Lambda tools, then review sampled transcripts after deployment.
B. Use Knowledge Bases metadata filters and citations, and allow the model to choose any Lambda tool in the workflow.
C. Use Prompt Management instructions to refuse prohibited advice, redact SSNs, call safe tools, and cite sources when available.
D. Use ApplyGuardrail before invocation, Bedrock Guardrails on output, Knowledge Bases citations with grounding fallback, and schema-restricted tools with least-privilege IAM.
Best answer: D
Explanation: The requirements call for layered runtime controls, not only prompt instructions or observability. Bedrock Guardrails and ApplyGuardrail handle inappropriate content, denied topics, and sensitive data on inputs and outputs. Knowledge Bases citations and grounding fallback address unreliable claims, while schema validation and least-privilege IAM constrain tool use.
Different safety risks need different controls. Bedrock Guardrails can enforce content filters, denied topics, and sensitive information handling; using ApplyGuardrail before model invocation helps stop or mask SSNs in user input before it reaches the model workflow. Knowledge Bases retrieval should provide citations and a fallback path when grounding is insufficient, which reduces unsupported claims. Tool safety is not solved by the model prompt; Step Functions and Lambda should validate tool parameters, restrict tool schemas, and use IAM permissions that allow only the approved read-only account APIs. The key pattern is layered enforcement: guardrails for content and PII, grounding for claims, and deterministic controls for tool actions.
Topic: Foundation Model Integration, Data Management, and Compliance
A contact center application ingests recorded MP3 calls from Amazon S3 into a RAG pipeline that uses Amazon Bedrock Knowledge Bases with OpenSearch Service. The ingestion Lambda reads each object as UTF-8 text before chunking and embedding. Users report hallucinated answers about call details, and retrieval samples contain ID3 headers and binary-looking text. Which change fixes the root cause with the smallest safe change?
Options:
A. Resample MP3 files with SageMaker Processing.
B. Run Amazon Transcribe, then index transcript text.
C. Increase chunk size and retrieval top-k.
D. Prompt a Bedrock multimodal model with each MP3.
Best answer: B
Explanation: The symptom is poor grounding because audio files are being treated as text. The root cause is missing modality-specific preprocessing before embedding. Amazon Transcribe is the smallest safe fix because it creates text transcripts that can flow through the existing RAG pipeline.
Symptom: hallucinated answers and retrieved chunks that look like ID3 headers indicate the vector index contains nonsemantic audio bytes rather than call content. Root cause: the ingestion workflow is applying text chunking and embeddings directly to MP3 objects instead of first converting speech to text. Fix: add Amazon Transcribe for the S3 audio files, store the transcript with call metadata, and index the transcript text in the existing Bedrock Knowledge Bases workflow.
SageMaker Processing could transform files, but audio resampling does not produce searchable language. A Bedrock multimodal prompt would bypass the existing retrieval design and is not the smallest safe change.
Topic: Testing, Validation, and Troubleshooting
In prompt troubleshooting for an Amazon Bedrock application, which term best describes running a fixed test set against two prompt versions, validating each response against the required JSON schema, comparing failures and quality scores, and refining the candidate prompt before release?
Options:
A. Retrieval reranking
B. Prompt regression testing
C. Semantic caching
D. Agent memory compaction
Best answer: B
Explanation: Prompt regression testing is the best match because it evaluates whether a prompt change has broken expected behavior. In a Bedrock workflow, this often pairs versioned prompts with output-schema validation and iterative refinement before deployment.
Prompt regression testing applies software testing discipline to prompt changes. A developer keeps a stable set of representative inputs, runs them against a baseline prompt and a candidate prompt, validates required output structure such as JSON schema, and compares quality or failure signals. This helps isolate whether the prompt change caused malformed outputs, weaker grounding, tone drift, or other regressions before the prompt is promoted through Amazon Bedrock Prompt Management or a CI/CD workflow.
The key idea is controlled comparison: hold the test inputs and model settings steady, then compare prompt versions and refine only the prompt or related configuration under test.
Topic: Operational Efficiency and Optimization for Genai Applications
A developer needs to profile a GenAI request path that runs in AWS Lambda, invokes an FM through Amazon Bedrock, and queries Amazon OpenSearch Service. The goal is to see per-request timing across the application and downstream AWS SDK calls to determine where latency is introduced. Which AWS service capability best supports this need?
Options:
A. AWS X-Ray distributed tracing
B. Amazon S3 server access logging
C. AWS CloudTrail event history
D. AWS Cost Explorer usage reports
Best answer: A
Explanation: AWS X-Ray is used for distributed tracing and latency profiling across request paths. For a GenAI workflow that spans Lambda, Bedrock API calls, and retrieval services, traces help isolate which component contributes delay.
The core concept is observability by trace, not audit or cost reporting. X-Ray records request-level traces and can show segments for application execution and subsegments for downstream calls made through instrumented clients. In this scenario, that helps compare Lambda processing time, FM invocation time, and retrieval call time in the same request path.
CloudWatch metrics and logs are also useful for operations, but X-Ray is the better fit when the question asks where latency is introduced across dependent services. CloudTrail answers who called what API and when, not performance breakdown.
Topic: AI Safety, Security, and Governance
A company builds a RAG assistant over regulated S3 datasets cataloged in AWS Glue Data Catalog. Compliance requires each generated answer to be auditable to the source table or document version, metadata tags used, and CloudTrail evidence of catalog changes. Which principle or pattern best satisfies this requirement?
Options:
A. Hallucination detection scoring
B. Governance lineage with source attribution
C. Semantic caching of responses
D. Token optimization by prompt compression
Best answer: B
Explanation: The requirement is to prove where generated content came from and how source metadata changed over time. Governance lineage with source attribution maps outputs to Glue Data Catalog metadata, tags, document versions, and CloudTrail audit evidence.
Governance lineage is the core pattern for tracking the data sources behind GenAI outputs. In this scenario, the assistant must retain source attribution for retrieved content, use Glue Data Catalog metadata and tags to identify governed datasets, and preserve CloudTrail evidence for catalog or tag changes. This supports compliance review because auditors can connect a response to the exact source context and metadata state used at generation time. Quality controls such as hallucination scoring may still help, but they do not provide lineage or audit evidence.
Topic: Foundation Model Integration, Data Management, and Compliance
An insurance company runs a claims assistant on Amazon Bedrock. A Lambda function builds one Converse request from uploaded claim packets that include PDF forms with tables, JPEG damage photos, MP3 adjuster notes, and CSV repair estimates. After deployment, some requests fail with unsupported content or context-size errors, and successful responses hallucinate line-item prices. A payload sample shows audio base64 inserted as text, CSV rows appended without schema checks, and dropped image references. Which change fixes the root cause with the smallest safe change?
Options:
A. Put S3 presigned URLs directly in the prompt.
B. Add modality-specific preprocessing and payload validation.
C. Increase output token limits for the Bedrock invocation.
D. Fine-tune a model on historical claim packets.
Best answer: B
Explanation: The symptom is both malformed FM payloads and hallucinated tabular values. The root cause is that raw multimodal inputs are being concatenated without modality-specific extraction, validation, or size control. The smallest safe fix is to preprocess each modality into supported, bounded inputs before invoking the FM.
Symptom: Bedrock requests fail with unsupported content and context-size errors, while successful responses hallucinate repair prices. Root cause: the workflow treats audio, CSV, tables, and images as generic prompt text instead of preparing them for FM consumption. Fix: add a preprocessing step, such as Amazon Transcribe for audio, Amazon Textract or Bedrock Data Automation for document tables/images, CSV schema validation, and bounded text/image payload construction for the Converse API. This preserves grounding and keeps the payload within supported input formats instead of asking the model to infer structure from raw or truncated content.
Topic: Foundation Model Integration, Data Management, and Compliance
A company uses Amazon Bedrock Prompt Management for a customer-support prompt. A CI/CD pipeline starts an AWS Step Functions QA workflow after each prompt update. The workflow uses AWS Lambda to invoke the FM with 60 canned tickets and publishes only HTTPStatus and InvocationLatency to Amazon CloudWatch. The gate passed, but the new prompt hallucinated refund-policy answers that differ from the approved golden responses. Which change fixes the root cause with the smallest safe change?
Options:
A. Add Lambda output assertions and fail the Step Functions gate
B. Extend the canary period before routing production traffic
C. Lower the production model temperature for all requests
D. Add CloudWatch alarms for Bedrock throttling and 5xx errors
Best answer: A
Explanation: The symptom is a prompt regression that produced hallucinated policy answers even though the workflow passed. The root cause is that the QA workflow checks transport health, not output correctness. Adding Lambda assertions for golden and edge-case expected outputs lets Step Functions block promotion and CloudWatch track regression metrics.
Symptom: the prompt QA gate passed, but production responses regressed against known approved answers. Root cause: the Lambda tests only verified that Bedrock calls succeeded and met latency expectations; they did not compare generated outputs to golden responses, edge-case criteria, or structured quality checks. Fix: update the Lambda test runner to evaluate each test case against expected content or rubric-based assertions, publish pass/fail and regression metrics to CloudWatch, and return failures to Step Functions so the promotion gate stops the prompt release. This is the smallest safe change because it strengthens the existing QA workflow without changing the model, routing, or production behavior.
Topic: Foundation Model Integration, Data Management, and Compliance
An insurer moved a RAG claims assistant from POC to production using Amazon Bedrock Knowledge Bases and versioned prompts. A user reports a hallucinated policy answer. Auditors can see Bedrock API events in CloudTrail, but the privacy policy forbids retaining raw customer prompts, and the app stores only the final answer. They cannot map the response to the prompt version, retrieval source metadata, or validation run approved for that configuration. Which change fixes the root cause with the smallest safe change?
Options:
A. Run Bedrock Model Evaluations weekly and archive the reports.
B. Enable full model payload logging for all prompts and responses.
C. Increase the knowledge base retrieval topK and rerank results.
D. Emit a redacted per-invocation evidence record with prompt version, source metadata, validation-run ID, and Bedrock request ID.
Best answer: D
Explanation: The production symptom is missing audit provenance, not only a hallucinated answer. CloudTrail proves an API call occurred, but it does not bind the response to prompt versions, retrieved sources, or the approved validation run. A redacted per-invocation evidence record supplies that mapping while respecting the data-retention boundary.
Symptom: auditors cannot reproduce or justify a specific hallucinated answer even though CloudTrail shows Bedrock activity. Root cause: the production path records only the final answer, so Domain 1 design artifacts such as prompt versions, retrieval source metadata, model identifiers, and validation-run IDs are not bound to the invocation. Fix: add a structured, KMS-encrypted audit event from the GenAI gateway or application tier for each request that stores correlation IDs and metadata, not raw customer text, in controlled logs or S3. This is smaller and safer than changing retrieval behavior or retaining full payloads.
Topic: Implementation and Integration
A team is implementing an internal /summarize endpoint. Amazon API Gateway invokes an AWS Lambda function running in private subnets. The function must call an Amazon Bedrock foundation model and return the generated summary in the same HTTP response. The VPC has no NAT gateway, and security requires no public internet path and least-privilege model access. Which implementation meets these requirements?
Options:
A. Call SageMaker Runtime InvokeEndpoint with the Bedrock model ID.
B. Use Bedrock CreateModelInvocationJob and read the S3 output.
C. Send requests to SQS and invoke Bedrock from Step Functions.
D. Use Bedrock Runtime Converse from Lambda with a bedrock-runtime VPC endpoint.
Best answer: D
Explanation: The endpoint needs a synchronous request-response call from Lambda to Amazon Bedrock without public internet access. Using the AWS SDK Bedrock Runtime API, such as Converse, through an interface VPC endpoint satisfies the latency, networking, and compute constraints.
For a synchronous FM interaction, the Lambda function should call the Bedrock Runtime data-plane API with the AWS SDK and wait for the model response before returning the API Gateway response. Because the Lambda function is in private subnets with no NAT gateway, traffic to Bedrock must use an interface VPC endpoint for Bedrock Runtime. The Lambda execution role should allow only the required Bedrock action, such as bedrock:InvokeModel, for the approved model ARN where supported.
Asynchronous orchestration and batch invocation patterns are useful for long-running jobs, but they do not meet the same-HTTP-response requirement.
CreateModelInvocationJob is asynchronous and writes results to storage later.Topic: AI Safety, Security, and Governance
A healthcare company deploys an Amazon Bedrock-powered triage summarizer through CodePipeline. Policy requires every production model and prompt release to reference an approved model card and approval ticket before deployment. After a hotfix, an audit finding shows that a new prompt version reached production without an approved model-card record. Guardrail invocation logs and CloudTrail deployment logs are present. Which change fixes the root cause with the smallest safe change?
Options:
A. Review model-card artifacts after production deployment.
B. Log all prompts and responses to CloudWatch Logs.
C. Add a Lambda gate that verifies an approved model card.
D. Tighten Bedrock Guardrails blocked-topic filters.
Best answer: C
Explanation: The symptom is missing approval evidence for a production prompt release, not unsafe generated content. The root cause is that the deployment path does not enforce the required governance artifact before promotion. A Lambda compliance gate in the pipeline is the smallest safe fix because it blocks noncompliant releases and creates audit evidence.
Symptom: a prompt version reached production without the required approved model-card record. Root cause: the CI/CD workflow observes deployments but does not enforce the policy before promotion. Fix: add a Lambda compliance action in CodePipeline or a Step Functions approval workflow that validates the release manifest against approved model-card and ticket metadata, then fails the deployment if the requirement is not met. This preserves the existing Bedrock application and guardrails while adding the missing preventive control and audit decision point.
Guardrails help with runtime safety, but they do not prove that a model or prompt release was reviewed and approved before deployment.
Topic: Foundation Model Integration, Data Management, and Compliance
A developer is designing a RAG vector index by using embedding models available through Amazon Bedrock. Which statement is most accurate when selecting and configuring the embedding model?
Options:
A. Prefer the largest vector dimension for all corpora.
B. Mix embedding models when using cosine similarity.
C. Match modality/domain and vector dimension, then use the model consistently.
D. Use generation models; Bedrock creates embeddings automatically.
Best answer: C
Explanation: Embedding model choice determines both semantic representation and vector dimensionality. For RAG, choose a model that fits the content domain and modality, configure the vector index for that model’s output dimension, and embed documents and queries with the same model configuration.
In a vector retrieval system, embeddings from one model live in that model’s semantic space and have a specific vector dimension. Amazon Titan embeddings or other Bedrock embedding models should be selected based on corpus fit, supported modality, language/domain needs, latency, cost, and retrieval quality. The vector index schema must match the embedding dimension, and query embeddings should use the same model and configuration as the indexed documents. Changing models or dimensions usually requires re-embedding and reindexing the corpus.
Higher dimensionality can improve representation in some cases, but it can also increase storage, memory, and search cost. The key is to evaluate the model on representative retrieval tasks rather than assuming the largest vector is always best.
Topic: Operational Efficiency and Optimization for Genai Applications
A team wants to reduce latency for an Amazon Bedrock RAG assistant but must not degrade answer quality. The team will compare foundation models, prompt versions, and optional semantic caching. Which statement best defines workload-specific benchmarking for this optimization decision?
Options:
A. Enable semantic caching before measuring model quality.
B. Choose the model with the best public leaderboard score.
C. Compare only average model invocation latency.
D. Measure representative requests through the full path for quality and latency.
Best answer: D
Explanation: Workload-specific benchmarking means testing the actual workload, not relying on generic scores or isolated service timings. For GenAI optimization, the benchmark should capture response quality and latency across the same prompt, retrieval, guardrail, model, and application path users experience.
A useful GenAI benchmark uses representative prompts, documents, user intents, and expected traffic patterns to compare alternatives. For an Amazon Bedrock RAG application, the measurement should include the full request path: retrieval, reranking if used, prompt assembly, model inference, guardrails, postprocessing, and response streaming. Quality can be measured with human review, task-specific rubrics, or LLM-as-judge evaluations, while latency should include distributions such as p50 and p95 rather than only a single average. The key takeaway is to optimize from evidence that reflects the real workload and user experience.
Topic: Foundation Model Integration, Data Management, and Compliance
A healthcare SaaS company is adding a RAG feature to a case-management app. Source PDFs are stored in Amazon S3. Each retrieved chunk must be filtered by the user’s tenant, contract, case status, and legal-hold flags that are already stored in Amazon RDS for PostgreSQL and updated in the same transaction as case changes. The team wants Amazon Bedrock embeddings and generation, but does not want to operate a separate search cluster. Which implementation best meets these requirements?
Options:
A. Use RDS PostgreSQL pgvector with S3 object references.
B. Use Bedrock Knowledge Bases with S3 synchronization.
C. Use OpenSearch Service as the vector store.
D. Use DynamoDB for chunk metadata lookups.
Best answer: A
Explanation: The best fit is storing embeddings in Amazon RDS for PostgreSQL with pgvector while keeping source documents in S3. This preserves transactional SQL filtering against existing case and entitlement tables without adding a separate vector search cluster.
Amazon RDS for PostgreSQL with pgvector is appropriate when vector retrieval must be tightly coupled with relational metadata and transactional constraints. The application can store the S3 object URI, chunk text or pointer, embedding vector, tenant, and document identifiers in PostgreSQL, then run a similarity query joined to the existing entitlement and case-status tables before sending grounded context to Amazon Bedrock. This avoids duplicating rapidly changing authorization state into an external vector index. Bedrock can still be used for embedding generation and final response generation, but the retrieval control plane remains in the relational database. The key tradeoff is choosing SQL consistency and join capability over a managed RAG workflow or search-cluster flexibility.
Topic: Foundation Model Integration, Data Management, and Compliance
A financial services company uses Amazon Bedrock Prompt Management for a claims-assistant prompt. The team runs a fixed 120-case evaluation set after each prompt version and wants to improve response quality without changing the FM yet. Based on the exhibit, which next step best implements an iterative prompt refinement loop?
Exhibit: Evaluation summary
| Rubric item | Target | Result | Feedback |
|---|---|---|---|
| Factuality | 90%+ | 92% | Facts align |
| JSON format | 95%+ | 96% | Schema valid |
| Source attribution | 90%+ | 54% | Missing claim IDs |
| Tone | 85%+ | 88% | Appropriate |
Options:
A. Add a guardrail that blocks uncited responses.
B. Increase temperature to improve citation completeness.
C. Revise citation instructions and rerun the fixed rubric.
D. Switch to a larger FM for all production traffic.
Best answer: C
Explanation: The structured evaluation shows one specific quality gap: source attribution is 54% against a 90% target. An iterative prompt refinement loop should make a targeted prompt change, version it, and rerun the same evaluation rubric to measure improvement.
Prompt refinement beyond basic prompting uses structured feedback as the control signal. Here, factuality, JSON format, and tone already meet their targets, while source attribution fails with feedback that claim IDs are missing. The best next step is to create a new prompt version that explicitly requires claim-ID citations from the provided context, then run the same fixed evaluation set and rubric to compare results against the prior version. This preserves prompt governance and avoids changing unrelated variables before the specific prompt weakness is measured.
Changing the FM, temperature, or safety controls may be useful in other situations, but those actions do not directly close the rubric-backed prompt quality gap shown in the exhibit.
Topic: Implementation and Integration
Which statement best defines an Amazon Q Business data source in an internal knowledge application that must provide governed access to organizational information?
Options:
A. A prompt versioning feature for reusable application prompts
B. An MCP server that exposes application tools to agents
C. A connector that syncs enterprise content and permissions into Amazon Q Business
D. A vector database that stores embeddings for Bedrock Knowledge Bases
Best answer: C
Explanation: An Amazon Q Business data source connects the application to enterprise repositories such as document stores, wikis, ticketing systems, or object storage. It supports governed internal knowledge access by syncing content and authorization metadata so users receive answers from information they are allowed to access.
Amazon Q Business data sources are connectors used to ingest and keep organizational content available to an Amazon Q Business application. For enterprise knowledge tools, the key role is not only retrieval but governed retrieval: the source connector syncs documents, metadata, and access-control information so the application can respect user and group permissions when answering questions. This is different from building a custom RAG stack with a vector store or managing prompts for a foundation model. The key takeaway is that Q Business data sources are the governed connection point between enterprise repositories and the internal knowledge assistant.
Topic: Foundation Model Integration, Data Management, and Compliance
An insurance company runs a claims summarization API on Amazon Bedrock in us-east-1. After a release, the assistant sometimes omits escalation instructions and returns prose instead of the JSON object required by a CRM. Logs show only the model ID, not the prompt template version. The team must keep customer data in the AWS account and Region, audit and roll back prompt changes, enforce a stable output contract, and block changes that fail golden-conversation regression tests. Which architecture is the best fit?
Options:
A. Use Bedrock Guardrails content filters and PII redaction while keeping prompt text in Lambda environment variables.
B. Fine-tune a SageMaker endpoint on failed outputs and replace the Bedrock prompt-based workflow.
C. Store the prompt in S3 with bucket versioning and load the latest object from Lambda at runtime.
D. Use Bedrock Prompt Management versions, AppConfig promotion, golden-set CI tests, JSON Schema validation, and prompt/model-version logging.
Best answer: D
Explanation: The failure is a prompt-operations problem, not primarily a model-training problem. The best design governs prompt versions, promotes only tested versions, validates the required JSON contract, and logs the prompt version used for each request so regressions can be traced and rolled back.
Prompt failures from missing instructions, conflicting constraints, weak schemas, and unmanaged templates require a prompt governance and QA workflow. Amazon Bedrock Prompt Management provides managed prompt assets and versions. AWS AppConfig or a CI/CD-controlled configuration can promote a specific approved version and roll it back quickly. Golden-conversation regression tests in a pipeline can invoke the target Bedrock model and fail the deployment when outputs omit required instructions or break expected behavior. JSON Schema validation before sending data to the CRM enforces the output contract, while CloudWatch logs that include prompt ID, prompt version, and model ID provide audit and troubleshooting evidence. Guardrails can add safety controls, but they do not replace prompt versioning, regression gates, or schema validation.
Topic: Testing, Validation, and Troubleshooting
A team uses AWS CodePipeline to promote a new Amazon Bedrock prompt version and a Bedrock Knowledge Bases retrieval configuration. A pre-production evaluation runs against a fixed regression dataset. The deployment policy requires every quality gate to meet its threshold before production promotion.
Exhibit: Evaluation results
| Gate | Threshold | Baseline | Candidate |
|---|---|---|---|
| Grounded answer rate | ≥ 95% | 96.8% | 91.4% |
| Prompt-injection refusal | ≥ 98% | 98.7% | 99.1% |
| P95 latency | ≤ 2,000 ms | 1,850 ms | 1,610 ms |
What is the best next step?
Options:
A. Start a production canary and monitor user feedback.
B. Promote the candidate because latency and safety improved.
C. Lower the groundedness threshold for this release.
D. Fail the gate and keep the current production version.
Best answer: D
Explanation: Continuous evaluation quality gates must block releases when a required metric fails. The decisive exhibit detail is the candidate grounded answer rate of 91.4%, which is below the 95% threshold, even though other metrics pass.
Regression testing for GenAI changes should treat prompt, model, retrieval, and workflow evaluations as release criteria, not advisory signals. In this scenario, the policy says every gate must meet its threshold before production promotion. The candidate improves latency and passes prompt-injection refusal, but it fails the groundedness gate, indicating a likely regression in retrieval relevance, citation grounding, or answer generation. The safe next step is to stop the promotion, keep the current production version, investigate the candidate change, and rerun the same regression evaluation after remediation.
A canary is useful after pre-production gates pass; it should not bypass a failed mandatory gate.
Topic: AI Safety, Security, and Governance
A financial services company is launching an internal claims assistant on AWS. The assistant must answer from approved policy documents, show citations and a confidence indicator, and provide a brief user-facing reasoning summary without exposing hidden chain-of-thought. It also invokes Lambda tools for claim status, and auditors require per-response evidence of retrieved sources and tool calls. Foundation model API traffic must use private AWS connectivity. Which architecture best meets these requirements?
Options:
A. Fine-tune an FM in SageMaker to generate full chain-of-thought, citations, and confidence values; store transcripts in S3.
B. Use Bedrock Agents with Knowledge Bases and Lambda action groups; enable tracing, log traces, return citations/confidence bands with a sanitized rationale, and access Bedrock through VPC endpoints.
C. Use Bedrock Guardrails with direct InvokeModel; treat guardrail allow/block outcomes as confidence, and log guardrail intervention results.
D. Build a custom Lambda RAG workflow with OpenSearch Service; return top document URLs and log only API Gateway request IDs.
Best answer: B
Explanation: The best fit is a Bedrock agent architecture that exposes transparent evidence without exposing hidden reasoning. Bedrock Knowledge Bases provide citations and retrieval metadata, while agent tracing records the orchestration path, retrieved sources, and Lambda action calls for audit review. VPC endpoints keep Bedrock API traffic on private AWS connectivity.
Responsible AI transparency in this scenario requires separating user-facing explanations from internal execution evidence. A Bedrock Agent with a Knowledge Base can ground answers in approved documents and return source citations. Agent tracing provides audit evidence for retrieval steps and Lambda action group calls, and those traces can be persisted to CloudWatch Logs or another governed log store. The application can convert retrieval scores and grounding signals into a simple confidence band and generate a concise, sanitized rationale for users without revealing hidden chain-of-thought. Interface VPC endpoints for Bedrock keep model API calls on private AWS connectivity.
Guardrails can complement this design for safety, but they do not replace attribution, confidence evidence, or agent traceability.
Topic: Implementation and Integration
An application uses an API Gateway and Lambda GenAI gateway to invoke Amazon Bedrock models. The team can route by request class. Simple FAQ requests must meet p95 latency under 2,000 ms with quality at least 90%. Contract analysis has no 2-second SLO but must meet quality at least 90%.
Exhibit: Canary routing results
| Request class | Candidate route | p95 / cost index | Quality |
|---|---|---|---|
| FAQ text | High-capability text | 4,700 ms / 9 | 93% |
| FAQ text | Fast text | 1,100 ms / 2 | 91% |
| Contract analysis | Fast text | 1,300 ms / 2 | 73% |
| Contract analysis | High-capability text | 3,900 ms / 9 | 94% |
Which routing change is the best next step?
Options:
A. Route FAQ text to the fast text model and contract analysis to the high-capability model.
B. Route both request classes to the fast text model.
C. Keep routing unchanged and tune Lambda memory for lower latency.
D. Route both request classes to the high-capability text model.
Best answer: A
Explanation: The best routing policy uses the cheapest and fastest model that still satisfies each request class requirement. The exhibit shows FAQ text can move to the fast model, but contract analysis needs the high-capability model to meet the 90% quality target.
Intelligent model routing maps each request class to the model that best satisfies its required latency, cost, and capability constraints. The exhibit shows FAQ text on the fast model has p95 latency of 1,100 ms, cost index 2, and 91% quality, so it satisfies the FAQ SLO at lower cost. Contract analysis on the fast model reaches only 73% quality, below the 90% target, while the high-capability model reaches 94%. A single default model would either waste cost and latency for FAQ or fail contract analysis quality.
Topic: Implementation and Integration
A team is building a production GenAI application on Amazon Bedrock and wants to use Amazon Q Developer to speed up implementation. Which statement best describes the role of this developer productivity tool in the delivery process?
Options:
A. Accelerate code and guidance, then validate through normal SDLC controls.
B. Serve as the audit record for runtime model behavior.
C. Approve production IAM policies generated during development.
D. Replace CI/CD tests for generated application code.
Best answer: A
Explanation: Developer productivity tools such as Amazon Q Developer help teams write, explain, and troubleshoot code faster. They do not replace the engineering controls needed for production GenAI systems, including architecture review, automated testing, least-privilege IAM, logging, and monitoring.
The core concept is tool-assisted implementation with retained production governance. Amazon Q Developer can suggest code, explain AWS APIs, help debug errors, and provide implementation guidance. However, generated or suggested artifacts must still move through the same delivery controls as human-authored code: peer review, CI/CD validation, security checks, infrastructure-as-code review, observability instrumentation, and operational readiness. For GenAI workloads, this also includes validating prompts, data access paths, model invocation behavior, and safety controls. The key takeaway is that productivity tooling accelerates developer work; it does not become the source of truth for architecture, security approval, testing, or operations.
Topic: Implementation and Integration
A developer is building an Amazon Bedrock agent that checks order status by invoking AWS Lambda. The model must choose tools from standardized operation names and typed parameters, while the Lambda function validates inputs, handles downstream API errors, and returns stable JSON fields to the agent. Which implementation approach matches this service role?
Options:
A. Configure Bedrock Guardrails to define Lambda parameters and return schemas.
B. Define a Bedrock agent action group with a tool schema and Lambda executor.
C. Use CloudWatch Logs to infer tool names and response fields at runtime.
D. Use Lambda reserved concurrency to publish callable function definitions.
Best answer: B
Explanation: Amazon Bedrock agent action groups are the integration point for tools. They define callable operations and parameters, then invoke a Lambda function or API to execute the selected action and return a predictable response.
For Bedrock Agents, an action group is the durable contract between model reasoning and external tools. The action group uses an OpenAPI schema or function details to describe operations, parameters, and expected invocation shape. The Lambda executor should still perform server-side parameter validation, catch and normalize downstream failures, and return a stable response structure that the agent can use safely. This avoids relying on free-form prompts or logs as the tool contract.
Topic: Foundation Model Integration, Data Management, and Compliance
A healthcare payer is building an authenticated chat assistant for members in one AWS account and Region. The assistant must recognize whether a turn is about benefits, claim status, or appeals; ask a clarifying question when required claim details are missing; ground benefits answers in indexed policy documents; and preserve cross-device conversation context for 30 days with encryption and audit evidence. Clients must not call foundation models or internal APIs directly. Which architecture best meets these requirements?
Options:
A. Call Amazon Bedrock directly from the client and store history in browser local storage.
B. Fine-tune a SageMaker model for all intents and write transcripts to S3 daily.
C. Use Amazon Lex only, store session attributes in Lex, and call Bedrock after fulfillment.
D. Use API Gateway WebSocket, Lambda, Bedrock Agents, Knowledge Bases, and DynamoDB.
Best answer: D
Explanation: The best fit is a server-side conversational architecture that combines managed FM orchestration with durable state storage. Bedrock Agents can handle intent routing, tool selection, and clarification, while Knowledge Bases support grounded document answers and DynamoDB preserves encrypted conversation state across devices.
Interactive FM systems need a backend conversation orchestrator because each turn must combine the current message, recognized intent, missing-slot state, retrieved grounding, and relevant prior history. API Gateway WebSocket with Lambda can authenticate clients and keep model/API calls server-side. A Bedrock Agent can route intents, elicit missing claim details, invoke action groups, and use a Bedrock Knowledge Base for benefits retrieval. DynamoDB with KMS encryption and TTL can store turn history, intent state, slot values, and summaries for 30 days, while CloudTrail and CloudWatch provide audit and operational evidence. The key takeaway is to persist governed conversation state outside the client and use managed Bedrock orchestration for clarification workflows.
Topic: Implementation and Integration
A company deploys a customized text-summarization FM to an Amazon SageMaker AI real-time endpoint from SageMaker Model Registry. A pipeline just changed production from summarizer-v2 to summarizer-v3 with a 10% canary. The team must restore service quickly and keep model lifecycle evidence accurate. What is the best next step based on the exhibit?
Exhibit: deployment signal
Previous package: summarizer-v2 (Approved)
New package: summarizer-v3 (Approved)
Canary traffic: 10%
5XXError (canary): 6.8% (limit: 1%)
ModelLatency p95: 7.4s (SLO: 3s)
Deployment rollback: automatic rollback disabled
CloudTrail: UpdateEndpoint succeeded 12 min ago
Options:
A. Increase summarizer-v3 instance count and continue the canary.
B. Delete summarizer-v2 from Model Registry after the update.
C. Promote summarizer-v3 to 100% traffic to collect metrics.
D. Revert the endpoint to summarizer-v2 and mark summarizer-v3 rejected.
Best answer: D
Explanation: The exhibit shows a failed canary: 6.8% 5XX errors exceeds the 1% limit, and p95 latency is above the SLO. Because automatic rollback is disabled, the next step is a manual rollback to the last known-good approved package and a lifecycle status update for the failed version.
For a production FM endpoint canary, metrics that breach predefined error and latency limits should stop promotion and trigger rollback. Here, the failed version is serving only 10% of traffic but already exceeds both operational thresholds, and the exhibit explicitly says automatic rollback is disabled. The controlled response is to update the endpoint back to the previous approved model package or endpoint configuration and update Model Registry so the failed package is not treated as a valid production candidate. Keeping the failed package marked approved risks accidental redeployment through the pipeline.
The key takeaway is to restore the known-good endpoint first, then preserve lifecycle evidence that the new model version failed production validation.
Topic: Implementation and Integration
A company exposes POST /genai/chat through an Amazon API Gateway REST API as a gateway layer in front of Amazon Bedrock Runtime. API Gateway selects the model from /{modelAlias} and uses an AWS service integration. After onboarding a mobile client, the support-claude route returns 400 errors only for that client.
Client body: {"prompt":"reset my password","maxTokens":300}
Integration body: {"prompt":"reset my password","maxTokens":300}
Bedrock error: ValidationException: required key [messages] not found
The client cannot be updated for 6 weeks. Which change fixes the root cause with the smallest safe change?
Options:
A. Add a Bedrock Guardrail to rewrite the JSON payload.
B. Retry ValidationException responses with exponential backoff.
C. Add an API Gateway request mapping template for support-claude.
D. Route support-claude to a Titan Text model temporarily.
Best answer: C
Explanation: The symptom is a malformed FM payload, not a transient Bedrock failure. The root cause is that API Gateway is passing the mobile client’s generic schema directly to a model route that expects a different request body. A request mapping template fixes the gateway normalization without changing clients or model behavior.
API Gateway can be used as a GenAI gateway layer to normalize client requests before routing them to a foundation model integration. Here, the integration body is identical to the client body, and Bedrock rejects it because the selected model route expects fields such as a messages structure rather than the client’s prompt field. The smallest safe fix is to add a route-specific API Gateway request transformation that maps the stable client contract to the Bedrock payload required by support-claude, including field names and model parameters. This preserves the mobile client contract and avoids changing the selected model or relying on retries for a deterministic validation error.
Topic: AI Safety, Security, and Governance
An insurance company is building a claims chat assistant. API Gateway invokes a Lambda function that stores chat turns in DynamoDB, retrieves recent turns, and starts an Amazon Bedrock Prompt Flow. Compliance requires SSNs, phone numbers, and email addresses to be masked before any chat text is stored or sent to the FM. CloudWatch metrics are required, but raw PII must not be logged. Which implementation meets these requirements?
Options:
A. Encrypt DynamoDB with KMS and restrict IAM access.
B. Call ApplyGuardrail in Lambda before storage and Prompt Flow input.
C. Attach a Bedrock Guardrail only to the Prompt Flow invocation.
D. Run scheduled Macie scans on exported chat records.
Best answer: B
Explanation: The PII control must run before DynamoDB writes and before the prompt reaches the FM. Calling Amazon Bedrock Guardrails with sensitive information filters from Lambda lets the application store, retrieve, prompt, and log only sanitized text or metadata.
The core pattern is application-layer PII preprocessing at the ingestion and prompt-construction boundary. Lambda should call the Bedrock Guardrails ApplyGuardrail API with sensitive information filters configured to mask or anonymize SSNs, phone numbers, and email addresses. The sanitized result is what gets written to DynamoDB, retrieved for chat history, passed into the Bedrock Prompt Flow, and referenced in CloudWatch metrics. Logs should include counts, entity types, request IDs, and outcomes, not raw matched values.
A guardrail attached only at model invocation can reduce unsafe FM input or output, but it does not prevent raw PII from being persisted first. Encryption, IAM, and after-the-fact scanning are useful supporting controls, but they do not satisfy the stated timing requirement.
Topic: Operational Efficiency and Optimization for Genai Applications
Which capability is best defined as the use of correlation IDs and request spans to reconstruct a GenAI interaction across the application, retrieval layer, guardrails, tool calls, and model invocation for operational dashboards?
Options:
A. Reranking retrieved passages
B. Semantic caching for repeated prompts
C. SageMaker Model Registry versioning
D. Distributed tracing with AWS X-Ray
Best answer: D
Explanation: Distributed tracing is the observability capability that follows a request through multiple services by using trace context and spans. In GenAI applications, this helps dashboards reconstruct user interactions, retrieval steps, guardrail decisions, tool calls, and model invocation behavior.
For compliance and forensic traceability dashboards, the key concept is distributed tracing. AWS X-Ray can capture trace segments and subsegments across components such as API Gateway, Lambda, retrieval services, and model invocation code. This lets operators answer questions such as which user action triggered a model call, where latency occurred, which downstream dependency failed, and which workflow path was taken. Tracing complements logs and metrics; it does not replace prompt governance, retrieval quality controls, or model artifact versioning. The key takeaway is that traces connect events across services into one request timeline.
Topic: Implementation and Integration
A company is deploying an Amazon Q Business application for employees to search HR, legal, and engineering knowledge bases. The application must answer only from documents that the signed-in user is allowed to access in the source system.
Exhibit: Data source sync summary
| Finding | Value |
|---|---|
| Identity provider | IAM Identity Center |
| ACL crawling | Disabled |
| Documents indexed | 42,118 |
| Sync warning | User/group access metadata skipped |
What is the best next step?
Options:
A. Encrypt the Q Business index with a new KMS key
B. Create one shared IAM role for all employees
C. Enable ACL crawling and remap source principals
D. Add a Bedrock Guardrail to block legal terms
Best answer: C
Explanation: Amazon Q Business should use identity-aware data source configuration when enterprise users need governed access to organizational information. The decisive exhibit detail is that ACL crawling is disabled and user/group access metadata was skipped, which prevents source permissions from being applied during retrieval.
For an internal knowledge tool, Amazon Q Business should ingest both content and access control metadata from supported data sources, then map those principals to the configured identity provider such as IAM Identity Center. In this case, the documents were indexed, but the sync warning shows that user and group access metadata was skipped. Enabling ACL crawling and fixing principal mapping allows Q Business to restrict answers to documents the signed-in user can access in the original system. Encryption and guardrails can be useful controls, but they do not replace document-level authorization for retrieval.
Topic: AI Safety, Security, and Governance
An insurance company deploys a customer-facing claim assistant using Amazon Bedrock. A Bedrock Guardrail denies legal advice and claim-approval promises. Red-team tests still receive responses such as ‘Your claim will be approved if you file today.’ CloudWatch traces show the application calls ApplyGuardrail only with source=INPUT before Converse; the generated response is streamed directly to users. Which change fixes the root cause with the smallest safe change?
Options:
A. Assess the response with Bedrock Guardrails before forwarding it.
B. Add a stronger system prompt warning.
C. Schedule Bedrock Model Evaluations on transcripts.
D. Lower the model temperature for all invocations.
Best answer: A
Explanation: The symptom is unsafe claim-related output despite an existing guardrail. The root cause is that the guardrail is applied only to the input, not the model response. The smallest safe fix is to apply Bedrock Guardrails to the generated output before delivery.
Symptom: red-team prompts produce claim-approval promises even though the guardrail policy denies them. Root cause: the application only calls ApplyGuardrail with source=INPUT before Converse; it never evaluates the FM response. Fix: attach the guardrail to the Bedrock runtime request or call ApplyGuardrail with source=OUTPUT on the generated response before forwarding or streaming it. This provides runtime blocking or masking and auditable guardrail traces. Offline evaluations are useful for regression testing, but they do not enforce output safety for a live response.
Topic: Foundation Model Integration, Data Management, and Compliance
A company is replacing a text-only Amazon Bedrock call in a Lambda function behind API Gateway. The new implementation must call an Anthropic Claude multimodal model on Amazon Bedrock Runtime using InvokeModel. Each request includes prior chat turns, the latest user question, and a PNG screenshot. Screenshots must remain in memory and must not be written to S3. Which request body should the Lambda function send?
Options:
A. A multipart/form-data request with separate parts for prompt, history, and the PNG file.
B. A JSON body with inputText, prior turns appended to the prompt, and a top-level imageBytes field.
C. A JSON body with anthropic_version, max_tokens, and messages; map prior turns to roles and include the PNG as a base64 image block with a text block.
D. An OpenAI-compatible messages array with string-only content and an image_url that points to a presigned S3 URL.
Best answer: C
Explanation: InvokeModel request bodies must match the selected foundation model’s required schema. For an Anthropic Claude multimodal model on Bedrock, the Lambda function should send a JSON Messages API payload with role-based turns and content blocks for text and image data.
Amazon Bedrock InvokeModel does not automatically convert a generic prompt into every provider-specific request format. The application must build the JSON body expected by the target model. For Anthropic Claude multimodal models, prior conversation turns belong in a messages array with user and assistant roles, and the current user message can contain multiple content blocks, such as a text block and an image block. Because the screenshot cannot be stored in S3, the function should read it in memory, base64 encode it, and include the appropriate media type, such as image/png, in the image source. Titan-style inputText, multipart uploads, and URL-based image references do not meet the model schema or the storage constraint.
inputText is not the Claude Messages API format and a top-level image field is not valid.InvokeModel call expects a JSON request body for the selected model.Topic: Foundation Model Integration, Data Management, and Compliance
An insurance company is building an internal claims-policy assistant with Amazon Bedrock. The assistant must use private policy PDFs and HTML files in S3, answer with citations, reflect daily source updates, keep source data and embeddings in one AWS Region encrypted with customer managed KMS keys, and remove PII before any retrieved context is sent to the FM. The team must deliver in 8 weeks and is prohibited from training or fine-tuning on customer data. Which architecture is the best fit?
Options:
A. Preprocess S3 documents with Step Functions, Lambda, and Comprehend for PII redaction and metadata, then schedule Bedrock Knowledge Bases syncs to same-Region KMS-encrypted vector storage.
B. Build a SageMaker Feature Store pipeline, engineer document features, fine-tune an FM on claims data, and serve it from SageMaker endpoints.
C. Use Glue ETL and Athena to denormalize documents into analytics tables, then generate daily batch summaries for the assistant to search.
D. Invoke Bedrock directly with full S3 document text at request time and rely on prompt instructions to suppress PII and cite sources.
Best answer: A
Explanation: The requirement is to prepare enterprise documents for safe FM consumption, not to create ML features or customize a model. A RAG-oriented ingestion path with PII redaction, metadata, managed vector retrieval, encryption, and scheduled syncs best meets the latency, governance, freshness, and delivery constraints.
Preparing data for FM consumption means making governed source content retrievable and safe for the model, not transforming it into predictive features. Here, the team needs a RAG ingestion path: extract text, remove PII, add useful metadata, chunk and embed the content, and keep the vector index synchronized with S3. Bedrock Knowledge Bases provides managed retrieval and citation support, while same-Region encrypted vector storage and AWS service logs support locality and auditability. A small Step Functions and Lambda preprocessing pipeline with PII detection keeps sensitive values out of the knowledge base context. The key boundary is to avoid Feature Store-style engineering or FM fine-tuning because the requirement is consumption of existing documents by a managed FM.
Use the AWS AIP-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try AWS AIP-C01 on Web View AWS AIP-C01 Practice Test
Read the AWS AIP-C01 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.