Free Databricks GenAI Engineer Practice Exam: Generative AI Engineer Associate
Try 45 free Databricks Certified Generative AI Engineer Associate (Databricks Generative AI Engineer Associate) questions across the exam domains, with explanations, then continue with IT Mastery practice.
This free full-length Databricks Generative AI Engineer Associate practice exam includes 45 original IT Mastery questions across the exam domains.
These are original IT Mastery practice questions. They are not official Databricks questions, copied live-exam content, or exam dumps. Use them to preview question style and explanation depth before continuing with mixed sets, topic drills, and timed mocks in IT Mastery.
Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.
Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.
Exam snapshot
- Practice target: Databricks Generative AI Engineer Associate
- Practice-set question count: 45
- Time limit: 90 minutes
- Practice style: mixed-domain diagnostic run with answer explanations
Full-length exam mix
| Domain | Weight |
|---|---|
| Design Applications | 14% |
| Data Preparation | 14% |
| Application Development | 30% |
| Assembling and Deploying Applications | 22% |
| Governance | 8% |
| Evaluation and Monitoring | 12% |
Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and interactive practice.
Practice questions
Questions 1-25
Question 1
Topic: Application Development
A team is building a Databricks support assistant that must answer from internal policy PDFs and return citations. The PDFs are already chunked in a Delta table and indexed in Mosaic AI Vector Search.
Artifact: Current chain outline
1. Receive user question
2. Normalize the question
3. Build prompt:
system: Answer only from provided policy context. Cite doc_uri.
user: {question}
4. Call Foundation Model API
5. Postprocess citation list
6. Return answer
Where should the retriever be added?
Options:
A. Between question normalization and prompt construction
B. After the model returns the answer
C. As an MLflow trace callback
D. Only during Vector Search index refresh
Best answer: A
Explanation: In a RAG workflow, the retriever is part of the request path before model invocation. The user question, or a normalized version of it, is used to query the Vector Search index. The returned chunks and metadata, such as doc_uri, are then inserted into the prompt as context before calling the Foundation Model API. In the artifact, the prompt asks the model to use “provided policy context,” but no step supplies that context. Adding retrieval after generation cannot ground the answer, and index refresh is a data maintenance task, not the per-query retrieval point.
- Post-answer retrieval can find related documents, but it cannot make the already generated answer grounded.
- Index refresh keeps searchable content current, but it does not retrieve chunks for each user question.
- Trace callback records execution details; it is not where application context is fetched for the model.
Question 2
Topic: Governance
A team is preparing a Databricks RAG assistant for support engineers. The planned Vector Search corpus includes Unity Catalog tables with approved retention metadata plus vendor PDFs copied from a shared drive. No one can confirm who owns the PDFs, whether internal GenAI use is allowed, or how long derived chunks may be retained. The team wants to deploy next week. What is the best engineering decision?
Options:
A. Add a disclaimer that answers may use unverified sources.
B. Index all sources but restrict endpoint access to support engineers.
C. Embed the PDFs, then delete the original files.
D. Use approved tables; send PDFs for governance review first.
Best answer: D
Explanation: Governed GenAI data use requires confirming source ownership, usage rights, and retention obligations before creating derived artifacts such as chunks or embeddings. The approved Unity Catalog tables can be used because their governance metadata is already known. The vendor PDFs should be excluded from the corpus until the appropriate source owner, legal, or data governance review confirms whether they may be used for an internal RAG application and how long derived data may be retained. Access restrictions or disclaimers do not fix unclear rights, and embeddings can still be governed derivatives of the source content.
- Endpoint restriction limits who can query the app, but it does not establish permission to ingest or retain the PDFs.
- Deleting originals does not remove governance obligations for copied chunks, embeddings, or indexed content.
- Source disclaimer may help transparency, but it does not resolve ownership, usage-rights, or retention uncertainty.
Question 3
Topic: Application Development
A Databricks team is testing a Model Serving RAG app that answers employee-benefits questions. The team needs to choose the next guardrail based on the trace.
Exhibit: MLflow trace excerpt
Input guardrail: pass (allowed topic; no jailbreak)
Masking scan: pass (no PII entities detected)
Policy check: pass (user can access HR-benefits docs)
Retrieval: two approved docs conflict on eligibility
Output judge: low confidence; may affect benefits decision
Requirement: do not send low-confidence determinations directly to employees
Which control best addresses the remaining risk?
Options:
A. Rewrite the answer with a stronger uncertainty disclaimer.
B. Route flagged benefits answers to human review before release.
C. Mask names and IDs in retrieved context.
D. Return the answer because policy checks passed.
Best answer: B
Explanation: Guardrail selection should match the risk being controlled. Input guardrails screen user requests before retrieval or tool use. Masking removes sensitive entities. Policy checks enforce access, source-use, or business rules. Output guardrails can detect unsafe or low-quality generated text, but a flagged high-impact decision may still need a human gate. In this trace, the input, masking, and policy checks all pass. The unresolved issue is conflicting approved evidence plus a low-confidence answer that could affect employee benefits. A disclaimer does not resolve the conflict or meet the requirement not to send the determination directly.
- Disclaimer-only output still sends an uncertain benefits determination to the employee.
- Masking addresses sensitive entities, but the trace says no PII was detected.
- Passed policy checks confirm access rules, not correctness when approved documents conflict.
Question 4
Topic: Design Applications
A retailer is building a Databricks RAG assistant for support agents. An agent enters a customer’s order ID and asks, “What refund options apply, and draft a response.” Order facts are in a Delta table; refund rules are in policy PDFs indexed in Vector Search. Which chain configuration best maps this interaction goal to inputs, knowledge, task, and output?
Options:
A. Prompt uses agent question and order facts; retrieve policy chunks; model applies rules and drafts; output eligibility and response.
B. Prompt uses agent question; skip retrieval; model relies on training data; output a generic refund script.
C. Prompt uses only policy chunks; retrieve similar tickets; model classifies sentiment; output escalation priority.
D. Prompt uses order ID only; query Delta with Genie; model summarizes orders; output purchase history.
Best answer: A
Explanation: The interaction goal has two parts: determine which refund options apply to a specific order and draft a support response. The prompt input must include the agent’s request and the structured order facts from Delta. Retrieved knowledge should come from the indexed refund policy documents, because those contain the rules the model must apply. The model task is not simple classification or database summarization; it is grounded rule application plus response generation. The response output should include the eligibility reasoning or options and a usable draft for the customer. A design that omits order facts or skips retrieval cannot reliably ground the answer in both the customer context and current policy.
- Similar tickets solve a support analytics problem, not the policy-grounded refund decision requested by the agent.
- Order summarization uses the structured data layer but misses the policy knowledge and drafting requirement.
- Generic script ignores the Vector Search policy source and cannot tailor the answer to the specific order.
Question 5
Topic: Assembling and Deploying Applications
A customer support team stores resolved tickets in a Unity Catalog Delta table. Each night, it needs to classify newly closed tickets with a short issue category and sentiment label using an existing served foundation model. The results must be written back to a Delta table for analysts, and minute-level latency is acceptable. Which approach is the best engineering decision?
Options:
A. Run a scheduled SQL job that calls
ai_query()over new rowsB. Create a Vector Search index and deploy a RAG application
C. Use Agent Framework with tools to process each ticket interactively
D. Build a low-latency chat app that calls Model Serving per user request
Best answer: A
Explanation: ai_query() is appropriate when a Databricks workload needs to apply model inference to data already stored in tables, especially from SQL or a scheduled batch job. In this scenario, the inputs are new Delta table rows, the output is structured enrichment written back to Delta, and latency is not interactive. That makes row-wise SQL-based batch inference a better fit than building a user-facing serving application or retrieval system. The served foundation model can still be used, but the workload pattern is table-in, table-out processing rather than online chat or agent orchestration.
- Per-request serving adds an interactive application layer that the nightly analyst workflow does not need.
- Vector Search RAG helps retrieve relevant context, but the task is classification of stored rows, not retrieval-grounded answering.
- Agent orchestration is unnecessary because the task does not require tool use, planning, or multi-step interaction.
Question 6
Topic: Evaluation and Monitoring
A team has deployed a RAG chatbot on a Databricks Model Serving endpoint. Pre-release, they already use MLflow evaluation on a curated Q&A dataset. The operations owner now needs visibility into live user traffic to detect latency regressions, error spikes, and token-usage increases. Which setup best meets this requirement?
Options:
A. Add MLflow tracing in the development notebook only
B. Run MLflow evaluation again on the curated Q&A dataset
C. Increase Vector Search
kand measure recall on test queriesD. Enable AI Gateway inference and usage logging for the serving endpoint
Best answer: D
Explanation: Live deployment monitoring uses signals from production traffic, such as request volume, latency, error rates, token usage, and other endpoint usage metrics. In Databricks, AI Gateway and inference logging can capture these signals through inference tables and usage tables for deployed Model Serving traffic. MLflow evaluation on a curated dataset is useful before release because it checks quality against known examples, but it does not show what real users are experiencing after deployment. MLflow tracing helps debug chain behavior and spans, especially during development or evaluation, but it is not a substitute for live usage monitoring. The key distinction is production observability versus pre-release quality assessment.
- Offline scoring misses the requirement because the curated Q&A dataset does not reflect live endpoint traffic.
- Notebook tracing only helps inspect chain execution, but it does not provide production latency, error, and usage monitoring.
- Retriever tuning addresses retrieval quality, not live serving health or token-usage spikes.
Question 7
Topic: Design Applications
An expense-policy assistant will run on Databricks. Its only output is the current reimbursement limit and a citation for an employee’s submitted country and expense type. Department, manager, and prior trips do not affect the limit. Policies are stored as chunked Delta records with metadata: country, expense_type, section_id, and effective_date. Which model input configuration best keeps context to the minimum necessary?
Options:
A. Embedding vectors plus policy metadata only
B. Question plus top chunks filtered by country, expense type, date, with section metadata
C. Employee profile plus all historical expense reports
D. Question plus the full expense policy Delta table
Best answer: B
Explanation: The pipeline input should match the output goal: return a policy limit and cite the source. In a RAG design, the LLM should receive the user’s request plus only the retrieved policy text that can answer it, filtered by the known constraints such as country, expense type, and current effective date. Citation fields such as section_id are useful because the answer must cite the policy. Employee profile data and prior expenses are unnecessary because the stem states they do not affect the reimbursement limit. Passing the entire table increases noise and context cost, while passing only vectors or metadata removes the rule text the model needs to answer.
- Full policy table adds irrelevant sections and increases the chance of distracting or conflicting context.
- Employee history solves a personalization or audit problem, but the stated limit does not depend on employee attributes.
- Vectors only support retrieval, but the generation step needs policy text to produce the limit.
Question 8
Topic: Application Development
A team is building a high-volume support-ticket RAG app on Databricks. Source documents will be chunked to at most 450 tokens before embedding, and the product requirement prioritizes low p95 latency and inference cost over maximum retrieval quality. The team is creating a new Mosaic AI Vector Search index.
| Embedding model | Max input | Latency/cost | Retrieval score |
|---|---|---|---|
| SmallEmbed | 512 tokens | Low | 0.78 |
| BalancedEmbed | 2,048 tokens | Medium | 0.82 |
| LargeEmbed | 8,192 tokens | High | 0.86 |
| LongContextEmbed | 32,000 tokens | High | 0.80 |
Which model should the team select?
Options:
A. SmallEmbed
B. LargeEmbed
C. LongContextEmbed
D. BalancedEmbed
Best answer: A
Explanation: Embedding model selection should match the retrieval workload constraints, not simply maximize benchmark score. Here, every document chunk is capped at 450 tokens, so a 512-token embedding context is sufficient. Because the business goal prioritizes low p95 latency and cost over the highest retrieval score, the smallest model that can represent the full chunk is the best fit. Larger or longer-context embedding models may improve quality for some workloads, but they add latency and cost that the scenario explicitly deprioritizes.
The key takeaway is to choose the cheapest and fastest embedding model that satisfies context-length needs and acceptable retrieval quality.
- Balanced quality is tempting, but it increases latency and cost when the smaller model already fits the chunk length.
- Maximum score misses the requirement because the highest retrieval score is not the primary optimization target.
- Long context solves a problem the app does not have, since chunks are already capped well below 512 tokens.
Question 9
Topic: Application Development
A retail company is building a Databricks agent that answers store managers’ questions about sales and inventory. The source data is in governed Unity Catalog Delta tables, but the business requires the agent to use only the certified metrics and filters already configured in an AI/BI Genie Space. The agent must support clarifying follow-up questions and must not generate arbitrary SQL. Which integration is the best engineering decision?
Options:
A. Fine-tune an LLM on query history
B. Index table exports in Vector Search
C. Call the Genie Space conversational API
D. Generate SQL directly against Delta tables
Best answer: C
Explanation: The core concept is using a controlled conversational interface for structured data retrieval. A Genie Space can encapsulate approved business definitions, governed table access, and conversational clarification behavior for questions over structured data. When an agent needs to retrieve data through that interface, integrating with the Genie Space conversational API keeps the agent aligned with the certified metrics and governance boundary. Direct SQL generation would bypass the approved semantic layer, while Vector Search is better suited to unstructured or semi-structured document retrieval. Fine-tuning does not provide live, governed access to current structured data.
- Direct SQL fails because it bypasses the approved Genie Space definitions and allows arbitrary query generation.
- Vector Search is the wrong retrieval pattern for certified metrics over structured Delta tables.
- Fine-tuning may imitate past queries, but it does not enforce current governed metrics or retrieve live data.
Question 10
Topic: Assembling and Deploying Applications
A Databricks team is deploying an Agent Framework support assistant. It must remember each user’s approved cost center, open ticket IDs, and last completed workflow step across sessions. The values must be queryable for audit, and only current relevant state should be added to each prompt. Which configuration is best?
Options:
A. Capture the state only with inference tables
B. Persist per-user state in Unity Catalog-governed Delta tables
C. Append the full conversation history to every prompt
D. Store the state only in a Vector Search index
Best answer: B
Explanation: Persistent memory or structured state should be stored outside the immediate model prompt when it must survive across sessions, be updated independently, or be audited. In this scenario, cost centers, ticket IDs, and workflow steps are structured application state, not just retrieval knowledge or transient chat context. A Unity Catalog-governed Delta table provides durable storage, access control, lineage, and queryability. The agent or chain can then read only the current user’s relevant fields and inject a compact state summary into the prompt at runtime. This keeps prompts small while preserving durable memory. Vector Search is better for semantic retrieval over unstructured knowledge, not authoritative workflow state.
- Full history fails because it increases token usage and does not provide an auditable structured state store.
- Vector Search only confuses semantic retrieval with exact, updateable application state.
- Inference tables only are useful for logging and monitoring requests, not for managing active per-user memory.
Question 11
Topic: Assembling and Deploying Applications
An Agent Framework app in Databricks must let an LLM create and update incidents in a third-party ITSM system. The ITSM team already operates an MCP-compliant HTTPS server that exposes only the approved incident actions and handles its own API authentication. The app does not need to index ITSM records in Databricks. Which integration should the engineer configure?
Options:
A. Use a managed MCP server
B. Build a custom MCP server
C. Configure an external MCP server connection
D. Create a Vector Search retriever
Best answer: C
Explanation: An external MCP server is the best fit when the tool or data source is already exposed by an MCP server outside Databricks. Here, the ITSM team owns the HTTPS MCP endpoint, restricts the available actions, and handles API authentication. The Databricks agent only needs to use that existing boundary. A managed MCP server is for Databricks-provided managed capabilities, and a custom MCP server is for bespoke tools or actions that the team must implement themselves. Vector Search is a retrieval pattern for indexed content, not a transactional integration for creating or updating incidents.
- Managed MCP misses the requirement because the action boundary is a third-party server, not a Databricks-managed capability.
- Custom MCP adds unnecessary implementation work because the approved actions and authentication are already exposed through MCP.
- Vector Search solves retrieval over indexed records, not live incident creation or updates in the ITSM system.
Question 12
Topic: Application Development
A team is selecting the application framework for a Databricks customer-support assistant. The design note says:
Need: compose prompt -> retrieval -> optional tool call -> model response
Retrieval: Mosaic AI Vector Search over Unity Catalog documents
Tool: check_order_status(order_id)
Memory: preserve prior chat turns by session
Model: Databricks-hosted chat model endpoint
Which framework choice best matches these requirements?
Options:
A. An MLflow Model Registry entry without an application chain
B. A SQL-only
ai_query()workflow for each user messageC. A LangChain-compatible chain with retriever, tools, memory, and model invocation
D. A standalone Vector Search index called directly from the UI
Best answer: C
Explanation: The artifact requires an application orchestration framework, not just a single Databricks service call. A LangChain-compatible design can compose multiple steps: retrieve context from Mosaic AI Vector Search, invoke a tool when needed, maintain chat history, format prompts, and call a Databricks-hosted model endpoint. That matches the full design note because the framework coordinates the components in one conversational flow.
Vector Search is important for retrieval, but it does not provide the whole chain, memory, or tool orchestration by itself. MLflow can log, register, trace, and evaluate the application, but a registry entry alone is not the runtime chain. A SQL-only ai_query() pattern is useful for model invocation from SQL, not for this multi-step interactive assistant.
- Vector Search only fails because indexing and retrieval do not handle tool calls, memory, or prompt-to-model orchestration.
- MLflow only fails because lifecycle management does not replace the application framework needed to compose the assistant steps.
- SQL-only invocation fails because per-message model calls do not satisfy the required conversational memory and tool orchestration.
Question 13
Topic: Governance
A Databricks team is deploying a support-agent RAG assistant over approved ticket history. Agents ask normal troubleshooting questions, and useful fix steps often appear in the same chunks as customer identifiers or temporary secrets. The requirement is to reduce sensitive-data exposure without rejecting valid troubleshooting questions.
Artifact: Current chain
User query
-> Mosaic AI Vector Search retriever
-> prompt assembly with top chunks
-> Foundation Model API
-> answer
Retrieved chunk example:
Ticket 1842: jane.lee@example.com reported 401 errors.
Temporary credential: <secret value in source>
Fix: rotate app credentials and update the secret reference.
Policy: answer fixes; do not reveal PII or secrets.
Which guardrail placement best meets the requirement?
Options:
A. Reject queries that mention customers, credentials, or tickets
B. Mask sensitive text only after the model generates the answer
C. Exclude every ticket containing sensitive text from the retriever
D. Redact PII and secrets in retrieved chunks before prompt assembly
Best answer: D
Explanation: For this RAG chain, the best placement is between retrieval and prompt assembly. The retriever can still find relevant troubleshooting chunks, but a masking step removes PII and secrets before those chunks are sent to the Foundation Model API. This reduces exposure both to the model and to downstream generated output, while keeping the application useful for legitimate support questions. An output guardrail can still be a useful defense-in-depth control, but using it alone allows sensitive values into the prompt. Broad query blocking or removing whole tickets would reduce application value because valid fix steps are mixed with sensitive text.
- Query rejection over-blocks normal support questions that may legitimately mention customers, credentials, or tickets.
- Output-only masking is too late because the model has already received the sensitive retrieved context.
- Dropping tickets removes useful troubleshooting evidence when only specific sensitive fields need redaction.
Question 14
Topic: Evaluation and Monitoring
An engineering team evaluated four LLM candidates in MLflow for a Databricks customer-support RAG app. All runs used the same held-out evaluation set, retrieval chain, and prompt. Production requires correctness at least 0.85, groundedness at least 0.90, safety pass rate at least 99%, and p95 latency 2.5 seconds or less. Among models that meet every gate, choose the lowest cost. Which LLM should be selected for production serving?
| LLM | Correct / grounded | Safety / p95 | Cost / 1,000 |
|---|---|---|---|
| Model A | 0.90 / 0.93 | 99.5% / 2.2 s | $1.80 |
| Model B | 0.87 / 0.91 | 99.2% / 2.3 s | $1.20 |
| Model C | 0.92 / 0.88 | 99.7% / 2.1 s | $1.40 |
| Model D | 0.84 / 0.94 | 99.6% / 1.8 s | $0.90 |
Options:
A. Select Model B.
B. Select Model C.
C. Select Model D.
D. Select Model A.
Best answer: A
Explanation: Deployment model selection should first apply the production gates to the MLflow evaluation evidence. Model A and Model B satisfy correctness, groundedness, safety, and latency. Model C is excluded because groundedness is below 0.90, even though correctness is highest. Model D is excluded because correctness is below 0.85, even though it is fastest and cheapest. After filtering to eligible models, cost is the deciding attribute: Model B costs $1.20 per 1,000 requests versus Model A at $1.80. The key is to map experiment metrics to release requirements before optimizing for a secondary attribute.
- Model A satisfies the gates but is more expensive than another eligible model, so it loses after the cost tie-breaker.
- Model C has the strongest correctness score but fails the groundedness minimum required for the RAG deployment.
- Model D optimizes cost and latency but misses the correctness gate, so it is not eligible for production.
Question 15
Topic: Data Preparation
An engineering team has already split 8,000 product-support articles into text chunks in a Python notebook for a Databricks RAG application. The next step must support Unity Catalog governance, preserve source-document lineage, and provide one stable row per chunk for a later Vector Search index. Which sequence is the best engineering decision?
Options:
A. Create chunk records, build a Spark DataFrame, write a UC Delta table
B. Generate embeddings first, then store only vectors in Vector Search
C. Write chunk lists as JSON files in workspace storage
D. Recombine chunks by article, then write one Delta row per article
Best answer: A
Explanation: For chunked text prepared for RAG, the Delta table should represent retrieval-ready records, usually one row per chunk. Each record should include a stable chunk identifier, the chunk text, source-document identifiers, and useful metadata such as title, URL, section, or timestamp. In Databricks, the typical sequence is to transform the Python chunk objects into row-like records, create a Spark DataFrame with a clear schema, and write it as a governed Delta table in Unity Catalog using a fully qualified table name. This keeps the data auditable and reusable before embeddings or Vector Search indexing are added. Storing files directly or collapsing chunks back into whole documents loses either governance or retrieval granularity.
- JSON files may hold the chunk text, but they do not satisfy the governed Delta table requirement in Unity Catalog.
- One row per article loses the chunk-level granularity needed by downstream retrieval.
- Vectors only skips the required governed text-record table and moves prematurely to indexing.
Question 16
Topic: Design Applications
A procurement team wants to process vendor contracts stored in Unity Catalog volumes. The output must be a Delta table that downstream workflows can join to purchase orders.
Artifact: Intake note
Input: contract PDFs and DOCX files
Required fields: vendor_name, renewal_date, termination_notice_days, auto_renew_flag
Interaction: no chat UI; run on new documents and review exceptions
Goal: populate one structured row per contract
Which Agent Brick should the team select?
Options:
A. Genie Space
B. Knowledge Assistant
C. Information Extraction
D. Multiagent Supervisor
Best answer: C
Explanation: Agent Bricks Information Extraction fits use cases where the application must read unstructured or semi-structured source content and produce structured fields. The artifact asks for contract facts such as renewal date, notice period, and auto-renew status, with one row per document in a Delta table. That is extraction, not conversational question answering or multi-agent orchestration.
The key signal is the required structured output schema. When the goal is to populate fields from documents, choose Information Extraction.
- Knowledge Assistant is for answering user questions from knowledge sources, not primarily generating one structured row per document.
- Multiagent Supervisor coordinates multiple agents or tools, which is unnecessary for a single extraction task.
- Genie Space supports natural-language interaction with data, not extracting fields from contract files into a table.
Question 17
Topic: Data Preparation
A Databricks team builds a RAG app over product manuals stored as PDFs. Mosaic AI Vector Search retrieves chunks from the correct manuals, but the retrieved text has merged table columns, broken words, and missing values. The same failures occur with two different LLMs and embedding models. What pipeline change is best?
Options:
A. Revise the prompt to infer missing table values
B. Add layout-aware extraction before chunking and reindexing
C. Switch to a larger Foundation Model API model
D. Increase Vector Search
top_kand add reranking
Best answer: B
Explanation: When retrieval finds the right documents but the retrieved context itself is malformed or incomplete, the likely failure is in source extraction, not model selection. For PDFs with tables, scanned content, or complex layouts, a basic text extraction step can merge columns, drop values, or corrupt tokens before chunking. Those bad chunks are then embedded and indexed, so changing the LLM or embedding model will not restore information that was never extracted correctly. The better implementation step is to use a layout-aware PDF/OCR extraction process, validate the extracted text, then chunk, embed, and rebuild the Vector Search index from the cleaned Delta data.
- Larger LLM fails because the model only sees the corrupted context retrieved from the index.
- More retrieval fails because retrieving additional chunks does not repair text that was extracted incorrectly.
- Prompt inference fails because asking the model to guess missing values increases hallucination risk instead of fixing the source pipeline.
Question 18
Topic: Data Preparation
A Databricks team is preparing Markdown/HTML runbooks stored in Delta tables for a Mosaic AI Vector Search-backed RAG assistant. Users ask procedure-specific questions, and the served LLM receives the top 4 chunks within an 8,000-token prompt budget.
| Signal | Observation |
|---|---|
| Current chunks | Fixed 650 tokens, 50 overlap |
| Retrieval | Relevant source appears in top 4 |
| Failures | Answers mix steps from adjacent procedures |
| Trace notes | Chunks often omit parent headings or split tables |
Which engineering decision is best?
Options:
A. Reduce chunks to 200 tokens for tighter semantic focus.
B. Use structure-aware chunking that preserves headings, lists, and tables.
C. Change the embedding model before revising chunking.
D. Increase chunks to 1,500 tokens for more surrounding context.
Best answer: B
Explanation: The evidence points to a segmentation problem, not a pure chunk-size problem. The retriever is finding the relevant source in the top 4, and the prompt budget is not the limiting factor. The failures happen when fixed-token boundaries separate a procedure from its parent heading or split a table, so the LLM receives text without the condition that tells it which steps apply. A structure-aware splitter should respect headings, procedure sections, list blocks, and tables, while keeping chunks near the current size. The key takeaway is to preserve semantic units before tuning size.
- Larger chunks may include headings, but they would also add more adjacent procedures that the model is already mixing.
- Smaller chunks can improve focus when chunks are broad, but here they would worsen missing headings and split tables.
- Embedding changes are not the first fix because the relevant source is already appearing in the retrieved results.
Question 19
Topic: Governance
A team is building a paid customer-support RAG app on Databricks. The source Delta table includes product manuals and community posts that feed a Mosaic AI Vector Search index. A source review finds several manuals have license metadata marked no_commercial_reuse, and legal says those texts cannot be used to generate answers for customers. Which mitigation should the team implement?
Options:
A. Add a system prompt instructing the LLM not to quote manuals.
B. Quarantine the disallowed rows and rebuild the index from approved sources.
C. Enable AI Gateway rate limits for the serving endpoint.
D. Apply PII masking to the extracted manual text.
Best answer: B
Explanation: Licensing risk in source text is best handled at the data preparation and retrieval layer, not only at generation time. If legal says certain texts cannot be used for commercial answer generation, those rows should be quarantined, filtered, or otherwise excluded before they are embedded and synced into Vector Search. Keeping license metadata in the governed Delta table helps enforce the rule consistently and supports auditability. A prompt can reduce quoting behavior, but it does not prevent retrieval or use of prohibited text as context. The key takeaway is to remove legally restricted content from the RAG source path.
- Prompt-only control misses the requirement because prohibited text could still be retrieved and influence the answer.
- Rate limiting controls usage volume, not whether licensed content is included in the knowledge base.
- PII masking mitigates sensitive personal data exposure, not commercial-use licensing restrictions.
Question 20
Topic: Application Development
A team is choosing a RAG chain run to promote from an MLflow experiment. The release gate requires safety_violation_rate = 0%, p95_latency_s <= 2.5, and cost_per_request <= $0.020; among runs that pass, choose the highest groundedness_score.
| Run | Change | Groundedness | Safety violations | p95 latency | Cost/request |
|---|---|---|---|---|---|
| run-11 | Baseline | 0.81 | 0% | 1.7 s | $0.008 |
| run-12 | Add reranker | 0.88 | 0% | 2.4 s | $0.018 |
| run-13 | Larger LLM | 0.91 | 0% | 3.8 s | $0.045 |
| run-14 | Relax guardrail | 0.89 | 1.2% | 1.9 s | $0.010 |
Which run should be promoted?
Options:
A. Promote
run-13B. Promote
run-14C. Promote
run-11D. Promote
run-12
Best answer: D
Explanation: Experiment selection should apply hard release gates before optimizing a quality metric. Here, safety, latency, and cost are constraints, while groundedness is the metric to maximize after filtering. run-13 has the highest groundedness, but it exceeds both the latency and cost limits. run-14 is low cost and fast, but it has safety violations, so it cannot pass the release gate. run-11 passes all gates, but run-12 also passes and has better groundedness. The best trade-off is the highest-quality run that still satisfies the required operational and safety limits.
- Baseline passes but underperforms because it meets all gates but has lower groundedness than another passing run.
- Larger LLM over-optimizes quality because its better groundedness comes with latency and cost above the stated limits.
- Relaxed guardrail fails safety because the safety violation rate is nonzero, even though latency and cost are acceptable.
Question 21
Topic: Design Applications
A claims operations team is designing a Databricks GenAI application for inbound customer emails and attached claim PDFs stored in governed Delta tables. The workflow must route each item to one of five fixed queues and populate policy_number, order_id, and requested_amount as structured fields. Agents do not want customer-facing replies, summaries, or creative text. Which task design is the best engineering decision?
Options:
A. Use RAG to generate policy summaries for each claim.
B. Use an agent to write case notes and choose tools.
C. Use extraction plus classification to emit schema-constrained JSON.
D. Use content generation to draft personalized claim responses.
Best answer: C
Explanation: This requirement maps to information extraction plus classification. The inputs are existing emails and PDFs, and the required outputs are fixed queue labels and specific structured fields. A Databricks GenAI pipeline could use an LLM with a constrained output schema, track results with MLflow, and write validated JSON fields back to Delta tables, but the core task is not content generation. Content generation is appropriate when the business needs new prose, such as replies, explanations, or summaries. Here, generating text would add risk and cost without satisfying the downstream workflow contract.
- Personalized responses fail because agents explicitly do not want customer-facing generated text.
- Policy summaries fail because the workflow needs fixed labels and fields, not retrieved narrative summaries.
- Agent-written notes overbuilds the solution and drifts from the required structured output.
Question 22
Topic: Assembling and Deploying Applications
A team is moving an MLflow-logged RAG chain from development toward deployment. Which action best applies the release policy?
MLflow run: runs:/a19b/model
Current registry: workspace model "dev_workspace.rag_chain"
Evaluation gate: passed
Unity Catalog release policy:
- Production name: prod_genai.apps.<model_name>
- Promotion controlled by release-bot using model aliases
- Endpoint identity spn-rag-endpoint must invoke only UC-governed models
Options:
A. Copy the model files to DBFS and share the path.
B. Promote
dev_workspace.rag_chainto a Production stage for serving.C. Deploy
runs:/a19b/modeldirectly after recording ticket approval.D. Register as
prod_genai.apps.rag_chainwith release-bot alias control and serving access.
Best answer: D
Explanation: Unity Catalog governance should be applied before a model is promoted toward deployment. The artifact requires the deployable model to be registered under the governed UC namespace prod_genai.apps.<model_name>, with production promotion controlled by release-bot using model aliases and invocation limited to the serving identity. Deploying from a run URI, workspace registry entry, or shared file path may be technically possible, but those approaches bypass the UC model boundary required for access control, promotion control, and governed lineage. The key is to make the deployable model a UC-registered asset before serving it.
- Workspace stage misses the required Unity Catalog namespace and alias-based promotion control.
- Run URI deployment uses the evaluated artifact but does not make it a UC-governed model.
- DBFS sharing governs file access, not model promotion and invocation through Unity Catalog.
Question 23
Topic: Evaluation and Monitoring
Agent Monitoring for a deployed claims-support agent shows repeated low-quality answers when a customer omits the purchase date. SMEs reviewed the captured traces and wrote this rule: “If the purchase date is missing, ask a clarifying question before citing warranty eligibility.” The team needs to improve the next agent version and prevent regressions. Which action is best?
Options:
A. Drive agent iteration with SME-labeled MLflow evaluations and a Custom Scorer.
B. Tighten Unity Catalog permissions on the registered model.
C. Increase AI Gateway rate limits for the endpoint.
D. Rebuild Vector Search with smaller chunks.
Best answer: A
Explanation: SME feedback should be converted into durable evaluation assets, not left as one-off comments. In Databricks, the reviewed traces can become labeled evaluation examples or guidelines in MLflow/Agent Evaluation, and the SME rule can be encoded as a Custom Scorer. The team can then update the agent prompt or tool policy, run the evaluation, and compare versions before deployment. This uses the monitoring gap as a feedback loop and gives the team a regression check for the missing-purchase-date behavior. Traffic controls, retrieval tuning, and access governance solve different layers of the application.
- Gateway limits control usage and traffic, but they do not encode SME guidance or test response behavior.
- Chunking changes may help retrieval quality, but the observed gap is a decision rule about missing information.
- Catalog permissions improve governance, but they do not change or evaluate the agent’s response policy.
Question 24
Topic: Design Applications
A retail company is building a Databricks GenAI application for store managers. The application already has separate specialist agents: a Knowledge Assistant for policy documents, a Genie-backed data agent for sales tables, and an MCP tool agent for opening support tickets. Managers ask mixed questions such as, “Why did returns spike, what policy applies, and open a ticket.” Which Agent Bricks selection best meets the routing and supervision requirement?
Options:
A. Use Knowledge Assistant as the single agent.
B. Use Multiagent Supervisor to coordinate the specialist agents.
C. Create only a Vector Search index over policy documents.
D. Use Information Extraction to structure every request.
Best answer: B
Explanation: The core concept is selecting the right Agent Bricks pattern for the application design. When a use case has multiple specialized agents and the application must decide which one should handle each part of a user request, the Multiagent Supervisor is the appropriate choice. It coordinates routing, delegation, and supervision across agents with different capabilities, such as document retrieval, data analysis, and tool execution. A Knowledge Assistant is useful for grounded answers over a knowledge base, and Information Extraction is useful for turning unstructured content into structured outputs, but neither provides cross-agent orchestration. A Vector Search index supports retrieval, not agent supervision. The key signal is the need to coordinate several specialists for one user-facing workflow.
- Single Knowledge Assistant misses the requirement to coordinate a data agent and a tool agent in addition to policy retrieval.
- Information Extraction solves structured data extraction, not routing mixed user requests across agents.
- Vector Search only supports retrieval over documents but does not supervise agent handoffs or tool use.
Question 25
Topic: Assembling and Deploying Applications
A claims team has deployed a policy-assistant agent in Databricks and must choose the first user-facing interface. Which interface best matches the artifact?
Artifact: Pilot note
Audience: 8 policy SMEs with workspace access
Purpose: pre-release validation only
Interaction: chat, inspect citations, rate answers, add comments
Operational context: no external app for the pilot; feedback feeds evaluation
Options:
A. Genie Space
B. Mosaic AI Agent Framework Review App
C. Model Serving REST API integration
D. Databricks Apps custom chat UI
Best answer: B
Explanation: The artifact describes a pre-release review workflow, not a production customer interface or an analytics self-service tool. The Mosaic AI Agent Framework Review App fits when internal SMEs need to interact with an agent, review its responses, and provide structured feedback that can inform evaluation and improvement. It minimizes custom app work while keeping the workflow inside Databricks.
A custom Databricks App or REST integration is more appropriate when the agent is ready to be embedded into an operational application. Genie Spaces are better for conversational exploration of governed data, not reviewing a custom policy-assistant agent.
- Custom chat UI adds app-building effort for a production-style experience, while the pilot needs rapid SME validation.
- Genie Space targets conversational data exploration, not structured review feedback on a custom agent.
- REST API integration fits embedding the agent in another system, but the artifact says no external app is needed for the pilot.
Questions 26-45
Question 26
Topic: Evaluation and Monitoring
A claims assistant is deployed with Databricks Agent Framework and RAG over policy manuals stored in Unity Catalog. MLflow evaluation with an automated LLM judge reports high groundedness and clarity, but SME reviewers in Agent Monitoring flag 18% of sampled live traces as incorrect because the agent uses superseded policy manuals that are still retrievable. The team must improve domain correctness before expanding rollout and avoid adding another model call to the live path. What is the BEST engineering decision?
Options:
A. Prioritize SME-flagged traces to filter current manuals and add regression tests
B. Replace the serving model with a larger foundation model
C. Raise the automated judge groundedness threshold before rollout
D. Add AI Gateway rate limits for claims users
Best answer: A
Explanation: Automated judges are useful for scalable checks such as groundedness, clarity, safety, and consistency, but they can miss domain-specific correctness problems. In this scenario, the judge sees answers grounded in retrieved text, yet SMEs know that the retrieved text is outdated. The best improvement is to use SME-flagged traces as high-value evidence, update retrieval or indexing rules to exclude or deprioritize superseded manuals, and turn those examples into regression tests for future MLflow evaluations. This improves the observed failure without adding latency to each live request. The key distinction is that automated judge output measures broad quality signals, while SME feedback identifies business-rule accuracy gaps.
- Judge threshold fails because higher groundedness still may reward answers grounded in outdated manuals.
- Larger model fails because the main issue is source selection, not model reasoning capacity.
- Rate limits control usage or cost, but they do not improve answer correctness.
Question 27
Topic: Application Development
A Databricks team is building an internal chain that turns support tickets into a short customer-ready summary and next step. Security permits hosted model access through Databricks Model Serving, but there is no approved labeled dataset for training.
Artifact: Pilot record
Task: summarize ticket text and produce JSON
Domain: common SaaS support vocabulary
Output fields: summary, next_step, urgency
Prompted foundation model eval: 44/50 SME-approved responses
Main failures: missing urgency field
Prompt update: adding schema examples fixed 5/6 retested failures
Launch need: this week
Which model-selection decision best fits this evidence?
Options:
A. Fine-tune a model using the 50 pilot tickets
B. Replace generation with an embedding-only Vector Search index
C. Use an existing foundation model with prompt refinement
D. Train a custom language model from scratch
Best answer: C
Explanation: An existing foundation model is the best starting point when the task is a common language capability, such as summarization or structured extraction, and prompt engineering meets the application requirements. The artifact shows no approved training dataset, a short launch timeline, and failures that improve with schema examples. Those facts point to using a hosted foundation model through Databricks Model Serving or Foundation Model APIs and iterating on prompts, output validation, and evaluation. Fine-tuning is more appropriate when prompt-based results are not sufficient and there is suitable training data for a stable task or style requirement. Custom model training is even less justified for this scope.
- Fine-tuning too early fails because the pilot set is evaluation evidence, not an approved training dataset, and prompt changes fix the main issue.
- Custom training is disproportionate for a common summarization and JSON-formatting task with a one-week launch need.
- Embedding-only retrieval can help find relevant text, but it does not generate customer-ready summaries or next steps.
Question 28
Topic: Data Preparation
A company is building a finance-policy RAG assistant on Databricks. The corpus must use only current, approved documents registered in Unity Catalog and must not rely on informal employee examples.
Evaluation fails on this user question: “If I prepaid a hotel for a conference that the organizer cancelled, can I be reimbursed, and what evidence must I upload?” Current retrieval returns booking rules and expense-form instructions, but no exception eligibility or evidence requirements.
Which source document should be added first?
Options:
A. Current Finance SOP for cancelled-event reimbursement exceptions
B. Corporate travel policy section on hotel rate caps
C. Historical approved expense reports for cancelled conferences
D. Expense system guide for uploading receipt files
Best answer: A
Explanation: For RAG source selection, choose the document that contains the exact missing knowledge needed to answer the target question and satisfies the governance boundary. The failed question needs two policy facts: whether prepaid hotel costs are reimbursable after organizer cancellation, and which evidence must be uploaded. The current Finance SOP for cancelled-event reimbursement exceptions is the narrowest approved source that should contain those rules. Documents that are merely related to travel, form submission, or past outcomes may improve context, but they do not provide authoritative missing policy knowledge. The key is to add authoritative source content before changing retrieval, prompts, or model behavior.
- Historical examples may show past approvals, but they are informal, potentially inconsistent, and disallowed by the governance constraint.
- Upload instructions explain how to attach files, not which evidence is required for this exception.
- Hotel rate caps are current travel policy content, but they do not answer cancellation reimbursement eligibility.
Question 29
Topic: Assembling and Deploying Applications
A retail analytics team wants to summarize and tag 250,000 new customer-review rows each night. All inputs arrive in a Unity Catalog Delta table by midnight, the results only need to appear in a BI dashboard by 7:00 AM, and no user waits for an individual response. The team is cost-sensitive and wants governed output stored for audit. Which engineering decision is BEST?
Options:
A. Build a RAG chatbot backed by Vector Search
B. Create an agent with persistent memory for review processing
C. Deploy an online Model Serving endpoint for each dashboard view
D. Run scheduled batch inference and write results to Delta
Best answer: D
Explanation: Batch inference is the right fit when inputs are known ahead of time, the work can run on a schedule, and outputs can be persisted for later consumption. In this scenario, reviews arrive in a Delta table, the dashboard only needs completed results by morning, and no interactive user requires a low-latency response. A scheduled Databricks job can process the new rows, call the selected foundation model in bulk, and write governed results back to a Unity Catalog Delta table for BI and audit. Online serving is better for per-request, user-facing applications that need responses in seconds.
- Online serving adds low-latency infrastructure for dashboard-time calls that the scenario does not require.
- RAG chatbot addresses interactive question answering over documents, not scheduled summarization of known rows.
- Persistent agent memory overbuilds the solution because the task is deterministic batch processing, not multi-turn user interaction.
Question 30
Topic: Application Development
A Databricks RAG chain for customer-policy answers is failing an output-quality check even though retrieval appears successful. What change should the engineer make first?
Prompt fragment:
Use the retrieved context to answer the user. Be helpful.
User:
Can I get shipping refunded for an opened refurbished tablet?
Retrieved context:
Opened electronics are not eligible for product refunds.
Shipping fees are not refunded. Refurbished items follow the same rules.
Evaluation note:
retrieval_hit = true
context_contains_answer = true
expected_output = JSON with {"eligible": boolean, "reason": string}
failure = response was prose and suggested asking for a shipping credit
Options:
A. Switch to a larger embedding model
B. Increase the retriever
top_kvalueC. Add more refund-policy documents to the index
D. Add explicit JSON and policy-boundary constraints to the prompt
Best answer: D
Explanation: This is a prompt-constraint problem, not a retrieval-coverage problem. The evaluation note says the retriever found context that contains the answer, but the generated response used prose instead of the required JSON and suggested an exception not supported by the retrieved policy. The prompt should explicitly constrain the response shape and allowed claims, such as requiring the exact JSON fields and instructing the model to answer only from retrieved policy text. Adding documents, changing top_k, or changing embeddings targets retrieval quality, which is not the observed failure.
- More retrieval fails because the artifact already marks
retrieval_hitandcontext_contains_answeras true. - More documents addresses missing source coverage, but the source policy facts are already present.
- Embedding change targets semantic matching, not unsupported response content or required output format.
Question 31
Topic: Governance
A team is preparing a customer-facing RAG support assistant on Databricks. An ingestion audit finds the following source record already loaded into a Unity Catalog Delta table and a Vector Search index:
source_id: vendor_admin_guide.pdf
license_note: evaluation use only; no commercial redistribution or derivative use
app_use: paid customer support assistant
policy: customer-facing GenAI apps may retrieve only sources approved for commercial reuse
Which source-text mitigation best addresses the stated concern?
Options:
A. Mask vendor product names before indexing the document
B. Add a prompt rule telling the model not to quote the document
C. Quarantine the document and rebuild the index from approved sources
D. Reduce chunk size so less text is retrieved per answer
Best answer: C
Explanation: The core issue is source licensing, not answer style or retrieval quality. The artifact says the document is limited to evaluation use and is not approved for commercial redistribution or derivative use, while the application is a paid customer-facing assistant. The appropriate mitigation is to remove or quarantine that source from the governed corpus and ensure the Vector Search index used by the app is rebuilt or synced only from approved sources. Keeping the text available to the retriever would still allow the application to use restricted material, even if the prompt discourages quoting. For licensing concerns, source exclusion and provenance control are stronger than downstream formatting controls.
- Prompt-only control fails because the model could still use restricted source material even if it avoids direct quotation.
- Name masking addresses sensitive identifiers, not whether the source license allows commercial use.
- Smaller chunks may reduce verbatim exposure, but it does not make an unapproved source permissible for retrieval.
Question 32
Topic: Application Development
A Databricks team is evaluating a RAG assistant for internal IT policies. It uses Mosaic AI Vector Search over a Unity Catalog Delta table, and the prompt already says to answer only from retrieved policy text and cite the source. In MLflow traces for failed questions, the Foundation Model API response is fluent but cites outdated VPN setup pages; the current VPN policy exists in the table but is absent from the retrieved chunks. Which engineering decision best identifies the response-quality issue?
Options:
A. Mask policy text before indexing it
B. Diagnose poor retrieval in the Vector Search pipeline
C. Replace the model with a larger reasoning model
D. Strengthen the prompt to forbid outdated answers
Best answer: B
Explanation: The deciding signal is retrieval quality. The correct policy is present in the governed Delta source, but the retrieved context contains outdated pages, so the LLM is grounded in the wrong evidence. In a Databricks RAG workflow, MLflow traces that show retrieved chunks are especially useful for separating retrieval failures from prompt or model failures. The next engineering focus should be the Vector Search pipeline: index freshness, chunking, metadata filters, embeddings, and ranking behavior. A stronger prompt or larger model may improve wording, but it will not reliably recover facts that never reached the prompt.
- Prompt tightening is unlikely to help because the prompt already requires grounded answers and the retrieved evidence is stale.
- Larger model does not solve missing evidence; it may produce a more fluent answer from the same wrong chunks.
- Masking source text addresses sensitive data exposure, not relevance of retrieved policy chunks.
Question 33
Topic: Application Development
A team is building a Databricks support assistant. It must accept a chat question, retrieve relevant chunks from a Mosaic AI Vector Search index over Unity Catalog documentation, pass that context to a Databricks-hosted foundation model, return a cited answer, and log traces for evaluation. No autonomous tools or multi-agent handoffs are needed. Which development tool is the best engineering choice?
Options:
A. Agent Framework with a multi-agent supervisor
B. A LangChain chain logged with MLflow
C. A Delta pipeline using
ai_query()D. AI Gateway rate limits and inference tables
Best answer: B
Explanation: A LangChain-style chain is the best fit when the main development task is to connect the user question, retriever output, prompt construction, model call, and final response handling. On Databricks, this can integrate with Mosaic AI Vector Search, Foundation Model APIs or Model Serving, and MLflow tracing or logging for evaluation. The scenario describes a coherent RAG application flow, not an autonomous agent system or a batch data pipeline. The key distinction is orchestration of application steps: monitoring, governance, and batch inference tools may support the app, but they do not replace the chain that coordinates the request-time logic.
- AI Gateway focus fails because rate limits and inference tables control and observe usage, not assemble the RAG request flow.
- Multi-agent supervisor overbuilds the solution because the assistant has no autonomous tools, planning, or handoffs.
- Delta pipeline inference fits table-oriented batch processing, not an interactive chat chain with retrieval and cited response handling.
Question 34
Topic: Evaluation and Monitoring
An engineering team has deployed a Databricks Agent Framework support agent behind Model Serving. Pre-release quality was acceptable, but the product owner now wants ongoing week-over-week tracking of production behavior, not just release-gate tests. Which action best satisfies the requirement?
Artifact:
Production signal available:
- inference log table with requests/responses
- trace records with tool calls and retrieved docs
- user feedback: thumbs up/down
Need: trends for answer quality, failures, and latency over time
Options:
A. Use AI Gateway usage tables to summarize requests and token volume
B. Configure Agent Monitoring using the live traces, feedback, and metrics
C. Rebuild the Vector Search index with smaller chunks
D. Rerun MLflow evaluation on the static test set every week
Best answer: B
Explanation: Agent Monitoring is the Databricks capability aligned to ongoing production monitoring of an LLM endpoint or agent. In this scenario, the artifact shows live inference logs, traces, tool-call records, retrieved-document evidence, user feedback, and a requirement to trend quality, failures, and latency over time. Those are monitoring signals, not just offline evaluation inputs. Agent Monitoring can use production behavior to surface regressions and operational issues after deployment.
Offline MLflow evaluation remains useful before release, and AI Gateway usage data helps with traffic and cost controls, but neither fully tracks agent behavior and quality trends from live interactions.
- Static evaluation fails because it measures a fixed test set, not ongoing production conversations.
- Usage summaries help with request volume and cost, but they do not directly assess answer quality or tool failures.
- Index rebuilding may improve retrieval, but it is a remediation step, not a live monitoring approach.
Question 35
Topic: Design Applications
A Databricks Agent Framework support assistant uses Vector Search to retrieve warranty policy and order history. After retrieval, the model can decide whether a replacement is allowed. If allowed, the app must create a replacement request in an existing fulfillment system, pass only structured fields, and keep the retrieval path read-only. Which tool is the best engineering decision?
Options:
A. A direct write from the retriever to the orders table
B. A prompt template that asks the model to update the order
C. An action tool that calls the fulfillment API
D. A second Vector Search retriever for fulfillment requests
Best answer: C
Explanation: In a multi-stage GenAI application, retrieval tools gather context, while action tools perform side effects after the model has enough information to decide. Here, the assistant first uses Vector Search for read-only warranty and order context. Once eligibility is determined, a dedicated action tool should call the approved fulfillment API with a constrained schema, auditability, and least-privilege access. This keeps decision-making, context gathering, and state-changing operations separated.
The key takeaway is to use retrieval for knowledge and a controlled action tool for external changes.
- Extra retriever fails because Vector Search can fetch context but should not create fulfillment requests.
- Prompt-only update fails because a prompt cannot reliably perform an external side effect or enforce structured API inputs.
- Retriever write path fails because it mixes read-only context retrieval with a state-changing operation.
Question 36
Topic: Data Preparation
A team is preparing a Delta table of chunked support documents for a RAG application that will use Mosaic AI Vector Search. The retriever must filter results by product, document type, publication date, content owner, user access scope, and source authority before sending context to the LLM. Which metadata configuration best supports this requirement?
Options:
A. Add
product,doc_type,published_at,owner_team,access_scope, andsource_authoritycolumns per chunk.B. Append all filter values to the chunk text before embedding and indexing.
C. Add only
productanddoc_type, then enforce freshness and access in the prompt.D. Add
chunk_id,token_count,embedding_model,vector_dimension, andingestion_job_idcolumns per chunk.
Best answer: A
Explanation: Retrieval filters work best when the values needed for filtering are stored as structured metadata alongside each chunk in the source Delta table and carried into the Vector Search index. In this scenario, the required filters are business and governance attributes: product, document type, publication date, owner, access scope, and source authority. These fields let the retriever restrict candidate chunks before the LLM sees them, improving relevance and helping enforce access boundaries. Operational fields such as token counts or ingestion job IDs may help pipeline debugging, but they do not satisfy the user-facing retrieval filters.
- Operational metadata supports maintenance but does not express product, authority, owner, or access filters.
- Prompt-only enforcement is too late for freshness and access because irrelevant or unauthorized chunks may already be retrieved.
- Text-only values make filtering unreliable because the retriever would need semantic matching instead of structured metadata predicates.
Question 37
Topic: Application Development
A Databricks RAG chain answers questions from governed policy documents. For one query, the retrieved context says the travel stipend is “under review” and that “no effective date has been approved.” The draft answer says, “Employees will definitely receive a $2,000 stipend starting July 1.” The team wants the application to reject or revise answers that overstate certainty beyond the retrieved context. Which chain component is the best fit?
Options:
A. A Unity Catalog row filter
B. A groundedness check against retrieved context
C. A lower model temperature
D. A higher Vector Search
top_kvalue
Best answer: B
Explanation: The issue is response groundedness, not retrieval breadth, access control, or randomness. The generated answer makes definitive claims about amount and timing that the retrieved context does not support. A response-quality or safety step should compare the draft answer to the supplied context and reject or revise unsupported confident statements, such as replacing them with a caveated answer that the stipend is still under review. Lowering temperature may reduce variation, but it does not prove claims are supported by context.
- More retrieval may help missing-context cases, but the visible context already contradicts a definitive answer.
- Row filtering controls which data a user can access; it does not assess whether the response overstates support.
- Lower temperature can make output more deterministic, but unsupported certainty can still appear.
Question 38
Topic: Application Development
A Databricks team is selecting a Foundation Model API endpoint for a customer-support RAG application. The release criteria are: MLflow evaluation quality score at least 0.82, safety pass rate at least 99%, p95 latency under 2 seconds, and average model cost no more than $0.01 per answer.
| Candidate | Quality score | Safety pass | p95 latency | Avg cost/answer |
|---|---|---|---|---|
| Smaller model | 0.84 | 99.2% | 1.4 sec | $0.004 |
| Larger model | 0.88 | 99.4% | 3.1 sec | $0.024 |
Which is the BEST engineering decision?
Options:
A. Deploy the larger model because it has the highest quality score.
B. Fine-tune the larger model before deployment.
C. Call both models for every answer and merge responses.
D. Deploy the smaller model for this RAG chain.
Best answer: D
Explanation: Model selection should use the application’s release criteria, not only the highest evaluation score. Here, both candidates meet the minimum quality and safety targets, but only the smaller model also satisfies latency and cost limits. The larger model’s quality gain is modest and does not justify violating two explicit deployment constraints. In a Databricks workflow, the team can record this decision with MLflow evaluation results and continue monitoring production quality, latency, and cost after release.
The key takeaway is to choose the smallest model that meets the visible business and engineering requirements.
- Highest score trap fails because the larger model violates the p95 latency and cost constraints.
- Fine-tuning detour is unnecessary because the smaller model already meets the release criteria.
- Dual-model overbuild adds cost and latency without a stated need for escalation or ensemble behavior.
Question 39
Topic: Application Development
A team is iterating on a Databricks RAG agent. Before release, they must compare prompt versions, retriever parameters, traces, and offline evaluation scores. After release, operations must observe real user requests, latency, token usage, and response quality from the served app. Which setup best separates these needs?
Options:
A. Use inference tables for development comparisons and MLflow run metrics for live operations.
B. Use MLflow Tracking for development runs and Agent Monitoring with inference logging for the served app.
C. Add more chain trace spans and disable endpoint-level logging after deployment.
D. Register each prompt in Unity Catalog and use Vector Search index metrics for live operations.
Best answer: B
Explanation: Experiment tracking and live monitoring happen at different layers of the GenAI lifecycle. During development, MLflow Tracking is used to record runs, parameters, prompt versions, traces, artifacts, and evaluation metrics so engineers can compare candidate chains before release. After the application is served, monitoring must observe production traffic and operational signals, such as requests, latency, token usage, failures, and quality trends. Databricks monitoring features such as Agent Monitoring and inference logging/inference tables are designed for that live deployed context. The key distinction is offline comparison of experiments versus ongoing observation of a running application.
- Reversed layers fails because inference tables are for served traffic, not the primary tool for comparing offline development runs.
- Governance focus fails because Unity Catalog registration supports governance, and Vector Search metrics do not replace live app monitoring.
- Tracing only fails because traces help diagnose behavior, but disabling endpoint logging removes the live operational evidence needed after deployment.
Question 40
Topic: Data Preparation
A team is preparing a Databricks RAG assistant for support agents to answer warranty questions. Answers must use current Legal-approved policy and include regional exceptions; the production Vector Search index will be built from Unity Catalog sources. A pilot shows EU exception questions often retrieve archived ticket comments that contradict the official exception table.
Exhibit: candidate sources
| Source | Status | Coverage |
|---|---|---|
| Legal FAQ 2026 | Legal-approved, current | Core policy only |
| Regional exceptions | Official source, updated monthly | APAC/EU exceptions |
| Warranty playbook 2023 | Draft, no owner | US-only |
| Archived tickets | Mixed authors, 2021-2024 | Inconsistent details |
Which is the best engineering decision before building the production index?
Options:
A. Build and coverage-test a corpus using Legal FAQ 2026 and Regional exceptions only.
B. Keep the corpus and use a larger Foundation Model.
C. Index all sources and rely on Vector Search ranking.
D. Restrict retrieval to the Legal FAQ 2026 only.
Best answer: A
Explanation: In RAG data preparation, the source corpus should be evaluated before chunking and indexing. It must be current, authoritative, and complete for the application goal. Here, the Legal FAQ is current and approved for core warranty policy, but it does not cover regional exceptions. The official exceptions source supplies that missing required scope. The draft playbook and archived tickets are either unauthoritative, stale, or inconsistent, and the pilot already shows they can pollute retrieval. Build the Vector Search index from the curated authoritative sources and use representative coverage checks for core and regional questions before release. Better ranking or a larger model cannot compensate for missing or conflicting source truth.
- Indexing everything fails because ranking can still surface stale, unowned, or contradictory content.
- Using only the Legal FAQ fails because regional exception handling is a stated requirement.
- A larger model fails because model choice does not establish source authority or fill verified knowledge gaps.
Question 41
Topic: Assembling and Deploying Applications
A Databricks-hosted customer-support agent already uses Mosaic AI Vector Search to retrieve policy articles. The next release must remember structured workflow facts, such as account_id, entitlement tier, open case ID, and approved escalation status. These facts are updated by tool calls, must survive Model Serving endpoint restarts, and must be read back exactly for the same account under Unity Catalog governance. Which datastore behavior is the best engineering decision?
Options:
A. Use schema-defined persistent records with keyed reads and updates
B. Keep the facts only in the model prompt context
C. Embed each workflow fact and retrieve by semantic similarity
D. Store the facts only in MLflow traces for later inspection
Best answer: A
Explanation: Structured information used as workflow memory should be stored as durable, schema-defined records that the application can read and update deterministically. In this scenario, the agent needs exact facts for the same account, persistence across serving restarts, and Unity Catalog governance. A governed Delta table or similar persistent memory store with keys, columns, and update behavior fits those requirements. Vector Search is still useful for unstructured policy articles, but embeddings are not the right primary mechanism for exact account state.
- Semantic retrieval is suitable for policy text, but approximate similarity can return the wrong account fact or stale workflow state.
- Prompt-only memory fails because context disappears across sessions and endpoint restarts.
- MLflow traces help observability and debugging, but they are not the operational datastore for exact workflow state.
Question 42
Topic: Assembling and Deploying Applications
A RAG application needs to retrieve support-document chunks that are semantically similar to a user’s natural-language question. The team already created a Mosaic AI Vector Search Delta Sync index on chunk_text using a Databricks-managed embedding endpoint. Which query approach should the application use at runtime?
Options:
A. Call
ai_query()to summarize all chunks.B. Query the source Delta table with
LIKE.C. Search MLflow model versions by tag.
D. Run Vector Search similarity search with
query_text.
Best answer: D
Explanation: For semantic retrieval from a Mosaic AI Vector Search index, the runtime chain should issue a similarity search against the index. Because this index was created on chunk_text with a Databricks-managed embedding endpoint, the application can pass the user’s question as query_text; Vector Search handles query embedding and nearest-neighbor lookup, then returns the requested chunk columns and scores. This is the retrieval step that supplies context to the generation step. Querying the Delta table with string matching would only find lexical matches, and summarizing all chunks skips retrieval entirely.
- SQL string matching misses semantically related text when the same meaning uses different wording.
- MLflow search manages model artifacts and versions; it does not retrieve document chunks from a Vector Search index.
- Summarizing all chunks solves a generation problem, not efficient semantic context retrieval.
Question 43
Topic: Application Development
A Databricks team is building an internal HR policy assistant. The app only needs to answer one policy question at a time from a governed handbook index and include citations.
Artifact: proposed flow
Input: employee policy question
1. Query Mosaic AI Vector Search index: hr_policy_chunks
2. Add top chunks and citation IDs to a prompt template
3. Call a Foundation Model API endpoint
Output: grounded answer with citations
Not required: tool choice, external actions, multi-step planning
Which implementation approach is most appropriate?
Options:
A. Use a lightweight RAG chain
B. Use Genie Spaces for SQL exploration
C. Create an agent that selects tools dynamically
D. Build a multi-agent supervisor workflow
Best answer: A
Explanation: A lightweight chain fits when the application has a predictable sequence of steps: retrieve context, format a prompt, call an LLM, and return a response. The artifact explicitly says there is no need for dynamic tool choice, external actions, or multi-step planning. In Databricks, this can be implemented as a RAG chain using Mosaic AI Vector Search, a prompt template, and a Foundation Model API or Model Serving endpoint. Agentic systems are better when the model must decide among tools, plan actions, coordinate agents, or interact with changing state. Here, agent orchestration would add complexity without solving a requirement.
- Multi-agent supervisor adds orchestration for coordinating specialists, which the fixed three-step flow does not need.
- Dynamic tool selection is unnecessary because the artifact names the only retrieval source and action sequence.
- Genie Spaces are suited to conversational data exploration over structured data, not this fixed unstructured RAG workflow.
Question 44
Topic: Assembling and Deploying Applications
A team has an MLflow-logged RAG agent that passed offline evaluation in development. They need to move the exact approved version toward a production Model Serving endpoint. Constraints: production assets are governed in Unity Catalog, only the release group may approve promoted versions, and the serving service principal should have inference-only access. Which engineering decision is best?
Options:
A. Set the development run as Production in the workspace Model Registry and have serving load the latest production version.
B. Register it as a Unity Catalog model, restrict alias updates to the release group, grant the serving principal
EXECUTE, and deploy the approved alias.C. Deploy directly from the development MLflow run artifact and grant the serving principal read access to the run.
D. Copy the model artifact to a shared path and rely on notebook permissions for promotion approval.
Best answer: B
Explanation: Unity Catalog model registration is the right governance boundary when moving a Databricks GenAI model from development toward production. A UC registered model uses a governed catalog and schema, supports access control, and provides an auditable object for approved model versions. Restricting alias updates to the release group separates development from promotion, while granting the serving service principal EXECUTE gives the endpoint the ability to use the model without broad write or administrative access. Deploying an approved alias also avoids accidentally serving an unapproved development run or a moving “latest” artifact.
- Workspace registry promotion does not place the model under the required Unity Catalog governance boundary.
- Development artifact serving bypasses model registration and makes approval depend on run access instead of a governed model object.
- Shared artifact path relies on storage or notebook controls, not UC model-level promotion and inference permissions.
Question 45
Topic: Assembling and Deploying Applications
A team is releasing an Agent Framework application with a retriever, two tools, and a prompt template. The pipeline registers each candidate version in Unity Catalog, then promotes an approved version. The team wants an automated gate that catches defects in individual agent parts before promotion. Which CI/CD step is best?
Options:
A. Run component unit tests for each agent part
B. Run only end-to-end response evaluation
C. Enable Agent Monitoring after staging deployment
D. Update the Unity Catalog production alias
Best answer: A
Explanation: A CI/CD pipeline should include component-level tests before promotion when the requirement is to catch defects in individual parts of an agent. These tests invoke each tool, retriever, prompt template, or chain component with controlled inputs and verify expected contracts, outputs, errors, or retrieval behavior. This is different from evaluating the whole agent response, which can be useful but may hide which component failed. Promotion should happen only after these checks pass, such as before updating a Unity Catalog model alias or deploying the candidate to a serving target.
- End-to-end only misses the requirement to isolate failures in individual tools, retrievers, or prompts.
- Agent Monitoring is useful for live or staged behavior, but it occurs after deployment rather than as a pre-promotion component gate.
- Alias update performs promotion; it does not test the candidate version before promotion.
Continue in the web app
Use IT Mastery for interactive Databricks Generative AI Engineer Associate practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.
Try Databricks Generative AI Engineer Associate on Web