Free Databricks Generative AI Engineer Associate Practice Questions: Deploying GenAI Apps

Last revised: June 29, 2026

Practice 10 free Databricks Certified Generative AI Engineer Associate (Databricks Generative AI Engineer Associate) questions on Deploying GenAI Apps, with answers, explanations, and the IT Mastery next step.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Databricks Generative AI Engineer Associate on Web

Topic snapshot

Field	Detail
Practice target	Databricks Generative AI Engineer Associate
Topic area	Assembling and Deploying Applications
Blueprint weight	22%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Assembling and Deploying Applications for Databricks Generative AI Engineer Associate. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These are original IT Mastery practice questions aligned to this topic area. They are not official Databricks questions, copied live-exam content, or exam dumps. Use them to preview question style and explanation depth before continuing with topic drills, mixed sets, and timed mocks in IT Mastery.

Question 1

Topic: Assembling and Deploying Applications

A team is deploying a GenAI assistant in a customer portal. Users submit one question at a time, calls arrive unpredictably, and the portal must return the chain’s response through an API while the chat session is active. Which Databricks setup should the team use?

Options:

A. Deploy the chain behind a Model Serving endpoint
B. Run a scheduled batch inference query on a Delta table
C. Query a Vector Search index directly from the portal
D. Create an MLflow evaluation run for the chain

Best answer: A

Explanation: Use a Databricks Model Serving endpoint when the application needs online inference: individual requests arrive from an application, and each response must be returned promptly through an API. Batch inference queries are better for offline or asynchronous processing, such as scoring many rows in a Delta table on a schedule. In this scenario, the workload is interactive and user-facing, so the serving layer is the right deployment target for the chain. Vector Search may support retrieval inside the chain, and MLflow evaluation may validate quality, but neither replaces the serving interface.

Batch query processes stored records in bulk, which misses the requirement for live chat responses.
MLflow evaluation helps assess quality before release, but it does not expose the chain for production requests.
Vector Search only can retrieve context, but it does not run and serve the full GenAI chain.

Question 2

Topic: Assembling and Deploying Applications

A team is deploying a customer-support summarization app on Databricks. The chain only needs to invoke a supported hosted LLM at request time, and the team does not want to package, register, or operate a custom model artifact. Which serving setup best fits this requirement?

Options:

A. Run batch inference with a scheduled Databricks Job
B. Create a Mosaic AI Vector Search endpoint
C. Invoke a Foundation Model API endpoint
D. Register an MLflow model and deploy it to Model Serving

Best answer: C

Explanation: Foundation Model APIs are the appropriate Databricks serving choice when an application needs managed invocation of supported foundation models without owning the model artifact lifecycle. The app can call the hosted model endpoint from its chain while Databricks handles the serving infrastructure for that model. This fits an online deployed summarization app that needs request-time LLM responses but does not need custom packaging, registration, or deployment. A custom MLflow Model Serving endpoint is useful when the team owns a model, wrapper, or custom logic as the served artifact, but that adds the hosting responsibilities the stem explicitly avoids.

Custom Model Serving misses the no-custom-artifact requirement because it involves registering and serving an MLflow model.
Vector Search solves retrieval over embedded data, not direct invocation of a hosted LLM.
Batch jobs fit offline or scheduled processing, not request-time serving for a deployed application.

Question 3

Topic: Assembling and Deploying Applications

A team is choosing a Mosaic AI Vector Search index for a RAG app. The app must use chunks from main.support.chunked_articles, use the precomputed embedding field embedding_vec, and update only when the nightly pipeline explicitly triggers a sync.

Exhibit: Index summaries

Index	Source table	Embedding field	Update pattern
`support_chunks_trig`	`main.support.chunked_articles`	`embedding_vec`	triggered sync
`support_pages_trig`	`main.support.raw_articles`	`embedding_vec`	triggered sync
`support_chunks_text`	`main.support.chunked_articles`	`content_text`	triggered sync
`support_chunks_live`	`main.support.chunked_articles`	`embedding_vec`	continuous sync

Which index should the team use?

Options:

A. support_chunks_live
B. support_pages_trig
C. support_chunks_text
D. support_chunks_trig

Best answer: D

Explanation: For a Vector Search index selection, the summary must match all required ingestion facts, not just the index name. This app needs a Delta-backed index over main.support.chunked_articles, must use the existing vector field embedding_vec, and must not update continuously because the nightly pipeline controls when new data is ready. A triggered sync pattern fits that release process because the sync can be run after the pipeline completes. An index using a raw source table, a text column instead of the embedding vector field, or continuous syncing would violate one of the stated requirements.

Wrong table fails because main.support.raw_articles is not the chunked source table required by the RAG app.
Wrong field fails because content_text is not the precomputed embedding vector field.
Wrong update pattern fails because continuous sync can update outside the nightly pipeline’s explicit sync point.

Question 4

Topic: Assembling and Deploying Applications

A RAG chain was registered in Unity Catalog and deployed as a Model Serving endpoint. The chain runs under the endpoint identity shown below, and its retriever reads a governed Delta table.

Artifact:

Endpoint: claims-rag-prod
Endpoint identity: spn-serving-claims-rag
Retriever source: main.claims.policy_chunks
Current grants:
- spn-serving-claims-rag: USE CATALOG on main
- spn-serving-claims-rag: USE SCHEMA on main.claims
- analysts: CAN QUERY on claims-rag-prod
Trace: PERMISSION_DENIED reading main.claims.policy_chunks

Which additional access is required?

Options:

A. Grant analysts SELECT on main.claims.policy_chunks
B. Grant SELECT on main.claims.policy_chunks to spn-serving-claims-rag
C. Grant CREATE MODEL on main.claims to spn-serving-claims-rag
D. Grant analysts CAN MANAGE on claims-rag-prod

Best answer: B

Explanation: For a served GenAI application, resource access for retrieval is evaluated for the identity running the serving workload. The artifact shows that spn-serving-claims-rag already has USE CATALOG and USE SCHEMA, but the trace fails while reading the governed table. The missing permission is SELECT on the retriever source table for the endpoint identity. User access to query or manage the endpoint controls who can call or administer the endpoint; it does not authorize the endpoint to read Unity Catalog data.

Granting analysts table access does not fix the failing read because the chain runs under the endpoint identity shown in the artifact.
Endpoint management permission controls administration, not governed table retrieval.
CREATE MODEL supports model registration workflows, not reading Delta table rows for a retriever.

Question 5

Topic: Assembling and Deploying Applications

A team is assembling a Databricks RAG app for Model Serving. Users will ask questions about product manuals stored in Unity Catalog. At request time, the app must query a Mosaic AI Vector Search index for the most relevant chunks before calling an already-selected Foundation Model API. The MLflow input example and model signature are handled separately. Which assembly element should implement the lookup step?

Options:

A. An MLflow model signature for request and response fields
B. A retriever that queries the Vector Search index
C. An embedding model that creates vector representations
D. A generation model that writes final answers

Best answer: B

Explanation: In a RAG application, the retriever is the component responsible for finding relevant context at request time. In this scenario, it should take the user’s question, query the Mosaic AI Vector Search index, and return matching chunks from the governed manuals. The embedding model may support retrieval by converting text into vectors, but it is not the lookup component itself. The generation model uses the retrieved context to produce the final answer. MLflow metadata such as an input example or model signature helps package and serve the application, but it does not perform retrieval.

Embedding model is tempting because Vector Search uses embeddings, but the model creates vectors rather than selecting and returning chunks.
Generation model is downstream of retrieval and produces the answer after context is supplied.
Model signature describes serving input and output schema; it does not implement runtime document lookup.

Question 6

Topic: Assembling and Deploying Applications

A Databricks RAG assistant is giving an outdated answer about an internal policy. Review the artifact and identify the most likely cause.

User question: What is the current audit-log retention period?
Assistant answer: 90 days

Source Delta table `main.gov.audit_chunks`:
- policy_2026_update.pdf loaded March 14, says retention is 365 days

Vector Search index `main.gov.audit_vs`:
- Sync mode: TRIGGERED
- Last successful sync: March 13

Retriever top result:
- policy_2025.pdf, says retention is 90 days

Options:

A. The Vector Search index was not synced after the new content loaded.
B. The model selected for generation has too small a context window.
C. The prompt needs stricter output formatting instructions.
D. The retriever should increase top_k to include more chunks.

Best answer: A

Explanation: This is a Vector Search indexing freshness issue. In a RAG application, the retriever can only return chunks that are present in the Vector Search index. The artifact shows that the Delta table contains the updated 365-day policy, but the index last successfully synced before that update was loaded. Because the retriever returned the older 2025 chunk, the generation model answered from stale retrieved context. With a triggered-sync index, new or changed source rows are not available to retrieval until the index is synced. The key takeaway is to compare source-table freshness with index sync status before blaming prompt wording or the LLM.

Prompt formatting fails because the answer content is wrong due to stale retrieved evidence, not poor response structure.
Context window fails because the artifact shows the wrong chunk was retrieved, not that relevant context was truncated.
Increasing top_k may return more old chunks, but it does not add the missing 2026 content to the index.

Question 7

Topic: Assembling and Deploying Applications

A RAG application works in a developer notebook but fails after being deployed to Databricks Model Serving. Review the endpoint artifact:

Endpoint: helpdesk-rag-prod
Served model: main.genai.helpdesk_rag/3
Run as: service-principal://svc-helpdesk-serving

Trace excerpt:
retriever.load_index("main.support.kb_index")
PERMISSION_DENIED: svc-helpdesk-serving lacks SELECT on main.support.kb_chunks

Which issue best explains the serving failure?

Options:

A. The endpoint is blocked by an AI Gateway rate limit.
B. The serving principal lacks access to a governed Unity Catalog table.
C. The retriever uses an incompatible embedding dimension.
D. The model was registered outside Unity Catalog.

Best answer: B

Explanation: Databricks Model Serving executes the deployed application under the endpoint’s configured identity, not necessarily the developer identity that tested the notebook. When the app reads governed resources, such as Unity Catalog tables or Vector Search data backed by those tables, that serving identity must have the required privileges. The artifact names the run-as principal and shows a PERMISSION_DENIED error for SELECT on main.support.kb_chunks, so the failure is an access-control issue on a governed resource. Registration, embedding configuration, and rate limiting would produce different evidence in the logs. The key takeaway is to verify the endpoint identity’s Unity Catalog privileges when a deployed app fails but the notebook succeeds.

Registration mismatch fails because the artifact shows a served model in main.genai, so registration location is not the indicated problem.
Embedding mismatch fails because the error is permission-based, not a vector dimension or retrieval-quality error.
Rate-limit block fails because the artifact shows Unity Catalog SELECT denial, not an AI Gateway throttling message.

Question 8

Topic: Assembling and Deploying Applications

A support RAG chain is registered in Unity Catalog and deployed to a Databricks Model Serving endpoint. It uses Foundation Model APIs and a Mosaic AI Vector Search index over governed product documents. It works in the developer notebook, but production calls fail. The app must keep Unity Catalog least privilege and cannot use personal tokens.

Endpoint log: PERMISSION_DENIED
Principal: svc-serving-prod
Action: query Vector Search index main.rag.product_docs_index

Which engineering decision best fixes the failure?

Options:

A. Store the developer’s personal token as an endpoint secret.
B. Grant the serving identity access to query the Vector Search index.
C. Grant all app users SELECT on the source Delta table.
D. Recreate the endpoint with a smaller foundation model.

Best answer: B

Explanation: A serving endpoint failure that appears only after deployment often comes from the difference between the developer’s interactive identity and the identity used by the served workload. The log names svc-serving-prod and shows a denied query against the Vector Search index, so the fix is to grant that serving identity the required least-privilege access to the governed retrieval resource. This preserves Unity Catalog governance and avoids relying on a developer’s personal permissions or credentials. Changing the model or broadening user access does not address the principal that is actually failing at runtime.

User table grants miss the runtime principal shown in the log and may grant broader access than needed.
Personal token workaround violates the no-personal-token requirement and creates an unstable credential dependency.
Smaller model changes generation behavior, but the failure occurs before retrieval can query the governed index.

Question 9

Topic: Assembling and Deploying Applications

A team is deploying a support assistant that must call an approved Databricks foundation model endpoint. Which serving pattern best satisfies the security note?

Exhibit: deployment note

Foundation model endpoint: fm-approved-chat
Endpoint policy: only approved service principals get CAN QUERY
Application target: rag-support-agent on Databricks Model Serving
Security requirement: keep fm-approved-chat ACLs as the control point
No personal access tokens are allowed in application code

Options:

A. Deploy with endpoint resource access and grant the serving identity CAN QUERY.
B. Call the foundation model directly from the user’s browser.
C. Copy the model to an unmanaged external inference service.
D. Store an admin token in the application secret store.

Best answer: A

Explanation: For a Databricks-served GenAI application, the appropriate pattern is to deploy the application on Model Serving and declare the approved foundation model endpoint as a required serving resource. The application’s serving identity is then granted CAN QUERY on that endpoint. This keeps the foundation model endpoint’s access control list as the policy enforcement point and avoids embedding personal or broad administrative tokens in application code. The key idea is not just that the app can reach the model, but that the call path remains governed by Databricks endpoint permissions.

Using a shared token or moving inference outside the governed endpoint would weaken or bypass the access model.

Admin token shortcut fails because a broad shared credential bypasses the endpoint-specific service principal control.
Browser direct call fails because the requirement is for the served application to call the endpoint without personal tokens.
External unmanaged service fails because it removes the approved Databricks endpoint ACL from the control path.

Question 10

Topic: Assembling and Deploying Applications

A team is choosing how to operationalize model calls for several GenAI tasks. They prefer Databricks SQL when the input is already stored in Delta tables and the output can be written back as batch results.

Artifact: workload notes

Workload	Input	Expected behavior
Customer review tagging	`prod.support.reviews` Delta table, 2M rows	Nightly add topic and sentiment columns
Helpdesk copilot	Browser chat messages	Sub-second conversational turns
Claims agent	Each request triggers tools and approvals	Multi-step workflow per user
PDF Q&A	User question plus retrieved chunks	Interactive answer with citations

Which workload is the best fit for ai_query()?

Options:

A. PDF Q&A
B. Customer review tagging
C. Helpdesk copilot
D. Claims agent

Best answer: B

Explanation: ai_query() is appropriate when model calls can be expressed from SQL over stored data, such as applying a prompt or model endpoint to rows in a Delta table and writing the outputs to another table or columns. The review-tagging workload is a repeatable nightly batch inference job: each row already exists in governed storage, and the desired output is structured metadata persisted back to the lakehouse. Interactive chat, agent workflows, and retrieval-time Q&A usually need request orchestration, session context, tools, or retriever logic outside a simple SQL batch pattern. The key signal is stored tabular input plus row-wise model inference at batch time.

Chat latency makes the copilot a request-serving problem, not a stored-data batch inference workload.
Tool orchestration makes the claims agent better suited to an agent framework or application service.
Retrieval-time context makes PDF Q&A an interactive RAG application rather than a SQL batch over existing rows.

Continue in the web app

Use IT Mastery for interactive Databricks Generative AI Engineer Associate practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Databricks Generative AI Engineer Associate on Web

Application Development

Governance

Free Databricks Generative AI Engineer Associate Practice Questions: Deploying GenAI Apps

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue in the web app

Related focused pages

Browse Certification Practice Tests by Exam Family