Databricks Certified Generative AI Engineer Associate Exam Blueprint

Last revised: June 29, 2026

Practical exam blueprint for the Databricks Certified Generative AI Engineer Associate exam.

How to Use This Exam Blueprint

Use this independent Exam Blueprint as a practical study map for the Databricks Certified Generative AI Engineer Associate exam, code GenAI Engineer. It is designed to help you verify that you can apply Databricks generative AI concepts in realistic engineering scenarios, not just recognize terms.

Work through the checklist in three passes:

Concept pass: Confirm you understand the purpose of each component, service, and workflow.
Scenario pass: Practice choosing the right design, data flow, or troubleshooting step from a short business requirement.
Final-readiness pass: Use the checkbox sections to identify weak areas before taking the exam.

Because exact exam weights are not provided here, the areas below are presented as readiness areas, not official weighted domains.

Exam Identity and Readiness Scope

Item	What to Know
Vendor/provider	Databricks
Exam title	Databricks Certified Generative AI Engineer Associate
Exam code	GenAI Engineer
Professional vertical	IT, data, AI engineering
Main readiness focus	Building, evaluating, deploying, governing, and troubleshooting generative AI solutions on Databricks
Practical emphasis	RAG applications, model serving, vector search, prompt engineering, evaluation, MLflow, governance, and production operations
Study approach	Combine Databricks platform knowledge with applied generative AI engineering judgment

Topic-Area Readiness Table

Readiness area	You should be able to…	Common exam-style cue
Generative AI fundamentals	Explain LLM behavior, tokens, context windows, embeddings, hallucination risk, grounding, and prompt design	“The model gives plausible but unsupported answers. What should you improve?”
Databricks GenAI architecture	Map a GenAI application to Databricks components such as notebooks, workflows, model serving, vector search, Unity Catalog, and MLflow	“Which component stores governed data or serves an endpoint?”
Retrieval-augmented generation	Design a RAG flow from source data to chunking, embeddings, indexing, retrieval, prompt construction, response generation, and evaluation	“Users need answers grounded in internal documents.”
Embeddings and vector search	Choose when to embed text, how to create searchable chunks, and how retrieval quality affects answer quality	“Search returns irrelevant passages even though documents exist.”
Prompt engineering	Improve instructions, context, examples, formatting constraints, and safety boundaries	“The model ignores output format or invents fields.”
Model selection and serving	Understand tradeoffs among hosted models, external models, foundation models, custom models, latency, cost, quality, and governance	“The team needs a low-latency endpoint with controlled access.”
MLflow and experiment tracking	Track prompts, parameters, model versions, evaluation outputs, traces, and artifacts	“You need reproducibility across model and prompt versions.”
Evaluation and quality	Select offline and online evaluation approaches, use human feedback where appropriate, and diagnose retrieval vs generation failures	“Answers are fluent but fail factual correctness tests.”
Agents and tool use	Recognize when an agentic workflow, function/tool calling, or multi-step orchestration is appropriate	“The assistant must query data, call tools, and decide next steps.”
Data governance and security	Apply Unity Catalog concepts, access control, lineage, data privacy, secrets, and responsible AI controls	“Sensitive documents should be available only to approved users.”
Deployment and operations	Move from notebook prototype to production workflow, endpoint, monitoring, evaluation, and rollback plan	“The prototype works, but failures occur after deployment.”
Troubleshooting	Identify likely causes for poor responses, slow latency, failed indexing, missing permissions, stale data, or excessive cost	“Quality dropped after a document refresh.”

Generative AI Fundamentals

Core Concepts to Review

Concept	Ready means you can explain…	Watch for
Large language model	How an LLM generates text from input context and learned patterns	Treating LLMs as deterministic databases
Token	Why input and output length affect cost, latency, and context limits	Assuming characters, words, and tokens are the same
Context window	Why only supplied context and model-accessible information can influence a response	Expecting the model to “know” private data without retrieval
Temperature	How higher or lower randomness can affect consistency and creativity	Increasing randomness when consistency is required
Prompt	Instructions, user task, retrieved context, examples, constraints, and formatting guidance	Mixing system rules, user content, and retrieved facts carelessly
Hallucination	Plausible but unsupported output	Solving hallucination only by changing wording instead of grounding
Embedding	Numeric representation used for semantic similarity	Using embeddings without considering chunk quality
RAG	Retrieval-augmented generation: retrieve relevant context, then generate grounded output	Confusing RAG with model fine-tuning

Can You Do This?

Explain why an LLM may answer incorrectly even when the prompt is well written.
Distinguish between model knowledge, prompt-provided context, and retrieved enterprise data.
Describe how embeddings support semantic search.
Explain why context quality matters more than simply adding more text.
Identify when a problem is likely a retrieval issue rather than a generation issue.
Explain why evaluation should include factuality, relevance, safety, and usefulness.
Describe the difference between prototype behavior and production GenAI requirements.

Databricks Platform Concepts for GenAI Engineering

You do not need to memorize every product detail, but you should understand how Databricks platform components fit together in a GenAI application.

Platform area	GenAI engineering role	Readiness check
Workspace and notebooks	Exploration, prototyping, prompt tests, data preparation, and model experiments	Can you describe when notebooks are useful and when to productionize with jobs/workflows?
Delta tables	Reliable structured storage for source data, prepared chunks, logs, and evaluation data	Can you identify where intermediate RAG artifacts might be stored?
Unity Catalog	Governance, permissions, lineage, discovery, and secure access to data and AI assets	Can you reason about who should access source data, indexes, models, and endpoints?
Databricks SQL	Querying governed datasets and supporting analytics use cases	Can you tell when a natural language assistant should query structured data instead of only documents?
MLflow	Tracking experiments, prompts, models, parameters, metrics, artifacts, and evaluation results	Can you explain why reproducibility matters for GenAI systems?
Model Serving	Hosting model or application endpoints for inference	Can you choose serving when an application needs API-based inference?
Vector Search	Semantic retrieval over embedded content	Can you diagnose poor retrieval from chunking, embedding, indexing, or query problems?
Workflows/jobs	Scheduled or triggered production pipelines	Can you describe how to refresh embeddings or evaluations automatically?
Secrets and credentials	Secure handling of tokens and external service credentials	Can you avoid hard-coding credentials in notebooks or applications?
Monitoring and logs	Observability for latency, errors, quality, and usage	Can you name signals that indicate a production GenAI issue?

Retrieval-Augmented Generation Readiness

RAG is a central pattern for enterprise GenAI because it grounds model output in data that the model may not have learned during training.

RAG Workflow Checklist

Step	What happens	Candidate readiness
Source selection	Identify documents, tables, pages, tickets, policies, or knowledge bases	Know how source quality and permissions affect answers
Ingestion	Load data into a reliable processing environment	Understand batch, incremental, and refresh considerations
Cleaning	Remove noise, duplicates, irrelevant markup, or broken content	Recognize that noisy input creates noisy answers
Chunking	Split content into retrievable units	Balance context completeness against retrieval precision
Metadata enrichment	Add source, owner, date, category, access level, or document ID	Use metadata for filtering, citation, and governance
Embedding	Convert chunks into vectors	Understand model compatibility and semantic similarity
Indexing	Store embeddings for vector search	Know that the index must reflect current, authorized data
Retrieval	Find relevant chunks for a user query	Tune query handling, filters, and top results
Prompt construction	Combine instructions, user question, and retrieved context	Avoid prompt injection and context confusion
Generation	Produce answer using the model	Constrain answer style, citations, and uncertainty handling
Evaluation	Measure retrieval and answer quality	Separate retrieval failure from generation failure
Monitoring	Track performance in production	Watch latency, errors, drift, freshness, and user feedback

RAG Design Prompts

Ask yourself these before choosing a design:

Is the data primarily unstructured text, structured tables, or a mix?
Does the answer require semantic search, SQL aggregation, or both?
Must answers cite source documents?
Are users allowed to access all retrieved content?
How often does source data change?
What happens when no relevant context is retrieved?
Should the assistant refuse, ask a clarifying question, or answer with uncertainty?
How will the team test whether retrieval is working?
How will refreshed documents update the vector index?
How will incorrect or outdated source content be removed?

Common RAG Weak Areas

Weak area	Symptom	Better thinking
Poor chunking	Retrieved context is incomplete or too broad	Chunk by semantic sections where possible; preserve useful metadata
Missing metadata	Cannot filter by source, user group, date, or document type	Add metadata during ingestion, not as an afterthought
Stale index	Answers reference outdated policy or old documentation	Plan index refresh and validation
Over-retrieval	Prompt includes irrelevant chunks and confuses the model	Improve retrieval precision and ranking
Under-retrieval	Model lacks enough context to answer	Improve query transformation, chunking, embedding, or top result strategy
No fallback behavior	Model invents an answer when context is absent	Instruct the model to say when information is unavailable
Ignoring permissions	Users see content they should not access	Align retrieval with governance and access controls

Embeddings and Vector Search

What to Be Ready For

Topic	Readiness target
Embedding purpose	Explain how embeddings represent semantic meaning for similarity search
Query embedding	Understand that the user query is embedded and compared with indexed content
Chunk embedding	Know that retrieval quality depends on the embedded chunk content
Metadata filtering	Use filters to restrict candidate results before or during retrieval
Similarity	Understand that “similar” does not always mean “correct” or “sufficient”
Index refresh	Recognize that new or changed source data requires refreshed searchable representations
Retrieval evaluation	Measure whether the right passages are retrieved, not just whether the final answer sounds good

Can You Diagnose This?

Scenario	Likely issue to investigate
The correct document exists, but it is never retrieved	Chunking, embedding model choice, index freshness, metadata filter, query wording
Results are semantically related but not answer-bearing	Chunk granularity, ranking, metadata, query transformation
Sensitive documents appear for unauthorized users	Permission model, metadata filters, Unity Catalog governance, application-layer checks
Latency is high during retrieval	Index design, query pattern, result count, filtering, endpoint load
Answers cite irrelevant sources	Retrieval quality, prompt structure, citation logic, post-processing

Prompt Engineering Checklist

Prompt engineering for the Databricks Certified Generative AI Engineer Associate exam is not just writing clever instructions. Be ready to reason about prompts as production artifacts that need testing, versioning, and governance.

Prompt Components

Prompt component	Purpose	Example readiness question
Role or task instruction	Defines what the assistant should do	Can you make the task unambiguous?
Constraints	Limits scope, tone, format, or allowed sources	Can you prevent unsupported claims?
Retrieved context	Provides grounded facts	Can you distinguish context from user instructions?
Examples	Demonstrate desired output	Can you use examples without overfitting the response?
Output schema	Enforces structured response	Can you specify JSON-like fields or bullet structure?
Refusal/uncertainty rule	Handles missing or unsafe information	Can you tell the model not to guess?
Citation rule	Requires source references	Can you tie claims to retrieved passages?

Prompt Readiness Checklist

Write prompts that separate system instructions, developer/application instructions, retrieved context, and user input.
Include rules for missing context, uncertainty, and unsupported questions.
Constrain output format when downstream systems need structured results.
Add examples only when they improve consistency.
Avoid placing untrusted retrieved text where it can override safety instructions.
Test prompts with normal, ambiguous, adversarial, and out-of-scope questions.
Track prompt versions and evaluation results.
Recognize when prompt tuning is not enough and retrieval, data quality, or model choice must change.

Example Prompt Skeleton

System:
You are a support assistant. Answer only from the provided context.
If the context does not contain the answer, say that the information is not available.

Context:
{retrieved_chunks}

User question:
{question}

Response requirements:
- Use concise language.
- Cite the source title when possible.
- Do not invent policy details.

Model Selection, Serving, and Inference

Model Choice Decision Table

Requirement	Consider
Fast prototype	Use an available model endpoint or managed model access pattern suitable for experimentation
Enterprise data grounding	Add RAG rather than relying only on model pretraining
Strict output structure	Prompt constraints, schema validation, post-processing, or model/tool strategy
Domain-specific language	Better retrieval, examples, fine-tuning, or specialized model selection depending on need
Low latency	Smaller/faster model, efficient prompt, fewer retrieved chunks, optimized serving
High factual accuracy	Better grounding, retrieval evaluation, citations, and human review
Cost control	Prompt length, model size, request volume, caching, batching where appropriate
Governance	Access controls, lineage, approval process, tracking, and monitoring

Serving Readiness

You should be able to reason about:

When a model or GenAI application should be exposed through a serving endpoint.
Why production inference needs authentication, authorization, and monitoring.
How input size, output length, retrieval calls, and model choice affect latency.
How to separate development, staging, and production behavior.
How to compare model versions or prompt versions before rollout.
How to handle endpoint errors, timeouts, and fallback responses.
Why logging prompts and responses may require privacy controls.

Inference Failure Cues

Cue	What to check
Endpoint works in notebook but not application	Authentication, endpoint name, network path, request format, permissions
Responses are slow	Prompt size, retrieved context count, model latency, tool calls, concurrency, downstream systems
Responses changed after deployment	Model version, prompt version, retrieval index, configuration, data refresh
Cost increased	Request volume, token usage, large context, inefficient retrieval, model selection
Users receive inconsistent answers	Temperature/settings, prompt ambiguity, retrieval variability, missing deterministic constraints

MLflow, Tracking, and Evaluation

What MLflow Readiness Looks Like

Area	You should be able to…
Experiment tracking	Track prompt versions, model parameters, evaluation data, and outputs
Artifact logging	Store relevant files, examples, metrics, and evaluation results
Model lifecycle	Understand why versioning and reproducibility matter
Comparison	Compare runs across prompts, models, retrieval settings, and datasets
Evaluation	Use metrics and qualitative review to select better candidates
Traceability	Connect a production issue back to prompt, model, data, and code changes

Evaluation Checklist

Define what a good answer means before optimizing.
Use representative questions, not only easy examples.
Include negative tests where the answer should be “not enough information.”
Evaluate retrieval separately from final answer quality.
Check factual correctness against source context.
Check relevance, completeness, conciseness, and citation accuracy.
Include safety and privacy tests.
Compare model and prompt versions using the same evaluation set.
Review failures manually to categorize root cause.
Keep evaluation artifacts so results can be reproduced.

Evaluation Dimensions

Dimension	Good result	Failure signal
Groundedness	Claims are supported by retrieved context	Unsupported facts or invented details
Relevance	Answer directly addresses the question	Tangential or generic response
Completeness	Includes necessary details without excess	Missing key steps or conditions
Faithfulness	Does not contradict source	Conflicts with retrieved document
Citation quality	Sources match claims	Citations are absent, wrong, or decorative
Safety	Avoids unsafe, private, or prohibited output	Reveals sensitive content or follows malicious instructions
Format compliance	Matches required structure	Invalid JSON, missing fields, wrong schema
Latency	Meets application needs	Too slow for user workflow

Agents, Tools, and Application Orchestration

Agentic patterns can be useful when an application must decide among actions, call tools, retrieve information, or complete multi-step tasks. Be ready to distinguish agent use cases from simpler RAG or single-prompt applications.

Pattern	Use when…	Avoid when…
Simple prompt	Task is self-contained and does not need external data	Enterprise facts or current data are required
RAG	Model needs grounded unstructured context	The answer requires precise structured calculation only
Text-to-SQL or tool call	Assistant needs to query structured data or call an API	A free-form answer is enough
Agent workflow	Task requires planning, multiple steps, tool selection, or iterative reasoning	Deterministic workflow is simpler and safer
Human-in-the-loop	Output affects high-risk decisions or needs expert approval	Fully automated action is acceptable and low risk

Agent Readiness Checklist

Explain why tool selection increases both capability and risk.
Identify when a deterministic workflow is better than an agent.
Define allowed tools, inputs, outputs, and stopping conditions.
Validate tool outputs before using them in final responses.
Prevent the model from calling tools with unauthorized or unsafe parameters.
Log traces for debugging multi-step behavior.
Evaluate not only the final answer but also the path taken.

Data Governance, Security, and Responsible AI

For Databricks GenAI engineering, governance is part of the design. Be ready for scenarios where the technically easiest solution is not the correct production answer.

Governance Readiness Table

Area	What to review	Scenario cue
Unity Catalog	Governed access to data and AI assets	“Only HR users should retrieve HR documents.”
Access control	Least privilege for users, jobs, endpoints, and service principals	“The notebook owner can access data, but the app user cannot.”
Data lineage	Understanding where source data, chunks, indexes, and outputs came from	“Which documents influenced this answer?”
Sensitive data	Handling PII, secrets, regulated data, or confidential text	“Logs contain full user prompts with private information.”
Secrets management	Avoiding hard-coded credentials	“A token is stored directly in a notebook.”
Prompt injection	Defending against malicious instructions in user input or retrieved content	“A document says: ignore previous instructions.”
Output safety	Refusal, redaction, review, or policy rules	“The assistant returns restricted information.”
Auditability	Tracking requests, versions, and decisions	“The team must explain why a response changed.”

Security and Privacy Checklist

Apply least privilege to source tables, files, models, indexes, and endpoints.
Avoid embedding or indexing data that users should not be able to retrieve.
Use metadata and governance controls to enforce access boundaries.
Do not hard-code secrets in prompts, notebooks, jobs, or application code.
Treat retrieved documents as untrusted content that may contain malicious instructions.
Decide what prompt, response, and trace data may be logged.
Redact or avoid storing sensitive information when not needed.
Validate generated output before downstream use in high-impact workflows.
Maintain lineage from answer to source where citations or audit are required.

Data Preparation for GenAI Applications

Artifact Checklist

Artifact	Why it matters	Ready when you can…
Raw source data	Original enterprise knowledge	Identify source owner, freshness, and permissions
Cleaned documents	Reduced noise and duplication	Explain cleaning rules and what was removed
Chunk table	Retrieval-ready text units	Choose chunk size strategy based on content type
Metadata columns	Filtering, governance, and citations	Name useful metadata fields for a scenario
Embedding table or index source	Vector search input	Explain how embeddings are regenerated
Vector index	Fast semantic retrieval	Describe refresh and access considerations
Evaluation dataset	Repeatable quality measurement	Build questions with expected evidence
Prompt template	Controlled generation behavior	Version and test changes
Inference endpoint	Production access path	Monitor latency, errors, and usage
Feedback table	User or reviewer signals	Use feedback to prioritize improvements

Example RAG Data Fields

document_id
source_system
source_title
source_uri_or_reference
owner_team
access_group
last_updated
chunk_id
chunk_text
chunk_order
embedding

You do not need to use these exact names, but you should understand why fields like source, owner, access group, and last-updated date are useful.

Troubleshooting Decision Points

Retrieval or Generation?

Symptom	More likely retrieval issue	More likely generation issue
Correct source is absent from context	Yes	No
Context is present but answer contradicts it	Possible	Yes
Answer is generic and lacks detail	Yes	Possible
Answer invents a policy not in context	Possible	Yes
Citations point to irrelevant documents	Yes	Possible
Output format is wrong	No	Yes
Answer omits required field from JSON	No	Yes
Model refuses safe questions	Possible prompt or safety configuration issue	Yes

Production Troubleshooting Checklist

Check whether source data changed.
Check whether the vector index was refreshed successfully.
Check whether the user has permission to retrieve needed documents.
Check whether metadata filters are too restrictive.
Check whether prompts or model parameters changed.
Check whether the endpoint version changed.
Check recent logs for errors, timeouts, or malformed requests.
Compare failing examples against the evaluation set.
Reproduce the issue with the exact prompt, context, model, and configuration.
Categorize the failure before changing the system.

Troubleshooting Flow

    flowchart TD
	    A[Bad or unexpected answer] --> B{Was relevant context retrieved?}
	    B -- No --> C[Check source data, chunking, embeddings, index freshness, filters, permissions]
	    B -- Yes --> D{Does the answer follow the retrieved context?}
	    D -- No --> E[Check prompt instructions, model behavior, safety rules, output constraints]
	    D -- Yes --> F{Is the answer still incomplete?}
	    F -- Yes --> G[Improve retrieval depth, context selection, prompt specificity, or source coverage]
	    F -- No --> H[Review evaluation criteria and user expectation]

Scenario and Decision-Point Practice

Use these prompts to test whether you can make exam-ready choices.

Scenario	Better answer should consider…
A legal team wants an assistant that answers only from approved policy documents	RAG, governed source data, metadata, access control, citations, refusal when context is missing
A support bot gives outdated answers after a documentation update	Index refresh, data pipeline schedule, source versioning, cache behavior, evaluation after refresh
A model gives correct answers in testing but leaks sensitive content in production	Permissions, retrieval filters, logging, prompt injection, user identity propagation
A finance analyst asks natural language questions about sales totals	Structured query/tool use may be better than document-only RAG
A chatbot response is too slow for end users	Prompt length, retrieval count, model choice, endpoint performance, tool call chain
A team wants to compare two prompts and two models	MLflow tracking, fixed evaluation set, metrics, artifacts, side-by-side review
A retrieved document contains “ignore previous instructions”	Treat retrieved text as untrusted context, reinforce system instructions, guard against prompt injection
A generated JSON response fails downstream parsing	Stronger output schema, validation, retry logic, lower randomness, post-processing
A user asks a question outside the knowledge base	Refusal or clarification, not hallucination
A team cannot reproduce a bad production response	Log prompt version, retrieved chunks, model version, parameters, endpoint, and trace where appropriate

Code and Configuration Awareness

The exam may test whether you understand the shape of GenAI engineering workflows. You should not rely on memorizing long code blocks, but you should recognize concise patterns.

Embedding and Retrieval Pseudocode

question = "What is the escalation policy for priority incidents?"

query_embedding = embed(question)

results = vector_search(
    embedding=query_embedding,
    filters={"document_type": "policy"},
    top_k=5
)

context = format_context(results)

answer = llm_generate(
    instructions="Answer only from the provided context. Cite sources.",
    context=context,
    question=question
)

Readiness checks:

Can you identify where permissions and filters should apply?
Can you explain what happens if results is empty?
Can you explain why top_k affects context quality, latency, and cost?
Can you explain why the prompt should tell the model not to invent missing facts?

Evaluation Pseudocode

for example in evaluation_set:
    retrieved = retrieve(example.question)
    response = generate(example.question, retrieved)

    score = evaluate(
        question=example.question,
        expected_evidence=example.expected_evidence,
        retrieved_context=retrieved,
        response=response
    )

    log_result(example.id, score, response, retrieved)