AI-103 — Microsoft Azure AI Apps and Agents Developer Associate Exam Blueprint

Last revised: June 18, 2026

Practical AI-103 exam blueprint for Microsoft Azure AI Apps and Agents Developer Associate candidates covering Azure AI apps, agents, RAG, security, monitoring, and final review.

How to Use This AI-103 Exam Blueprint

Use this checklist as a practical readiness map for the Microsoft Azure AI Apps and Agents Developer Associate (AI-103) exam from Microsoft. It is designed to help you translate the exam identity into review tasks: what to know, what to build, what to troubleshoot, and what decisions you should be able to justify.

This is not a list of exact official weights or scoring rules. Treat it as a final-review blueprint for the major readiness areas likely to matter when developing Azure-based AI apps and agents.

For each section, ask:

Can I explain the concept without notes?
Can I choose the right Azure service or pattern for a scenario?
Can I recognize the security, cost, quality, and operational tradeoffs?
Can I troubleshoot a broken or low-quality AI app workflow?

Topic-Area Readiness Table

Readiness area	What to review	You are ready when you can…
Azure AI app architecture	App flow, model access, API integration, user experience, backend services, orchestration	Design an end-to-end AI app pattern from user request to model response, including error handling and observability
Generative AI foundations	Prompting, grounding, context windows, embeddings, model selection, temperature-like behavior, structured outputs	Explain why a response changes, how to improve it, and when to use grounding instead of prompt-only answers
Azure AI Foundry and project workflow	Model deployment, app development workflow, evaluation, responsible AI features, prompt iteration	Describe how a team would build, test, evaluate, and deploy an AI solution using Azure AI development tooling
Agents and tool use	Agent instructions, tools/functions, actions, state, memory, retrieval, orchestration, human escalation	Decide when an agent is appropriate and design safe tool-calling behavior
Retrieval-augmented generation	Data ingestion, chunking, embeddings, vector search, hybrid search, ranking, citations, freshness	Build and troubleshoot a grounded chat or Q&A pattern over private data
Azure AI Search	Indexes, fields, analyzers, vector fields, filters, scoring/ranking, semantic capabilities, indexers	Choose an indexing/search design for keyword, vector, hybrid, or filtered retrieval scenarios
Azure AI services	Language, Vision, Speech, Document Intelligence, Translator, Content Safety, and related API patterns	Select the appropriate AI service for text, image, audio, document, moderation, or extraction scenarios
Responsible AI and safety	Content filtering, jailbreak and prompt injection risks, protected data, groundedness, transparency, review workflows	Add practical controls to reduce unsafe, ungrounded, or unauthorized model behavior
Security and identity	Microsoft Entra ID, managed identities, RBAC, keys, secrets, Key Vault, network controls	Choose secure authentication and authorization patterns for AI apps and services
Data governance	Data classification, access boundaries, logging, retention, data minimization, tenant/project separation	Identify where sensitive data can leak and how to reduce exposure
Deployment and integration	APIs, containers, app hosting, CI/CD, configuration, model endpoint references, environment separation	Move a prototype toward a maintainable dev/test/prod deployment
Monitoring and evaluation	Quality metrics, telemetry, traces, latency, failures, token usage, retrieval quality, user feedback	Diagnose whether a problem is caused by prompts, retrieval, tools, model behavior, or infrastructure
Troubleshooting	Authentication errors, endpoint issues, poor search recall, hallucinations, unsafe outputs, latency, throttling symptoms	Work from symptom to likely cause and choose a practical fix

Core “Can You Do This?” Checklist

Azure AI App Development Basics

Explain the difference between an AI model, a model deployment, an endpoint, an app, an agent, and a tool.
Describe a typical request flow for a chat, summarization, extraction, classification, or multimodal AI app.
Identify where configuration belongs: code, environment variables, managed configuration, Key Vault, or deployment settings.
Explain when to call an Azure AI service directly versus when to orchestrate multiple services.
Recognize when a prototype design is not production-ready because of missing security, monitoring, evaluation, or data controls.
Describe how retry behavior, timeouts, and fallback responses affect user experience.
Explain how latency, cost, response quality, and safety requirements influence model and architecture choices.

Generative AI and Prompting

Write clear system-level instructions that define role, boundaries, allowed sources, and output format.
Use prompt examples to guide style or classification behavior.
Separate user input from trusted instructions and retrieved context.
Ask for structured output when downstream code needs predictable fields.
Explain why a model may hallucinate when it lacks grounding or when retrieved context is weak.
Recognize when prompt tuning alone is insufficient and retrieval, tools, evaluation, or fine-tuning-like approaches should be considered.
Design a prompt that tells the model what to do when the answer is not present in the supplied context.

Agents and Tool-Calling

Explain when an agent is useful: multi-step tasks, tool use, planning, retrieval, or workflow orchestration.
Define a tool/function with clear inputs, outputs, validation, and failure behavior.
Distinguish between agent instructions and user instructions.
Limit what an agent can do through least-privilege tools.
Add confirmation or human review for risky actions.
Prevent the agent from treating untrusted retrieved text as system instructions.
Troubleshoot loops, repeated tool calls, missing parameters, and tool output misinterpretation.

Retrieval-Augmented Generation

Explain the purpose of chunking, embeddings, vector search, keyword search, and hybrid retrieval.
Choose appropriate metadata fields for filtering, authorization, freshness, and citations.
Recognize symptoms of poor retrieval: irrelevant chunks, missing key documents, stale content, weak citations, or inconsistent answers.
Explain the difference between improving retrieval and changing the generation prompt.
Design a grounded answer format that includes citations or source references when needed.
Evaluate retrieval quality with representative questions, not only happy-path demos.

Azure AI Services Integration

Select the right service pattern for language understanding, document extraction, image analysis, speech, translation, or moderation.
Explain synchronous versus asynchronous processing patterns for larger or longer-running tasks.
Handle API credentials or identities securely.
Parse service responses into application-ready data structures.
Recognize when a prebuilt model/service is sufficient versus when a custom workflow is needed.
Implement error handling for invalid inputs, unsupported formats, authentication failures, and service-side failures.

Service and Pattern Selection Checks

Scenario cue	Strong answer pattern	Watch for this trap
“Answer questions from company documents”	RAG with Azure AI Search, embeddings, metadata filters, citations, and grounded prompt instructions	Relying on a generic model prompt without retrieval
“Perform actions in another system”	Agent or orchestration layer with constrained tools, validation, permissions, and audit logs	Letting the model directly decide unrestricted actions
“Extract fields from invoices, forms, or PDFs”	Document Intelligence or document-processing workflow, followed by validation and business rules	Treating all document extraction as plain chat completion
“Moderate user-generated content”	Content safety checks before or after generation, depending on risk	Assuming model instructions alone are a safety control
“Search legal or policy documents with exact terms and semantic meaning”	Hybrid search with keyword, vector, filters, and ranking considerations	Using only vector search and losing exact-term behavior
“Support multiple departments with different document access”	Security trimming, metadata filters, identity-aware retrieval, and authorization checks	Indexing all content together without access boundaries
“Reduce hallucinations”	Improve grounding, prompt constraints, citations, evaluation, and refusal behavior	Only lowering randomness or adding “be accurate” to the prompt
“Improve slow responses”	Inspect retrieval time, model latency, tool calls, payload size, and streaming options	Blaming the model before measuring the full request path
“Move from demo to production”	Add identity, secrets management, logging, monitoring, evaluation, deployment automation, and rollback strategy	Shipping notebook or portal-only configuration as the production design

Agent Readiness Checklist

Agents are likely to test judgment, not just definitions. Be ready to reason about what the agent is allowed to know, decide, and do.

Agent design element	Review questions
Instructions	Are role, task, boundaries, source priority, and refusal rules clear?
Tools	Are tool names, descriptions, parameters, and return values unambiguous?
Permissions	Does each tool have only the access needed for its task?
State	What conversation or task state is retained, and for how long?
Memory	Is memory necessary, user-approved, scoped, and safe?
Retrieval	Does the agent retrieve authoritative information before answering?
Validation	Are tool inputs validated before execution?
Confirmation	Are destructive, expensive, or external actions confirmed?
Error handling	What happens if a tool fails, returns partial data, or times out?
Observability	Can you trace the plan, tool calls, retrieved context, and final answer?
Safety	Can the agent resist prompt injection embedded in documents, emails, or web content?

Agent Decision Prompts

Can you answer these without guessing?

Should this be a simple chat app, a workflow, or an agent?
Which tools should be available to the agent, and which should remain unavailable?
What tool calls require human approval?
What should the agent do if two tools return conflicting information?
How should the app record tool calls for debugging and audit?
How do you prevent user text from overriding system or developer instructions?
How do you prevent retrieved documents from becoming hidden instructions?

RAG and Azure AI Search Checklist

Retrieval Design

Design choice	What to know
Chunking	Smaller chunks can improve precision but may lose context; larger chunks preserve context but may retrieve irrelevant text
Embeddings	Used to represent meaning for vector similarity search
Keyword search	Useful for exact terms, identifiers, product names, legal phrases, and codes
Vector search	Useful for semantic similarity and natural-language queries
Hybrid search	Combines keyword and vector approaches for many enterprise search scenarios
Metadata filters	Support category filtering, security trimming, document type, freshness, geography, owner, or business unit
Ranking	Determines which retrieved passages are sent to the model
Citations	Help users verify grounded answers
Index refresh	Determines whether the app answers from current or stale content
Evaluation set	A representative set of questions and expected source documents

RAG Failure Diagnosis

Symptom	Likely area to inspect
Answer is fluent but unsupported	Prompt grounding rules, retrieved context quality, citation enforcement
Correct document is not retrieved	Index coverage, chunking, metadata filters, embedding strategy, query formulation
Retrieved chunks are too broad	Chunk size, overlap, field selection, ranking
User sees data they should not access	Authorization, security trimming, metadata filters, identity propagation
Good answer in test but bad in production	Content freshness, user query diversity, missing evaluation cases, environment drift
Slow responses	Index query latency, number of retrieved chunks, model latency, tool calls, payload size
Poor exact-match results	Keyword fields, analyzers, filters, hybrid search design

Retrieval Quality Metrics to Recognize

Precision and recall are useful concepts when reviewing search and retrieval quality.

\[ \text{Precision} = \frac{\text{relevant retrieved items}}{\text{all retrieved items}} \]\[ \text{Recall} = \frac{\text{relevant retrieved items}}{\text{all relevant available items}} \]

You do not need to turn every AI app into a formal information retrieval project, but you should understand the tradeoff: high precision reduces irrelevant context, while high recall reduces missed evidence.

Prompt, Output, and Grounding Checklist

Prompt concern	Readiness check
Instruction hierarchy	Can you distinguish trusted system instructions from user input and retrieved content?
Context boundaries	Can you clearly mark retrieved context so the model knows what to use?
Refusal behavior	Can you instruct the model to say it does not know when context is insufficient?
Output schema	Can you request JSON or another predictable structure for app integration?
Few-shot examples	Can you use examples to stabilize tone, classification, or extraction?
Grounding	Can you restrict answers to supplied sources when required?
Prompt injection	Can you identify malicious instructions inside user input or retrieved documents?
Evaluation	Can you compare outputs against expected behavior across multiple test cases?

Example pseudocode flow for a grounded chat app:

receive user question
authenticate user
rewrite or normalize query if needed
retrieve authorized context from search index
build prompt with system instructions + retrieved context + user question
call model endpoint
validate output for safety and format
return answer with citations
log telemetry for quality and troubleshooting

Security, Identity, and Governance Checklist

AI-103 preparation should include security judgment. Many exam scenarios are less about “can the model answer?” and more about “is this design safe and maintainable?”

Area	Be ready to decide…
Authentication	Whether the app should use managed identity, user delegation, service principal, or key-based access
Authorization	How users, apps, and agents are allowed to access resources and data
Secrets	Where keys, connection strings, and endpoint secrets should be stored
RBAC	Which identities need access to AI services, search, storage, monitoring, or deployment resources
Network access	Whether public endpoints, private connectivity, firewall rules, or network restrictions are needed
Data protection	How sensitive input, output, prompts, files, and logs are handled
Tenant/project separation	How to isolate dev/test/prod or different business units
Auditability	Whether prompts, retrieval results, tool calls, and actions can be reviewed
Compliance support	How the app supports organizational policy without claiming unsupported guarantees
Least privilege	How to avoid giving an agent or app broad access “just to make it work”

Security Scenario Cues

A user should only query documents they are permitted to read.
An agent can create tickets but must not delete records.
A developer needs local testing without committing secrets.
Logs are useful for debugging but may contain sensitive prompts or outputs.
Retrieved documents may contain malicious instructions.
A model response must not expose hidden system prompts or internal configuration.
Production and development deployments require separate identities and settings.

Responsible AI and Safety Checklist

Risk	Practical control to review
Harmful user input	Input moderation, safe completion strategy, refusal behavior
Harmful model output	Output filtering, review workflow, safety evaluation
Hallucination	Grounding, citations, retrieval evaluation, refusal when unsupported
Prompt injection	Instruction hierarchy, content isolation, tool restrictions, detection patterns
Data leakage	Data minimization, access control, logging controls, redaction where appropriate
Overreliance	User-facing caveats, source links, confidence handling, human review
Biased or unfair output	Representative tests, review of high-impact use cases, human oversight
Unsafe tool execution	Confirmation steps, allowlists, validation, audit logs

Be ready to explain that safety is layered. Prompt instructions help, but they are not a substitute for authorization, validation, monitoring, and controlled tool design.

Azure AI Services Readiness Map

Capability	What to review	Example readiness question
Language	Classification, extraction, summarization, sentiment, entity recognition, conversational language patterns	Which service or model pattern best fits a text-understanding task?
Speech	Speech-to-text, text-to-speech, translation or transcription workflows	How would you process audio input before sending text to another AI workflow?
Vision	Image analysis, OCR-related scenarios, multimodal prompts where applicable	When should you use a vision service versus a general multimodal model?
Document processing	Layout, tables, key-value extraction, document classification, validation	How do you handle low-confidence or missing fields?
Translation	Text translation and multilingual app considerations	Where should translation happen in a multi-step AI workflow?
Content safety	Input/output moderation and risk reduction	What should happen when content violates policy?
Search	Indexing, retrieval, vector and hybrid search	How do you ground model answers in private data?

Deployment, Operations, and Monitoring Checklist

Operational area	What “ready” means
Configuration	You can explain which settings differ by environment and how they are injected securely
Versioning	You can track prompt versions, app versions, model deployment references, and index schema changes
CI/CD	You understand how code, infrastructure, prompts, and tests move through environments
Telemetry	You can capture request IDs, latency, failures, retrieval details, model calls, and tool calls
Evaluation	You can run test sets before and after prompt, retrieval, or model changes
Rollback	You can revert a bad prompt, model deployment reference, tool change, or index update
Cost awareness	You can identify drivers such as model calls, retrieved context size, embeddings, indexing, and repeated agent tool calls
Reliability	You can design retries, timeouts, fallback responses, and graceful degradation
Incident review	You can inspect logs and traces without exposing sensitive data unnecessarily

Troubleshooting Decision Path

Use this flow when a scenario describes a poor answer, failed request, or unreliable agent.

    flowchart TD
	    A[Symptom reported] --> B{Request fails?}
	    B -->|Yes| C[Check auth, endpoint, deployment, quota symptoms, network, payload format]
	    B -->|No| D{Answer unsupported or wrong?}
	    D -->|Yes| E[Inspect retrieved context, prompt, model settings, citations, evaluation cases]
	    D -->|No| F{Tool or agent issue?}
	    F -->|Yes| G[Check tool schema, permissions, validation, state, loop behavior, confirmations]
	    F -->|No| H{Performance issue?}
	    H -->|Yes| I[Measure retrieval time, model latency, tool calls, payload size, streaming]
	    H -->|No| J[Review telemetry, user input, edge cases, environment changes]

Common Weak Areas and Exam Traps

Weak area	Why it hurts	Review action
Treating all AI tasks as chat prompts	Many scenarios require search, tools, services, or workflow orchestration	Practice service and pattern selection
Ignoring identity and authorization	AI apps often expose private data or perform actions	Review managed identities, RBAC, secret handling, and security trimming
Confusing retrieval problems with model problems	A model cannot answer accurately from poor or missing context	Diagnose RAG failures separately from prompt failures
Overusing agents	Not every workflow needs autonomous planning	Decide between deterministic workflow, simple model call, and agent
Weak tool definitions	Ambiguous tools cause incorrect actions or repeated calls	Practice writing clear tool schemas and validation rules
No evaluation set	Demos hide edge cases	Build representative tests for prompts, retrieval, and safety
Missing observability	You cannot troubleshoot what you cannot trace	Know what to log and how to protect sensitive logs
Unsafe logging	Prompts and outputs may contain sensitive information	Apply data minimization and access control to telemetry
Assuming content filters solve everything	Safety is layered and context-dependent	Combine filters, grounding, permissions, validation, and review
Forgetting freshness	RAG apps can answer from stale indexes	Review ingestion, indexing, and update strategies
Ignoring output format reliability	Downstream apps may break on free-form text	Use structured outputs, validation, and fallback handling

Final-Week AI-103 Review Checklist

Architecture and Service Selection

For each practice scenario, identify the user goal, data sources, AI capability, security boundary, and output format.
Explain why you would use a model, an Azure AI service, Azure AI Search, an agent, or a deterministic workflow.
Review at least three end-to-end patterns: grounded chat, document extraction, and agent with tools.
Practice eliminating attractive but unsafe or overcomplicated answers.

RAG and Search

Review chunking, embeddings, vector search, keyword search, hybrid retrieval, filters, and citations.
Diagnose at least five RAG failure symptoms and match each to a likely fix.
Explain how identity-aware retrieval prevents unauthorized answers.
Know how evaluation questions reveal weak retrieval or weak prompting.

Agents

Review tool design, parameter validation, permissions, state, memory, confirmations, and audit logs.
Identify when a workflow should not be an agent.
Practice prompt-injection scenarios involving retrieved documents or user instructions.
Explain how to stop an agent from taking unsafe actions.

Security and Responsible AI

Review managed identity, RBAC, Key Vault-style secret handling, and network access concepts.
Identify sensitive data in prompts, files, retrieved context, outputs, and logs.
Explain layered safety: moderation, grounding, authorization, validation, monitoring, and human review.
Practice scenarios where safety and usefulness conflict.

Operations and Troubleshooting

Review what telemetry should be captured for model calls, search calls, and agent tool calls.
Practice troubleshooting authentication failures, wrong answers, missing documents, slow responses, and unsafe outputs.
Explain how prompt, model, index, tool, and configuration changes are versioned and rolled back.
Review cost and latency drivers without relying on memorized pricing or quota numbers.

Practical Next Step

Pick one weak area from each category: RAG, agents, security, Azure AI services, and monitoring. For each, complete a short practice cycle: read the concept, answer scenario questions, explain your decision out loud, then review why the alternatives are weaker. Use this checklist again after practice to confirm that you can make the decision, not just recognize the term.

Study Plan

Scenario Guide