AI-103 — Microsoft Azure AI Apps and Agents Developer Associate Exam Blueprint
Practical AI-103 exam blueprint for Microsoft Azure AI Apps and Agents Developer Associate candidates covering Azure AI apps, agents, RAG, security, monitoring, and final review.
How to Use This AI-103 Exam Blueprint
Use this checklist as a practical readiness map for the Microsoft Azure AI Apps and Agents Developer Associate (AI-103) exam from Microsoft. It is designed to help you translate the exam identity into review tasks: what to know, what to build, what to troubleshoot, and what decisions you should be able to justify.
This is not a list of exact official weights or scoring rules. Treat it as a final-review blueprint for the major readiness areas likely to matter when developing Azure-based AI apps and agents.
For each section, ask:
- Can I explain the concept without notes?
- Can I choose the right Azure service or pattern for a scenario?
- Can I recognize the security, cost, quality, and operational tradeoffs?
- Can I troubleshoot a broken or low-quality AI app workflow?
Topic-Area Readiness Table
| Readiness area | What to review | You are ready when you can… |
|---|---|---|
| Azure AI app architecture | App flow, model access, API integration, user experience, backend services, orchestration | Design an end-to-end AI app pattern from user request to model response, including error handling and observability |
| Generative AI foundations | Prompting, grounding, context windows, embeddings, model selection, temperature-like behavior, structured outputs | Explain why a response changes, how to improve it, and when to use grounding instead of prompt-only answers |
| Azure AI Foundry and project workflow | Model deployment, app development workflow, evaluation, responsible AI features, prompt iteration | Describe how a team would build, test, evaluate, and deploy an AI solution using Azure AI development tooling |
| Agents and tool use | Agent instructions, tools/functions, actions, state, memory, retrieval, orchestration, human escalation | Decide when an agent is appropriate and design safe tool-calling behavior |
| Retrieval-augmented generation | Data ingestion, chunking, embeddings, vector search, hybrid search, ranking, citations, freshness | Build and troubleshoot a grounded chat or Q&A pattern over private data |
| Azure AI Search | Indexes, fields, analyzers, vector fields, filters, scoring/ranking, semantic capabilities, indexers | Choose an indexing/search design for keyword, vector, hybrid, or filtered retrieval scenarios |
| Azure AI services | Language, Vision, Speech, Document Intelligence, Translator, Content Safety, and related API patterns | Select the appropriate AI service for text, image, audio, document, moderation, or extraction scenarios |
| Responsible AI and safety | Content filtering, jailbreak and prompt injection risks, protected data, groundedness, transparency, review workflows | Add practical controls to reduce unsafe, ungrounded, or unauthorized model behavior |
| Security and identity | Microsoft Entra ID, managed identities, RBAC, keys, secrets, Key Vault, network controls | Choose secure authentication and authorization patterns for AI apps and services |
| Data governance | Data classification, access boundaries, logging, retention, data minimization, tenant/project separation | Identify where sensitive data can leak and how to reduce exposure |
| Deployment and integration | APIs, containers, app hosting, CI/CD, configuration, model endpoint references, environment separation | Move a prototype toward a maintainable dev/test/prod deployment |
| Monitoring and evaluation | Quality metrics, telemetry, traces, latency, failures, token usage, retrieval quality, user feedback | Diagnose whether a problem is caused by prompts, retrieval, tools, model behavior, or infrastructure |
| Troubleshooting | Authentication errors, endpoint issues, poor search recall, hallucinations, unsafe outputs, latency, throttling symptoms | Work from symptom to likely cause and choose a practical fix |
Core “Can You Do This?” Checklist
Azure AI App Development Basics
- Explain the difference between an AI model, a model deployment, an endpoint, an app, an agent, and a tool.
- Describe a typical request flow for a chat, summarization, extraction, classification, or multimodal AI app.
- Identify where configuration belongs: code, environment variables, managed configuration, Key Vault, or deployment settings.
- Explain when to call an Azure AI service directly versus when to orchestrate multiple services.
- Recognize when a prototype design is not production-ready because of missing security, monitoring, evaluation, or data controls.
- Describe how retry behavior, timeouts, and fallback responses affect user experience.
- Explain how latency, cost, response quality, and safety requirements influence model and architecture choices.
Generative AI and Prompting
- Write clear system-level instructions that define role, boundaries, allowed sources, and output format.
- Use prompt examples to guide style or classification behavior.
- Separate user input from trusted instructions and retrieved context.
- Ask for structured output when downstream code needs predictable fields.
- Explain why a model may hallucinate when it lacks grounding or when retrieved context is weak.
- Recognize when prompt tuning alone is insufficient and retrieval, tools, evaluation, or fine-tuning-like approaches should be considered.
- Design a prompt that tells the model what to do when the answer is not present in the supplied context.
Agents and Tool-Calling
- Explain when an agent is useful: multi-step tasks, tool use, planning, retrieval, or workflow orchestration.
- Define a tool/function with clear inputs, outputs, validation, and failure behavior.
- Distinguish between agent instructions and user instructions.
- Limit what an agent can do through least-privilege tools.
- Add confirmation or human review for risky actions.
- Prevent the agent from treating untrusted retrieved text as system instructions.
- Troubleshoot loops, repeated tool calls, missing parameters, and tool output misinterpretation.
Retrieval-Augmented Generation
- Explain the purpose of chunking, embeddings, vector search, keyword search, and hybrid retrieval.
- Choose appropriate metadata fields for filtering, authorization, freshness, and citations.
- Recognize symptoms of poor retrieval: irrelevant chunks, missing key documents, stale content, weak citations, or inconsistent answers.
- Explain the difference between improving retrieval and changing the generation prompt.
- Design a grounded answer format that includes citations or source references when needed.
- Evaluate retrieval quality with representative questions, not only happy-path demos.
Azure AI Services Integration
- Select the right service pattern for language understanding, document extraction, image analysis, speech, translation, or moderation.
- Explain synchronous versus asynchronous processing patterns for larger or longer-running tasks.
- Handle API credentials or identities securely.
- Parse service responses into application-ready data structures.
- Recognize when a prebuilt model/service is sufficient versus when a custom workflow is needed.
- Implement error handling for invalid inputs, unsupported formats, authentication failures, and service-side failures.
Service and Pattern Selection Checks
| Scenario cue | Strong answer pattern | Watch for this trap |
|---|---|---|
| “Answer questions from company documents” | RAG with Azure AI Search, embeddings, metadata filters, citations, and grounded prompt instructions | Relying on a generic model prompt without retrieval |
| “Perform actions in another system” | Agent or orchestration layer with constrained tools, validation, permissions, and audit logs | Letting the model directly decide unrestricted actions |
| “Extract fields from invoices, forms, or PDFs” | Document Intelligence or document-processing workflow, followed by validation and business rules | Treating all document extraction as plain chat completion |
| “Moderate user-generated content” | Content safety checks before or after generation, depending on risk | Assuming model instructions alone are a safety control |
| “Search legal or policy documents with exact terms and semantic meaning” | Hybrid search with keyword, vector, filters, and ranking considerations | Using only vector search and losing exact-term behavior |
| “Support multiple departments with different document access” | Security trimming, metadata filters, identity-aware retrieval, and authorization checks | Indexing all content together without access boundaries |
| “Reduce hallucinations” | Improve grounding, prompt constraints, citations, evaluation, and refusal behavior | Only lowering randomness or adding “be accurate” to the prompt |
| “Improve slow responses” | Inspect retrieval time, model latency, tool calls, payload size, and streaming options | Blaming the model before measuring the full request path |
| “Move from demo to production” | Add identity, secrets management, logging, monitoring, evaluation, deployment automation, and rollback strategy | Shipping notebook or portal-only configuration as the production design |
Agent Readiness Checklist
Agents are likely to test judgment, not just definitions. Be ready to reason about what the agent is allowed to know, decide, and do.
| Agent design element | Review questions |
|---|---|
| Instructions | Are role, task, boundaries, source priority, and refusal rules clear? |
| Tools | Are tool names, descriptions, parameters, and return values unambiguous? |
| Permissions | Does each tool have only the access needed for its task? |
| State | What conversation or task state is retained, and for how long? |
| Memory | Is memory necessary, user-approved, scoped, and safe? |
| Retrieval | Does the agent retrieve authoritative information before answering? |
| Validation | Are tool inputs validated before execution? |
| Confirmation | Are destructive, expensive, or external actions confirmed? |
| Error handling | What happens if a tool fails, returns partial data, or times out? |
| Observability | Can you trace the plan, tool calls, retrieved context, and final answer? |
| Safety | Can the agent resist prompt injection embedded in documents, emails, or web content? |
Agent Decision Prompts
Can you answer these without guessing?
- Should this be a simple chat app, a workflow, or an agent?
- Which tools should be available to the agent, and which should remain unavailable?
- What tool calls require human approval?
- What should the agent do if two tools return conflicting information?
- How should the app record tool calls for debugging and audit?
- How do you prevent user text from overriding system or developer instructions?
- How do you prevent retrieved documents from becoming hidden instructions?
RAG and Azure AI Search Checklist
Retrieval Design
| Design choice | What to know |
|---|---|
| Chunking | Smaller chunks can improve precision but may lose context; larger chunks preserve context but may retrieve irrelevant text |
| Embeddings | Used to represent meaning for vector similarity search |
| Keyword search | Useful for exact terms, identifiers, product names, legal phrases, and codes |
| Vector search | Useful for semantic similarity and natural-language queries |
| Hybrid search | Combines keyword and vector approaches for many enterprise search scenarios |
| Metadata filters | Support category filtering, security trimming, document type, freshness, geography, owner, or business unit |
| Ranking | Determines which retrieved passages are sent to the model |
| Citations | Help users verify grounded answers |
| Index refresh | Determines whether the app answers from current or stale content |
| Evaluation set | A representative set of questions and expected source documents |
RAG Failure Diagnosis
| Symptom | Likely area to inspect |
|---|---|
| Answer is fluent but unsupported | Prompt grounding rules, retrieved context quality, citation enforcement |
| Correct document is not retrieved | Index coverage, chunking, metadata filters, embedding strategy, query formulation |
| Retrieved chunks are too broad | Chunk size, overlap, field selection, ranking |
| User sees data they should not access | Authorization, security trimming, metadata filters, identity propagation |
| Good answer in test but bad in production | Content freshness, user query diversity, missing evaluation cases, environment drift |
| Slow responses | Index query latency, number of retrieved chunks, model latency, tool calls, payload size |
| Poor exact-match results | Keyword fields, analyzers, filters, hybrid search design |
Retrieval Quality Metrics to Recognize
Precision and recall are useful concepts when reviewing search and retrieval quality.
\[ \text{Precision} = \frac{\text{relevant retrieved items}}{\text{all retrieved items}} \]\[ \text{Recall} = \frac{\text{relevant retrieved items}}{\text{all relevant available items}} \]You do not need to turn every AI app into a formal information retrieval project, but you should understand the tradeoff: high precision reduces irrelevant context, while high recall reduces missed evidence.
Prompt, Output, and Grounding Checklist
| Prompt concern | Readiness check |
|---|---|
| Instruction hierarchy | Can you distinguish trusted system instructions from user input and retrieved content? |
| Context boundaries | Can you clearly mark retrieved context so the model knows what to use? |
| Refusal behavior | Can you instruct the model to say it does not know when context is insufficient? |
| Output schema | Can you request JSON or another predictable structure for app integration? |
| Few-shot examples | Can you use examples to stabilize tone, classification, or extraction? |
| Grounding | Can you restrict answers to supplied sources when required? |
| Prompt injection | Can you identify malicious instructions inside user input or retrieved documents? |
| Evaluation | Can you compare outputs against expected behavior across multiple test cases? |
Example pseudocode flow for a grounded chat app:
receive user question
authenticate user
rewrite or normalize query if needed
retrieve authorized context from search index
build prompt with system instructions + retrieved context + user question
call model endpoint
validate output for safety and format
return answer with citations
log telemetry for quality and troubleshooting
Security, Identity, and Governance Checklist
AI-103 preparation should include security judgment. Many exam scenarios are less about “can the model answer?” and more about “is this design safe and maintainable?”
| Area | Be ready to decide… |
|---|---|
| Authentication | Whether the app should use managed identity, user delegation, service principal, or key-based access |
| Authorization | How users, apps, and agents are allowed to access resources and data |
| Secrets | Where keys, connection strings, and endpoint secrets should be stored |
| RBAC | Which identities need access to AI services, search, storage, monitoring, or deployment resources |
| Network access | Whether public endpoints, private connectivity, firewall rules, or network restrictions are needed |
| Data protection | How sensitive input, output, prompts, files, and logs are handled |
| Tenant/project separation | How to isolate dev/test/prod or different business units |
| Auditability | Whether prompts, retrieval results, tool calls, and actions can be reviewed |
| Compliance support | How the app supports organizational policy without claiming unsupported guarantees |
| Least privilege | How to avoid giving an agent or app broad access “just to make it work” |
Security Scenario Cues
- A user should only query documents they are permitted to read.
- An agent can create tickets but must not delete records.
- A developer needs local testing without committing secrets.
- Logs are useful for debugging but may contain sensitive prompts or outputs.
- Retrieved documents may contain malicious instructions.
- A model response must not expose hidden system prompts or internal configuration.
- Production and development deployments require separate identities and settings.
Responsible AI and Safety Checklist
| Risk | Practical control to review |
|---|---|
| Harmful user input | Input moderation, safe completion strategy, refusal behavior |
| Harmful model output | Output filtering, review workflow, safety evaluation |
| Hallucination | Grounding, citations, retrieval evaluation, refusal when unsupported |
| Prompt injection | Instruction hierarchy, content isolation, tool restrictions, detection patterns |
| Data leakage | Data minimization, access control, logging controls, redaction where appropriate |
| Overreliance | User-facing caveats, source links, confidence handling, human review |
| Biased or unfair output | Representative tests, review of high-impact use cases, human oversight |
| Unsafe tool execution | Confirmation steps, allowlists, validation, audit logs |
Be ready to explain that safety is layered. Prompt instructions help, but they are not a substitute for authorization, validation, monitoring, and controlled tool design.
Azure AI Services Readiness Map
| Capability | What to review | Example readiness question |
|---|---|---|
| Language | Classification, extraction, summarization, sentiment, entity recognition, conversational language patterns | Which service or model pattern best fits a text-understanding task? |
| Speech | Speech-to-text, text-to-speech, translation or transcription workflows | How would you process audio input before sending text to another AI workflow? |
| Vision | Image analysis, OCR-related scenarios, multimodal prompts where applicable | When should you use a vision service versus a general multimodal model? |
| Document processing | Layout, tables, key-value extraction, document classification, validation | How do you handle low-confidence or missing fields? |
| Translation | Text translation and multilingual app considerations | Where should translation happen in a multi-step AI workflow? |
| Content safety | Input/output moderation and risk reduction | What should happen when content violates policy? |
| Search | Indexing, retrieval, vector and hybrid search | How do you ground model answers in private data? |
Deployment, Operations, and Monitoring Checklist
| Operational area | What “ready” means |
|---|---|
| Configuration | You can explain which settings differ by environment and how they are injected securely |
| Versioning | You can track prompt versions, app versions, model deployment references, and index schema changes |
| CI/CD | You understand how code, infrastructure, prompts, and tests move through environments |
| Telemetry | You can capture request IDs, latency, failures, retrieval details, model calls, and tool calls |
| Evaluation | You can run test sets before and after prompt, retrieval, or model changes |
| Rollback | You can revert a bad prompt, model deployment reference, tool change, or index update |
| Cost awareness | You can identify drivers such as model calls, retrieved context size, embeddings, indexing, and repeated agent tool calls |
| Reliability | You can design retries, timeouts, fallback responses, and graceful degradation |
| Incident review | You can inspect logs and traces without exposing sensitive data unnecessarily |
Troubleshooting Decision Path
Use this flow when a scenario describes a poor answer, failed request, or unreliable agent.
flowchart TD
A[Symptom reported] --> B{Request fails?}
B -->|Yes| C[Check auth, endpoint, deployment, quota symptoms, network, payload format]
B -->|No| D{Answer unsupported or wrong?}
D -->|Yes| E[Inspect retrieved context, prompt, model settings, citations, evaluation cases]
D -->|No| F{Tool or agent issue?}
F -->|Yes| G[Check tool schema, permissions, validation, state, loop behavior, confirmations]
F -->|No| H{Performance issue?}
H -->|Yes| I[Measure retrieval time, model latency, tool calls, payload size, streaming]
H -->|No| J[Review telemetry, user input, edge cases, environment changes]
Common Weak Areas and Exam Traps
| Weak area | Why it hurts | Review action |
|---|---|---|
| Treating all AI tasks as chat prompts | Many scenarios require search, tools, services, or workflow orchestration | Practice service and pattern selection |
| Ignoring identity and authorization | AI apps often expose private data or perform actions | Review managed identities, RBAC, secret handling, and security trimming |
| Confusing retrieval problems with model problems | A model cannot answer accurately from poor or missing context | Diagnose RAG failures separately from prompt failures |
| Overusing agents | Not every workflow needs autonomous planning | Decide between deterministic workflow, simple model call, and agent |
| Weak tool definitions | Ambiguous tools cause incorrect actions or repeated calls | Practice writing clear tool schemas and validation rules |
| No evaluation set | Demos hide edge cases | Build representative tests for prompts, retrieval, and safety |
| Missing observability | You cannot troubleshoot what you cannot trace | Know what to log and how to protect sensitive logs |
| Unsafe logging | Prompts and outputs may contain sensitive information | Apply data minimization and access control to telemetry |
| Assuming content filters solve everything | Safety is layered and context-dependent | Combine filters, grounding, permissions, validation, and review |
| Forgetting freshness | RAG apps can answer from stale indexes | Review ingestion, indexing, and update strategies |
| Ignoring output format reliability | Downstream apps may break on free-form text | Use structured outputs, validation, and fallback handling |
Final-Week AI-103 Review Checklist
Architecture and Service Selection
- For each practice scenario, identify the user goal, data sources, AI capability, security boundary, and output format.
- Explain why you would use a model, an Azure AI service, Azure AI Search, an agent, or a deterministic workflow.
- Review at least three end-to-end patterns: grounded chat, document extraction, and agent with tools.
- Practice eliminating attractive but unsafe or overcomplicated answers.
RAG and Search
- Review chunking, embeddings, vector search, keyword search, hybrid retrieval, filters, and citations.
- Diagnose at least five RAG failure symptoms and match each to a likely fix.
- Explain how identity-aware retrieval prevents unauthorized answers.
- Know how evaluation questions reveal weak retrieval or weak prompting.
Agents
- Review tool design, parameter validation, permissions, state, memory, confirmations, and audit logs.
- Identify when a workflow should not be an agent.
- Practice prompt-injection scenarios involving retrieved documents or user instructions.
- Explain how to stop an agent from taking unsafe actions.
Security and Responsible AI
- Review managed identity, RBAC, Key Vault-style secret handling, and network access concepts.
- Identify sensitive data in prompts, files, retrieved context, outputs, and logs.
- Explain layered safety: moderation, grounding, authorization, validation, monitoring, and human review.
- Practice scenarios where safety and usefulness conflict.
Operations and Troubleshooting
- Review what telemetry should be captured for model calls, search calls, and agent tool calls.
- Practice troubleshooting authentication failures, wrong answers, missing documents, slow responses, and unsafe outputs.
- Explain how prompt, model, index, tool, and configuration changes are versioned and rolled back.
- Review cost and latency drivers without relying on memorized pricing or quota numbers.
Practical Next Step
Pick one weak area from each category: RAG, agents, security, Azure AI services, and monitoring. For each, complete a short practice cycle: read the concept, answer scenario questions, explain your decision out loud, then review why the alternatives are weaker. Use this checklist again after practice to confirm that you can make the decision, not just recognize the term.