AI-103 — Microsoft Azure AI Apps and Agents Developer Associate Exam Blueprint

Practical AI-103 exam blueprint for Microsoft Azure AI Apps and Agents Developer Associate candidates covering Azure AI apps, agents, RAG, security, monitoring, and final review.

How to Use This AI-103 Exam Blueprint

Use this checklist as a practical readiness map for the Microsoft Azure AI Apps and Agents Developer Associate (AI-103) exam from Microsoft. It is designed to help you translate the exam identity into review tasks: what to know, what to build, what to troubleshoot, and what decisions you should be able to justify.

This is not a list of exact official weights or scoring rules. Treat it as a final-review blueprint for the major readiness areas likely to matter when developing Azure-based AI apps and agents.

For each section, ask:

  • Can I explain the concept without notes?
  • Can I choose the right Azure service or pattern for a scenario?
  • Can I recognize the security, cost, quality, and operational tradeoffs?
  • Can I troubleshoot a broken or low-quality AI app workflow?

Topic-Area Readiness Table

Readiness areaWhat to reviewYou are ready when you can…
Azure AI app architectureApp flow, model access, API integration, user experience, backend services, orchestrationDesign an end-to-end AI app pattern from user request to model response, including error handling and observability
Generative AI foundationsPrompting, grounding, context windows, embeddings, model selection, temperature-like behavior, structured outputsExplain why a response changes, how to improve it, and when to use grounding instead of prompt-only answers
Azure AI Foundry and project workflowModel deployment, app development workflow, evaluation, responsible AI features, prompt iterationDescribe how a team would build, test, evaluate, and deploy an AI solution using Azure AI development tooling
Agents and tool useAgent instructions, tools/functions, actions, state, memory, retrieval, orchestration, human escalationDecide when an agent is appropriate and design safe tool-calling behavior
Retrieval-augmented generationData ingestion, chunking, embeddings, vector search, hybrid search, ranking, citations, freshnessBuild and troubleshoot a grounded chat or Q&A pattern over private data
Azure AI SearchIndexes, fields, analyzers, vector fields, filters, scoring/ranking, semantic capabilities, indexersChoose an indexing/search design for keyword, vector, hybrid, or filtered retrieval scenarios
Azure AI servicesLanguage, Vision, Speech, Document Intelligence, Translator, Content Safety, and related API patternsSelect the appropriate AI service for text, image, audio, document, moderation, or extraction scenarios
Responsible AI and safetyContent filtering, jailbreak and prompt injection risks, protected data, groundedness, transparency, review workflowsAdd practical controls to reduce unsafe, ungrounded, or unauthorized model behavior
Security and identityMicrosoft Entra ID, managed identities, RBAC, keys, secrets, Key Vault, network controlsChoose secure authentication and authorization patterns for AI apps and services
Data governanceData classification, access boundaries, logging, retention, data minimization, tenant/project separationIdentify where sensitive data can leak and how to reduce exposure
Deployment and integrationAPIs, containers, app hosting, CI/CD, configuration, model endpoint references, environment separationMove a prototype toward a maintainable dev/test/prod deployment
Monitoring and evaluationQuality metrics, telemetry, traces, latency, failures, token usage, retrieval quality, user feedbackDiagnose whether a problem is caused by prompts, retrieval, tools, model behavior, or infrastructure
TroubleshootingAuthentication errors, endpoint issues, poor search recall, hallucinations, unsafe outputs, latency, throttling symptomsWork from symptom to likely cause and choose a practical fix

Core “Can You Do This?” Checklist

Azure AI App Development Basics

  • Explain the difference between an AI model, a model deployment, an endpoint, an app, an agent, and a tool.
  • Describe a typical request flow for a chat, summarization, extraction, classification, or multimodal AI app.
  • Identify where configuration belongs: code, environment variables, managed configuration, Key Vault, or deployment settings.
  • Explain when to call an Azure AI service directly versus when to orchestrate multiple services.
  • Recognize when a prototype design is not production-ready because of missing security, monitoring, evaluation, or data controls.
  • Describe how retry behavior, timeouts, and fallback responses affect user experience.
  • Explain how latency, cost, response quality, and safety requirements influence model and architecture choices.

Generative AI and Prompting

  • Write clear system-level instructions that define role, boundaries, allowed sources, and output format.
  • Use prompt examples to guide style or classification behavior.
  • Separate user input from trusted instructions and retrieved context.
  • Ask for structured output when downstream code needs predictable fields.
  • Explain why a model may hallucinate when it lacks grounding or when retrieved context is weak.
  • Recognize when prompt tuning alone is insufficient and retrieval, tools, evaluation, or fine-tuning-like approaches should be considered.
  • Design a prompt that tells the model what to do when the answer is not present in the supplied context.

Agents and Tool-Calling

  • Explain when an agent is useful: multi-step tasks, tool use, planning, retrieval, or workflow orchestration.
  • Define a tool/function with clear inputs, outputs, validation, and failure behavior.
  • Distinguish between agent instructions and user instructions.
  • Limit what an agent can do through least-privilege tools.
  • Add confirmation or human review for risky actions.
  • Prevent the agent from treating untrusted retrieved text as system instructions.
  • Troubleshoot loops, repeated tool calls, missing parameters, and tool output misinterpretation.

Retrieval-Augmented Generation

  • Explain the purpose of chunking, embeddings, vector search, keyword search, and hybrid retrieval.
  • Choose appropriate metadata fields for filtering, authorization, freshness, and citations.
  • Recognize symptoms of poor retrieval: irrelevant chunks, missing key documents, stale content, weak citations, or inconsistent answers.
  • Explain the difference between improving retrieval and changing the generation prompt.
  • Design a grounded answer format that includes citations or source references when needed.
  • Evaluate retrieval quality with representative questions, not only happy-path demos.

Azure AI Services Integration

  • Select the right service pattern for language understanding, document extraction, image analysis, speech, translation, or moderation.
  • Explain synchronous versus asynchronous processing patterns for larger or longer-running tasks.
  • Handle API credentials or identities securely.
  • Parse service responses into application-ready data structures.
  • Recognize when a prebuilt model/service is sufficient versus when a custom workflow is needed.
  • Implement error handling for invalid inputs, unsupported formats, authentication failures, and service-side failures.

Service and Pattern Selection Checks

Scenario cueStrong answer patternWatch for this trap
“Answer questions from company documents”RAG with Azure AI Search, embeddings, metadata filters, citations, and grounded prompt instructionsRelying on a generic model prompt without retrieval
“Perform actions in another system”Agent or orchestration layer with constrained tools, validation, permissions, and audit logsLetting the model directly decide unrestricted actions
“Extract fields from invoices, forms, or PDFs”Document Intelligence or document-processing workflow, followed by validation and business rulesTreating all document extraction as plain chat completion
“Moderate user-generated content”Content safety checks before or after generation, depending on riskAssuming model instructions alone are a safety control
“Search legal or policy documents with exact terms and semantic meaning”Hybrid search with keyword, vector, filters, and ranking considerationsUsing only vector search and losing exact-term behavior
“Support multiple departments with different document access”Security trimming, metadata filters, identity-aware retrieval, and authorization checksIndexing all content together without access boundaries
“Reduce hallucinations”Improve grounding, prompt constraints, citations, evaluation, and refusal behaviorOnly lowering randomness or adding “be accurate” to the prompt
“Improve slow responses”Inspect retrieval time, model latency, tool calls, payload size, and streaming optionsBlaming the model before measuring the full request path
“Move from demo to production”Add identity, secrets management, logging, monitoring, evaluation, deployment automation, and rollback strategyShipping notebook or portal-only configuration as the production design

Agent Readiness Checklist

Agents are likely to test judgment, not just definitions. Be ready to reason about what the agent is allowed to know, decide, and do.

Agent design elementReview questions
InstructionsAre role, task, boundaries, source priority, and refusal rules clear?
ToolsAre tool names, descriptions, parameters, and return values unambiguous?
PermissionsDoes each tool have only the access needed for its task?
StateWhat conversation or task state is retained, and for how long?
MemoryIs memory necessary, user-approved, scoped, and safe?
RetrievalDoes the agent retrieve authoritative information before answering?
ValidationAre tool inputs validated before execution?
ConfirmationAre destructive, expensive, or external actions confirmed?
Error handlingWhat happens if a tool fails, returns partial data, or times out?
ObservabilityCan you trace the plan, tool calls, retrieved context, and final answer?
SafetyCan the agent resist prompt injection embedded in documents, emails, or web content?

Agent Decision Prompts

Can you answer these without guessing?

  • Should this be a simple chat app, a workflow, or an agent?
  • Which tools should be available to the agent, and which should remain unavailable?
  • What tool calls require human approval?
  • What should the agent do if two tools return conflicting information?
  • How should the app record tool calls for debugging and audit?
  • How do you prevent user text from overriding system or developer instructions?
  • How do you prevent retrieved documents from becoming hidden instructions?

RAG and Azure AI Search Checklist

Retrieval Design

Design choiceWhat to know
ChunkingSmaller chunks can improve precision but may lose context; larger chunks preserve context but may retrieve irrelevant text
EmbeddingsUsed to represent meaning for vector similarity search
Keyword searchUseful for exact terms, identifiers, product names, legal phrases, and codes
Vector searchUseful for semantic similarity and natural-language queries
Hybrid searchCombines keyword and vector approaches for many enterprise search scenarios
Metadata filtersSupport category filtering, security trimming, document type, freshness, geography, owner, or business unit
RankingDetermines which retrieved passages are sent to the model
CitationsHelp users verify grounded answers
Index refreshDetermines whether the app answers from current or stale content
Evaluation setA representative set of questions and expected source documents

RAG Failure Diagnosis

SymptomLikely area to inspect
Answer is fluent but unsupportedPrompt grounding rules, retrieved context quality, citation enforcement
Correct document is not retrievedIndex coverage, chunking, metadata filters, embedding strategy, query formulation
Retrieved chunks are too broadChunk size, overlap, field selection, ranking
User sees data they should not accessAuthorization, security trimming, metadata filters, identity propagation
Good answer in test but bad in productionContent freshness, user query diversity, missing evaluation cases, environment drift
Slow responsesIndex query latency, number of retrieved chunks, model latency, tool calls, payload size
Poor exact-match resultsKeyword fields, analyzers, filters, hybrid search design

Retrieval Quality Metrics to Recognize

Precision and recall are useful concepts when reviewing search and retrieval quality.

\[ \text{Precision} = \frac{\text{relevant retrieved items}}{\text{all retrieved items}} \]\[ \text{Recall} = \frac{\text{relevant retrieved items}}{\text{all relevant available items}} \]

You do not need to turn every AI app into a formal information retrieval project, but you should understand the tradeoff: high precision reduces irrelevant context, while high recall reduces missed evidence.

Prompt, Output, and Grounding Checklist

Prompt concernReadiness check
Instruction hierarchyCan you distinguish trusted system instructions from user input and retrieved content?
Context boundariesCan you clearly mark retrieved context so the model knows what to use?
Refusal behaviorCan you instruct the model to say it does not know when context is insufficient?
Output schemaCan you request JSON or another predictable structure for app integration?
Few-shot examplesCan you use examples to stabilize tone, classification, or extraction?
GroundingCan you restrict answers to supplied sources when required?
Prompt injectionCan you identify malicious instructions inside user input or retrieved documents?
EvaluationCan you compare outputs against expected behavior across multiple test cases?

Example pseudocode flow for a grounded chat app:

receive user question
authenticate user
rewrite or normalize query if needed
retrieve authorized context from search index
build prompt with system instructions + retrieved context + user question
call model endpoint
validate output for safety and format
return answer with citations
log telemetry for quality and troubleshooting

Security, Identity, and Governance Checklist

AI-103 preparation should include security judgment. Many exam scenarios are less about “can the model answer?” and more about “is this design safe and maintainable?”

AreaBe ready to decide…
AuthenticationWhether the app should use managed identity, user delegation, service principal, or key-based access
AuthorizationHow users, apps, and agents are allowed to access resources and data
SecretsWhere keys, connection strings, and endpoint secrets should be stored
RBACWhich identities need access to AI services, search, storage, monitoring, or deployment resources
Network accessWhether public endpoints, private connectivity, firewall rules, or network restrictions are needed
Data protectionHow sensitive input, output, prompts, files, and logs are handled
Tenant/project separationHow to isolate dev/test/prod or different business units
AuditabilityWhether prompts, retrieval results, tool calls, and actions can be reviewed
Compliance supportHow the app supports organizational policy without claiming unsupported guarantees
Least privilegeHow to avoid giving an agent or app broad access “just to make it work”

Security Scenario Cues

  • A user should only query documents they are permitted to read.
  • An agent can create tickets but must not delete records.
  • A developer needs local testing without committing secrets.
  • Logs are useful for debugging but may contain sensitive prompts or outputs.
  • Retrieved documents may contain malicious instructions.
  • A model response must not expose hidden system prompts or internal configuration.
  • Production and development deployments require separate identities and settings.

Responsible AI and Safety Checklist

RiskPractical control to review
Harmful user inputInput moderation, safe completion strategy, refusal behavior
Harmful model outputOutput filtering, review workflow, safety evaluation
HallucinationGrounding, citations, retrieval evaluation, refusal when unsupported
Prompt injectionInstruction hierarchy, content isolation, tool restrictions, detection patterns
Data leakageData minimization, access control, logging controls, redaction where appropriate
OverrelianceUser-facing caveats, source links, confidence handling, human review
Biased or unfair outputRepresentative tests, review of high-impact use cases, human oversight
Unsafe tool executionConfirmation steps, allowlists, validation, audit logs

Be ready to explain that safety is layered. Prompt instructions help, but they are not a substitute for authorization, validation, monitoring, and controlled tool design.

Azure AI Services Readiness Map

CapabilityWhat to reviewExample readiness question
LanguageClassification, extraction, summarization, sentiment, entity recognition, conversational language patternsWhich service or model pattern best fits a text-understanding task?
SpeechSpeech-to-text, text-to-speech, translation or transcription workflowsHow would you process audio input before sending text to another AI workflow?
VisionImage analysis, OCR-related scenarios, multimodal prompts where applicableWhen should you use a vision service versus a general multimodal model?
Document processingLayout, tables, key-value extraction, document classification, validationHow do you handle low-confidence or missing fields?
TranslationText translation and multilingual app considerationsWhere should translation happen in a multi-step AI workflow?
Content safetyInput/output moderation and risk reductionWhat should happen when content violates policy?
SearchIndexing, retrieval, vector and hybrid searchHow do you ground model answers in private data?

Deployment, Operations, and Monitoring Checklist

Operational areaWhat “ready” means
ConfigurationYou can explain which settings differ by environment and how they are injected securely
VersioningYou can track prompt versions, app versions, model deployment references, and index schema changes
CI/CDYou understand how code, infrastructure, prompts, and tests move through environments
TelemetryYou can capture request IDs, latency, failures, retrieval details, model calls, and tool calls
EvaluationYou can run test sets before and after prompt, retrieval, or model changes
RollbackYou can revert a bad prompt, model deployment reference, tool change, or index update
Cost awarenessYou can identify drivers such as model calls, retrieved context size, embeddings, indexing, and repeated agent tool calls
ReliabilityYou can design retries, timeouts, fallback responses, and graceful degradation
Incident reviewYou can inspect logs and traces without exposing sensitive data unnecessarily

Troubleshooting Decision Path

Use this flow when a scenario describes a poor answer, failed request, or unreliable agent.

    flowchart TD
	    A[Symptom reported] --> B{Request fails?}
	    B -->|Yes| C[Check auth, endpoint, deployment, quota symptoms, network, payload format]
	    B -->|No| D{Answer unsupported or wrong?}
	    D -->|Yes| E[Inspect retrieved context, prompt, model settings, citations, evaluation cases]
	    D -->|No| F{Tool or agent issue?}
	    F -->|Yes| G[Check tool schema, permissions, validation, state, loop behavior, confirmations]
	    F -->|No| H{Performance issue?}
	    H -->|Yes| I[Measure retrieval time, model latency, tool calls, payload size, streaming]
	    H -->|No| J[Review telemetry, user input, edge cases, environment changes]

Common Weak Areas and Exam Traps

Weak areaWhy it hurtsReview action
Treating all AI tasks as chat promptsMany scenarios require search, tools, services, or workflow orchestrationPractice service and pattern selection
Ignoring identity and authorizationAI apps often expose private data or perform actionsReview managed identities, RBAC, secret handling, and security trimming
Confusing retrieval problems with model problemsA model cannot answer accurately from poor or missing contextDiagnose RAG failures separately from prompt failures
Overusing agentsNot every workflow needs autonomous planningDecide between deterministic workflow, simple model call, and agent
Weak tool definitionsAmbiguous tools cause incorrect actions or repeated callsPractice writing clear tool schemas and validation rules
No evaluation setDemos hide edge casesBuild representative tests for prompts, retrieval, and safety
Missing observabilityYou cannot troubleshoot what you cannot traceKnow what to log and how to protect sensitive logs
Unsafe loggingPrompts and outputs may contain sensitive informationApply data minimization and access control to telemetry
Assuming content filters solve everythingSafety is layered and context-dependentCombine filters, grounding, permissions, validation, and review
Forgetting freshnessRAG apps can answer from stale indexesReview ingestion, indexing, and update strategies
Ignoring output format reliabilityDownstream apps may break on free-form textUse structured outputs, validation, and fallback handling

Final-Week AI-103 Review Checklist

Architecture and Service Selection

  • For each practice scenario, identify the user goal, data sources, AI capability, security boundary, and output format.
  • Explain why you would use a model, an Azure AI service, Azure AI Search, an agent, or a deterministic workflow.
  • Review at least three end-to-end patterns: grounded chat, document extraction, and agent with tools.
  • Practice eliminating attractive but unsafe or overcomplicated answers.
  • Review chunking, embeddings, vector search, keyword search, hybrid retrieval, filters, and citations.
  • Diagnose at least five RAG failure symptoms and match each to a likely fix.
  • Explain how identity-aware retrieval prevents unauthorized answers.
  • Know how evaluation questions reveal weak retrieval or weak prompting.

Agents

  • Review tool design, parameter validation, permissions, state, memory, confirmations, and audit logs.
  • Identify when a workflow should not be an agent.
  • Practice prompt-injection scenarios involving retrieved documents or user instructions.
  • Explain how to stop an agent from taking unsafe actions.

Security and Responsible AI

  • Review managed identity, RBAC, Key Vault-style secret handling, and network access concepts.
  • Identify sensitive data in prompts, files, retrieved context, outputs, and logs.
  • Explain layered safety: moderation, grounding, authorization, validation, monitoring, and human review.
  • Practice scenarios where safety and usefulness conflict.

Operations and Troubleshooting

  • Review what telemetry should be captured for model calls, search calls, and agent tool calls.
  • Practice troubleshooting authentication failures, wrong answers, missing documents, slow responses, and unsafe outputs.
  • Explain how prompt, model, index, tool, and configuration changes are versioned and rolled back.
  • Review cost and latency drivers without relying on memorized pricing or quota numbers.

Practical Next Step

Pick one weak area from each category: RAG, agents, security, Azure AI services, and monitoring. For each, complete a short practice cycle: read the concept, answer scenario questions, explain your decision out loud, then review why the alternatives are weaker. Use this checklist again after practice to confirm that you can make the decision, not just recognize the term.

Browse Certification Practice Tests by Exam Family