AI-103 — Microsoft Azure AI Apps and Agents Developer Associate Quick Review

Last revised: June 18, 2026

Concise AI-103 review for Microsoft Azure AI apps and agents: Azure AI Foundry, agents, RAG, search, safety, deployment, and practice focus.

Quick Review purpose

This Quick Review is for candidates preparing for Microsoft Azure AI Apps and Agents Developer Associate (AI-103), exam code AI-103. It is an independent review aid, not a Microsoft document, and is designed to help you refresh the highest-yield concepts before using topic drills, mock exams, original practice questions, and detailed explanations.

Use it to check whether you can make the right design choice under exam-style wording: which Azure AI capability to use, how to ground a generative AI app, how to secure an agent, how to troubleshoot retrieval quality, and how to evaluate and monitor an AI solution.

Fast exam orientation

For AI-103, expect practical developer scenarios rather than isolated definitions. A strong candidate can connect Azure AI services, model deployments, search, agents, security, safety, and operations into working application patterns.

Area	What to be ready to decide	Common exam angle
Azure AI app architecture	How the app uses models, data, tools, identity, safety, and telemetry	Choose the missing component in an app design
Azure AI Foundry and model use	Select, deploy, test, evaluate, and monitor models	Distinguish model selection from prompt, RAG, or fine-tuning decisions
Agents and tool use	Build agents that call functions, use knowledge, maintain context, and respect guardrails	Know that tool calls must be validated and executed by application logic
Retrieval-augmented generation	Ingest, chunk, embed, index, retrieve, rerank, ground, and cite source content	Fix hallucination, irrelevant retrieval, stale indexes, or poor chunking
Azure AI Search	Use keyword, vector, hybrid, semantic ranking, filters, facets, and index schemas	Pick the right query and index configuration
Azure AI services	Apply language, speech, vision, document intelligence, translation, and safety services	Select the purpose-built service instead of forcing a generative model
Responsible AI and security	Protect users, data, tools, prompts, outputs, and infrastructure	Avoid treating prompts or content filters as authorization controls
Deployment and operations	Handle latency, throttling, retries, monitoring, evaluation, and cost	Diagnose runtime errors and quality regressions

Core architecture pattern for Azure AI apps

Most Azure AI apps and agents can be reviewed as a layered system.

Layer	Main responsibility	Azure-focused examples	Candidate trap
User/application layer	Collect input, authenticate users, render output	Web app, API, bot, mobile app	Letting anonymous or unauthorized users reach privileged tools
Orchestration layer	Decide prompt flow, agent steps, tool calls, memory, and validation	App code, Semantic Kernel-style orchestration, Azure AI Foundry app assets	Assuming the model executes tools automatically
Model layer	Generate, classify, summarize, reason, embed, or process multimodal inputs	Azure OpenAI or other models available through Azure AI Foundry	Using a larger model when a smaller model or purpose-built service is sufficient
Grounding layer	Provide trusted enterprise context	Azure AI Search, databases, Blob Storage, knowledge sources	Passing too much irrelevant context and increasing hallucination risk
Tool/action layer	Execute deterministic operations	APIs, functions, workflows, databases, line-of-business systems	Failing to validate arguments, permissions, and side effects
Safety layer	Detect, filter, validate, and review risky content/actions	Azure AI Content Safety, groundedness checks, custom validators, human review	Treating safety filters as complete governance
Security layer	Protect access, secrets, data, and network paths	Microsoft Entra ID, managed identities, RBAC, Key Vault, private endpoints	Storing API keys in code or client-side apps
Operations layer	Observe quality, cost, latency, errors, drift, and abuse	Application Insights, Azure Monitor, evaluation reports, logs	Monitoring only uptime but not AI quality

    flowchart LR
	    A[User request] --> B[Authenticate and authorize]
	    B --> C[Classify intent and risk]
	    C --> D{Need enterprise context?}
	    D -- Yes --> E[Retrieve from Azure AI Search or data source]
	    D -- No --> F[Build prompt or agent state]
	    E --> F
	    F --> G{Need action/tool?}
	    G -- Yes --> H[Validate tool call and permissions]
	    H --> I[Execute tool]
	    I --> J[Return tool result to model]
	    G -- No --> K[Generate response]
	    J --> K
	    K --> L[Validate safety, schema, citations]
	    L --> M[Respond and log telemetry]

High-yield decision rules

If the scenario says…	Usually think…	Why
“Answer questions using company documents”	Retrieval-augmented generation with Azure AI Search	The model needs current, private, grounded context
“Find semantically similar passages”	Embeddings and vector search	Embeddings represent meaning for similarity comparison
“Search should use both exact terms and semantic meaning”	Hybrid search	Combines keyword matching with vector similarity
“Improve ranking of natural-language search results”	Semantic ranking/reranking	Reranks likely relevant text results; it is not the same as generating embeddings
“Extract fields from invoices, forms, receipts, or documents”	Azure AI Document Intelligence	Purpose-built document extraction is usually better than raw prompting
“Detect PII, sentiment, key phrases, or entities”	Azure AI Language	Use purpose-built NLP capabilities when the task is standard
“Transcribe calls or synthesize voice”	Azure AI Speech	Do not choose text-only services for audio requirements
“Moderate harmful text or images”	Azure AI Content Safety	Safety classification is a separate control from model generation
“Need repeatable JSON output for an API”	Structured output plus schema validation	Prompt instructions alone are not enough
“Need a model to perform a business transaction”	Tool/function calling with server-side validation	The model proposes; the app authorizes and executes
“User asks complex multi-step goal with tool choices”	Agent pattern	Agents are useful when steps are dynamic, not fixed
“Need current private data without changing model weights”	RAG, not fine-tuning	Fine-tuning teaches behavior/style; RAG supplies facts
“Need reduce hallucinations”	Improve grounding, retrieval, citations, evaluation, and refusal behavior	Lowering temperature alone rarely solves poor grounding
“Need secure Azure-to-Azure access”	Managed identity and RBAC where supported	Avoid hard-coded secrets and broad API keys

Azure AI Foundry and model deployment review

Azure AI Foundry is central to building and managing modern Azure AI apps. For exam prep, focus on the development lifecycle rather than memorizing screen names.

Model lifecycle checklist

Define task: chat, summarization, classification, extraction, embedding, image/audio processing, agentic action, or multimodal reasoning.
Select model: balance capability, context length, modality, latency, throughput, region availability, and cost.
Deploy or connect: configure endpoint, deployment name, model version, and access controls.
Build prompt/app flow: instructions, context, tools, retrieval, output schema, and error handling.
Evaluate: quality, safety, groundedness, relevance, task success, latency, and token usage.
Deploy application: use environment configuration, secrets management, CI/CD, and least privilege.
Monitor and improve: collect telemetry, compare prompt/model versions, and run regression evaluations.

Common configuration settings

Setting	What it affects	Exam trap
Deployment name	What the application calls at runtime	In Azure-hosted model APIs, code often references the deployment, not just a public model name
Model/version	Capability, behavior, supported features, and lifecycle	Changing model versions can change responses; evaluate before rollout
Context window	Amount of input and history that can be considered	More context is not automatically better if it is irrelevant
Max output tokens	Upper bound on generated response length	Too low can truncate answers; too high increases cost and latency
Temperature	Randomness/creativity	Lower values improve consistency but do not guarantee factual accuracy
Top-p	Nucleus sampling behavior	Usually tune either temperature or top-p first, not both randomly
Streaming	Sends partial output as it is generated	Improves perceived latency but requires client handling
Structured output	Constrains response format	Still validate output server-side
Content filters/safety settings	Reduce unsafe content exposure	Not a replacement for authorization, validation, or human review

Prompting and structured generation

Prompting is not just wording. In production-style exam scenarios, it is part of a controlled application contract.

Strong prompt pattern

A strong prompt usually includes:

Role or task: what the model should do.
Authoritative context: retrieved content, user data, or tool results.
Boundaries: what to ignore, when to refuse, and what not to infer.
Output format: JSON schema, bullet list, table, or concise answer.
Examples: when useful for style or edge cases.
Citation rules: if answers must be grounded in retrieved sources.
Safety rules: do not expose secrets, hidden instructions, or unauthorized data.

Prompting traps

Trap	Why it matters	Better approach
Putting security policy only in the prompt	A malicious user may override or manipulate natural-language instructions	Enforce security in application code and identity controls
Passing raw untrusted documents as instructions	Retrieved content can contain prompt injection	Treat retrieved text as data, isolate it, and validate output
Asking for JSON without validation	Models can produce malformed or extra text	Use structured output features where available and validate with a parser/schema
Using long chat history blindly	Old context can conflict with current instructions and increase cost	Summarize, trim, or store only relevant state
Relying on temperature for correctness	Correctness depends on grounding and evaluation	Improve source data, retrieval, prompt constraints, and validation
Hiding business rules in examples only	Examples may not cover edge cases	State rules explicitly and test edge cases with topic drills

Agents and tool use

An AI agent is more than a chat completion. It combines a model with instructions, tools, state, knowledge, and guardrails so it can pursue a goal over one or more steps.

Agent components

Component	Purpose	What to review
Instructions	Define role, boundaries, and task strategy	Keep system/developer instructions separate from user-controlled content
Tools/functions	Allow the agent to call APIs or perform actions	Validate arguments, authorize actions, and handle failures
Knowledge/retrieval	Ground the agent in trusted data	Use RAG, filters, citations, and source constraints
State/memory	Preserve conversation or task context	Store only necessary data and respect privacy requirements
Planner/orchestrator	Decides step order or tool choice	Prefer deterministic workflows when steps are fixed
Guardrails	Control safety, privacy, schema, and allowed actions	Combine model instructions with code-level enforcement
Evaluation	Measures whether the agent completes tasks safely	Test multi-step paths, tool errors, and adversarial inputs

Agent versus workflow versus RAG

Requirement	Best fit	Reason
Answer questions from documents	RAG-based chat	Retrieval is the main need
Execute a fixed approval process	Deterministic workflow	Predictability and auditability matter more than flexible reasoning
Choose among several APIs based on user goal	Agent with tools	Dynamic tool selection is useful
Produce a structured extraction from known document types	Document Intelligence or structured extraction flow	Purpose-built extraction is easier to validate
Summarize a known text input	Direct model call	No agent is needed
Investigate, retrieve, call tools, and synthesize	Agentic orchestration	Multi-step reasoning and actions are required

Tool/function calling flow

User asks for an outcome.
Application sends instructions, available tool schemas, and context to the model.
Model proposes a tool call and arguments.
Application validates:
- Is the user authorized?
- Is this tool allowed for this user and context?
- Are arguments complete, typed, and within safe limits?
- Could the call cause a high-impact side effect?
Application executes the tool if allowed.
Tool result is returned to the model.
Model produces a final response.
Application validates output and logs the interaction.

Exam shortcut: the model may select or propose a tool, but your application is responsible for enforcement, execution, retries, auditing, and side-effect control.

Retrieval-augmented generation and Azure AI Search

RAG is one of the highest-yield AI-103 patterns. It lets a generative model answer using data that was not part of its training data.

RAG pipeline

Stage	Developer decisions	Common mistakes
Source selection	Which documents, databases, or storage containers are authoritative	Indexing unapproved or stale content
Ingestion	Push documents or use indexers/connectors where appropriate	Forgetting refresh schedules and deletion handling
Chunking	Split content by semantic sections, headings, paragraphs, or token limits	Chunks too large, too small, or split mid-thought
Enrichment	Extract metadata, OCR, entities, summaries, or normalized fields	Missing metadata needed for filters and security trimming
Embedding	Generate vector representations for chunks and queries	Using inconsistent embedding models or dimensions
Index design	Define fields, vector fields, searchable fields, filterable metadata, analyzers	Not marking fields filterable/sortable/facetable when needed
Retrieval	Choose keyword, vector, hybrid, filters, top-k, semantic ranking	Retrieving irrelevant context or too much context
Generation	Pass selected context with instructions and citation rules	Allowing model to answer beyond provided evidence
Evaluation	Measure groundedness, relevance, retrieval precision/recall, and user outcomes	Judging only by a few manual examples

Azure AI Search review

Feature	Use when	Watch for
Search index	Store searchable documents and fields	Schema design matters; field attributes affect query capabilities
Keyword search	Need exact terms, product codes, names, or legal wording	May miss semantically similar content
Vector search	Need meaning-based similarity	Requires embeddings and compatible vector field dimensions
Hybrid search	Need both lexical and semantic similarity	Often improves enterprise document retrieval
Semantic ranker	Need improved ranking/captions for natural-language results	It reranks; it does not replace indexing quality
Filters	Need scope by user, tenant, department, date, product, region, or document type	Filters require filterable fields and correct metadata
Facets	Need result navigation or counts by category	Requires facetable fields
Scoring profiles	Need boost specific fields or freshness	Bad boosts can bury relevant results
Indexers/skillsets	Need automated ingestion or enrichment from supported sources	Still validate extraction quality and refresh behavior
Synonyms/analyzers	Need domain vocabulary handling	Do not use them as a substitute for embeddings when semantic meaning matters

RAG troubleshooting table

Symptom	Likely cause	Fix
Model hallucinates unsupported facts	Missing or irrelevant retrieved context	Improve retrieval, require citations, add refusal rule, evaluate groundedness
Correct document is not retrieved	Poor chunking, missing metadata, weak query expansion, wrong embedding setup	Rechunk, enrich metadata, use hybrid search, check vector dimensions
Results ignore user permissions	No security trimming or tenant filtering	Add authorization-aware filters before retrieval
Answers cite wrong source	Retrieved chunks are ambiguous or citation mapping is weak	Store source IDs, page numbers, section titles, and stable links
Latency is high	Too many retrieved chunks, large prompts, slow tools, no caching	Reduce top-k, compress context, cache embeddings/results, stream output
Answers are outdated	Index refresh problem or stale source	Refresh index, track source version, monitor ingestion failures
Retrieval works in tests but fails in production	Different data distribution or user wording	Add query rewriting, synonyms, hybrid search, and real-user evaluation sets
Prompt injection from documents affects answer	Retrieved content contains malicious instructions	Treat documents as data, not instructions; isolate context and validate outputs

Azure AI services integration review

Not every task needs a general-purpose generative model. AI-103 scenarios may reward choosing a purpose-built Azure AI service.

Service/capability	Use for	Developer focus	Trap
Azure OpenAI / generative models in Azure AI Foundry	Chat, summarization, reasoning, embeddings, structured generation, multimodal tasks where supported	Deployments, prompts, tokens, safety, evaluations, tool calling	Using generation for deterministic extraction without validation
Azure AI Search	Enterprise search, vector search, hybrid retrieval, RAG grounding	Index schema, embeddings, filters, ranking, index refresh	Confusing retrieval with generation
Azure AI Language	Entity recognition, PII detection, sentiment, key phrases, classification, language analysis	Text input/output, confidence, batch handling	Choosing a chat model when standard NLP capability is enough
Azure AI Document Intelligence	Extract structured data from documents, forms, receipts, invoices, IDs, or custom document types	Models, fields, confidence scores, human review for low confidence	Treating OCR text alone as reliable structured extraction
Azure AI Vision	Image analysis, OCR-style image understanding, tagging, object/caption scenarios where appropriate	Image input, supported features, confidence	Using text-only models for visual tasks
Azure AI Speech	Speech-to-text, text-to-speech, translation or transcription scenarios	Audio format, latency, speaker/audio quality, language support	Forgetting audio preprocessing and streaming constraints
Azure AI Translator	Text translation	Language detection, target language, formatting	Using a generative model for simple high-volume translation
Azure AI Content Safety	Harmful content detection and moderation support	Categories, severity, thresholds, review workflows	Treating moderation as a full security model

Security, identity, and data protection

Security questions often test whether you know where enforcement belongs. A prompt can guide a model, but it cannot replace identity, authorization, networking, or validation.

Requirement	Prefer	Avoid
Azure service-to-service authentication	Managed identity with least-privilege RBAC where supported	Hard-coded keys in source code
Store secrets or API keys	Azure Key Vault and secure app configuration	Secrets in client apps, repos, logs, or prompts
Human/admin access	Microsoft Entra ID and role-based access control	Shared accounts or broad owner permissions
Restrict network exposure	Private endpoints, network rules, and secure deployment architecture where required	Public access by default without review
Authorize user data retrieval	App-level authorization plus search filters/security trimming	Asking the model to decide whether the user may see data
Protect tool calls	Server-side validation, allow lists, scoped permissions, audit logs	Letting model-generated arguments call sensitive APIs directly
Protect sensitive prompts	Keep system instructions and secrets out of user-visible content	Sending secrets as prompt text
Reduce data exposure	Data minimization and retention controls	Passing full documents or histories when only a few chunks are needed

Authorization rule to remember

If a user is not allowed to read or perform something outside the AI app, the AI app must not allow the model or agent to expose or perform it either. Enforce that before retrieval and before tool execution.

Responsible AI and safety controls

Responsible AI is practical in developer scenarios: detect risk, reduce harm, validate outputs, log decisions, and provide human oversight where needed.

Risk	Control pattern
Harmful or unsafe user input/output	Azure AI Content Safety, content filters, moderation thresholds, escalation
Prompt injection	Separate instructions from data, strip or isolate untrusted content, validate outputs
Data exfiltration	Do not expose system prompts, secrets, hidden tool outputs, or unauthorized retrieved data
PII leakage	Detect/redact PII, minimize context, avoid logging sensitive content unnecessarily
Hallucinated answer	Ground with trusted sources, cite evidence, allow “I don’t know,” evaluate groundedness
Biased or unfair output	Evaluate representative data, review sensitive use cases, provide human review
Unsafe autonomous action	Require approval for high-impact actions, restrict tools, log decisions
Overreliance by users	Show confidence, citations, limitations, and escalation paths
Model or prompt regression	Version prompts, run evaluation suites, compare before deployment
Abuse and cost attacks	Rate limit, authenticate, monitor usage, set quotas and alerts

Evaluation and monitoring

Quality evaluation is not optional for AI apps and agents. Traditional tests confirm that code runs; AI evaluations check whether outputs are useful, safe, grounded, and consistent.

What to evaluate before release

Dimension	What good looks like
Relevance	Answer addresses the user’s actual question
Groundedness	Claims are supported by retrieved or provided context
Retrieval quality	Correct sources appear in top results
Citation accuracy	Citations point to the supporting source
Task completion	Agent completes the intended workflow
Tool correctness	Tool calls use valid arguments and respect permissions
Safety	Harmful, sensitive, or disallowed content is handled correctly
Robustness	App handles ambiguous, adversarial, and out-of-scope prompts
Latency	Response time meets user experience requirements
Cost	Token, model, search, and tool usage are within budget

Runtime telemetry to monitor

Track more than HTTP success.

Request volume and rate limits
Latency by model, retrieval, and tool step
Input/output token usage
Retrieval hit rate and top document scores
No-answer or fallback frequency
Tool call success/failure rates
Content safety flags
User feedback and corrections
Prompt version, model version, and deployment version
Exceptions, retries, and throttling
Cost trends by tenant, user group, or feature

Performance and cost review

Token usage matters because retrieved context, chat history, tool results, and generated output all increase latency and cost.

\[ \text{Approximate token cost} = \left(\frac{T_{in}}{1000}\times P_{in}\right) + \left(\frac{T_{out}}{1000}\times P_{out}\right) \]

Where \(T_{in}\) includes user input, system/developer instructions, retrieved context, conversation history, and tool results; \(T_{out}\) includes generated response tokens.

Optimization lever	Helps with	Tradeoff
Use smaller model for simpler tasks	Cost and latency	May reduce reasoning quality
Reduce retrieved chunks	Cost, latency, focus	May miss needed evidence
Improve chunking and filters	Relevance and token efficiency	Requires better ingestion design
Cache embeddings	Cost and ingestion speed	Must handle source updates
Cache common answers/results	Latency and cost	Must avoid stale or unauthorized responses
Stream responses	Perceived latency	More client complexity
Summarize long history	Context size	Summary may lose detail
Batch offline processing	Throughput and cost	Not suitable for interactive responses
Use purpose-built services	Accuracy, cost, maintainability	Less flexible than general generation
Add retries with backoff	Resilience to transient failures	Can increase latency if overused

Troubleshooting patterns

Problem	Likely explanation	Review action
401 Unauthorized	Missing/invalid credential	Check identity, token, key, endpoint, and configuration
403 Forbidden	Identity lacks permission	Check RBAC, resource access, network rules, or policy
404 Not Found for model call	Wrong endpoint or deployment name	Verify Azure resource endpoint and deployment identifier
429 Too Many Requests	Rate limit or quota exceeded	Use backoff, batching, quota planning, or workload smoothing
Context length error	Prompt, history, retrieved chunks, or tool results too large	Trim, summarize, reduce top-k, or choose model with larger context
Malformed JSON output	Model not constrained or output not validated	Use structured output and schema validation
Poor answer quality	Weak prompt, bad context, wrong model, or missing evaluation	Isolate prompt, retrieval, model, and data issues
Unsafe output	Safety controls insufficient	Add content safety checks, refusal rules, and review workflows
Tool call has bad parameters	Weak tool schema or missing validation	Tighten schema, add examples, validate server-side
Search returns no results	Query mismatch, filters too strict, stale index	Test without filters, inspect index fields, refresh data
Search returns irrelevant results	Poor chunking, no hybrid search, weak metadata	Rechunk, enrich, tune query, add semantic ranking
App works locally but not in Azure	Identity, networking, environment variables, or managed identity issue	Compare configuration and permissions across environments

Common AI-103 candidate mistakes

Choosing fine-tuning when the scenario needs access to current private documents.
Choosing RAG when the task is a simple fixed classification supported by Azure AI Language.
Assuming vector search automatically enforces user permissions.
Forgetting to mark metadata fields as filterable when filters are required.
Treating content safety as the same thing as authentication or authorization.
Passing full documents to the model instead of retrieving focused chunks.
Ignoring citation requirements in grounded answers.
Letting model-generated tool arguments execute without validation.
Storing API keys in application code or exposing them to browser clients.
Monitoring only service uptime and not groundedness, safety, retrieval quality, or cost.
Using an agent for a fixed workflow that should be deterministic.
Using a general chat model for document extraction without confidence handling or validation.
Assuming lower temperature fixes wrong answers caused by bad retrieval.
Forgetting that Azure-hosted model calls may use deployment names configured in Azure.
Failing to handle throttling, transient failures, and timeout behavior.
Overloading prompts with irrelevant history and context.
Ignoring prompt injection in retrieved documents or tool outputs.
Forgetting that a search index must be refreshed as source content changes.
Allowing the model to infer missing facts instead of refusing or asking for clarification.
Not testing edge cases with original practice questions and detailed explanations.

Rapid self-check before practice

You are ready for AI-103 question-bank practice if you can answer these without looking them up:

When would you choose RAG instead of fine-tuning?
What fields should a search index include for secure, citation-based RAG?
How does hybrid search differ from pure vector search?
What does semantic ranking improve, and what does it not do?
What happens between a model proposing a tool call and the tool actually running?
Which controls belong in code instead of prompts?
How would you reduce hallucinations in an enterprise document assistant?
What telemetry proves an AI app is working well, not merely online?
How would you handle a user request that requires a high-impact action?
Which Azure AI service fits speech, document extraction, translation, moderation, or entity detection scenarios?
How do managed identities and RBAC improve security compared with embedded keys?
What should you check first when an app returns 403, 404, 429, or context-length errors?

How to use this with topic drills and mock exams

Use this Quick Review as a map, then practice by topic:

Practice area	What to drill	What detailed explanations should clarify
Azure AI Foundry and model deployment	Model selection, deployment configuration, prompt settings, evaluations	Why a model/configuration choice fits the scenario
Agents and tools	Tool schemas, validation, permissions, multi-step orchestration	Why the model does not replace application enforcement
RAG and Azure AI Search	Chunking, embeddings, hybrid search, filters, semantic ranking, citations	Why retrieval quality affects answer quality
Azure AI services	Language, Speech, Vision, Document Intelligence, Translator, Content Safety	Why a purpose-built service may be better than a generative model
Security and responsible AI	Managed identity, RBAC, Key Vault, safety filters, prompt injection, PII	Which control mitigates which risk
Operations	Monitoring, retries, throttling, latency, token cost, regression testing	How to diagnose production-style failures

For the best review sequence, do short topic drills first, read the detailed explanations carefully, then move to mixed mock exams. Your next step is to practice original AI-103 questions by weak area—especially agents, RAG with Azure AI Search, security, and responsible AI—until you can explain why each wrong answer is wrong.

Continue in IT Mastery

Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official Microsoft questions, copied live-exam content, or exam dumps.

Study Plan