AI-103 — Microsoft Azure AI Apps and Agents Developer Associate Quick Reference

Last revised: June 18, 2026

Compact exam-prep reference for Microsoft AI-103 covering Azure AI Foundry, Azure OpenAI, agents, RAG, Azure AI Search, security, evaluation, and operations.

Exam identity and study focus

This independent Quick Reference supports candidates preparing for the Microsoft Microsoft Azure AI Apps and Agents Developer Associate (AI-103) exam. Use it as a compact review of high-yield design choices, implementation patterns, and troubleshooting points for Azure AI apps and agent-based solutions.

Item	Reference
Vendor/provider	Microsoft
Exam title	Microsoft Azure AI Apps and Agents Developer Associate (AI-103)
Exam code	AI-103
Candidate focus	Build, integrate, secure, evaluate, and operate AI apps and agents on Azure
Core services to recognize	Azure AI Foundry, Azure OpenAI in Azure AI Foundry, Azure AI Search, Azure AI services, Azure AI Content Safety, Azure Monitor/Application Insights, Microsoft Entra ID, Key Vault, Storage

High-yield architecture map

    flowchart LR
	    U[User or app client] --> A[AI app API / orchestration layer]
	    A --> ID[Microsoft Entra ID / managed identity]
	    A --> LLM[Azure OpenAI / model deployment]
	    A --> AG[Agent service or agent runtime]
	    AG --> T[Tools: functions, APIs, code, search, workflows]
	    A --> R[Retriever]
	    R --> S[Azure AI Search index]
	    S --> D[Blob, files, DBs, documents]
	    A --> CS[Content safety and policy checks]
	    A --> MON[Tracing, logs, evaluations, metrics]
	    LLM --> A
	    T --> AG
	    CS --> A

High-yield mental model:

Model generates or reasons.
Retrieval grounds answers in enterprise data.
Tools let the model or agent take actions.
Security controls identity, data access, networking, and secrets.
Evaluation proves quality, safety, and groundedness before and after release.
Observability helps troubleshoot latency, token use, model errors, unsafe outputs, and poor retrieval.

Service-selection matrix

Need	Usually choose	Why	Exam trap
Build generative AI app with model deployments, prompts, evaluations, and project assets	Azure AI Foundry	Central workspace for model-centric AI app development	Do not treat Foundry as only a portal; know project, model, deployment, connection, evaluation, and tracing concepts
Call GPT-style models from an app	Azure OpenAI in Azure AI Foundry	Managed access to OpenAI models through Azure controls	In Azure calls, the `model` value often refers to the deployment name, not just the base model name
Chat over private documents	Azure AI Search + Azure OpenAI	Retrieval-augmented generation with indexed chunks and citations	Fine-tuning is not the default answer for changing private facts
Multi-step assistant that chooses tools	Azure AI Foundry Agent Service or agent framework	Agent instructions, tools, threads/runs, and tool-call orchestration	Agents increase non-determinism; use deterministic workflows for fixed business processes
Enterprise search over text and vectors	Azure AI Search	Keyword, vector, hybrid, filtering, semantic ranking	Semantic ranking is not a security boundary
Extract tables, key-value pairs, layout, or fields from forms	Azure AI Document Intelligence	Document layout and extraction models	OCR alone is not enough for structured document extraction
Classify, extract, summarize, or analyze natural language with prebuilt APIs	Azure AI Language or generative model	Use task-specific APIs for predictable NLP; use LLMs for flexible generation	Do not overuse LLMs when a deterministic AI service API fits
Speech transcription or text-to-speech	Azure AI Speech	Speech-to-text, text-to-speech, speech translation patterns	Audio quality, language, and diarization requirements affect design
Image analysis or OCR	Azure AI Vision / Document Intelligence	Image tagging, OCR, document layout depending on input	Choose Document Intelligence for document structure, not just images
Moderate unsafe text or images	Azure AI Content Safety and Azure OpenAI content filters	Detect harmful content, jailbreak attempts, protected categories, or policy violations	Content filtering is not a full compliance program
Store secrets and keys	Azure Key Vault	Central secret management and rotation support	Prefer managed identity where possible instead of distributing keys
Monitor production AI app	Azure Monitor, Application Insights, Foundry tracing/evaluation features	Logs, traces, metrics, failures, latency, quality signals	Do not log sensitive prompts/responses without a privacy plan

Core app patterns

Pattern	Use when	Main components	Avoid when
Direct chat/completion	User asks general questions or app needs generated text	App API, prompt, model deployment	Answers require current private data or strict traceability
Grounded chat / RAG	Answers must use enterprise documents	Chunking pipeline, embeddings, Azure AI Search, prompt with retrieved context	Source content is highly structured and better served by direct database queries
Agentic RAG	Assistant must search, reason, call tools, and iterate	Agent, tools, retrieval, thread/run state, policy controls	A fixed workflow can meet the requirement more reliably
Tool/function calling	Model chooses from app-defined operations	Function schema, tool-call handler, validation, execution layer	The action is high-risk and needs human approval or deterministic rules
Workflow-first automation	Steps are known and must be auditable	API workflow, rules engine, Logic Apps/Functions, optional LLM step	The task requires flexible open-ended reasoning
Fine-tuning	Need consistent style, format, or task behavior from examples	Training examples, evaluation set, model deployment	Need to add frequently changing facts; use RAG instead
Task-specific AI service	Need predictable extraction/classification/speech/vision	Azure AI Language, Speech, Vision, Document Intelligence	Need open-ended reasoning across many task types

Azure AI Foundry concepts

Concept	What to know for AI-103
Project	Organizes app assets such as models, deployments, data connections, prompts, evaluations, and traces
Model catalog	Place to discover foundation models and select models for deployment or inference
Model deployment	App-facing deployed model endpoint/configuration; applications call deployments
Prompt engineering	Iterative design of instructions, examples, constraints, grounding, and output format
Evaluation	Measures quality and safety using test data, metrics, and comparison runs
Tracing	Captures app/agent execution steps for debugging prompts, retrieval, tools, and latency
Connections	Secure references to resources such as storage, search, model endpoints, and external services
Agents	Assistants that use instructions, models, tools, and conversation state to perform tasks

Foundry development checklist

Create or select the Azure AI project/resource.
Deploy or select a suitable model.
Define the app pattern: direct model call, RAG, agent, or workflow.
Configure connections to data sources, indexes, tools, and storage.
Build prompts with clear instructions, grounding rules, and output constraints.
Add content safety and input/output validation.
Evaluate with representative prompts and expected outcomes.
Deploy through an app/API layer with managed identity where possible.
Monitor traces, latency, token use, model errors, safety flags, and user feedback.

Azure OpenAI and model interaction reference

Building blocks

Building block	Purpose	Common exam distinction
System/developer instructions	Define assistant behavior, constraints, and role	More durable than user text, but not a security boundary
User message	End-user request	Must be validated and checked for prompt injection
Assistant message	Model response	Can be used as conversation history, but manage token growth
Context	Retrieved or supplied facts	The model only knows private data if you provide or connect it
Embeddings	Numeric representation of text for similarity	Query and indexed vectors must be generated consistently
Tool/function definition	Schema for actions the model may request	The app executes the function; the model does not directly access your systems
Structured output	JSON or schema-constrained response	Still validate output before using it
Streaming	Incremental token delivery	Improves perceived latency but complicates moderation and logging

Model parameter quick reference

Parameter	Effect	Practical guidance
Temperature	Higher means more varied/random output	Lower for factual, deterministic, or formatted answers
Top-p	Controls nucleus sampling	Usually tune either temperature or top-p, not both aggressively
Max output tokens	Caps response length	Set based on UX and cost/latency requirements
Stop sequences	Stop generation at defined text	Useful for templates, delimiters, or multi-part prompts
Frequency/presence penalties	Discourage repetition or encourage novelty	Use carefully; can reduce consistency
Response format / schema	Requests structured output	Always parse and validate in code

Prompt design checklist

Goal	Prompt tactic
Grounded answer	“Use only the provided context. If context is insufficient, say what is missing.”
Citation support	Include source IDs/URLs in retrieved context and require citations by source ID
Tool discipline	Tell the model when it must use a tool versus when it may answer directly
JSON output	Provide schema, valid example, and instruction to return only JSON
Safety	Include prohibited behaviors, escalation instructions, and human handoff rules
Injection resistance	Treat retrieved/user content as data, not as higher-priority instructions

Minimal Azure OpenAI call pattern

import os
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=token_provider,
    api_version=os.environ["AZURE_OPENAI_API_VERSION"]
)

response = client.chat.completions.create(
    model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"],  # Azure deployment name
    messages=[
        {"role": "system", "content": "Answer using concise technical language."},
        {"role": "user", "content": "Explain hybrid search in RAG."}
    ],
    temperature=0.2
)

print(response.choices[0].message.content)

Exam points:

Prefer Microsoft Entra ID and managed identities for production when supported.
API keys are easier for quick tests but increase secret-management risk.
The model deployment name is a frequent source of 404 or deployment-not-found errors.
Token budget includes instructions, history, retrieved context, tool schemas, and response.

RAG and Azure AI Search

RAG pipeline

    flowchart LR
	    A[Source documents] --> B[Load and crack documents]
	    B --> C[Clean, split, chunk]
	    C --> D[Enrich: OCR, metadata, extraction]
	    D --> E[Create embeddings]
	    E --> F[Index in Azure AI Search]
	    Q[User question] --> G[Embed / rewrite query]
	    G --> H[Retrieve: keyword, vector, hybrid]
	    F --> H
	    H --> I[Prompt with context + citations]
	    I --> J[Generate answer]
	    J --> K[Evaluate and monitor]

Chunking and indexing decisions

Decision	Good default thinking	Trap
Chunk size	Large enough for meaning, small enough for precise retrieval	Entire documents often dilute relevance and exceed context budget
Overlap	Add overlap when concepts span chunk boundaries	Too much overlap increases cost and duplicate results
Metadata	Store source, page, section, timestamp, owner, ACLs, content type	Without metadata, filtering and citations are weak
Embedding model	Use the same embedding approach for documents and queries	Mixing incompatible embeddings breaks similarity quality
Reindexing	Re-run indexing when source data or enrichment logic changes	RAG does not automatically know changed documents unless ingestion updates the index
Security trimming	Apply filters based on user authorization	Search relevance is not authorization

Azure AI Search components

Component	Purpose	Exam notes
Index	Searchable schema and stored document chunks	Fields can be searchable, filterable, sortable, facetable, retrievable, vectorized
Data source	Connection to source data for indexers	Commonly storage or supported data platforms
Indexer	Pulls data from source into index	Useful for scheduled or repeatable ingestion
Skillset	Enrichment pipeline such as OCR, extraction, language, or custom skills	Adds structure before indexing
Analyzer	Controls tokenization and text processing	Important for language-specific search behavior
Vector field	Stores embedding vectors	Query vectors must align with index configuration
Semantic ranking	Improves natural-language ranking and captions where configured	Enhances relevance; does not enforce security
Filters	Restrict results by metadata or ACL fields	Critical for tenant, user, or department isolation
Synonym map	Expands equivalent terms	Helpful for domain vocabulary
Scoring profile	Boosts selected fields or freshness	Useful when ranking needs business tuning

Retrieval modes

Retrieval mode	Best for	Limitations
Keyword search	Exact terms, IDs, names, product codes	Misses semantic matches
Vector search	Conceptual similarity and paraphrases	Can return plausible but contextually wrong chunks
Hybrid search	Combines keyword and vector signals	Often strong for enterprise RAG
Semantic ranking	Re-ranks top results for natural-language relevance	Works after initial retrieval; not a replacement for good indexing
Filtered retrieval	Enforces scope such as user, region, product, or document type	Overly strict filters can hide relevant context

RAG retrieval snippet

## Conceptual pattern: embed query, retrieve chunks, then pass context to the model.
query = "What is the refund exception process for enterprise customers?"

query_vector = embed(query)  # Use the same embedding strategy as the indexed chunks.

results = search_client.search(
    search_text=query,
    vector_queries=[
        {
            "vector": query_vector,
            "fields": "contentVector",
            "k_nearest_neighbors": 5
        }
    ],
    filter="department eq 'Support'",
    select=["content", "source", "page", "lastUpdated"],
    top=5
)

context = "\n\n".join(
    f"[{r['source']} p.{r['page']}]\n{r['content']}" for r in results
)

RAG failure-to-fix table

Symptom	Likely cause	Fix
Answer is fluent but wrong	Retrieved context is irrelevant or missing	Inspect retrieved chunks; tune chunking, hybrid search, filters, and prompts
Answer lacks citations	Source metadata missing or prompt does not require citations	Store source/page IDs and require citation format
User sees unauthorized content	No security trimming or wrong filter	Add per-user/tenant ACL fields and enforce filters before generation
Model ignores context	Prompt allows outside knowledge or context too noisy	Strengthen grounding instruction and improve retrieval precision
High latency	Too many retrieval calls, large context, slow tools	Cache, reduce top-k, compress context, parallelize safe calls
Poor recall	Chunks too small/large, weak synonyms, no hybrid search	Tune chunks, add metadata, use hybrid/semantic ranking
Stale answers	Index not refreshed	Schedule or trigger ingestion updates

Agents and tool calling

Agent concepts

Concept	Meaning	Candidate reminder
Agent	Model-backed assistant configured with instructions and tools	Use for flexible multi-step tasks
Instructions	Persistent behavior and policy guidance	Keep concise, explicit, and testable
Thread/session	Conversation state	Manage retention, privacy, and token growth
Run/execution	One agent processing cycle	A run may require tool outputs before completion
Tool	Capability exposed to the agent	Examples: function, search, file retrieval, code, workflow, API
Tool call	Model-requested action with arguments	Validate arguments before execution
Tool output	Result returned to agent	Sanitize tool output to reduce prompt injection
Human approval	Manual gate for sensitive actions	Use for irreversible, financial, legal, or high-impact actions

Agent vs function vs workflow

Requirement	Best fit	Why
“Answer questions about these files”	RAG or file-search-capable agent	Retrieval is the primary need
“Book a meeting, email summary, update CRM”	Agent with tools, or workflow with LLM step	Agent can select tools; workflow is safer if sequence is fixed
“Always run these 5 steps in this order”	Deterministic workflow	Easier to audit and test
“Decide which diagnostic command to run next”	Agent	Requires iterative reasoning
“Call one known API based on user intent”	Function calling	Lighter than a full agent
“Generate strictly formatted output”	Direct model call with schema	Agent may be unnecessary

Tool/function calling pattern

## Pseudocode: the app, not the model, executes tools.
messages = [
    {"role": "system", "content": "Use tools for account lookups. Do not invent account data."},
    {"role": "user", "content": "What is the status of order A123?"}
]

model_response = call_model(messages, tools=[get_order_status_schema])

if model_response.requests_tool:
    tool_name = model_response.tool_name
    args = validate_json(model_response.tool_arguments)

    if tool_name == "get_order_status":
        tool_result = get_order_status(order_id=args["order_id"])

    messages.append(model_response.as_message())
    messages.append({
        "role": "tool",
        "tool_call_id": model_response.tool_call_id,
        "content": sanitize(tool_result)
    })

    final_response = call_model(messages, tools=[get_order_status_schema])

Tool-calling traps:

Validate tool arguments even if the schema is strict.
Apply authorization before executing the requested action.
Treat tool outputs and retrieved documents as untrusted text.
Use idempotency keys or confirmation for actions that change state.
Log tool traces without exposing secrets or sensitive data.
Set max iterations to avoid runaway agent loops.

Azure AI services quick grid

Service area	Use for	High-yield distinction
Azure AI Language	Sentiment, key phrases, entity recognition, PII detection, classification, conversational language understanding	Use when a prebuilt or custom NLP API is more predictable than an LLM prompt
Azure AI Speech	Speech-to-text, text-to-speech, speech translation	Audio format, language, latency, and speaker requirements matter
Azure AI Vision	Image analysis, OCR/image understanding scenarios	Use Document Intelligence when document structure is central
Azure AI Document Intelligence	Layout, tables, key-value pairs, prebuilt/custom document extraction	Best for forms, invoices, receipts, contracts, and structured document processing
Azure AI Translator	Text translation	Prefer for translation workloads instead of prompting a general model
Azure AI Content Safety	Harmful content detection and safety controls	Complements Azure OpenAI content filters and app policy logic
Azure AI Search	Indexing and retrieval for enterprise content	Core service for scalable RAG grounding

Security, identity, and governance

Identity and access choices

Control	Prefer	Use when	Trap
Managed identity	Azure-hosted apps accessing Azure resources	App Service, Functions, AKS, VM, Container Apps, workflows	Role assignment still required
Microsoft Entra ID token auth	Production service-to-service access	Supported SDKs and enterprise auth	Wrong token scope or tenant causes auth failures
API keys	Quick tests or unsupported identity scenario	Local prototypes or simple integration	Store in Key Vault; do not hard-code
Key Vault	Secrets, keys, certificates	Central secret lifecycle	App still needs identity to read secrets
RBAC	Resource and data-plane permissions	Least privilege access	Contributor at subscription scope is usually excessive
Private endpoint/network controls	Restrict public exposure	Sensitive data or enterprise network requirements	DNS and routing must be configured correctly

Data and prompt security checklist

Classify data before sending it to model, search, logging, or evaluation systems.
Use least privilege for app identity to Search, Storage, Key Vault, and AI resources.
Apply user-level or tenant-level filters before retrieval.
Remove or mask sensitive data in logs and traces.
Do not put secrets in prompts, tool schemas, system messages, or source documents.
Validate model output before database writes, API calls, or user-visible actions.
Use human approval for high-impact operations.
Treat prompt injection as an application security issue, not just a prompt wording issue.

Prompt injection defenses

Attack pattern	Defense
Retrieved document says “ignore previous instructions”	Delimit retrieved content and state that it is untrusted data
User asks for hidden system prompt	Refuse disclosure and avoid placing secrets in prompts
User asks agent to call unauthorized tool	Check authorization in code before tool execution
Malicious source includes fake citation	Generate citations from metadata, not from document text alone
Tool output contains instructions	Sanitize and summarize tool output before returning it to the model

Evaluation and responsible AI

Quality and safety evaluation matrix

Evaluation target	What to measure	Practical method
Groundedness	Response is supported by retrieved context	Compare answer claims to source chunks
Relevance	Response answers the user’s question	Use labeled test prompts or evaluator model
Retrieval quality	Right chunks appear in top results	Inspect recall/precision by query set
Citation quality	Citations point to correct sources	Validate source IDs/pages against answer claims
Coherence	Response is clear and logically structured	Human review or automated scoring
Safety	Harmful, disallowed, or policy-violating content	Content Safety checks and adversarial tests
Robustness	Handles ambiguous, malicious, or edge-case prompts	Red-team prompt set
Latency	Meets user experience needs	Trace model, retrieval, and tool durations
Cost/token use	Fits budget and throughput goals	Track prompt size, context size, completion size

Responsible AI controls

Control	Use for	Notes
Content filters	Model input/output safety enforcement	Built into Azure OpenAI flows depending on configuration
Azure AI Content Safety	Moderation and harm detection across app content	Useful for custom moderation workflows
Grounding checks	Detect unsupported claims	Important for enterprise Q&A
Human review	Escalation and high-impact decisions	Especially for sensitive or irreversible actions
Abuse monitoring	Detect misuse patterns	Combine telemetry, rate limits, and policy
Feedback capture	Improve prompts, retrieval, and tools	Keep feedback privacy-aware

Deployment and operations

Production readiness checklist

Area	Check
App architecture	Separate client, orchestration/API layer, model calls, retrieval, and tools
Identity	Use managed identity or Entra ID where possible
Secrets	Store keys in Key Vault; rotate and audit access
Retrieval	Test index freshness, metadata filters, and citation accuracy
Prompting	Version prompts and evaluate before release
Tools	Validate arguments, authorize actions, handle retries and timeouts
Safety	Run input/output moderation and policy checks
Observability	Trace model calls, retrieval, tool calls, failures, latency, and token use
Reliability	Implement retries with backoff for transient errors
Privacy	Redact or avoid sensitive prompt/response logging
Evaluation	Maintain regression set for quality and safety
Rollback	Keep known-good prompt/model/config versions

Troubleshooting quick table

Symptom/error	Common cause	Response
401 Unauthorized	Bad credential, expired token, wrong auth method	Check identity, key, token acquisition, and SDK config
403 Forbidden	Identity lacks role or network access blocked	Verify RBAC/data-plane roles, private endpoint, firewall
404 deployment/resource not found	Wrong endpoint, resource, deployment name, or region	Confirm endpoint and Azure deployment name
429 throttling	Too much concurrency or request volume	Retry with exponential backoff, queue, reduce parallelism
5xx/transient errors	Service or network transient issue	Retry safely, add circuit breaker, monitor status
JSON parse failure	Model did not follow output format	Use schema/structured output, lower temperature, validate and retry
Tool loop	Agent keeps requesting tools	Limit iterations, improve instructions, return clearer tool errors
Hallucinated answer	Weak grounding or missing context	Improve retrieval, require “insufficient information” behavior
High token use	Long history, excessive context, verbose tools	Summarize history, reduce chunks, compress tool output
Slow response	Retrieval/tool/model latency	Trace each step, stream output, cache safe results

Common AI-103 exam traps

Trap	Correct exam mindset
“Use fine-tuning for private knowledge”	Use RAG for changing or source-grounded private data; fine-tune for behavior/style/task examples
“The LLM securely enforces permissions”	Your app must enforce identity, authorization, filters, and tool permissions
“Prompt instructions are security controls”	Prompts help behavior but are not sufficient security boundaries
“Vector search is always better than keyword search”	Hybrid search often performs better for enterprise content
“Semantic ranking controls access”	It ranks results; it does not authorize users
“Agent equals workflow”	Agents choose steps dynamically; workflows execute defined logic
“Tool schemas guarantee safe execution”	Validate, authorize, sanitize, and log in application code
“Content filters replace app policy”	Filters are one layer; add business rules, review, and monitoring
“More retrieved chunks always improve answers”	Too much context can add noise, cost, and latency
“Conversation history can grow forever”	Summarize, truncate, or selectively retain context
“Logging everything helps debugging”	AI logs may contain sensitive data; design privacy-aware telemetry
“Model name and deployment name are interchangeable”	Azure app calls commonly use the deployment name configured in Azure

Rapid review checklist

Before practice, make sure you can explain:

When to use Azure AI Foundry, Azure OpenAI, Azure AI Search, Azure AI services, and Azure AI Content Safety.
The difference between direct prompting, RAG, tool calling, and agents.
How embeddings, chunking, metadata, filters, and hybrid search affect RAG quality.
Why managed identity, RBAC, Key Vault, private networking, and data filtering matter.
How to evaluate groundedness, relevance, safety, retrieval quality, and latency.
How to troubleshoot auth errors, deployment-name issues, throttling, poor retrieval, hallucinations, and tool loops.
Why prompt injection requires application-level defenses.

Next step for practice

Use this Quick Reference as a checklist while completing hands-on Azure AI Foundry, Azure OpenAI, Azure AI Search, and agent labs. Then move into timed AI-103-style practice questions that force you to choose the best service, pattern, security control, and troubleshooting action for each scenario.

Scenario Guide

Plan and Manage an Azure AI Solution