Exam identity and study focus
This independent Quick Reference supports candidates preparing for the Microsoft Microsoft Azure AI Apps and Agents Developer Associate (AI-103) exam. Use it as a compact review of high-yield design choices, implementation patterns, and troubleshooting points for Azure AI apps and agent-based solutions.
| Item | Reference |
|---|
| Vendor/provider | Microsoft |
| Exam title | Microsoft Azure AI Apps and Agents Developer Associate (AI-103) |
| Exam code | AI-103 |
| Candidate focus | Build, integrate, secure, evaluate, and operate AI apps and agents on Azure |
| Core services to recognize | Azure AI Foundry, Azure OpenAI in Azure AI Foundry, Azure AI Search, Azure AI services, Azure AI Content Safety, Azure Monitor/Application Insights, Microsoft Entra ID, Key Vault, Storage |
High-yield architecture map
flowchart LR
U[User or app client] --> A[AI app API / orchestration layer]
A --> ID[Microsoft Entra ID / managed identity]
A --> LLM[Azure OpenAI / model deployment]
A --> AG[Agent service or agent runtime]
AG --> T[Tools: functions, APIs, code, search, workflows]
A --> R[Retriever]
R --> S[Azure AI Search index]
S --> D[Blob, files, DBs, documents]
A --> CS[Content safety and policy checks]
A --> MON[Tracing, logs, evaluations, metrics]
LLM --> A
T --> AG
CS --> A
High-yield mental model:
- Model generates or reasons.
- Retrieval grounds answers in enterprise data.
- Tools let the model or agent take actions.
- Security controls identity, data access, networking, and secrets.
- Evaluation proves quality, safety, and groundedness before and after release.
- Observability helps troubleshoot latency, token use, model errors, unsafe outputs, and poor retrieval.
Service-selection matrix
| Need | Usually choose | Why | Exam trap |
|---|
| Build generative AI app with model deployments, prompts, evaluations, and project assets | Azure AI Foundry | Central workspace for model-centric AI app development | Do not treat Foundry as only a portal; know project, model, deployment, connection, evaluation, and tracing concepts |
| Call GPT-style models from an app | Azure OpenAI in Azure AI Foundry | Managed access to OpenAI models through Azure controls | In Azure calls, the model value often refers to the deployment name, not just the base model name |
| Chat over private documents | Azure AI Search + Azure OpenAI | Retrieval-augmented generation with indexed chunks and citations | Fine-tuning is not the default answer for changing private facts |
| Multi-step assistant that chooses tools | Azure AI Foundry Agent Service or agent framework | Agent instructions, tools, threads/runs, and tool-call orchestration | Agents increase non-determinism; use deterministic workflows for fixed business processes |
| Enterprise search over text and vectors | Azure AI Search | Keyword, vector, hybrid, filtering, semantic ranking | Semantic ranking is not a security boundary |
| Extract tables, key-value pairs, layout, or fields from forms | Azure AI Document Intelligence | Document layout and extraction models | OCR alone is not enough for structured document extraction |
| Classify, extract, summarize, or analyze natural language with prebuilt APIs | Azure AI Language or generative model | Use task-specific APIs for predictable NLP; use LLMs for flexible generation | Do not overuse LLMs when a deterministic AI service API fits |
| Speech transcription or text-to-speech | Azure AI Speech | Speech-to-text, text-to-speech, speech translation patterns | Audio quality, language, and diarization requirements affect design |
| Image analysis or OCR | Azure AI Vision / Document Intelligence | Image tagging, OCR, document layout depending on input | Choose Document Intelligence for document structure, not just images |
| Moderate unsafe text or images | Azure AI Content Safety and Azure OpenAI content filters | Detect harmful content, jailbreak attempts, protected categories, or policy violations | Content filtering is not a full compliance program |
| Store secrets and keys | Azure Key Vault | Central secret management and rotation support | Prefer managed identity where possible instead of distributing keys |
| Monitor production AI app | Azure Monitor, Application Insights, Foundry tracing/evaluation features | Logs, traces, metrics, failures, latency, quality signals | Do not log sensitive prompts/responses without a privacy plan |
Core app patterns
| Pattern | Use when | Main components | Avoid when |
|---|
| Direct chat/completion | User asks general questions or app needs generated text | App API, prompt, model deployment | Answers require current private data or strict traceability |
| Grounded chat / RAG | Answers must use enterprise documents | Chunking pipeline, embeddings, Azure AI Search, prompt with retrieved context | Source content is highly structured and better served by direct database queries |
| Agentic RAG | Assistant must search, reason, call tools, and iterate | Agent, tools, retrieval, thread/run state, policy controls | A fixed workflow can meet the requirement more reliably |
| Tool/function calling | Model chooses from app-defined operations | Function schema, tool-call handler, validation, execution layer | The action is high-risk and needs human approval or deterministic rules |
| Workflow-first automation | Steps are known and must be auditable | API workflow, rules engine, Logic Apps/Functions, optional LLM step | The task requires flexible open-ended reasoning |
| Fine-tuning | Need consistent style, format, or task behavior from examples | Training examples, evaluation set, model deployment | Need to add frequently changing facts; use RAG instead |
| Task-specific AI service | Need predictable extraction/classification/speech/vision | Azure AI Language, Speech, Vision, Document Intelligence | Need open-ended reasoning across many task types |
Azure AI Foundry concepts
| Concept | What to know for AI-103 |
|---|
| Project | Organizes app assets such as models, deployments, data connections, prompts, evaluations, and traces |
| Model catalog | Place to discover foundation models and select models for deployment or inference |
| Model deployment | App-facing deployed model endpoint/configuration; applications call deployments |
| Prompt engineering | Iterative design of instructions, examples, constraints, grounding, and output format |
| Evaluation | Measures quality and safety using test data, metrics, and comparison runs |
| Tracing | Captures app/agent execution steps for debugging prompts, retrieval, tools, and latency |
| Connections | Secure references to resources such as storage, search, model endpoints, and external services |
| Agents | Assistants that use instructions, models, tools, and conversation state to perform tasks |
Foundry development checklist
- Create or select the Azure AI project/resource.
- Deploy or select a suitable model.
- Define the app pattern: direct model call, RAG, agent, or workflow.
- Configure connections to data sources, indexes, tools, and storage.
- Build prompts with clear instructions, grounding rules, and output constraints.
- Add content safety and input/output validation.
- Evaluate with representative prompts and expected outcomes.
- Deploy through an app/API layer with managed identity where possible.
- Monitor traces, latency, token use, model errors, safety flags, and user feedback.
Azure OpenAI and model interaction reference
Building blocks
| Building block | Purpose | Common exam distinction |
|---|
| System/developer instructions | Define assistant behavior, constraints, and role | More durable than user text, but not a security boundary |
| User message | End-user request | Must be validated and checked for prompt injection |
| Assistant message | Model response | Can be used as conversation history, but manage token growth |
| Context | Retrieved or supplied facts | The model only knows private data if you provide or connect it |
| Embeddings | Numeric representation of text for similarity | Query and indexed vectors must be generated consistently |
| Tool/function definition | Schema for actions the model may request | The app executes the function; the model does not directly access your systems |
| Structured output | JSON or schema-constrained response | Still validate output before using it |
| Streaming | Incremental token delivery | Improves perceived latency but complicates moderation and logging |
Model parameter quick reference
| Parameter | Effect | Practical guidance |
|---|
| Temperature | Higher means more varied/random output | Lower for factual, deterministic, or formatted answers |
| Top-p | Controls nucleus sampling | Usually tune either temperature or top-p, not both aggressively |
| Max output tokens | Caps response length | Set based on UX and cost/latency requirements |
| Stop sequences | Stop generation at defined text | Useful for templates, delimiters, or multi-part prompts |
| Frequency/presence penalties | Discourage repetition or encourage novelty | Use carefully; can reduce consistency |
| Response format / schema | Requests structured output | Always parse and validate in code |
Prompt design checklist
| Goal | Prompt tactic |
|---|
| Grounded answer | “Use only the provided context. If context is insufficient, say what is missing.” |
| Citation support | Include source IDs/URLs in retrieved context and require citations by source ID |
| Tool discipline | Tell the model when it must use a tool versus when it may answer directly |
| JSON output | Provide schema, valid example, and instruction to return only JSON |
| Safety | Include prohibited behaviors, escalation instructions, and human handoff rules |
| Injection resistance | Treat retrieved/user content as data, not as higher-priority instructions |
Minimal Azure OpenAI call pattern
import os
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_ad_token_provider=token_provider,
api_version=os.environ["AZURE_OPENAI_API_VERSION"]
)
response = client.chat.completions.create(
model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], # Azure deployment name
messages=[
{"role": "system", "content": "Answer using concise technical language."},
{"role": "user", "content": "Explain hybrid search in RAG."}
],
temperature=0.2
)
print(response.choices[0].message.content)
Exam points:
- Prefer Microsoft Entra ID and managed identities for production when supported.
- API keys are easier for quick tests but increase secret-management risk.
- The model deployment name is a frequent source of 404 or deployment-not-found errors.
- Token budget includes instructions, history, retrieved context, tool schemas, and response.
RAG and Azure AI Search
RAG pipeline
flowchart LR
A[Source documents] --> B[Load and crack documents]
B --> C[Clean, split, chunk]
C --> D[Enrich: OCR, metadata, extraction]
D --> E[Create embeddings]
E --> F[Index in Azure AI Search]
Q[User question] --> G[Embed / rewrite query]
G --> H[Retrieve: keyword, vector, hybrid]
F --> H
H --> I[Prompt with context + citations]
I --> J[Generate answer]
J --> K[Evaluate and monitor]
Chunking and indexing decisions
| Decision | Good default thinking | Trap |
|---|
| Chunk size | Large enough for meaning, small enough for precise retrieval | Entire documents often dilute relevance and exceed context budget |
| Overlap | Add overlap when concepts span chunk boundaries | Too much overlap increases cost and duplicate results |
| Metadata | Store source, page, section, timestamp, owner, ACLs, content type | Without metadata, filtering and citations are weak |
| Embedding model | Use the same embedding approach for documents and queries | Mixing incompatible embeddings breaks similarity quality |
| Reindexing | Re-run indexing when source data or enrichment logic changes | RAG does not automatically know changed documents unless ingestion updates the index |
| Security trimming | Apply filters based on user authorization | Search relevance is not authorization |
Azure AI Search components
| Component | Purpose | Exam notes |
|---|
| Index | Searchable schema and stored document chunks | Fields can be searchable, filterable, sortable, facetable, retrievable, vectorized |
| Data source | Connection to source data for indexers | Commonly storage or supported data platforms |
| Indexer | Pulls data from source into index | Useful for scheduled or repeatable ingestion |
| Skillset | Enrichment pipeline such as OCR, extraction, language, or custom skills | Adds structure before indexing |
| Analyzer | Controls tokenization and text processing | Important for language-specific search behavior |
| Vector field | Stores embedding vectors | Query vectors must align with index configuration |
| Semantic ranking | Improves natural-language ranking and captions where configured | Enhances relevance; does not enforce security |
| Filters | Restrict results by metadata or ACL fields | Critical for tenant, user, or department isolation |
| Synonym map | Expands equivalent terms | Helpful for domain vocabulary |
| Scoring profile | Boosts selected fields or freshness | Useful when ranking needs business tuning |
Retrieval modes
| Retrieval mode | Best for | Limitations |
|---|
| Keyword search | Exact terms, IDs, names, product codes | Misses semantic matches |
| Vector search | Conceptual similarity and paraphrases | Can return plausible but contextually wrong chunks |
| Hybrid search | Combines keyword and vector signals | Often strong for enterprise RAG |
| Semantic ranking | Re-ranks top results for natural-language relevance | Works after initial retrieval; not a replacement for good indexing |
| Filtered retrieval | Enforces scope such as user, region, product, or document type | Overly strict filters can hide relevant context |
RAG retrieval snippet
## Conceptual pattern: embed query, retrieve chunks, then pass context to the model.
query = "What is the refund exception process for enterprise customers?"
query_vector = embed(query) # Use the same embedding strategy as the indexed chunks.
results = search_client.search(
search_text=query,
vector_queries=[
{
"vector": query_vector,
"fields": "contentVector",
"k_nearest_neighbors": 5
}
],
filter="department eq 'Support'",
select=["content", "source", "page", "lastUpdated"],
top=5
)
context = "\n\n".join(
f"[{r['source']} p.{r['page']}]\n{r['content']}" for r in results
)
RAG failure-to-fix table
| Symptom | Likely cause | Fix |
|---|
| Answer is fluent but wrong | Retrieved context is irrelevant or missing | Inspect retrieved chunks; tune chunking, hybrid search, filters, and prompts |
| Answer lacks citations | Source metadata missing or prompt does not require citations | Store source/page IDs and require citation format |
| User sees unauthorized content | No security trimming or wrong filter | Add per-user/tenant ACL fields and enforce filters before generation |
| Model ignores context | Prompt allows outside knowledge or context too noisy | Strengthen grounding instruction and improve retrieval precision |
| High latency | Too many retrieval calls, large context, slow tools | Cache, reduce top-k, compress context, parallelize safe calls |
| Poor recall | Chunks too small/large, weak synonyms, no hybrid search | Tune chunks, add metadata, use hybrid/semantic ranking |
| Stale answers | Index not refreshed | Schedule or trigger ingestion updates |
Agent concepts
| Concept | Meaning | Candidate reminder |
|---|
| Agent | Model-backed assistant configured with instructions and tools | Use for flexible multi-step tasks |
| Instructions | Persistent behavior and policy guidance | Keep concise, explicit, and testable |
| Thread/session | Conversation state | Manage retention, privacy, and token growth |
| Run/execution | One agent processing cycle | A run may require tool outputs before completion |
| Tool | Capability exposed to the agent | Examples: function, search, file retrieval, code, workflow, API |
| Tool call | Model-requested action with arguments | Validate arguments before execution |
| Tool output | Result returned to agent | Sanitize tool output to reduce prompt injection |
| Human approval | Manual gate for sensitive actions | Use for irreversible, financial, legal, or high-impact actions |
Agent vs function vs workflow
| Requirement | Best fit | Why |
|---|
| “Answer questions about these files” | RAG or file-search-capable agent | Retrieval is the primary need |
| “Book a meeting, email summary, update CRM” | Agent with tools, or workflow with LLM step | Agent can select tools; workflow is safer if sequence is fixed |
| “Always run these 5 steps in this order” | Deterministic workflow | Easier to audit and test |
| “Decide which diagnostic command to run next” | Agent | Requires iterative reasoning |
| “Call one known API based on user intent” | Function calling | Lighter than a full agent |
| “Generate strictly formatted output” | Direct model call with schema | Agent may be unnecessary |
## Pseudocode: the app, not the model, executes tools.
messages = [
{"role": "system", "content": "Use tools for account lookups. Do not invent account data."},
{"role": "user", "content": "What is the status of order A123?"}
]
model_response = call_model(messages, tools=[get_order_status_schema])
if model_response.requests_tool:
tool_name = model_response.tool_name
args = validate_json(model_response.tool_arguments)
if tool_name == "get_order_status":
tool_result = get_order_status(order_id=args["order_id"])
messages.append(model_response.as_message())
messages.append({
"role": "tool",
"tool_call_id": model_response.tool_call_id,
"content": sanitize(tool_result)
})
final_response = call_model(messages, tools=[get_order_status_schema])
Tool-calling traps:
- Validate tool arguments even if the schema is strict.
- Apply authorization before executing the requested action.
- Treat tool outputs and retrieved documents as untrusted text.
- Use idempotency keys or confirmation for actions that change state.
- Log tool traces without exposing secrets or sensitive data.
- Set max iterations to avoid runaway agent loops.
Azure AI services quick grid
| Service area | Use for | High-yield distinction |
|---|
| Azure AI Language | Sentiment, key phrases, entity recognition, PII detection, classification, conversational language understanding | Use when a prebuilt or custom NLP API is more predictable than an LLM prompt |
| Azure AI Speech | Speech-to-text, text-to-speech, speech translation | Audio format, language, latency, and speaker requirements matter |
| Azure AI Vision | Image analysis, OCR/image understanding scenarios | Use Document Intelligence when document structure is central |
| Azure AI Document Intelligence | Layout, tables, key-value pairs, prebuilt/custom document extraction | Best for forms, invoices, receipts, contracts, and structured document processing |
| Azure AI Translator | Text translation | Prefer for translation workloads instead of prompting a general model |
| Azure AI Content Safety | Harmful content detection and safety controls | Complements Azure OpenAI content filters and app policy logic |
| Azure AI Search | Indexing and retrieval for enterprise content | Core service for scalable RAG grounding |
Security, identity, and governance
Identity and access choices
| Control | Prefer | Use when | Trap |
|---|
| Managed identity | Azure-hosted apps accessing Azure resources | App Service, Functions, AKS, VM, Container Apps, workflows | Role assignment still required |
| Microsoft Entra ID token auth | Production service-to-service access | Supported SDKs and enterprise auth | Wrong token scope or tenant causes auth failures |
| API keys | Quick tests or unsupported identity scenario | Local prototypes or simple integration | Store in Key Vault; do not hard-code |
| Key Vault | Secrets, keys, certificates | Central secret lifecycle | App still needs identity to read secrets |
| RBAC | Resource and data-plane permissions | Least privilege access | Contributor at subscription scope is usually excessive |
| Private endpoint/network controls | Restrict public exposure | Sensitive data or enterprise network requirements | DNS and routing must be configured correctly |
Data and prompt security checklist
- Classify data before sending it to model, search, logging, or evaluation systems.
- Use least privilege for app identity to Search, Storage, Key Vault, and AI resources.
- Apply user-level or tenant-level filters before retrieval.
- Remove or mask sensitive data in logs and traces.
- Do not put secrets in prompts, tool schemas, system messages, or source documents.
- Validate model output before database writes, API calls, or user-visible actions.
- Use human approval for high-impact operations.
- Treat prompt injection as an application security issue, not just a prompt wording issue.
Prompt injection defenses
| Attack pattern | Defense |
|---|
| Retrieved document says “ignore previous instructions” | Delimit retrieved content and state that it is untrusted data |
| User asks for hidden system prompt | Refuse disclosure and avoid placing secrets in prompts |
| User asks agent to call unauthorized tool | Check authorization in code before tool execution |
| Malicious source includes fake citation | Generate citations from metadata, not from document text alone |
| Tool output contains instructions | Sanitize and summarize tool output before returning it to the model |
Evaluation and responsible AI
Quality and safety evaluation matrix
| Evaluation target | What to measure | Practical method |
|---|
| Groundedness | Response is supported by retrieved context | Compare answer claims to source chunks |
| Relevance | Response answers the user’s question | Use labeled test prompts or evaluator model |
| Retrieval quality | Right chunks appear in top results | Inspect recall/precision by query set |
| Citation quality | Citations point to correct sources | Validate source IDs/pages against answer claims |
| Coherence | Response is clear and logically structured | Human review or automated scoring |
| Safety | Harmful, disallowed, or policy-violating content | Content Safety checks and adversarial tests |
| Robustness | Handles ambiguous, malicious, or edge-case prompts | Red-team prompt set |
| Latency | Meets user experience needs | Trace model, retrieval, and tool durations |
| Cost/token use | Fits budget and throughput goals | Track prompt size, context size, completion size |
Responsible AI controls
| Control | Use for | Notes |
|---|
| Content filters | Model input/output safety enforcement | Built into Azure OpenAI flows depending on configuration |
| Azure AI Content Safety | Moderation and harm detection across app content | Useful for custom moderation workflows |
| Grounding checks | Detect unsupported claims | Important for enterprise Q&A |
| Human review | Escalation and high-impact decisions | Especially for sensitive or irreversible actions |
| Abuse monitoring | Detect misuse patterns | Combine telemetry, rate limits, and policy |
| Feedback capture | Improve prompts, retrieval, and tools | Keep feedback privacy-aware |
Deployment and operations
Production readiness checklist
| Area | Check |
|---|
| App architecture | Separate client, orchestration/API layer, model calls, retrieval, and tools |
| Identity | Use managed identity or Entra ID where possible |
| Secrets | Store keys in Key Vault; rotate and audit access |
| Retrieval | Test index freshness, metadata filters, and citation accuracy |
| Prompting | Version prompts and evaluate before release |
| Tools | Validate arguments, authorize actions, handle retries and timeouts |
| Safety | Run input/output moderation and policy checks |
| Observability | Trace model calls, retrieval, tool calls, failures, latency, and token use |
| Reliability | Implement retries with backoff for transient errors |
| Privacy | Redact or avoid sensitive prompt/response logging |
| Evaluation | Maintain regression set for quality and safety |
| Rollback | Keep known-good prompt/model/config versions |
Troubleshooting quick table
| Symptom/error | Common cause | Response |
|---|
| 401 Unauthorized | Bad credential, expired token, wrong auth method | Check identity, key, token acquisition, and SDK config |
| 403 Forbidden | Identity lacks role or network access blocked | Verify RBAC/data-plane roles, private endpoint, firewall |
| 404 deployment/resource not found | Wrong endpoint, resource, deployment name, or region | Confirm endpoint and Azure deployment name |
| 429 throttling | Too much concurrency or request volume | Retry with exponential backoff, queue, reduce parallelism |
| 5xx/transient errors | Service or network transient issue | Retry safely, add circuit breaker, monitor status |
| JSON parse failure | Model did not follow output format | Use schema/structured output, lower temperature, validate and retry |
| Tool loop | Agent keeps requesting tools | Limit iterations, improve instructions, return clearer tool errors |
| Hallucinated answer | Weak grounding or missing context | Improve retrieval, require “insufficient information” behavior |
| High token use | Long history, excessive context, verbose tools | Summarize history, reduce chunks, compress tool output |
| Slow response | Retrieval/tool/model latency | Trace each step, stream output, cache safe results |
Common AI-103 exam traps
| Trap | Correct exam mindset |
|---|
| “Use fine-tuning for private knowledge” | Use RAG for changing or source-grounded private data; fine-tune for behavior/style/task examples |
| “The LLM securely enforces permissions” | Your app must enforce identity, authorization, filters, and tool permissions |
| “Prompt instructions are security controls” | Prompts help behavior but are not sufficient security boundaries |
| “Vector search is always better than keyword search” | Hybrid search often performs better for enterprise content |
| “Semantic ranking controls access” | It ranks results; it does not authorize users |
| “Agent equals workflow” | Agents choose steps dynamically; workflows execute defined logic |
| “Tool schemas guarantee safe execution” | Validate, authorize, sanitize, and log in application code |
| “Content filters replace app policy” | Filters are one layer; add business rules, review, and monitoring |
| “More retrieved chunks always improve answers” | Too much context can add noise, cost, and latency |
| “Conversation history can grow forever” | Summarize, truncate, or selectively retain context |
| “Logging everything helps debugging” | AI logs may contain sensitive data; design privacy-aware telemetry |
| “Model name and deployment name are interchangeable” | Azure app calls commonly use the deployment name configured in Azure |
Rapid review checklist
Before practice, make sure you can explain:
- When to use Azure AI Foundry, Azure OpenAI, Azure AI Search, Azure AI services, and Azure AI Content Safety.
- The difference between direct prompting, RAG, tool calling, and agents.
- How embeddings, chunking, metadata, filters, and hybrid search affect RAG quality.
- Why managed identity, RBAC, Key Vault, private networking, and data filtering matter.
- How to evaluate groundedness, relevance, safety, retrieval quality, and latency.
- How to troubleshoot auth errors, deployment-name issues, throttling, poor retrieval, hallucinations, and tool loops.
- Why prompt injection requires application-level defenses.
Next step for practice
Use this Quick Reference as a checklist while completing hands-on Azure AI Foundry, Azure OpenAI, Azure AI Search, and agent labs. Then move into timed AI-103-style practice questions that force you to choose the best service, pattern, security control, and troubleshooting action for each scenario.