AI-200 — Microsoft Azure AI Cloud Developer Associate Quick Reference
Compact AI-200 quick reference for Azure AI service selection, RAG patterns, security, deployment, and troubleshooting.
How to Use This Quick Reference
This independent Quick Reference is for candidates preparing for the Microsoft Azure AI Cloud Developer Associate (AI-200) exam. Use it to review decision points, service selection, implementation patterns, and common traps for Azure AI development scenarios.
Focus less on memorizing product names and more on answering: Which Azure AI service fits the requirement, how is it secured, how is it deployed, and how do you evaluate and troubleshoot it?
High-Yield Exam Map
| Area | What to know for AI-200-style scenarios |
|---|---|
| Azure AI service selection | Choose between Azure OpenAI, Azure AI Search, Azure AI Document Intelligence, Azure AI Language, Azure AI Vision, Azure AI Speech, Translator, Content Safety, Azure Machine Learning, and app hosting services. |
| Generative AI development | Chat completions, embeddings, model deployments, prompt structure, tools/function calling, token management, response grounding, and evaluation. |
| Retrieval-augmented generation | Chunking, embeddings, vector indexes, hybrid search, semantic ranking, citations, access filtering, freshness, and hallucination reduction. |
| Knowledge mining | Azure AI Search indexes, indexers, data sources, skillsets, enrichment pipelines, custom skills, and semantic/vector search. |
| Natural language, speech, vision, documents | Select prebuilt vs custom models; distinguish OCR, form extraction, image analysis, transcription, translation, classification, and entity extraction. |
| Security and governance | Microsoft Entra ID, managed identities, keys, Key Vault, RBAC, private networking, content filters, responsible AI controls, logging, and data protection. |
| Deployment and operations | App Service, Azure Functions, Container Apps, AKS, API Management, queues, monitoring, retries, throttling, testing, and CI/CD. |
Azure AI Service Selection Matrix
| Requirement | Prefer | Why | Watch for |
|---|---|---|---|
| Build a chat, summarization, reasoning, or code-assist feature | Azure OpenAI Service or model deployments through Azure AI development tooling | Managed access to large language models with Azure security and deployment controls | The model name is not enough; apps call a deployment. Region and model availability matter. |
| Build, test, evaluate, and manage generative AI apps | Azure AI Foundry tooling | Project-based development, prompt workflows, evaluations, deployments, and model catalog workflows | Do not confuse design-time project tooling with the runtime app architecture. |
| Ground an LLM on enterprise documents | Azure AI Search + embeddings + Azure OpenAI | Supports keyword, vector, hybrid, semantic ranking, metadata filters, and citations | Retrieval does not guarantee correctness; still evaluate groundedness and safety. |
| Search structured and unstructured enterprise content | Azure AI Search | Indexes documents, supports filters, scoring, semantic ranking, vector search, and enrichment | Index schema, analyzer choice, vector dimensions, and metadata fields are exam-relevant. |
| Extract fields from invoices, receipts, IDs, tax forms, or custom forms | Azure AI Document Intelligence | Prebuilt and custom document extraction models | Use Document Intelligence for field extraction, not generic OCR-only scenarios. |
| OCR text from images or simple documents | Azure AI Vision Read/OCR or Document Intelligence | Vision handles image OCR; Document Intelligence handles document-centric extraction | If the scenario needs key-value pairs/tables/forms, choose Document Intelligence. |
| Analyze images for captions, tags, objects, or visual features | Azure AI Vision | Prebuilt image analysis capabilities | Do not choose Document Intelligence for general image tagging. |
| Classify images with custom labels | Custom Vision / custom image model workflow | Train image classification or object detection from labeled images | Use only when prebuilt Vision features are insufficient. |
| Detect language, sentiment, key phrases, entities, or PII | Azure AI Language | Prebuilt NLP APIs | Use custom Language models when domain-specific labels, intents, or entities are needed. |
| Build intent recognition for a chatbot | Conversational Language Understanding | Maps user utterances to intents and entities | CLU identifies intent; it does not automatically complete business workflows. |
| Create FAQ-style question answering over curated content | Custom question answering | Best for controlled knowledge bases and FAQ-style responses | For broad document retrieval plus generation, prefer RAG with Azure AI Search and an LLM. |
| Translate text between languages | Translator | Purpose-built machine translation | Do not use speech translation unless audio is involved. |
| Transcribe or synthesize speech | Azure AI Speech | Speech-to-text, text-to-speech, speech translation, custom speech scenarios | Batch vs real-time and custom model requirements are common decision points. |
| Detect harmful, unsafe, or policy-violating content | Azure AI Content Safety plus model content filters | Safety classification for text/images and layered protection for generative AI apps | Safety filters reduce risk; they are not a substitute for app authorization or validation. |
| Train, register, deploy, and monitor custom ML models | Azure Machine Learning | Full ML lifecycle for custom models, pipelines, endpoints, and MLOps | Do not choose Azure ML when a prebuilt Azure AI service satisfies the requirement. |
| Expose AI functionality through an API | App Service, Azure Functions, Container Apps, AKS + API Management | Hosts app logic and protects/standardizes APIs | The AI service is not usually the entire application boundary. |
| Trigger AI processing from uploaded files/events | Event Grid, Service Bus, Storage Queue, Azure Functions | Event-driven ingestion and asynchronous processing | Use queues for buffering, retries, and decoupling long-running AI tasks. |
Core Azure AI Terms
| Term | Meaning | Exam distinction |
|---|---|---|
| Azure AI services resource | Azure resource used to access one or more cognitive services APIs | Multi-service resources simplify management but not every scenario uses one shared endpoint. |
| Azure OpenAI resource | Azure resource for deploying and calling OpenAI models through Azure | You deploy a model before an app can call it. |
| Model | The base AI capability, such as a chat model or embedding model | Model availability is not the same as deployment availability. |
| Deployment | Named runtime instance of a model in an Azure OpenAI resource | In many SDK calls, the model parameter is the deployment name. |
| Endpoint | Network address used by applications to call a service | May be public, restricted by firewall, or private through Private Link. |
| Key | Shared secret for API access | Simpler but weaker operational model than managed identity. Store in Key Vault when used. |
| Managed identity | Microsoft Entra identity assigned to an Azure workload | Preferred for Azure-hosted apps calling Azure services without secrets. |
| RBAC | Role-based access control through Microsoft Entra ID | Separate management-plane permissions from data-plane permissions. |
| Index | Searchable structure in Azure AI Search | Requires a schema, fields, analyzers, and optionally vector fields. |
| Indexer | Crawler that loads data from a supported source into an index | Runs on schedule or demand; does not run at query time. |
| Skillset | Enrichment pipeline for Azure AI Search indexing | Applies OCR, extraction, translation, custom skills, or projections during indexing. |
| Embedding | Numeric vector representation of text/images | Query and document embeddings must be generated with compatible models and dimensions. |
| Chunk | Segment of a document indexed for retrieval | Bad chunking causes weak grounding even with a strong model. |
| Semantic ranking | Language-aware ranking layer in Azure AI Search | Often combined with keyword/vector retrieval for better relevance. |
| Content filter | Safety control applied to model inputs/outputs | Not an authorization system and not a full business policy engine. |
Generative AI Development Reference
Chat Completion Anatomy
| Component | Purpose | Common trap |
|---|---|---|
| System instruction | Sets assistant behavior, constraints, tone, and task rules | It is not a security boundary. Always validate inputs, outputs, and tool calls. |
| User message | End-user request | User content may contain prompt injection attempts. |
| Assistant message | Prior model response | Long histories consume tokens and may preserve bad context. |
| Tool/function definition | Describes callable app functions | The model suggests calls; your code authorizes and executes them. |
| Retrieved context | External data inserted into the prompt | Must be relevant, access-controlled, and cited when required. |
| Response format | Controls structured output, such as JSON | Validate schema after generation; do not assume perfect formatting. |
| Temperature/top-p | Controls randomness | Lower values usually suit extraction, classification, and deterministic business tasks. |
| Max tokens | Caps response length | Too low truncates answers; too high can increase latency and cost. |
Azure OpenAI SDK Pattern
Use Microsoft Entra ID or managed identity where possible for production workloads. API keys are common in simple examples but should be protected.
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint="https://<resource-name>.openai.azure.com/",
azure_ad_token_provider=token_provider,
api_version="<api-version>"
)
response = client.chat.completions.create(
model="<deployment-name>",
messages=[
{"role": "system", "content": "Answer using the provided policy excerpt only."},
{"role": "user", "content": "What is the refund window?"}
],
temperature=0.2
)
print(response.choices[0].message.content)
High-yield detail: in Azure OpenAI calls, model="<deployment-name>" commonly refers to the Azure deployment name, not just the base model family.
Model Choice Decision Points
| Need | Prefer | Notes |
|---|---|---|
| General chat, summarization, reasoning, extraction | Chat/completion model | Choose based on required quality, latency, cost, context length, and region availability. |
| Enterprise RAG | Chat model + embedding model + Azure AI Search | The chat model generates; the embedding model retrieves. |
| Similarity search | Embedding model | Store embeddings in a vector index; do not ask embeddings to generate answers. |
| Deterministic classification/extraction | Lower temperature, schema validation, possibly Azure AI Language or Document Intelligence | For standard NLP/document tasks, prebuilt services may be more reliable and simpler. |
| Multimodal reasoning | Model/service that supports the required input type | Verify whether the scenario needs image, text, audio, or document-native processing. |
| High-volume automation | Smaller/faster model where acceptable, caching, batching, queueing | Avoid using the largest model by default. |
| Regulated or sensitive workflow | Private networking, managed identity, logging strategy, human review, content safety | Security and governance often decide the architecture. |
Retrieval-Augmented Generation Reference
RAG Flow
flowchart LR
A[Source documents] --> B[Extract text and metadata]
B --> C[Chunk documents]
C --> D[Generate embeddings]
D --> E[Index in Azure AI Search]
U[User question] --> V[Embed query]
V --> W[Vector / hybrid retrieval]
W --> X[Rank, filter, and trim]
X --> Y[Prompt with retrieved context]
Y --> Z[Generate grounded answer with citations]
Z --> Q[Evaluate, log, and monitor]
RAG Design Matrix
| Design choice | Use when | Exam cues | Traps |
|---|---|---|---|
| Keyword search | Exact terms, IDs, names, codes, or structured phrases matter | “Find documents containing…” | Poor semantic recall for paraphrased questions. |
| Vector search | Users ask semantically similar but differently worded questions | “Natural language questions over documents” | Vector dimensions must match the embedding model. |
| Hybrid search | Need both exact matching and semantic recall | “Best relevance over enterprise content” | Requires tuning scoring, filters, and ranking. |
| Semantic ranking | Need improved natural-language relevance and captions/answers | “Improve result quality without retraining” | It ranks retrieved candidates; it does not replace indexing. |
| Metadata filtering | Need access control, departments, dates, regions, document types | “Only show documents user can access” | Filter fields must exist and be populated in the index. |
| Security trimming | Results must respect user permissions | “User-specific document access” | Do not rely on the LLM to hide unauthorized text after retrieval. |
| Chunk overlap | Concepts span boundaries between chunks | “Answers miss context at page breaks” | Too much overlap increases index size and duplicate retrieval. |
| Citations | Users need traceability | “Answer with sources” | Citations require source metadata captured during ingestion. |
| Freshness | Data changes often | “New documents must appear quickly” | Scheduled indexers may not meet near-real-time needs; consider push/event ingestion. |
| Human review | High-impact or risky outputs | “Approval required before action” | Content filters alone may be insufficient. |
Minimal Search Index Fields for RAG
| Field | Purpose | Search configuration |
|---|---|---|
id | Stable unique key | Key field |
content | Chunk text passed to the model | Searchable |
contentVector | Embedding for vector search | Vector field with matching dimensions |
title | Human-friendly source label | Searchable/filterable as needed |
sourceUri | Citation link or storage reference | Retrievable |
pageNumber / section | Citation precision | Filterable/retrievable |
lastModified | Freshness filtering/sorting | Filterable/sortable |
acl / groups | Security trimming | Filterable |
documentType | Filter by policy, manual, contract, etc. | Filterable/facetable |
Vector/Hybrid Search SDK Shape
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.identity import DefaultAzureCredential
search_client = SearchClient(
endpoint="https://<search-service>.search.windows.net",
index_name="<index-name>",
credential=DefaultAzureCredential()
)
vector_query = VectorizedQuery(
vector=query_embedding,
k_nearest_neighbors=5,
fields="contentVector"
)
results = search_client.search(
search_text="refund policy for annual subscriptions",
vector_queries=[vector_query],
filter="documentType eq 'policy'",
select=["title", "content", "sourceUri", "pageNumber"],
top=5
)
Use this pattern to remember the separation between query embedding, vector retrieval, metadata filtering, and prompt construction.
Azure AI Search and Knowledge Mining
| Component | Role | When to use | Common issue |
|---|---|---|---|
| Data source | Connection to supported content store | Pull data from Azure Storage, databases, or other supported sources | Permissions and private networking can block indexers. |
| Indexer | Moves data into an index | Scheduled or on-demand indexing | It does not continuously reflect changes unless scheduled or triggered. |
| Skillset | Enriches content during indexing | OCR, entity extraction, key phrases, translation, custom enrichment | Skills run at ingestion time, not at query time. |
| Custom Web API skill | Calls your custom enrichment logic | Domain-specific extraction, normalization, classification | Must handle scaling, failures, and expected schema. |
| Index projection | Maps enriched content into target index structures | Parent-child or chunked indexing patterns | Incorrect mapping leads to missing fields. |
| Analyzer | Tokenization and text processing | Language-specific search behavior, stemming, tokenization | Analyzer choice affects matching and cannot always be casually changed later. |
| Synonym map | Expands equivalent terms | Industry acronyms, product aliases | Synonyms help keyword search but do not replace semantic/vector search. |
| Semantic configuration | Defines prioritized fields for semantic ranking | Better captions/reranking for natural-language queries | Needs meaningful title/content fields. |
| Vector profile/configuration | Enables vector search | Embedding-based retrieval | Embedding dimensions and vector field config must align. |
Knowledge Mining vs RAG
| Scenario | Better answer |
|---|---|
| “Extract entities and key phrases from documents into a searchable index” | Azure AI Search skillset with enrichment |
| “Ask natural-language questions and generate answers from indexed documents” | RAG using Azure AI Search plus Azure OpenAI |
| “Search documents with filters, facets, and relevance scoring” | Azure AI Search |
| “Summarize search results into a conversational response” | Azure AI Search retrieval followed by generative model response |
| “Apply OCR before indexing scanned PDFs” | Azure AI Search skillset with OCR, or Document Intelligence depending on extraction needs |
Language, Speech, Vision, and Document Services
Azure AI Language
| Requirement | Choose | Notes |
|---|---|---|
| Sentiment and opinion mining | Sentiment analysis | Identifies positive/negative/neutral sentiment and opinions where supported. |
| Extract names, places, organizations, dates | Named entity recognition | Use custom NER for domain-specific entities. |
| Detect sensitive personal data | PII detection | Combine with app policy for redaction, storage, and auditing. |
| Extract important terms | Key phrase extraction | Useful for tagging and indexing. |
| Identify language | Language detection | Often used before translation or language-specific processing. |
| Classify text into custom categories | Custom text classification | Requires labeled examples and training/evaluation. |
| Extract domain-specific entities | Custom named entity recognition | Use when prebuilt NER misses business-specific labels. |
| Detect user intent and entities in conversations | Conversational Language Understanding | Good for bot commands and routing. |
| FAQ-style answers from curated sources | Custom question answering | Best for controlled knowledge base scenarios. |
Azure AI Speech and Translator
| Requirement | Choose | Key distinction |
|---|---|---|
| Convert microphone or audio files to text | Speech-to-text | Real-time vs batch transcription matters. |
| Convert text to spoken audio | Text-to-speech | Voice, language, and style requirements drive selection. |
| Translate text | Translator | Text input/output. |
| Translate spoken audio | Speech translation | Audio input with translation output. |
| Improve recognition for domain vocabulary | Custom Speech | Use when baseline transcription struggles with accents, terms, or environment. |
| Build voice-enabled app | Speech SDK + app host | The SDK handles audio interaction; your app handles business logic. |
Azure AI Vision and Document Intelligence
| Requirement | Choose | Avoid this mistake |
|---|---|---|
| Read text from an image | Azure AI Vision OCR/Read | Do not build a custom model for basic OCR. |
| Extract fields from forms | Azure AI Document Intelligence | OCR alone does not produce structured fields reliably. |
| Extract tables from documents | Document Intelligence | Tables require document-aware layout extraction. |
| Use prebuilt invoice/receipt/ID extraction | Document Intelligence prebuilt model | Do not train custom if a prebuilt model satisfies the form type. |
| Extract from a custom business form | Document Intelligence custom model | Needs representative labeled samples and evaluation. |
| Classify document types before extraction | Document classifier / routing pattern | Route to the right extraction model. |
| Generate image tags/captions | Azure AI Vision image analysis | Document Intelligence is document-centric, not image-scene analysis. |
| Detect custom objects in images | Custom Vision or custom vision model workflow | Requires labeled images and model training. |
Prompting, Tools, and Agentic Patterns
| Pattern | Use when | Implementation reminder |
|---|---|---|
| Direct prompt | Simple transformation, drafting, summarization, or classification | Keep instructions explicit and constrain output format. |
| Few-shot prompt | Need consistent style or labels | Include representative examples; avoid excessive token use. |
| RAG prompt | Need answers grounded in private/current data | Retrieve first, then generate using context and citation instructions. |
| Tool/function calling | The model needs live data or actions | Validate arguments, authorize user, execute tool in app code, then return result to model. |
| Planner/agent loop | Multi-step tasks with tool use | Add iteration limits, logging, timeout, safety checks, and human approval for risky actions. |
| Structured output | Downstream system expects JSON or schema | Validate and retry/repair; never blindly trust generated JSON. |
| Prompt template | Reusable prompt with variables | Treat retrieved/user content as data, not instructions. |
| Guardrail prompt | Reduce unsafe behavior | Helpful but insufficient without content filters, authorization, and validation. |
Tool Calling Control Points
| Step | Control |
|---|---|
| Tool definition | Expose only required functions and arguments. |
| User request | Authenticate user and check authorization before tool execution. |
| Model-proposed tool call | Validate name, arguments, types, ranges, and policy constraints. |
| Tool execution | Use least-privilege identity and handle timeouts/retries. |
| Tool result | Remove secrets and unnecessary data before sending back to the model. |
| Final response | Check safety, correctness, citation, and formatting requirements. |
Security, Identity, and Governance
Authentication and Authorization Choices
| Option | Best use | Exam distinction |
|---|---|---|
| API key | Simple local testing or services that require key-based access | Store in Key Vault; rotate; avoid embedding in code or client apps. |
| Microsoft Entra ID | Enterprise authentication and RBAC | Preferred for production when supported. |
| Managed identity | Azure-hosted app calling Azure services | Avoids secrets; assign least-privilege roles. |
| Service principal | CI/CD or non-Azure workload automation | Protect credentials; scope permissions tightly. |
| SAS token | Limited delegated access to storage objects | Time-bound and permission-scoped; not an identity replacement. |
| Key Vault | Secret, key, and certificate management | App identity needs permission to retrieve secrets. |
RBAC and Access Boundaries
| Boundary | What it controls | Common trap |
|---|---|---|
| Azure management plane | Create/update/delete resources | Contributor on a resource does not always grant data-plane read/write. |
| Data plane | Use the service endpoint, indexes, models, or documents | Requires service-specific roles or keys. |
| Search index access | Query or modify indexes/documents | Separate query access from index administration. |
| Storage access | Read/write source documents | Indexers and apps need appropriate storage permissions. |
| Model deployment access | Invoke deployed models | Users/apps may need data-plane permission even if they can view the resource. |
| Application authorization | Which user can perform business action | Do not delegate this decision to the model. |
Network Isolation Checklist
| Requirement | Control |
|---|---|
| Keep traffic off public internet where supported | Private Endpoint / Private Link |
| Restrict public access | Disable or limit public network access and configure firewalls |
| Allow specific Azure services or networks | Service firewall rules and network integration |
| Resolve private endpoints correctly | Private DNS zone configuration |
| Secure app-to-service calls | Managed identity plus private endpoint where supported |
| Protect inbound app APIs | API Management, authentication, authorization, WAF where appropriate |
| Log security-relevant events | Azure Monitor, diagnostics, app logs, and audit trails |
Responsible AI and Safety Controls
| Concern | Practical control |
|---|---|
| Harmful content | Azure AI Content Safety, model content filters, blocked categories, review workflow |
| Hallucination | RAG grounding, citations, retrieval evaluation, refusal behavior when context is insufficient |
| Prompt injection | Treat retrieved/user text as untrusted, separate instructions from data, validate tool calls |
| Data leakage | Access filtering before retrieval, output redaction, least privilege, private networking |
| Bias or unfairness | Representative test sets, human review, metric tracking, documented limitations |
| Overreliance | Confidence indicators, citations, escalation paths, user education |
| Unsafe actions | Human-in-the-loop approval, allowlisted tools, transaction limits, audit logs |
| Privacy | Minimize collected data, redact where appropriate, control logs and retention |
Deployment Architecture Patterns
| Pattern | Use when | Azure services commonly involved |
|---|---|---|
| Synchronous AI API | User waits for response | App Service / Container Apps / AKS, Azure OpenAI, Azure AI Search, API Management |
| Event-driven document ingestion | Files arrive asynchronously | Blob Storage, Event Grid, Azure Functions, Azure AI Search, Document Intelligence |
| Long-running batch processing | Large document sets or audio transcription | Queue/Service Bus, Functions/Container Apps, durable workflow pattern, Storage |
| Chat application with RAG | Conversational answers over enterprise data | Web app, Azure OpenAI, Azure AI Search, storage, identity provider |
| Bot interface | Teams/web chat integration | Azure Bot Service, CLU/Language, Azure OpenAI, backend APIs |
| Custom ML endpoint | Model trained outside prebuilt AI services | Azure Machine Learning endpoint, app host, monitoring |
| Enterprise API facade | Standardized access to AI backend | API Management, managed identity, rate limiting, logging, backend services |
| Private enterprise deployment | Sensitive data and restricted access | Private Endpoints, VNet integration, managed identity, Key Vault, diagnostics |
Hosting Service Selection
| Need | Prefer | Notes |
|---|---|---|
| Simple web API or web app | Azure App Service | Good default for managed hosting. |
| Lightweight event handler | Azure Functions | Good for triggers, ingestion, and glue logic. |
| Containerized microservice without Kubernetes overhead | Azure Container Apps | Good for scalable container workloads and background workers. |
| Full Kubernetes control | AKS | Use when orchestration requirements justify complexity. |
| Workflow orchestration with connectors | Logic Apps | Good for integration-heavy business workflows. |
| Durable stateful orchestration | Durable Functions pattern | Good for fan-out/fan-in and long-running workflows. |
Monitoring, Evaluation, and Optimization
What to Log and Monitor
| Signal | Why it matters |
|---|---|
| Request count, latency, failure rate | Basic reliability and user experience |
| HTTP status codes | Diagnose auth, throttling, quota, and service errors |
| Token usage | Cost, latency, and prompt optimization |
| Model deployment used | Compare quality, regressions, and routing decisions |
| Prompt/template version | Reproduce failures and evaluate changes |
| Retrieval query and document IDs | Debug grounding and citation issues |
| Safety filter outcomes | Monitor blocked content and false positives/negatives |
| Tool calls and results | Audit actions and diagnose agent behavior |
| User feedback | Build evaluation datasets and prioritize fixes |
| Indexer status | Detect ingestion failures and stale search data |
Evaluation Metrics by Scenario
| Scenario | Evaluate |
|---|---|
| RAG answer generation | Groundedness, relevance, citation correctness, answer completeness, refusal when context is insufficient |
| Search retrieval | Precision, recall, top-k relevance, filter correctness, freshness |
| Classification | Accuracy, precision/recall, confusion matrix, threshold behavior |
| Extraction | Field-level accuracy, missing fields, table accuracy, format validity |
| Chat assistant | Task success, safety, latency, escalation rate, user satisfaction |
| Speech transcription | Word error patterns, domain vocabulary recognition, speaker/audio conditions |
| Document extraction | Model confidence, field accuracy, page/layout handling, exception routing |
Optimization Levers
| Problem | First levers to try |
|---|---|
| High latency | Smaller/faster model, shorter prompts, fewer retrieved chunks, caching, streaming responses, async processing |
| High cost | Token reduction, response length limits, model selection, cache repeated answers, batch offline tasks |
| Poor grounding | Improve chunking, add metadata, hybrid search, semantic ranking, better prompt constraints |
| Poor extraction | Use purpose-built service, add examples, validate schema, choose custom model when prebuilt fails |
| Frequent throttling | Backoff/retry, queue requests, smooth traffic, request capacity planning |
| Inconsistent output | Lower temperature, structured output, validation/retry, clearer instructions |
| Stale answers | More frequent indexing, event-driven ingestion, freshness filters |
Troubleshooting Reference
| Symptom | Likely cause | Check |
|---|---|---|
| 401 Unauthorized | Missing/invalid credential | Endpoint, key/token, managed identity configuration |
| 403 Forbidden | Authenticated but not authorized | RBAC role, data-plane permission, storage/search permissions |
| 404 deployment not found | Wrong Azure OpenAI deployment name or endpoint | Resource endpoint, deployment name, region, API version |
| 429 throttling | Too many requests or capacity pressure | Retry-after handling, exponential backoff, queueing, traffic smoothing |
| 5xx service errors | Transient platform/backend issue | Retry with backoff, circuit breaker, monitor service health |
| Vector search returns no results | Wrong field, missing vectors, dimension mismatch, bad embeddings | Index schema, embedding model, vector field config, indexed documents |
| Answers hallucinate | Weak retrieval, prompt allows unsupported claims, no refusal rule | Retrieved chunks, citations, system instruction, evaluation set |
| Correct document not retrieved | Chunking/indexing/query mismatch | Chunk size, metadata filters, hybrid search, analyzers, synonyms |
| Citations are wrong | Source metadata not stored or chunk mapping incorrect | sourceUri, page/section fields, projection logic |
| Indexer fails | Data source permissions, unsupported file, skill error, mapping issue | Indexer execution history and skillset outputs |
| Private endpoint connection fails | DNS or network routing issue | Private DNS zone, VNet links, firewall, public access setting |
| Document fields missing | Prebuilt model mismatch or custom model undertrained | Document type, sample quality, confidence scores |
| CLU predicts wrong intent | Overlapping intents or weak utterance examples | Training data balance, labels, examples, thresholds |
| Speech transcription poor | Audio quality, noise, vocabulary, accent, wrong language | Audio preprocessing, custom speech, language config |
| Output JSON invalid | Model not constrained or schema too complex | Structured output, validation, repair/retry logic |
| Tool call unsafe or incorrect | Model over-selected tool or bad arguments | Tool allowlist, argument validation, user authorization |
Common Exam Traps
- Choosing Azure Machine Learning when a prebuilt Azure AI service already solves the task.
- Choosing OCR when the requirement is structured form or table extraction; use Document Intelligence.
- Treating a system prompt as a security control. It is guidance, not enforcement.
- Letting the LLM decide whether a user is authorized to see retrieved content. Filter before retrieval or before prompt assembly.
- Forgetting that Azure OpenAI apps call a deployment name, not just a model family.
- Assuming embeddings generate answers. Embeddings support similarity search; a generative model writes the answer.
- Ignoring metadata fields needed for filters, citations, freshness, and access control.
- Using only vector search when exact IDs, codes, or names are important; consider hybrid search.
- Assuming indexers run continuously. Understand scheduled, on-demand, and event-driven ingestion patterns.
- Sending secrets, raw credentials, or excessive retrieved data into prompts.
- Confusing management-plane RBAC with data-plane permissions.
- Skipping retries and backoff for throttling and transient service failures.
- Choosing the largest model automatically instead of balancing quality, latency, and cost.
- Treating content filters as a full responsible AI program. They are one layer.
Scenario Answer Checklist
When you see an AI-200 scenario, identify:
- Input type: text, image, document, audio, structured data, or mixed.
- Task type: generate, retrieve, classify, extract, translate, transcribe, moderate, or train.
- Data source: static, frequently updated, private, user-specific, or public.
- Best service: prebuilt Azure AI service, Azure OpenAI, Azure AI Search, Azure ML, or a combination.
- Security model: managed identity, RBAC, Key Vault, private endpoint, access trimming.
- Runtime pattern: synchronous API, async queue, batch job, bot, web app, or containerized service.
- Quality controls: evaluation set, citations, validation, thresholds, human review.
- Operations: logging, monitoring, retries, throttling, cost/latency optimization.
- Responsible AI controls: content safety, privacy, fairness, transparency, escalation.
Practical Next Step
Use this Quick Reference to build a one-page service-selection map, then practice scenario questions where you must justify the Azure AI service, security controls, retrieval pattern, and troubleshooting step. Focus especially on RAG design, managed identity/RBAC, Azure AI Search indexing, and choosing prebuilt AI services over custom models when appropriate.