Try 50 free AI-103 questions across the exam domains, with explanations, then continue with full IT Mastery practice.
This free full-length AI-103 practice exam includes 50 original IT Mastery questions across the exam domains.
These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.
Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.
Need concept review first? Read the AI-103 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
| Domain | Weight |
|---|---|
| Plan and Manage an Azure AI Solution | 27% |
| Implement Generative AI and Agentic Solutions | 33% |
| Implement Computer Vision Solutions | 13% |
| Implement Text Analysis Solutions | 13% |
| Implement Information Extraction Solutions | 14% |
Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.
Topic: Plan and Manage an Azure AI Solution
A Python support agent is deployed by the same pipeline to staging and production. Each environment has its own Foundry project and model deployment, but both deployments use the same model family. The release gate must validate that production traffic uses the production project and intended deployment, and must provide evidence if a request is misrouted. What should you implement?
Options:
A. Emit traces with the Foundry project endpoint and deployment name per call.
B. Run a safety evaluation on production responses.
C. Monitor aggregate token usage by model family.
D. Compare generated answers with a fixed golden dataset.
Best answer: A
Explanation: The goal is to verify connection context, not only model quality or safety. Trace logging that records the actual Foundry project endpoint and deployment name for each request provides direct evidence that the application is using the intended production context.
For Foundry app deployments, the project endpoint and model deployment name are part of the runtime context that determines where requests are sent. In a CI/CD release gate, instrument the app or SDK calls to emit traces with the configured project endpoint, deployment name, environment, and correlation ID. A smoke test or telemetry query can then confirm that production requests resolve to the production Foundry project and deployment. Quality evaluations can still be useful, but they do not prove that the app connected to the correct project context.
Topic: Implement Computer Vision Solutions
A Foundry project hosts a support agent that accepts customer screenshots and photos. The app already applies text safety filters to the user’s typed prompt. During red-team testing, a benign prompt such as “summarize this screenshot” is paired with an image that contains embedded policy-violating instructions and unsafe visual content. You need to reduce this risk without blocking normal screenshot analysis. What should you implement?
Options:
A. Disable all image uploads to the agent.
B. Run only the existing text filter on OCR output.
C. Apply multimodal visual safety moderation before model processing.
D. Require managed identity for the model endpoint.
Best answer: C
Explanation: The threat is in the image, so the control must inspect visual content, not just typed text. Multimodal visual moderation or risk detection can flag unsafe image content and embedded attacks while preserving legitimate screenshot workflows.
For multimodal agents, responsible AI controls must cover every input modality that can carry unsafe content or instructions. A text-only filter sees the typed prompt but may miss policy violations, visual harms, or adversarial content embedded in pixels. The better design is to moderate image inputs before they reach the model, apply guardrails based on the detected risk, and use review or refusal only for flagged cases. This reduces risk without removing the business value of visual support.
Topic: Implement Generative AI and Agentic Solutions
A Foundry project hosts a customer-support agent that uses Azure AI Search for grounding and calls a refund-eligibility function. Before production, stakeholders ask whether the agent gives correct policy answers, cites retrieved sources, and blocks unsafe refund guidance. Which evaluation plan best addresses the request?
Options:
A. Track token spend per conversation and select the lowest-cost model.
B. Compare p95 latency across model deployments and select the fastest endpoint.
C. Run concurrent-user load tests against the agent endpoint.
D. Evaluate groundedness, answer correctness, safety, and tool-call correctness on traces.
Best answer: D
Explanation: The scenario asks about output quality and agent behavior, not operational efficiency. A suitable Foundry evaluation plan should use conversation traces to test correctness, groundedness, safety, and whether the refund function is called appropriately.
For a grounded agent workflow, evaluation should match the business risk being tested. Here, the risks are incorrect policy answers, unsupported claims, unsafe guidance, and improper function-calling. Use quality and safety evaluations over representative conversations, including retrieved context and tool-call traces, so reviewers can see whether the response is supported by sources and whether the agent followed the refund policy. Latency, throughput, and cost metrics are useful for performance planning, but they do not prove that answers are correct, grounded, or safe.
Topic: Implement Generative AI and Agentic Solutions
You maintain a Foundry RAG app that answers HR policy questions from documents indexed in Azure AI Search. The team changed retrieval from top-3 semantic hybrid results to top-5 and revised the grounding prompt. You must verify before deployment that the change does not reduce groundedness or citation accuracy for known questions. What should you do?
Options:
A. Run baseline-vs-candidate regression evaluation with groundedness checks.
B. Replace hybrid search with vector-only search for all queries.
C. Increase chunk size and redeploy after a smoke test.
D. Monitor production user feedback after releasing the change.
Best answer: A
Explanation: Prompt and retrieval changes can alter which passages are used and how answers cite them. A regression evaluation uses the same test cases against the current and candidate workflows to detect grounding or citation regressions before deployment.
The core concept is regression evaluation for a RAG workflow. Because the team changed both retrieval parameters and the grounding prompt, the safest predeployment check is to run a fixed evaluation dataset through the baseline and candidate versions, then compare groundedness, answer relevance, and citation accuracy. This verifies that known questions still produce answers supported by retrieved sources.
Smoke tests, search-mode changes, or post-release monitoring can be useful, but they do not provide controlled evidence that the new RAG workflow preserves grounding quality.
Topic: Implement Generative AI and Agentic Solutions
You are building a Microsoft Foundry app for a help desk. A user submits a natural-language prompt and a screenshot of an error dialog. The app must generate troubleshooting steps that depend on both the prompt text and the visible message in the screenshot. The app can call only one deployed model endpoint per request and cannot add an OCR or Vision preprocessing step. Which endpoint should the app call?
Options:
A. An embedding model deployment endpoint
B. A text-only generation LLM endpoint
C. A multimodal model deployment endpoint
D. A code model deployment endpoint
Best answer: C
Explanation: The task requires one endpoint that can process a prompt and an image together, then generate text. A multimodal model deployment is the right fit because it supports visual input plus natural-language generation without a separate OCR or Vision step.
In Foundry applications, the deployed endpoint should match the input modality and output task. This scenario needs text generation, but the generated answer must use information visible in an image. A multimodal model deployment can receive both the user prompt and screenshot and produce troubleshooting guidance in a single call. A text-only generation endpoint cannot directly inspect the screenshot, and an embedding endpoint produces vectors for retrieval rather than user-facing guidance. Choose the endpoint whose supported input and output capabilities match the workload.
Topic: Plan and Manage an Azure AI Solution
A Foundry project deploys an enterprise procurement agent by using CI/CD. The approved production baseline requires business-system tools to use a user-assigned managed identity with a least-privilege role policy, and it prohibits stored API keys. A drift scan shows the erpLookup tool in production now authenticates by using an API key stored in app configuration, while the endpoint and tool schema are unchanged. Users must continue querying purchase order status.
What should you do?
Options:
A. Add a prompt guardrail that tells the model not to reveal secrets.
B. Disable the erpLookup tool until the next monthly release.
C. Redeploy the approved managed-identity tool configuration and remove the API key.
D. Leave the setting unchanged because the endpoint and schema match.
Best answer: C
Explanation: The drift is in the tool authentication setting, not in the tool schema. Restoring the approved managed identity and role policy removes the stored secret risk without preventing authorized purchase order lookups.
Configuration drift occurs when the deployed production settings no longer match the approved baseline from the Foundry CI/CD process. In this scenario, the erpLookup tool changed from keyless managed identity authentication to a stored API key. That creates a credential exposure and governance risk even though the endpoint and schema are unchanged. The right response is to reconcile production back to the approved deployment configuration, remove or rotate the exposed key, and keep the tool available through the managed identity with its least-privilege role policy. Blocking the tool would reduce business functionality unnecessarily, and prompt-only guardrails do not fix an authentication drift issue.
Topic: Implement Text Analysis Solutions
A Foundry project processes support tickets before creating workflow tasks. The workflow expects valid JSON with customer, product, issue_type, severity, and topics. Recent runs fail with Invalid JSON.
Trace excerpt:
Tool selected: summarize_ticket
Prompt intent: concise summary and key themes
Model output: "Contoso reports intermittent sign-in failures..."
Parser error: expected object at line 1
What should you do next?
Options:
A. Use a structured JSON extraction tool with the required schema.
B. Add Azure AI Search grounding for each ticket.
C. Route tickets to topic classification only.
D. Increase max output tokens for the summarization prompt.
Best answer: A
Explanation: The failure is caused by using a summarization-oriented prompt/tool for a structured-output requirement. In Foundry, the fix is to use an extraction prompt or tool that constrains the response to the required JSON schema.
Language model text analysis tasks should match the required output type. Summaries produce human-readable prose, topic classification produces labels or themes, and entity extraction identifies specific values. When an automation requires machine-readable fields, use a structured-output extraction prompt or Foundry Tool with an explicit JSON schema so the model returns the fields the parser expects.
The trace shows the selected tool is summarize_ticket, and the model output is natural language. The issue is not missing context or response length; it is the mismatch between the task type and the required output format.
Topic: Implement Text Analysis Solutions
You are extending a Microsoft Foundry customer-support agent that already uses Azure AI Search for RAG over private product manuals. The mobile app must let field technicians ask questions hands-free and hear the agent’s answer while they work. The manuals must remain in the existing private search index. Which implementation should you use?
Options:
A. Expose the Foundry agent as a text chat and enlarge the response font.
B. Add Azure AI Speech for speech-to-text and text-to-speech around the Foundry agent.
C. Use Content Understanding to extract fields from uploaded manuals only.
D. Use Azure Translator to translate typed questions and text responses.
Best answer: B
Explanation: The requirement is a voice user experience, not just text analysis or document extraction. Azure AI Speech can convert technician audio to text for the Foundry agent and synthesize the agent’s response back to speech while leaving the existing Azure AI Search RAG flow in place.
For a hands-free agent interaction, the implementation must support both speech input and speech output. Azure AI Speech provides speech-to-text for the user’s spoken question and text-to-speech for the agent’s answer. The Foundry agent can still perform the reasoning step and use the existing private Azure AI Search index for grounding, so the retrieval architecture does not need to be replaced.
The key takeaway is to add speech modality support around the agent instead of choosing a text-only workflow.
Topic: Plan and Manage an Azure AI Solution
You are designing a Microsoft Foundry HR benefits agent. The agent uses Azure AI Search for handbook grounding and a payroll tool that can submit compensation-change requests. The solution must block unsafe or prompt-injection input, avoid grounding on restricted content, limit tool calls to approved actions, prevent unsafe responses, and require HR approval before any payroll change is committed. Which guardrail placement should you use?
Options:
A. Use a system prompt to describe all restrictions and rely on conversation memory to avoid unsafe requests.
B. Run one safety check only on the final model response and send payroll changes for HR review after execution.
C. Guard input before retrieval, filter retrieved content before grounding, enforce tool policy before execution, scan output before return, and approve before commit.
D. Require HR approval for every user turn and skip retrieval and tool guardrails.
Best answer: C
Explanation: Guardrails should be placed as close as possible to the risk they control. In this workflow, input, retrieval, tool execution, output, and approval each need a separate checkpoint so unsafe content or unauthorized actions are stopped before they affect the next stage.
Agent governance in Foundry should use layered checkpoints rather than a single late filter. Input controls help stop prompt injection or unsafe requests before retrieval. Retrieval controls help ensure only permitted, trusted content is added to the model context. Tool policies and typed schemas constrain function-calling before execution. Output checks reduce the chance of unsafe or confidential responses reaching users. Human approval belongs before a high-impact action is committed, not afterward.
The key takeaway is to prevent risk before the agent uses the next capability, especially before external tool execution.
Topic: Plan and Manage an Azure AI Solution
A team builds a RAG-backed enterprise assistant in a Microsoft Foundry project. New policy PDFs are synced from SharePoint, processed with OCR/layout extraction, chunked, embedded, and indexed in Azure AI Search. Users report that some newly uploaded policies are not being cited. The team already evaluates generated answer quality and needs an observability signal for data ingestion quality. What should they monitor?
Options:
A. Search click-through rate for cited documents
B. Source-to-index completeness and freshness per ingestion run
C. LLM token usage and response latency per conversation
D. Safety evaluator scores for generated responses
Best answer: B
Explanation: Ingestion quality monitoring should verify the path from source content to searchable grounding data. For this scenario, the useful signal is whether each source PDF was processed, chunked, embedded, indexed, and refreshed on time.
The core concept is source-to-index observability for a retrieval pipeline. When users cannot cite newly uploaded policies, the team must determine whether the issue occurred before retrieval: connector sync, OCR/layout extraction, chunking, embedding generation, indexing, or freshness. Per-run lineage metrics should compare expected source documents with successfully indexed chunks and embeddings, and record failures or stale items.
This is different from monitoring the model’s runtime behavior. Latency, token usage, response safety, and user clicks can be useful, but they do not prove that the grounding source was ingested correctly.
Topic: Implement Generative AI and Agentic Solutions
A Microsoft Foundry project includes an agent that helps field technicians troubleshoot equipment. The agent keeps conversation state and uses Azure AI Search to retrieve manual passages. Some requests include an uploaded photo of the equipment panel, and the answer must reason over both the image and retrieved text. Which deployed model endpoint should you use as the agent’s primary reasoning and generation endpoint?
Options:
A. A code model deployment
B. A small text-only classifier deployment
C. A multimodal chat model deployment
D. An embedding model deployment
Best answer: C
Explanation: The agent needs a deployed endpoint that can generate answers while reasoning over image and text inputs. A multimodal chat model is the best fit because the scenario requires combining a technician photo with retrieved manual content.
In Foundry applications, choose the deployed model endpoint based on the task and input modality. This agent is not just classifying intent or retrieving passages; it must generate a response from both visual input and text grounding. A multimodal chat model supports image-plus-text reasoning and can be used with retrieval results from Azure AI Search as grounding context. Embedding endpoints support search/vectorization, not response generation, and code or text-only focused models do not satisfy the image requirement.
The key takeaway is to match the agent’s reasoning endpoint to the richest required input modality and generation goal.
Topic: Plan and Manage an Azure AI Solution
A support agent in a Microsoft Foundry project has intermittent latency spikes during morning sign-in. The model deployment traces show the following records, and the app currently retries failed calls immediately up to three times.
Status: 429 RateLimitExceeded
Message: Tokens-per-minute limit exceeded
Header: retry-after-ms=8000
Pattern: failures occur in 2-minute bursts
Which change is the best next fix?
Options:
A. Increase model temperature for the agent deployment.
B. Increase retry concurrency to clear the backlog.
C. Throttle and queue model calls, honoring Retry-After.
D. Rebuild the search index with larger vectors.
Best answer: C
Explanation: The symptom is a model deployment rate limit, not a grounding or search-quality issue. HTTP 429 with a tokens-per-minute message and a retry-after-ms header indicates the app should slow and schedule requests instead of retrying immediately.
Rate-limit troubleshooting starts with the error evidence. A 429 RateLimitExceeded response plus a tokens-per-minute message means the deployment is receiving more token demand than its current capacity allows during bursts. Immediate retries can multiply the load and make latency worse. A safe mitigation is to add client-side throttling, queue requests, track request and token budgets, and use exponential backoff that honors the Retry-After value. If the queued demand remains consistently high after smoothing bursts, use the trace data for capacity planning or quota adjustment. The key takeaway is to reduce retry amplification before adding more traffic pressure.
Topic: Implement Generative AI and Agentic Solutions
A customer support agent in a Microsoft Foundry project must create incidents by calling the createIncident tool. Users report that the agent says it cannot create the incident after several attempts.
Trace excerpt:
User: Create a P1 incident for checkout failures.
Tool call: createIncident({"title":"checkout failures","priority":"P1"})
Tool result: 400 ValidationError
Detail: field "severity" is required; "priority" is not allowed;
severity must be one of ["sev1","sev2","sev3"]
Tool call repeated 3 times with the same arguments
What is the best next fix?
Options:
A. Raise the model temperature for tool argument generation.
B. Increase the agent’s maximum tool-call iterations.
C. Correct the createIncident tool schema and enum descriptions.
D. Add Azure AI Search grounding for incident history.
Best answer: C
Explanation: This is a tool contract problem, not a retrieval problem. The agent repeatedly sends priority, but the tool requires severity with a specific enum, so the tool definition and descriptions must guide the model to produce valid arguments.
Tool-augmented generation depends on the tool schema as the contract the model uses to form calls. The trace shows the external API rejects the call before any incident is created because the argument names and allowed values do not match: priority is sent, but severity is required. In a Foundry agent or workflow, expose the correct parameter names, required fields, enum values, and descriptions such as mapping a user phrase like P1 to sev1. Adding retries or changing model randomness does not fix a deterministic schema mismatch.
Topic: Implement Generative AI and Agentic Solutions
You are implementing a Microsoft Foundry customer-support agent. A user can ask for the current refund status and, if they approve, the agent must create a return authorization. The status and action are available only through an internal REST API that supports Microsoft Entra ID and is reachable from the Foundry project’s private network. Which implementation should you choose?
Options:
A. Put API examples and retry instructions in the system prompt.
B. Fine-tune the model on nightly exports of refund records.
C. Index refund policies in Azure AI Search for RAG answers.
D. Register managed-identity Foundry tools for lookup and confirmed creation.
Best answer: D
Explanation: The request requires both live external data and a controlled external action. Tool-augmented generation is the right pattern because the agent can call authenticated tools for the refund lookup and return creation instead of relying on model memory or static grounding.
Tool-augmented generation is used when the model must retrieve current data or perform an action outside the model. In this scenario, the Foundry agent should expose the internal REST API as tools with defined inputs and outputs, authenticate with managed identity, and enforce user confirmation before invoking the tool that creates a return authorization. The model can reason over the user request, call the lookup tool, present the result, and only call the creation tool after approval. Fine-tuning, RAG, and prompt instructions can improve language behavior or grounding, but they do not securely perform live API calls or side-effecting actions.
Topic: Plan and Manage an Azure AI Solution
A company is planning a Microsoft Foundry project for a procurement agent. The agent must search approved contract content, call a vendor-status REST API, and submit exception requests by running an internal custom function. Security policy requires no stored secrets, no public ingress to internal systems, least-privilege access, and human approval for write actions. Which integration approach should you choose?
Options:
A. Use Foundry Tools with managed identity, private endpoints, role policies, approval for write tools, and provenance tracing.
B. Store API keys in the agent tool configuration and restrict access by prompt instructions.
C. Give the agent a broad Contributor role and allow all tool calls automatically on the private network.
D. Disable all API and function tools and answer only from an indexed knowledge store.
Best answer: A
Explanation: The agent needs multiple tool types: search over knowledge, API calls, and a custom function. The safest workable design uses Foundry tool integrations with managed identity, private connectivity, least-privilege roles, approval controls for write actions, and traceable provenance.
For this scenario, the core concept is choosing secure tool integration patterns for an agent. Azure AI Search can serve as the knowledge/search tool, while the REST API and custom function can be exposed as controlled tools. Managed identity avoids embedded secrets, private networking avoids public ingress, role policies limit what the agent can access, and approval workflows reduce risk for state-changing operations. Trace logging and provenance metadata support audit and oversight without preventing legitimate read and write workflows.
The key takeaway is to secure each required tool path instead of relying on prompts alone or removing required capabilities.
Topic: Plan and Manage an Azure AI Solution
A team is moving a Microsoft Foundry agent from pilot to production. The agent retrieves regulated HR documents from Azure Blob Storage and Azure AI Search. The release gate requires proof that the runtime does not use public endpoints or stored access keys to reach those data sources. Which option best validates the deployment design?
Options:
A. Store connection strings in Key Vault; monitor secret expiration events.
B. Enable content safety filters; monitor blocked prompt categories.
C. Enable groundedness evaluation; monitor retrieval relevance scores.
D. Use managed identity and private endpoints; monitor audit and network logs.
Best answer: D
Explanation: This requirement is about identity and network isolation, not model quality. Managed identity is needed when the runtime must avoid stored secrets, and private networking is needed when access to Azure resources must not traverse public endpoints.
For production Foundry solutions that access regulated data, the deployment design must include managed identity when services should authenticate without embedded keys or connection strings. It must include private networking, such as private endpoints, when resource access must be restricted away from public endpoints. The release gate should validate both controls with observable evidence, such as identity/audit logs showing managed identity token-based access and network/resource logs showing private endpoint traffic. Quality evaluators for relevance, groundedness, or safety are useful for AI behavior, but they do not prove keyless authentication or private network routing.
Topic: Implement Text Analysis Solutions
A Foundry support agent uses a text classification tool to route employee chat messages to harassment, self_harm, security_incident, or normal. Monitoring shows many false positives. In traces, the tool receives only the latest sentence and bare category names. Phrases such as “kill the process,” “this build is sick,” and “blast radius review” are routed to harmful categories. What should you change in the agent workflow?
Options:
A. Increase the classifier response token limit.
B. Pass category definitions and domain glossary into classification.
C. Lower safety thresholds for all negative messages.
D. Disable conversation memory during triage.
Best answer: B
Explanation: The trace indicates that the agent is classifying domain-specific language without enough context. Supplying clear category definitions and a domain glossary helps the text classifier distinguish technical phrases from actual harmful or sensitive content.
Poor classification in text-analysis workflows is often caused by missing context, domain language, or ambiguous labels rather than model capacity. In this case, phrases like “kill the process” and “blast radius” are normal engineering terms, but the agent receives only bare labels and the latest sentence. The workflow should ground the classification step by passing label definitions, examples, and relevant domain terminology, potentially retrieved from an approved glossary or knowledge source. This improves routing without weakening safety controls or relying on unrelated tuning changes.
Topic: Implement Generative AI and Agentic Solutions
You are implementing a Python service that calls a Microsoft Foundry agent in the production Foundry project. The agent uses a Foundry Azure AI Search tool connector for RAG over kb-index. The production security standard requires keyless access, and the connector uses managed identity uami-agent-prod. Private endpoint connectivity tests pass, and model responses work, but grounded answers fail.
Trace excerpt:
Project endpoint: foundry-prod
Agent: support-agent-prod
Tool: azure_ai_search(kb-index)
Status: 403 Forbidden
Message: Principal uami-agent-prod is not authorized to read documents
Which implementation change should you make?
Options:
A. Replace managed identity with a Search admin key.
B. Reference the base model name instead of the deployment name.
C. Switch the SDK endpoint to the dev Foundry project.
D. Grant uami-agent-prod Search Index Data Reader on kb-index.
Best answer: D
Explanation: The failure is caused by connector permissions, not by the model deployment or project endpoint. The Search tool reaches Azure AI Search but receives a 403 for the managed identity, so the identity needs read permission on the target index while preserving keyless access.
In a Foundry agent integration, the model deployment call and the tool connector call can fail independently. Here, model responses work and private connectivity passes, so the request is reaching the Search tool. The 403 message names uami-agent-prod, which is the managed identity configured on the connector. To query documents from Azure AI Search, that identity needs an appropriate data-plane role such as Search Index Data Reader on the target index or service scope.
Changing the project endpoint or model reference would not address a tool authorization failure. Using an admin key would also violate the stated keyless requirement.
Topic: Plan and Manage an Azure AI Solution
You are evaluating a Microsoft Foundry RAG assistant that answers HR policy questions by using Azure AI Search for grounding. Groundedness scores are low, but trace logs show that answers are consistent with the retrieved chunks. For failed questions, the relevant policy pages exist in the source files but are not in the top retrieved results.
Which validation should you prioritize to confirm that retrieval or indexing design is the primary fix?
Options:
A. Run a content safety evaluation on failed answers.
B. Increase the system prompt detail and retest fluency.
C. Compare answer length with the model token limit.
D. Measure retrieval recall@k against labeled relevant policy pages.
Best answer: D
Explanation: The trace evidence points to a retrieval failure, not a generation failure. If the model answers from the retrieved context but the correct source passages are missing, retrieval recall against labeled relevant content is the key validation.
For RAG grounding issues, first separate retrieval quality from generation quality. In this scenario, the source content exists, and the model is staying consistent with the retrieved chunks. The failure is that Azure AI Search is not returning the right pages in the top results. Measuring retrieval recall@k with a labeled evaluation set confirms whether chunking, metadata filtering, semantic ranking, vector strategy, or hybrid retrieval needs redesign.
Prompt tuning can improve how the model uses context, but it cannot ground answers in passages that were never retrieved.
Topic: Implement Generative AI and Agentic Solutions
A company is building a Foundry-based assistant for internal policy questions. The source documents change daily and contain confidential data. Security requires keyless private access, least-privilege document visibility, and auditable source citations. The assistant must answer from current enterprise knowledge. Which implementation should you recommend?
Options:
A. Fine-tune an LLM on the documents and update the model monthly.
B. Place the latest documents in the system prompt for every request.
C. Disable document grounding and answer only from the base model.
D. Use RAG with Azure AI Search, managed identity, role filters, and provenance metadata.
Best answer: D
Explanation: The scenario needs current enterprise knowledge, not model training. A RAG design can retrieve fresh content at query time and enforce security controls such as managed identity, private access, role-based filtering, and citations without blocking valid user questions.
RAG is the right pattern when an app must use enterprise content that changes frequently. In this case, the assistant should retrieve from an Azure AI Search index or connected knowledge source at runtime, using managed identity for keyless access and private networking where required. Role-based filters preserve each user’s document visibility, and provenance metadata enables source citations and audit review. This keeps confidential knowledge outside model weights while still grounding answers in approved content. Fine-tuning is not a substitute for secure retrieval when facts change daily.
Topic: Implement Computer Vision Solutions
An insurer is building a Microsoft Foundry claim-review agent. The agent must reason over uploaded photos and walkthrough videos by consuming structured outputs that include visual attributes, locations or timestamps, and source provenance. Which two backlog items are appropriate to implement by using Azure Content Understanding? Select TWO.
Options:
A. Translate adjuster comments from French to English before indexing.
B. Transcribe the adjuster’s spoken narration from the video audio track.
C. Index raw media files in Azure AI Search without visual extraction.
D. Generate photorealistic repair mockups from a text prompt.
E. Create timestamped records of visible leaks from walkthrough videos.
F. Extract damage cues and affected regions from roof photos as JSON.
Correct answers: E and F
Explanation: Azure Content Understanding is appropriate when an app needs structured, grounded representations of visual content for downstream reasoning. In this scenario, extracting visual attributes, regions, and timestamps from photos or videos matches that purpose.
Content Understanding fits workloads that convert unstructured visual media into structured outputs an agent or RAG workflow can reason over. For images, that can include domain-specific visual characteristics and affected regions. For video, it can include timestamped observations or segments that preserve where the evidence came from. The key requirement is not just storing media or producing a generic caption; it is creating a reliable visual representation with grounding/provenance for later reasoning.
A single-purpose translation, speech, search, or image generation capability may still be useful in the larger solution, but it does not satisfy the visual-characteristic extraction requirement.
Topic: Implement Computer Vision Solutions
A retail team is building a Microsoft Foundry app that lets designers upload a product photo, mark the product area to preserve, and request a new campaign background. The app must return an edited image, not labels, captions, or search results. Which implementation should you use?
Options:
A. Index the photo in Azure AI Search for similar-image retrieval.
B. Run Azure AI Vision image analysis to generate tags and captions.
C. Use Content Understanding to extract product fields from the image.
D. Call a Foundry image editing model with the source image, mask, and prompt.
Best answer: D
Explanation: The task requires creating or editing media, so the implementation must use an image generation or image editing model. A classification, extraction, or retrieval workflow can describe or find images, but it will not produce the requested edited campaign image.
Image generation and editing workflows in Microsoft Foundry are used when an application must create new visual content or modify an existing image. In this scenario, the product photo and mask define what should be preserved, and the prompt describes the new background to generate. That requires calling an image editing-capable model deployment through the app or Foundry SDK.
Image analysis, information extraction, and search can be useful supporting features, but they do not perform the core requested behavior: generating an edited image.
Topic: Implement Generative AI and Agentic Solutions
You are designing a Microsoft Foundry agent for insurance claims support. The agent uses a deployed GPT-4o mini model and Azure AI Search over private policy documents with managed identity. Users must resume a claim conversation from web or mobile for 30 days. The security team allows durable storage of claim state and communication preferences, but not full long-term transcripts containing PII. Which memory design is the best fit?
Options:
A. Append every prior chat turn to each prompt from the client app.
B. Store all chat transcripts as vectors in the policy search index.
C. Use Foundry agent threads and persist encrypted state summaries keyed by claim.
D. Fine-tune the deployed model monthly on resolved conversations.
Best answer: C
Explanation: The best design separates short-term conversation state from durable memory. Foundry agent threads can track the active exchange, while an encrypted application-managed summary stores only the claim state and preferences needed to resume later.
For agent continuity, use the agent thread for the current conversation and persist only the minimum durable memory required outside the model, such as claim status, unresolved questions, and approved user preferences. On resume, the app can load that summary into a new or existing thread and continue grounding policy answers with Azure AI Search. This satisfies the 30-day continuity requirement without retaining full PII-heavy transcripts. The key takeaway is to keep enterprise knowledge retrieval separate from per-user memory and avoid using model training as a substitute for conversation state.
Topic: Implement Computer Vision Solutions
You are building a Foundry agent that answers questions about uploaded equipment photos, such as “Is the oxygen valve open?” Answers must be grounded in the actual image and include the source photo and relevant region. The current RAG pipeline indexes only OCR text and file names in Azure AI Search, so visual-only details are missed. Which change should you implement?
Options:
A. Increase OCR chunk size and enable semantic ranking over file names.
B. Fine-tune a text LLM on previous image questions and keep the same text index.
C. Generate one page-level alt-text summary per photo and answer only from summaries.
D. Index visual captions, embeddings, and region metadata, then pass retrieved image crops to a multimodal model.
Best answer: D
Explanation: Visual question answering needs evidence from the image, not only nearby text. Indexing visual descriptors, embeddings, and region metadata enables retrieval of relevant visual evidence, and passing the retrieved crop or image region to a multimodal model lets the answer stay grounded.
For grounded visual Q&A, the retrieval pipeline must make image content searchable and preserve provenance. OCR helps with visible text, but questions like whether a valve is open depend on visual features. A better pipeline enriches each image with captions or visual embeddings, stores source photo and region metadata, retrieves the relevant visual evidence from Azure AI Search, and gives the multimodal model the original image or crop for final answering. The key takeaway is that text-only RAG cannot reliably answer questions that depend on visual details in the source image.
Topic: Implement Generative AI and Agentic Solutions
A Foundry project includes a support agent that calls a schedule_pickup API tool. The tool schema should require orderId, require reasonCode from an approved enum, and require pickupDate in ISO date format. Recent traces show calls that omit reasonCode or send free-text dates. You need to validate that the revised schema supports reliable tool invocation before the API runs. Which observability approach is best?
Options:
A. Monitor only API 4xx and 5xx response counts
B. Validate traced tool-call arguments against the schema
C. Track total token usage for pickup conversations
D. Score final answers with a similarity evaluator
Best answer: B
Explanation: The goal is to validate whether the agent produces tool calls that conform to required arguments and constraints. The best signal is a trace-based tool-call evaluation that checks generated arguments against the declared schema before the API executes.
For tool schema quality, observe the model’s proposed function calls and validate their arguments against the schema used by the agent. This catches missing required properties, invalid enum values, and format violations such as non-ISO dates before the downstream API is called. In Foundry agent traces, tool-call records provide the right evidence because they show the selected tool and the exact arguments generated by the model. Operational metrics such as latency, token count, or API error rate can be useful, but they do not directly prove that the tool schema is constraining invocation arguments correctly.
Topic: Plan and Manage an Azure AI Solution
Your team is deploying a claims-assistance agent in a Microsoft Foundry project. The agent must retrieve internal policy content from Azure AI Search, maintain per-case memory, and call a refund API when a refund is justified. Security policy requires no stored service keys, private connectivity to data sources and tools, least-privilege tool access, human approval before any refund call, and audit evidence that includes tool calls, grounding sources, and safety events. Which deployment configuration should you choose?
Options:
A. Use managed identity, private endpoints, RBAC-scoped tools and memory, refund approval gating, and trace logging with provenance and safety events.
B. Disable the refund tool and require operators to process all refunds outside the agent workflow.
C. Use managed identity and private endpoints, but grant broad tool access and omit trace logging for privacy.
D. Use project-stored API keys, public endpoints, prompt-only refund rules, and logging of final responses only.
Best answer: A
Explanation: The scenario requires secure deployment controls without preventing approved refund actions. Managed identity and private endpoints address keyless private access, while RBAC-scoped tools, approval gating, trace logging, provenance, and safety events support governance and auditability.
For a Foundry agent that can use enterprise data and a high-impact action tool, the deployment should combine identity, network, tool, oversight, and monitoring controls. Managed identity avoids stored service keys. Private endpoints keep traffic to Azure AI Search, memory storage, and tools off public paths. Role-based tool access limits the agent to only the resources it needs. A human approval gate allows legitimate refund use while preventing autonomous high-risk actions. Trace logging with provenance and safety events gives auditors evidence of prompts, retrieval grounding, tool calls, and policy enforcement. The key takeaway is to control risky tool use with approval and observability, not by relying only on prompts or disabling the workflow.
Topic: Implement Generative AI and Agentic Solutions
A finance team is building a Foundry agent that reconciles invoices by using retrieval over purchase orders and then calling an ERP tool named createPayment. Payments above a defined limit or payments involving changed bank details must be reviewed because they have financial and compliance impact. Which workflow should you implement?
Options:
A. Use only a system prompt that tells the agent to be careful.
B. Require approval before createPayment runs for flagged payments.
C. Run createPayment first and notify approvers afterward.
D. Route only ERP tool failures to a reviewer.
Best answer: B
Explanation: High-impact agent actions need a human-in-the-loop approval control before the side-effecting tool executes. The agent can still prepare the payment request, but the reviewed arguments and evidence should be approved before calling the ERP payment tool.
For financial, operational, privacy, or compliance-impacting actions, the approval workflow should gate the tool invocation, not just the conversation. In this scenario, the agent can retrieve purchase order evidence, draft the createPayment arguments, and present the payment summary for review. The workflow should execute the ERP tool only after an authorized approver accepts the proposed action, and the approval decision should be traceable for monitoring and audit. Post-action notifications or prompts alone do not reliably prevent an irreversible or regulated action.
Topic: Implement Text Analysis Solutions
You are building a Foundry app that triages customer emails in French, Spanish, and Japanese. The Azure AI Content Safety text check supports these source languages. The validated extraction and summarization prompts use English-only examples and return canonical English labels. Policy requires unsafe user content to be blocked before any transformation or domain LLM call. Which TWO ordering choices should you implement?
Options:
A. Summarize first, then scan the summary for safety.
B. Translate every message to English before safety detection.
C. Extract fields in the source language, then translate values.
D. Translate safe non-English messages before extraction and summarization.
E. Translate canonical labels before downstream routing.
F. Run safety detection on the original message first.
Correct answers: D and F
Explanation: The pipeline must preserve the safety policy first, then satisfy the language assumptions of the domain prompts. Because safety detection supports the incoming languages and must happen before transformations, scan the original text first. After content is allowed, translate non-English input to English for the validated extraction and summarization prompts.
Translation order depends on which component has the stricter language or governance requirement. Here, safety detection is both multilingual for the workload and required before any transformation, so it should run against the original user message. Once the message is allowed, translation to English is appropriate because the extraction and summarization prompts were validated only in English and produce canonical English labels. This reduces prompt drift and keeps downstream routing consistent. The key distinction is that safety policy controls should not be delayed by translation when the safety detector can process the source language.
Topic: Implement Information Extraction Solutions
You are building a Foundry agent workflow that reviews scanned supplier contracts. The reviewer agent must reason over clause order, headings, tables, and page provenance before it can invoke an approval tool. You need the preprocessing step to create a grounded artifact for downstream reasoning. Which implementation should you use?
Options:
A. Embed raw OCR text chunks and ignore layout structure.
B. Have the approval tool parse the original PDF.
C. Ask the model to summarize the PDF into Markdown.
D. Call a Content Understanding analyzer and pass its Markdown artifact.
Best answer: D
Explanation: The workflow needs a document-derived artifact that preserves layout and can be consumed by the reasoning agent. A Content Understanding analyzer that outputs Markdown is the appropriate preprocessing step because it keeps headings, tables, order, and provenance available before tool approval decisions.
For document extraction workflows, Content Understanding analyzers can produce clean, layout-aware representations such as Markdown for downstream reasoning. In this scenario, the agent should not reason directly from an unstructured PDF or a lossy summary. The analyzer output becomes the grounded intermediate artifact that the reviewer agent can use to inspect clauses, follow table context, and retain source/page metadata before invoking an approval tool.
The key takeaway is to generate Markdown from the source document before agent reasoning, not after a model has already summarized or discarded structure.
Topic: Implement Computer Vision Solutions
You are implementing a product-photo editing workflow by using the Foundry SDK. An edit request uses a deployed image-editing model, a private source image, and a mask. The request fails before generation.
Prompt safety: passed
Reference media fetch: 200 OK
Source image: 1024 x 1024 PNG
Mask image: 768 x 768 PNG
Model capability: image editing with masks
Error: Invalid mask geometry
Which implementation change should you make?
Options:
A. Rewrite the prompt to remove unsafe content.
B. Regenerate the mask at 1024 x 1024 pixels.
C. Switch to a text-to-image-only deployment.
D. Move the source image to a public URL.
Best answer: B
Explanation: The failure is caused by the mask, not by the prompt, reference media, policy, or model choice. For masked image editing, the mask must align with the source image geometry so the model can identify the exact region to edit.
In an image editing workflow, validation can fail before generation if the provided mask is incompatible with the source image. The exhibit shows the prompt passed safety checks, the reference media was fetched successfully, and the deployed model supports masked editing. The only failing evidence is Invalid mask geometry, with a 768 x 768 mask for a 1024 x 1024 source image. Regenerating the mask at the same dimensions as the source preserves the intended private, edit-capable workflow.
Changing the prompt, media location, or model would not address the geometry mismatch shown in the failure details.
Topic: Implement Information Extraction Solutions
An insurance company is building a RAG-backed Foundry agent for claims specialists. Source content includes 80 GB of PDF manuals in Azure Storage, scanned claim forms, and SharePoint policy pages that change throughout the day. The agent must cite page or section sources and enforce each user’s document permissions at retrieval time. Newly changed content must be searchable within 30 minutes. Which ingestion approach should you choose?
Options:
A. Index whole-file embeddings with a nightly full refresh.
B. Use incremental Azure AI Search indexing with OCR-enriched chunks, source metadata, and ACL filters.
C. Load retrieved source files into the model prompt for filtering.
D. Create public summaries and index only the summaries.
Best answer: B
Explanation: The best ingestion design uses Azure AI Search with incremental indexing, OCR enrichment, chunking, and metadata needed for citations and security trimming. This matches the content types, 30-minute freshness requirement, large corpus size, and downstream grounding needs.
For grounded RAG, ingestion must prepare retrievable units that preserve meaning, source provenance, freshness, and access control. OCR enrichment makes scanned forms searchable, chunking creates passages suitable for retrieval, and source metadata supports page or section citations. Incremental indexing avoids reprocessing the full 80 GB corpus while meeting the 30-minute freshness goal. ACL metadata enables retrieval-time filtering so the Foundry agent only grounds answers in documents the user is allowed to access. Whole-document embeddings or prompt-time filtering do not provide the same grounding quality, freshness, or security guarantees.
Topic: Implement Information Extraction Solutions
A financial services team is building a Microsoft Foundry agent to review scanned onboarding packets. The packets contain PII, and policy permits processing them only if the solution uses keyless access, private network paths, clean extracted fields and layout-aware markdown, and source provenance for each recommendation. The team must not block valid packets solely because they contain PII.
Which implementation should you choose?
Options:
A. Pass raw PDFs to the agent with safety filters and trace logging.
B. Reject packets when PII is detected before agent processing.
C. Use Content Understanding extraction with managed identity and provenance outputs.
D. Use a stored storage key so the agent can read raw PDFs.
Best answer: C
Explanation: The requirement is not just to moderate unsafe content; it is to transform scanned documents into governed, traceable representations before the agent uses them. Content Understanding extraction with managed identity supports clean fields, layout-aware markdown, and provenance without blocking legitimate PII-containing packets.
For document extraction scenarios, an agent should not reason directly over raw unstructured scans when the business requires trusted fields, layout, markdown, or citations. Use Azure Content Understanding or document extraction capabilities first, secure the data path with managed identity and private networking, and pass the agent only the extracted representation plus provenance metadata. This lets the agent summarize or recommend using auditable source spans while role policies and keyless access reduce credential risk.
Safety filters and trace logs are still useful, but they do not replace OCR, layout analysis, field extraction, or provenance generation.
Topic: Implement Text Analysis Solutions
A Microsoft Foundry agent triages maintenance voice notes. The current flow transcribes each audio file, summarizes the transcript, and uses the summary to query Azure AI Search for repair manuals.
Technicians report irrelevant answers when they say, “the pump makes this sound,” and then record 15 seconds of abnormal noise. Trace logs show accurate speech transcription, but the noise segment appears as [non-speech audio]. The agent must reason over both the spoken description and the recorded sound.
What is the best next fix?
Options:
A. Improve the text summarization prompt for the transcript.
B. Increase the number of Azure AI Search results returned.
C. Translate the transcript before querying the search index.
D. Route the original audio to a multimodal reasoning workflow.
Best answer: D
Explanation: The problem is not inaccurate transcription or weak search recall. The workflow loses the diagnostic pump sound because it converts the input to text and treats non-speech audio as unavailable content. A multimodal reasoning workflow can use the raw audio signal with the spoken context.
For speech-enabled agents, choose transcription when the relevant information is spoken words, translation when language conversion is needed, summarization when a long transcript must be condensed, and multimodal reasoning when the audio itself contains information the model must interpret. In this scenario, the transcript is accurate, but the decisive evidence is an abnormal non-speech sound. Summarizing or retrieving from the text transcript cannot recover information that was discarded before reasoning.
The key takeaway is to preserve and route the modality that contains the signal needed for the task.
Topic: Plan and Manage an Azure AI Solution
Your team is building an internal policy assistant in a Microsoft Foundry project. The agent is deployed to Azure Container Apps by CI/CD, calls a Foundry model deployment, and queries an Azure AI Search index for grounding. All traffic to Foundry and Search must use private endpoints, and security forbids API keys, connection strings, and service principal secrets. RBAC assignments must survive container app redeployments and blue/green replacements. Which identity design is the best fit?
Options:
A. Use a system-assigned managed identity with Contributor on the resource group.
B. Store Foundry and Search keys in Key Vault and inject them at deployment.
C. Use a service principal secret behind an API gateway to broker access.
D. Use a user-assigned managed identity with keyless SDK authentication and least-privilege RBAC.
Best answer: D
Explanation: A user-assigned managed identity is the best fit when the app identity must remain stable across redeployments. The agent can use Azure SDK keyless authentication over private endpoints, with RBAC scoped only to the Foundry and Azure AI Search resources it needs.
Managed identity removes stored credentials from the app and CI/CD pipeline. In this scenario, the redeployment constraint makes a user-assigned managed identity preferable because its principal is independent of the container app lifecycle. Assign only the required data-plane roles for model invocation and search queries, and keep network access restricted through private endpoints. This satisfies keyless authentication, least privilege, private networking, and stable RBAC without adding a custom credential broker.
Topic: Implement Information Extraction Solutions
Your team is building a Foundry RAG assistant for contracts stored as PDF, TIFF, and DOCX files in Azure Blob Storage. Many PDFs are scanned images with no embedded text. The ingestion flow must populate an Azure AI Search-backed RAG index with searchable chunks from both native document text and image-based text, including source/page metadata for citations. Which TWO configurations can meet the requirement? Select TWO.
Options:
A. Store scanned pages in Foundry agent memory and skip ingestion OCR.
B. Run a Content Understanding OCR/layout analyzer, then index chunked page text.
C. Use an Azure AI Search indexer skillset with normalized-image OCR and chunk projection.
D. Apply semantic ranking to native extracted text without OCR.
E. Vectorize only file names and blob metadata.
F. Use image classification tags as the only text for scanned pages.
Correct answers: B and C
Explanation: Scanned pages must be converted to text during ingestion before retrieval can use them. Both Azure AI Search OCR enrichment and Content Understanding OCR/layout analysis can produce text that is chunked, indexed, and tied to source/page metadata for grounded RAG citations.
RAG ingestion for image-only documents needs an extraction stage before retrieval indexing. Azure AI Search can handle this in an indexer skillset by creating normalized page images, applying OCR, and projecting the resulting text into chunks with metadata. Alternatively, a Content Understanding analyzer configured for OCR/layout can produce page-level text or markdown before those chunks are embedded and indexed in Azure AI Search. The key is that text from scanned content must exist in the index as retrievable content; semantic ranking and vector search improve retrieval over indexed content but do not extract text from pixels.
Topic: Implement Generative AI and Agentic Solutions
A support agent in a Foundry project uses a Foundry connector to retrieve policy articles from Azure AI Search before generating answers. After release, users report that some answers cite articles that do not support the response. Which monitoring approach best validates the quality goal?
Options:
A. Add tests for exact SDK method names
B. Track only model token usage per conversation
C. Trace retrieval spans and run groundedness evaluations
D. Log only successful connector HTTP status codes
Best answer: C
Explanation: The quality issue is grounding, not SDK syntax or basic connectivity. End-to-end traces show which content the connector retrieved, and groundedness evaluations measure whether responses are supported by that retrieved evidence.
For a RAG-backed agent, observability should connect the generated answer to the retrieval evidence used in the same run. Trace logging can capture the Azure AI Search connector call, retrieved chunks, citations, latency, and tool spans. Groundedness or relevance evaluations can then score whether the final answer is supported by those retrieved sources. This targets the stated failure: citations that do not support the response. Token totals, HTTP success codes, and SDK method tests can be useful in other contexts, but they do not validate answer support against retrieved content.
Topic: Implement Generative AI and Agentic Solutions
You are building a Microsoft Foundry agent for employee reimbursements. The agent can review receipts and call a finance connector that issues payments. Company policy requires reimbursements over USD 1,000 to be approved by a manager, and every payment must have an audit record before the connector is called. Which implementation should you use?
Options:
A. Use workflow steps for validation, approval, audit, and payment.
B. Issue payments immediately and flag exceptions in monitoring.
C. Let the agent call the connector after retrieving policy text.
D. Add the policy and payment rules to the system prompt.
Best answer: A
Explanation: Business-critical actions should be controlled by explicit workflow steps, not hidden inside a prompt. In this scenario, the payment connector must run only after deterministic validation, manager approval, and audit recording.
For agentic solutions in Microsoft Foundry, prompts can guide reasoning, but they should not be the only control for regulated or high-impact business actions. A reimbursement payment is an external side effect, so the flow should separate the model’s recommendation from the execution path. The workflow should validate the amount, route over-limit requests to an approval step, write the audit record, and only then allow the payment tool or connector to run. This makes the process inspectable, testable, and enforceable even if the model output is incomplete or ambiguous. The key takeaway is to place critical gates in the workflow/tool orchestration layer rather than relying on natural-language instructions alone.
Topic: Implement Information Extraction Solutions
A team is building a RAG-backed troubleshooting agent in a Microsoft Foundry project. The source articles have already been extracted and chunked. User queries often describe symptoms in different words than the articles, and the main requirement is to retrieve chunks with the closest embedding similarity for grounding. Which retrieval approach should the team implement?
Options:
A. Store chunk embeddings and query with vector search.
B. Enable OCR enrichment on the indexed articles.
C. Use keyword search with a synonym map.
D. Use semantic ranking over text-only fields.
Best answer: A
Explanation: The deciding requirement is similarity over embeddings, not text extraction or exact keyword matching. Azure AI Search vector search supports nearest-neighbor retrieval against stored chunk vectors, which is the appropriate grounding method for semantically similar wording.
Vector search is the right retrieval pattern when the application must compare embedding vectors for semantic similarity. In this scenario, the articles are already extracted and chunked, so the missing capability is not OCR or enrichment. The index should include a vector field containing embeddings for each chunk, and the RAG pipeline should embed the user query and use vector search to retrieve the nearest chunks for grounding. Semantic ranking and keyword techniques can improve some text retrieval scenarios, but they do not directly satisfy a requirement to rank by embedding similarity.
Topic: Implement Generative AI and Agentic Solutions
A team deployed a multi-agent claims assistant from a Microsoft Foundry project. The assistant uses Azure AI Search retrieval, a policy-check tool, and a human-approval tool before final responses. Managers need weekly error analysis that separates missing grounding, failed tool calls, safety blocks, and slow approval steps. Which monitoring approach should you implement?
Options:
A. Track only model token usage and aggregate request latency.
B. Capture Foundry agent traces and run evaluators on grounding, tools, safety, and latency spans.
C. Store final chat transcripts and manually review only low-rated sessions.
D. Evaluate the base model with a static prompt set outside the agent.
Best answer: B
Explanation: The goal is step-level observability for a deployed agent workflow. Foundry agent traces plus evaluators can show where failures occur across retrieval, tool invocation, safety handling, and human-approval latency instead of only measuring the final response.
For deployed agents, monitoring should preserve the execution path as trace spans: model calls, retrieval calls, tool invocations, safety decisions, and human-approval steps. Running evaluators over those captured runs lets the team score qualities such as groundedness, tool-call success, safety handling, and latency by component. This supports weekly error analysis because each bad outcome can be tied to a specific span or evaluator result. Aggregate model metrics are useful, but they cannot explain whether the agent failed because retrieval was weak, a tool call failed, or an approval step was slow.
Topic: Plan and Manage an Azure AI Solution
You are troubleshooting a Microsoft Foundry project agent for HR policy questions. Users report confident but outdated answers with no citations.
Exhibit: Trace summary
Requirement: answer only from current SharePoint policies and cite the policy URL
Agent trace: response used conversation memory; no retrieval or tool call occurred
Knowledge sources: none configured
Memory: enabled for user preferences and prior chats
Index status: SharePoint policy index in Azure AI Search is healthy with URL metadata
Which next fix best addresses the root cause?
Options:
A. Fine-tune the model daily on HR policies.
B. Store summarized HR policies in conversation memory.
C. Increase the model’s maximum output tokens.
D. Connect the Azure AI Search index as the agent’s retrieval source.
Best answer: D
Explanation: The issue is not index health; the index is healthy but unused. For grounded answers from enterprise content, the agent needs a retrieval or knowledge integration, such as an Azure AI Search-backed source, rather than relying on conversation memory.
Grounded enterprise answers require an authoritative retrieval path from the agent to the enterprise content. In this trace, the SharePoint content has already been indexed and includes URL metadata, but the agent has no knowledge source configured and no retrieval call occurs. Conversation memory is appropriate for prior interaction context or preferences, not for serving as the authoritative store for changing HR policies. Connecting the Azure AI Search index lets the agent retrieve current policy passages and use their metadata for citations.
The key takeaway is to fix the knowledge integration pattern, not the model generation settings.
Topic: Implement Text Analysis Solutions
A Foundry project app extracts customer, entities, topics, and summary from service-case notes into a JSON record. After deployment, 18% of runs fail schema validation because topics is sometimes a comma-separated string and entities sometimes contains prose.
Trace excerpt:
Prompt: Return valid JSON that matches this shape...
Enabled tools: none
Configured tool: emit_case_json (JSON schema)
Tool calls: 0
Parser error: $.topics expected array
What is the best next fix?
Options:
A. Add Azure AI Search grounding for case notes
B. Lower temperature and keep prompt-only JSON output
C. Enable and require the emit_case_json tool call
D. Increase the model deployment token limit
Best answer: C
Explanation: The failure is a structured-output enforcement issue, not a retrieval or capacity issue. The trace shows a JSON schema tool is configured but no tools are enabled and no tool call occurs, so the model is only following a prompt convention rather than a schema-bound contract.
For entity, topic, summary, and structured JSON extraction, a Foundry Tool with a JSON schema can make the model emit arguments that conform to the expected fields and types. The evidence shows prompt-only extraction: no enabled tools and zero tool calls. Prompt instructions such as “return valid JSON” are useful, but they do not reliably enforce arrays, objects, or required fields across all inputs. Enabling the schema-backed tool and requiring the extraction step to call it provides a stronger contract for downstream validation.
The key troubleshooting signal is the mismatch between a configured schema tool and the trace showing it was never available to the run.
Topic: Implement Generative AI and Agentic Solutions
You are building a Microsoft Foundry HR policy assistant that uses RAG with an Azure AI Search index of approved policy documents. Testers report that the assistant sometimes gives confident answers that are not supported by the indexed documents. You need to reduce fabrication risk and make policy claims traceable to approved sources. Which TWO actions should you implement?
Options:
A. Require response citations that reference retrieved source metadata.
B. Fine-tune the model on historical HR chat transcripts.
C. Run Azure AI Content Safety moderation on every response.
D. Increase the generation temperature for more varied wording.
E. Use conversation memory as the primary source for policy details.
F. Pass retrieved document chunks into the model prompt as grounding context.
Correct answers: A and F
Explanation: RAG reduces fabrication by grounding generation in retrieved content from authoritative sources. Passing retrieved chunks into the prompt and requiring citations from retrieved metadata help ensure answers are source-backed and traceable.
For a RAG-backed Foundry app, the model should not answer policy questions only from its pretrained knowledge or conversation history. The app should retrieve relevant chunks from Azure AI Search, provide those chunks as grounding context, and instruct the model to answer only when the retrieved context supports the claim. Source metadata, such as document name, URL, page, or chunk ID, should be preserved so the response can cite the retrieved evidence. This does not guarantee correctness by itself, but it directly reduces unsupported fabrication and gives users provenance for validation. Moderation, fine-tuning, and memory can be useful in other parts of an AI solution, but they do not replace grounded retrieval and citations for authoritative policy answers.
Topic: Implement Information Extraction Solutions
You are designing a Microsoft Foundry project for compliance review. Azure Content Understanding extracts contract fields and markdown sections into an Azure AI Search index. The solution has these requirements: the batch validation workflow must always retrieve the current policy section before calling a model and record source IDs; a reviewer-facing agent must decide during multi-turn chats when policy search is needed before using other tools.
Which TWO design choices should you implement?
Options:
A. Embed the Azure AI Search query in the batch workflow before the model call.
B. Expose Azure AI Search as a retrieval tool for the reviewer agent.
C. Let the agent read raw PDFs directly instead of using the index.
D. Run a fixed retrieval query before every reviewer-agent turn.
E. Expose batch validation retrieval only as an optional agent tool.
F. Fine-tune the model on extracted contracts instead of using retrieval.
Correct answers: A and B
Explanation: Use an agent retrieval tool when the agent needs discretion about whether and when to search during a conversation. Embed retrieval in the application workflow when retrieval is mandatory, deterministic, and must produce auditable source IDs before model invocation.
The placement of retrieval depends on control. For the batch validation workflow, retrieval is a required processing step, so the application should call Azure AI Search directly, capture the selected chunks and source IDs, and pass that grounded context to the model. For the reviewer-facing experience, the agent is handling open-ended multi-turn questions and may need to search only when relevant before using other tools. That retrieval should be exposed as an agent tool with an appropriate description and access controls. The key distinction is deterministic orchestration by the app versus discretionary tool use by the agent.
Topic: Plan and Manage an Azure AI Solution
You manage a Foundry project for an HR knowledge assistant grounded by Azure AI Search. Evaluation shows the assistant often cites retired benefits policies even though the current PDFs are also indexed. The retrieval trace shows the old and current chunks have similar embeddings, and no version or effective-date fields are used during retrieval.
What should you change first to improve grounding quality?
Options:
A. Lower the model temperature for all assistant responses.
B. Add conversation memory for recent HR topics.
C. Fine-tune the model on the latest policy PDFs.
D. Add version metadata and apply freshness-aware retrieval filtering.
Best answer: D
Explanation: This is a grounding-quality problem caused by retrieval design, not by generation style or memory. Because old and current chunks are both indexed and semantically similar, the retrieval layer needs metadata and filtering or scoring that favors the currently effective policy versions.
For RAG solutions, low grounding quality is often fixed at the retrieval and indexing layer when the wrong sources are being selected before the model generates an answer. In this case, Azure AI Search is returning retired and current policy chunks as near matches, and the system has no indexed fields to distinguish active content from archived content. Adding version, status, and effective-date metadata during ingestion, then using filters or scoring profiles at query time, gives the assistant a reliable way to ground answers in current policy unless the user explicitly requests archives.
Changing model behavior can make responses sound different, but it cannot reliably correct retrieval results that contain the wrong source set.
Topic: Implement Generative AI and Agentic Solutions
You are building an expense-policy agent in a Microsoft Foundry project. The current design sends every user turn to a high-capability LLM, causing high cost and p95 latency. Requirements: simple FAQ answers must be fast, reimbursement approvals must follow deterministic policy rules, and complex exceptions must still use the high-capability LLM. Which TWO actions should you take? (Select TWO.)
Options:
A. Route by task type to small or high-capability models.
B. Run all models sequentially and vote on every response.
C. Use Azure AI Search semantic ranking as the orchestrator.
D. Execute approval decisions in a rules engine tool.
E. Use the high-capability LLM for every request.
F. Put all approval policy rules only in the prompt.
Correct answers: A and D
Explanation: The best design uses orchestration to match each task to the lowest-cost component that still meets quality and governance needs. Simple requests can use smaller models, complex exceptions can use the high-capability LLM, and deterministic approvals should be handled by rules rather than prompts alone.
The core concept is task-fit orchestration across models and non-LLM components. In a Foundry agent or flow, a router can classify the request and send simple FAQ tasks to a smaller, faster model while escalating ambiguous or complex exceptions to the high-capability LLM. For reimbursement approval, the agent should call a deterministic rules engine tool so the decision follows policy consistently and can be audited. The LLM can still summarize or explain the decision, but it should not be the source of truth for approval logic. This design optimizes cost, latency, and quality without weakening policy enforcement.
Topic: Implement Generative AI and Agentic Solutions
Contoso is building a claims-support agent in a Microsoft Foundry project. The agent uses a deployed chat model, managed identity to query a private Azure AI Search index over policy PDFs, and tool calls for claim status. A release pipeline must block promotion when the agent gives incorrect benefit guidance, unsupported citations, or unsafe financial/legal recommendations. Operations already tracks token usage and p95 latency separately. Which evaluation architecture is the best fit?
Options:
A. Select the model deployment with the lowest cost per completed answer.
B. Gate CI/CD on average token count, endpoint cost, and p95 latency.
C. Gate CI/CD on Foundry quality, groundedness, citation, and safety evaluations.
D. Approve releases after manual spot checks of production chat transcripts.
Best answer: C
Explanation: The scenario asks whether the agent is correct, grounded in retrieved policy content, and safe. The evaluation plan should use Foundry app/output evaluations with curated test prompts, expected answers or rubrics, source evidence, and safety cases, then gate CI/CD on those quality signals.
For generative AI and agent evaluations, the metric set must match the release risk. Here, the risk is not performance or spend; it is whether the claims agent gives correct guidance, cites retrieved policy evidence, and avoids unsafe recommendations. A best-fit Azure-native design uses Foundry evaluations and traces over a representative test set, including expected outcomes, retrieved context, citation checks, and safety/adversarial prompts. Those results can be used as a release gate in CI/CD while operational telemetry continues to track latency and token usage separately. Cost and latency are valid operational metrics, but they do not prove correctness, groundedness, or safety.
Topic: Plan and Manage an Azure AI Solution
A bank is piloting a Microsoft Foundry agent for loan-servicing staff. The agent can retrieve policy documents and invoke a payment-deferral tool. Governance requires the tool to run only for authorized staff, and deferrals over 30 days must pause for human approval before execution. The compliance lead asks for evidence that the deployed agent is enforcing these oversight and tool-access controls during real conversations. What should you monitor?
Options:
A. Retrieval relevance scores for policy documents
B. Average response latency and token usage by conversation
C. User satisfaction ratings after each conversation
D. Agent traces with tool-call, authorization, and approval events
Best answer: D
Explanation: The stated goal is to validate governance enforcement, not general quality or performance. Agent trace logging is the best evidence because it records runtime tool attempts, authorization outcomes, approval handoffs, and execution results for each conversation.
For agent governance, the key observability signal is a per-run trace or audit trail that captures the agent’s decisions around tools and oversight. In this scenario, compliance needs to know whether unauthorized users were blocked and whether long deferrals paused for human approval before the payment-deferral tool executed. Aggregate metrics such as latency, token consumption, relevance, or satisfaction can be useful, but they do not prove that the agent enforced role-based tool access or human-in-the-loop approval requirements. The key takeaway is to monitor the control points the policy governs: tool invocation, authorization, approval, and execution.
Topic: Implement Computer Vision Solutions
You are building a Foundry agent that answers questions about store audit photos. The current retrieval index contains only file names and manual titles, so answers are not grounded when users ask about visual details such as damaged packaging, warning labels, object location, or dominant colors. You need the agent to retrieve fresh photo evidence and cite the source image. What should you implement?
Options:
A. Enable OCR only and index the recognized text from each photo.
B. Fine-tune the model on previous store-audit answer transcripts.
C. Add upload-date metadata filters to the existing title-only index.
D. Configure a Content Understanding image analyzer and index its extracted visual fields.
Best answer: D
Explanation: The grounding gap is that the retrieval source lacks visual characteristics. Configuring Azure Content Understanding in Foundry Tools to extract objects, regions, labels, colors, and related visual fields creates searchable evidence that the agent can retrieve and cite.
For visual-understanding grounding, the ingestion pipeline must convert each image into reliable, structured evidence before retrieval. Azure Content Understanding in Foundry Tools is designed to analyze visual content and produce extracted characteristics that can be stored with the source image ID, timestamp, and citation metadata. Those extracted fields can then be indexed for retrieval so the agent grounds answers in current photo evidence instead of filenames or manual titles. OCR can help when text is visible, but it does not cover non-text visual attributes such as damage, object placement, or color.
Topic: Plan and Manage an Azure AI Solution
A company is designing Microsoft Foundry infrastructure for an internal policy assistant. Approved policy files are published throughout the day to a private Azure Storage account, and many files are scanned PDFs. The agent must ground answers in the latest approved content, return citations, and restrict retrieval by business unit. Which design should you use?
Options:
A. Cache policy summaries in conversation memory and filter them by user profile.
B. Load the PDFs into the agent prompt and rely on system instructions for citations.
C. Fine-tune the model on the policy files and redeploy it after each publishing cycle.
D. Connect Azure AI Search with OCR enrichment, hybrid/vector indexing, citations, and filterable business-unit metadata.
Best answer: D
Explanation: The requirement is a RAG infrastructure decision, not a model-training decision. Azure AI Search is the appropriate grounding layer because it can index enriched content from documents, support hybrid/vector retrieval, preserve citation metadata, and filter results by business unit.
For a Foundry-based grounded assistant, the infrastructure should separate approved source storage from the searchable grounding index. Azure AI Search can ingest approved files, use enrichment such as OCR for scanned PDFs, create searchable chunks and embeddings, and store provenance fields for citations. Filterable metadata such as business unit enables retrieval-time constraints before content is supplied to the agent. This design keeps answers tied to current indexed sources instead of relying on static model knowledge or fragile prompt-only controls.
Topic: Implement Information Extraction Solutions
You are building a Foundry agent that reviews uploaded supplier invoices and routes exceptions to an approval workflow. The uploads are scanned PDFs and phone images. The agent must reason over normalized invoice fields, table layout, and a markdown representation with provenance for each flagged discrepancy.
Which workflow should you implement?
Options:
A. Attach the raw files to conversation memory.
B. Let the agent retrieve raw PDF chunks only.
C. Prompt the agent with base64 file contents.
D. Run a Content Understanding analyzer before the agent.
Best answer: D
Explanation: The scenario requires clean extracted fields, layout, markdown, and provenance, not just access to raw documents. The agent should receive a structured representation produced by an extraction step, such as a Content Understanding analyzer, before making routing decisions.
For document extraction workflows, use OCR, layout, field extraction, and Content Understanding analyzers to transform scanned or image-based documents into grounded representations the agent can safely consume. In this scenario, the agent’s role is to review extracted invoice facts and route exceptions, while the analyzer’s role is to produce normalized fields, table structure, markdown output, and provenance. Passing raw PDFs, images, or encoded file contents directly to the agent makes extraction inconsistent and weakens traceability. Retrieval can help find content, but it does not replace a document-understanding step when structured fields and layout are required.
Use the AI-103 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try AI-103 on Web View AI-103 Practice Test
Read the AI-103 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.