AI-103: Plan and Manage an Azure AI Solution

May 1, 2026

Try 10 focused AI-103 questions on Plan and Manage an Azure AI Solution, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AI-103 on Web View full AI-103 practice page

Topic snapshot

Field	Detail
Exam route	AI-103
Topic area	Plan and Manage an Azure AI Solution
Blueprint weight	27%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Plan and Manage an Azure AI Solution for AI-103. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 27% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Plan and Manage an Azure AI Solution

Your team is building a procurement assistant in a Microsoft Foundry project. Users ask in natural language to buy equipment. The assistant must retrieve current purchasing policy from Azure AI Search, decide whether to call private inventory and vendor APIs by using managed identity, keep conversation context while asking follow-up questions, and require manager approval for orders over 5,000 USD. Tool-call traces must be available. Which design is the best fit?

Options:

A. Call a single Foundry Tool for the inventory API.
B. Use a fixed workflow for every retrieve-check-create sequence.
C. Build a Foundry agent with Search grounding, managed-identity tools, approval gating, and tracing.
D. Deploy a chat model and prompt it with policy excerpts.

Best answer: C

Explanation: This scenario needs an agent-based solution, not just model inference, a standalone tool, or a fixed workflow. The assistant must reason over a user goal, maintain state, select tools dynamically, retrieve policy grounding, and enforce approval before taking action.

Use a Foundry agent when the solution must handle open-ended user goals, maintain conversational state, choose among tools, and coordinate actions under policy controls. A deployed model is appropriate for a single inference task, a Foundry Tool provides a specific capability, and a workflow is best when the sequence is predictable. Here, the procurement assistant must combine retrieval, private API tool calls, follow-up questions, managed-identity access, approval gating, and trace logging. Those requirements match an agent with scoped tools and observability rather than a simple prompt or a fixed process.

The key distinction is autonomy within controlled boundaries: the agent can decide what to do next, while the architecture still constrains tools, identity, approvals, and monitoring.

Direct model call fails because it does not provide controlled private API tool use, approval gating, or reliable action tracing.
Single tool fails because inventory access alone does not handle retrieval, conversation context, vendor decisions, or approvals.
Fixed workflow fails because the assistant must make dynamic decisions and ask follow-up questions, not always run the same sequence.

Question 2

Topic: Plan and Manage an Azure AI Solution

A company has separate dev, test, and prod Microsoft Foundry projects. The prod agent must use only the prod model endpoint, prod agent tools, prod Azure AI Search index, and prod storage account. You need an observability design that validates this isolation and alerts on cross-environment access. What should you implement?

Options:

A. Trace runs with resource IDs and alert on non-prod targets
B. Track average token usage by model deployment
C. Monitor storage capacity and search index document counts
D. Run groundedness evaluations on sampled prod answers

Best answer: A

Explanation: The goal is to validate environment isolation, not just model quality or service health. Trace logging should correlate each prod agent run with the actual model endpoint, tool calls, search index, storage account, and identity used, then alert when any resource falls outside the prod allowlist.

For isolated Foundry deployments, observability should prove which resources an agent actually used during each run. The strongest design is trace logging with environment labels, managed identity, and Azure resource identifiers for model calls, tool invocations, retrieval calls, and storage access. Alerts can compare those identifiers to a prod allowlist to detect accidental cross-environment routing. Quality evaluators and usage metrics are useful, but they do not validate that the deployment boundary is being enforced.

Token analytics helps with cost and performance trends, but it does not show whether the agent called a dev tool or index.
Groundedness evaluation measures answer quality against retrieved evidence, not deployment isolation.
Capacity monitoring can show resource usage, but it will not reliably identify cross-environment agent access.

Question 3

Topic: Plan and Manage an Azure AI Solution

A team is reducing the cost of a Microsoft Foundry RAG-based compliance assistant. The proposed changes use a smaller model and reduce retrieval depth. The assistant must still provide grounded answers with citations, apply safety filters, and retain evidence for audit review. Which observability approach should the team use before approving the change?

Options:

A. Disable detailed trace logging and sample a few chat transcripts manually.
B. Compare baseline and optimized runs with Foundry evaluations for groundedness, relevance, safety, citations, traces, and token cost.
C. Monitor only Azure AI Search query volume and index storage size.
D. Approve the change if average token usage and latency decrease in production.

Best answer: B

Explanation: Cost optimization should be evaluated against the scenario’s non-negotiable quality and governance requirements. A baseline-versus-optimized Foundry evaluation with groundedness, relevance, safety, citation, trace, and token-cost signals shows whether the savings weaken required controls.

The core concept is cost-aware evaluation without sacrificing grounding, safety, or auditability. For a RAG assistant, reducing model size or retrieval depth can lower cost but may reduce answer relevance, citation coverage, or groundedness. The team should compare the current and optimized versions using representative test conversations and evaluators, while preserving traces and provenance metadata for review. Token and latency metrics are useful, but they are not enough unless paired with quality and safety measurements.

The key takeaway is that cost metrics should be interpreted alongside grounding, safety, and governance evidence, not used as a replacement for them.

Cost-only monitoring fails because lower token usage and latency do not prove the assistant remains grounded or safe.
Manual sampling is too weak and disabling traces removes audit evidence required by the scenario.
Search resource metrics show infrastructure footprint but do not validate answer quality, citations, or safety behavior.

Question 4

Topic: Plan and Manage an Azure AI Solution

A team deploys the same Python agent app to dev and prod by using CI/CD. Each environment has a separate Microsoft Foundry project and a separately approved chat model deployment. The app runs in Azure Container Apps with a managed identity; stored API keys and hard-coded project IDs are prohibited. The pipeline can provide environment variables at release time. Which implementation connects the app to the correct project and deployment?

Options:

A. Read the Azure AI Search endpoint; use it as the project context.
B. Read the subscription details; discover and use the newest deployment at startup.
C. Read the project endpoint and deployment name; authenticate with DefaultAzureCredential.
D. Read the account endpoint and model catalog ID; authenticate with an API key.

Best answer: C

Explanation: The app must be configured with the target Foundry project context and the model deployment context. Supplying the project endpoint and deployment name through CI/CD variables keeps the container environment-neutral, while DefaultAzureCredential uses the managed identity instead of stored keys.

For a Foundry-based app, the project endpoint identifies the project that contains the app’s connections, tools, agents, and model deployments. The model deployment name identifies the specific deployed model to invoke within that project. In CI/CD, these values should be injected per environment, so the same code can target dev or prod safely. DefaultAzureCredential allows the container app’s managed identity to authenticate keylessly, assuming it has the required role assignments. A base model or catalog ID is not the same as a deployed model in a project, and retrieval service endpoints do not establish Foundry project context.

Catalog model ID fails because it does not select an approved deployment in the target Foundry project.
Newest deployment discovery fails because it bypasses explicit CI/CD approval and can change runtime behavior unexpectedly.
Search endpoint context fails because Azure AI Search supports retrieval, not Foundry project authentication or deployment selection.

Question 5

Topic: Plan and Manage an Azure AI Solution

A team is preparing a RAG-backed agent in Microsoft Foundry. Developers need to change prompts and tools freely, evaluation runs must use test data and evaluator traces, staging must mirror production, and production monitoring must report only live user traffic. Which environment separation pattern best supports release validation and observability?

Options:

A. Run all evaluations in production as shadow traffic.
B. Share one telemetry workspace across all environments to simplify dashboards.
C. Use one Foundry project and label traces with environment names.
D. Use separate Foundry projects per environment with separate telemetry and gated artifact promotion.

Best answer: D

Explanation: Environment separation should prevent test activity from contaminating production observability. Separate Foundry projects or equivalent isolated environments, with separate telemetry and release gates, let teams evaluate safely and promote validated artifacts into staging and production.

The core pattern is isolation plus controlled promotion. Development, evaluation, staging, and production should have separate Foundry project resources or clearly isolated environments, including separate model deployments, connections, data sources where needed, and observability sinks. Evaluation traces, synthetic data, and failed experiment runs remain outside production monitoring. CI/CD can then promote versioned prompts, tools, agent definitions, and app configuration through evaluation gates before production release. Staging should mirror production enough to validate quality, latency, grounding, and safety without mixing its telemetry with live-user metrics. A shared project or shared telemetry store can be convenient, but it weakens separation and makes release quality and production health harder to trust.

Trace labels only can reduce filtering effort but do not provide strong operational or access isolation.
Production shadow evaluation risks mixing test behavior with live monitoring and is not a substitute for staging validation.
Shared telemetry simplifies dashboards but can contaminate production metrics with development and evaluation traces.

Question 6

Topic: Plan and Manage an Azure AI Solution

A Microsoft Foundry project contains a customer-support agent that retrieves policy documents and can invoke a createRefund tool. Refund creation is a regulated action, and auditors must later determine which sources, tool calls, and approvals led to each refund. Which implementation should you choose?

Options:

A. Store conversation memory only and ask the agent to summarize its actions.
B. Enable trace logging, store retrieval provenance, and require approval before createRefund.
C. Log final answers only and let the agent call createRefund when confidence is high.
D. Show citations to users but exclude tool arguments and results from logs.

Best answer: B

Explanation: Auditing an agent workflow requires evidence beyond the final response. Trace logs capture conversation and tool execution, provenance metadata ties retrieved content to specific sources, and an approval workflow controls regulated actions before the write tool runs.

For a regulated agent action, the audit trail should connect the user request, retrieved grounding data, tool invocation, and approval decision. In Foundry-based agent workflows, trace logging is used to record observable execution steps such as prompts, retrieval events, function calls, inputs, outputs, and errors. Provenance metadata should identify the source documents or chunks used to ground the decision. A human approval step should gate the createRefund tool so the agent cannot complete the regulated action without an auditable decision. The key takeaway is to log the workflow path and preserve source and approval evidence, not just the model’s final text.

Final-only logging misses the retrieval and tool-call evidence auditors need and allows the regulated action without approval.
User citations only can help explain an answer, but omitting tool arguments, results, and approval decisions leaves the action unauditable.
Memory summaries are not a reliable audit record because they are model-generated and do not preserve authoritative trace events or provenance.

Question 7

Topic: Plan and Manage an Azure AI Solution

An enterprise HR assistant in a Microsoft Foundry project uses a RAG pattern with Azure AI Search and an employee-profile tool. Monitoring shows these production signals:

Exhibit: Recent evaluation signals

Groundedness evaluator: 41% fail
Citation pattern: answers cite 2023 policy pages for 2025 questions
Retrieval trace: top results contain no /HR/2025/ documents
Indexer config note: source path filter excludes /HR/2025/

Safety events: sensitive-info warnings in assistant answers
Tool trace: employeeProfile returns SSN and medicalCode fields
User intent: How many PTO days do I have?

You need to remediate the observed issues while preserving the current model deployment. Which two actions should you recommend?

Options:

A. Move the assistant to a larger general-purpose LLM.
B. Disable sensitive-info checks for authenticated HR users.
C. Limit employee-profile tool output to PTO fields only.
D. Increase topK against the existing search index.
E. Fix the search source filter for HR/2025 documents.
F. Add a prompt requiring citations for every HR answer.

Correct answers: C and E

Explanation: The signals identify two upstream causes: retrieval is missing current HR content, and the tool is exposing sensitive fields. Remediation should target the Azure AI Search ingestion/filter path and the tool schema or permissions, not model replacement.

Use monitoring signals to route remediation to the component that caused the failure. Low groundedness plus retrieval traces showing no /HR/2025/ results indicates a retrieval or ingestion problem, so the search source filter must include the current documents. Sensitive-info safety events plus a tool trace showing SSN and medical fields indicates a tool-output problem, so the tool should return only fields needed for the user’s PTO request. Prompt changes can help with behavior, but they do not add missing documents or prevent sensitive tool data from reaching the model.

Citation-only prompt fails because it cannot retrieve documents that the indexer excluded.
Increasing topK fails because searching more results from the same incomplete index still omits /HR/2025/ content.
Disabling checks weakens safety monitoring instead of reducing sensitive data exposure.
Larger model misses the stated root causes and violates the requirement to preserve the deployment.

Question 8

Topic: Plan and Manage an Azure AI Solution

A company is designing a Microsoft Foundry support agent. The agent uses a warranty-check tool and a retrieval connection to product manuals. During a chat, it must remember the customer’s selected device, completed troubleshooting steps, and a pending approval request across turns. These facts must not be learned globally or stored as a permanent customer profile unless the user approves. Which memory integration should you choose?

Options:

A. Fine-tune the deployed model with recent support transcripts.
B. Write every turn to a persistent CRM customer profile.
C. Add the chat facts to the Azure AI Search index.
D. Use per-conversation agent thread memory with structured session state.

Best answer: D

Explanation: The scenario requires short-lived conversation context, not global knowledge or permanent personalization. Per-conversation agent thread memory, optionally with structured session state, keeps the selected device, steps, and approval status available across turns while respecting the storage constraint.

For an agent that must maintain context during an active interaction, use conversation memory scoped to the agent thread or session. This lets the agent reference prior turns and store small structured facts, such as the selected device and pending approval state, without updating model weights or publishing private conversation facts into a retrieval index. Retrieval should ground answers in product manuals, while thread/session memory should track what has happened in the current conversation. If durable personalization is later approved, store only the approved facts in an appropriate profile or business system. The key distinction is runtime conversational state versus reusable organizational knowledge.

Search index misuse fails because Azure AI Search is for retrievable knowledge, not transient per-chat state.
Model tuning misuse fails because fine-tuning does not maintain live conversation context and can create privacy risk.
Persistent profile misuse fails because the stem explicitly prohibits permanent customer-profile storage without approval.

Question 9

Topic: Plan and Manage an Azure AI Solution

A team is moving a Microsoft Foundry agent from pilot to production. The agent retrieves regulated HR documents from Azure Blob Storage and Azure AI Search. The release gate requires proof that the runtime does not use public endpoints or stored access keys to reach those data sources. Which option best validates the deployment design?

Options:

A. Enable content safety filters; monitor blocked prompt categories.
B. Store connection strings in Key Vault; monitor secret expiration events.
C. Use managed identity and private endpoints; monitor audit and network logs.
D. Enable groundedness evaluation; monitor retrieval relevance scores.

Best answer: C

Explanation: This requirement is about identity and network isolation, not model quality. Managed identity is needed when the runtime must avoid stored secrets, and private networking is needed when access to Azure resources must not traverse public endpoints.

For production Foundry solutions that access regulated data, the deployment design must include managed identity when services should authenticate without embedded keys or connection strings. It must include private networking, such as private endpoints, when resource access must be restricted away from public endpoints. The release gate should validate both controls with observable evidence, such as identity/audit logs showing managed identity token-based access and network/resource logs showing private endpoint traffic. Quality evaluators for relevance, groundedness, or safety are useful for AI behavior, but they do not prove keyless authentication or private network routing.

Secret monitoring helps manage stored credentials but does not satisfy a no-stored-keys requirement.
Groundedness metrics validate answer quality and retrieval usefulness, not network path or authentication method.
Safety filter logs monitor harmful content controls, not private access to Blob Storage or Azure AI Search.

Question 10

Topic: Plan and Manage an Azure AI Solution

Your team is designing an agent workflow in a Foundry project for software-release approvals. The agent receives chat requests, policy questions, deployment-dashboard screenshots, and pull-request diffs. Governance requires least-capable model routing with audit traces; routine status updates need low latency; policy answers must be grounded in Azure AI Search; risky releases must call an approval tool. Which implementation should you choose?

Options:

A. Use one small language model with OCR, search, and approval tools.
B. Route by task to SLM, LLM, code, and multimodal deployments.
C. Use a code model as the orchestrator for all agent requests.
D. Use one multimodal LLM deployment for every turn and tool call.

Best answer: B

Explanation: The workflow has different task types with conflicting requirements, so a routed multi-model design is best. Use a small language model for low-latency routine status, an LLM with Azure AI Search for grounded policy reasoning, a code model for pull-request diffs, and a multimodal model for screenshots.

Model selection in a Foundry agent workflow should match each role to the smallest or most specialized deployment that can meet the task. A small language model is appropriate for predictable, low-latency status responses. A larger LLM is better for policy reasoning, especially when grounded through Azure AI Search. A code model is specialized for pull-request diffs, and a multimodal model is needed for screenshot understanding. The approval tool should be gated to risky releases and traced for governance. Using one large model for all work is simpler, but it weakens least-capable routing and may hurt latency and oversight.

Single multimodal model ignores the least-capable routing and low-latency constraints by sending every task to the broadest deployment.
Small model only is too limited for complex policy reasoning, code review, and visual dashboard interpretation.
Code model orchestrator overfits to pull-request analysis and is not the right primary model for policy, chat, or screenshots.

Continue with full practice

Use the AI-103 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AI-103 on Web View AI-103 Practice Test

Free review resource

Read the AI-103 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026

Implement Generative AI and Agentic Solutions

Browse Certification Practice Tests by Exam Family

AI-103: Plan and Manage an Azure AI Solution

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue with full practice

Related focused pages

Free review resource