Try 50 free Microsoft AI-300 questions across the exam domains, with explanations, then continue with full IT Mastery practice.
This free full-length Microsoft AI-300 practice exam includes 50 original IT Mastery questions across the exam domains.
These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.
Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.
Need concept review first? Read the Microsoft AI-300 Cheat Sheet for compact concept review before returning to timed practice.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try Microsoft AI-300 on Web View full Microsoft AI-300 practice page
| Domain | Weight |
|---|---|
| Design and Implement an MLOps Infrastructure | 19% |
| Implement Machine Learning Model Lifecycle and Operations | 29% |
| Design and Implement a GenAIOps Infrastructure | 24% |
| Implement Generative AI Quality Assurance and Observability | 14% |
| Optimize Generative AI Systems and Model Performance | 14% |
Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.
Topic: Design and Implement a GenAIOps Infrastructure
A team is deploying a chat-based generative AI workload in Microsoft Foundry. Load testing shows a steady requirement of 8,000 tokens per minute during business hours, and the operations team must reserve predictable capacity rather than rely on best-effort shared throughput. Which implementation should you use?
Options:
A. Deploy the model as a standard serverless API endpoint
B. Deploy the model with provisioned throughput units sized for the target token rate
C. Create a larger Foundry project environment
D. Register the prompt variant in a Git repository
Best answer: B
Explanation: Provisioned throughput is a capacity-planning choice for high-volume foundation model workloads with explicit throughput requirements. In this scenario, the key constraint is not just deploying a model; it is reserving predictable capacity for a known token-per-minute target. A standard model deployment or serverless API endpoint can make the model available, but it does not by itself satisfy the requirement to reserve throughput. Prompt versioning and project environment sizing are useful GenAIOps practices, but they do not allocate dedicated model-serving capacity. The operational distinction is: ordinary deployment exposes the model, while provisioned throughput planning reserves capacity for expected load.
Topic: Design and Implement a GenAIOps Infrastructure
A team uses GitHub Actions to deploy Microsoft Foundry infrastructure with Bicep. The workflow creates a Foundry resource, a Foundry project with a managed identity, and a role assignment that lets the project identity access Azure AI Search. The deployment fails.
Exhibit:
Scope: rg-genai-prod
Workflow principal: sp-gh-foundry-deploy
Current role on scope: Contributor
Failed operation:
Microsoft.Authorization/roleAssignments/write
Error:
The client 'sp-gh-foundry-deploy' does not have authorization
to perform action 'Microsoft.Authorization/roleAssignments/write'.
Which configuration change should you make before rerunning the deployment?
Options:
A. Add a Bicep dependsOn from the role assignment to the project.
B. Grant the workflow principal Role Based Access Control Administrator on the scope.
C. Change the Foundry project to use a user-assigned managed identity.
D. Store the Azure AI Search admin key as a GitHub secret.
Best answer: B
Explanation: The failure is an authorization problem for the deployment principal, not a Foundry project identity or dependency problem. In Azure, creating resources with Contributor is different from assigning Azure RBAC roles. A Bicep deployment that includes Microsoft.Authorization/roleAssignments needs the deploying identity to have role-assignment permissions at the target scope, such as Role Based Access Control Administrator or User Access Administrator. After that permission is granted, the same Bicep template can create the Foundry resources and assign the project managed identity access to Azure AI Search. The key clue is the failed roleAssignments/write operation.
roleAssignments/write.Topic: Design and Implement an MLOps Infrastructure
An ML engineering team is onboarding a fraud model. Training runs are submitted from individual notebooks, model files are stored in separate storage accounts, and the first endpoint deployment fails.
Deployment step: resolve model asset
Result: failed
Message: No Azure Machine Learning workspace target was provided for the model asset or endpoint.
The team wants a controlled place to organize experiments, assets, jobs, and deployments before adding automation. What should you do next?
Options:
A. Register the model in a cross-workspace Azure Machine Learning registry.
B. Add a GitHub Actions workflow to rerun the deployment step.
C. Create a shared Azure Machine Learning workspace and target it consistently.
D. Create a datastore for each storage account used by the notebooks.
Best answer: C
Explanation: The visible failure is not a model packaging or automation problem; the deployment step has no Azure Machine Learning workspace target. A workspace is the controlled management place where teams organize and govern experiments, data and model assets, jobs, compute references, and endpoints. The next step is to create or select a shared workspace and ensure the team’s jobs, registrations, and deployments all target it consistently. After that boundary exists, datastores, registries, and GitHub Actions can be added as supporting capabilities.
Topic: Implement Generative AI Quality Assurance and Observability
A team uses a Microsoft Foundry evaluation workflow as a release gate for a customer-support GenAI agent. The deployment policy says to block promotion when any safety category has reproducible high-severity findings, even if quality metrics pass.
Evaluation summary:
| Check | Result |
|---|---|
| Groundedness | Pass |
| Relevance | Pass |
| Jailbreak resistance | High severity, reproducible |
| Self-harm content | Low severity, mitigated |
Which configuration should the team apply to the release gate?
Options:
A. Block promotion until safety changes pass reevaluation
B. Promote because quality metrics passed
C. Retest immediately without changing the agent
D. Enable tracing only after production release
Best answer: A
Explanation: Risk and safety evaluation results should override quality-pass signals when the release policy defines high-severity findings as blocking. In this case, groundedness and relevance are acceptable, but jailbreak resistance has a reproducible high-severity failure. The release gate should prevent promotion, require a safety-focused change such as prompt, grounding, filtering, or orchestration mitigation, and then rerun the evaluation before release. A retest-only action is appropriate when the evaluation is inconclusive or misconfigured, not when a reproducible failure is already shown. The key takeaway is that passing quality metrics does not compensate for a blocking safety result.
Topic: Design and Implement a GenAIOps Infrastructure
A GitHub Actions workflow provisions Microsoft Foundry infrastructure with Bicep, then deploys a foundation model and runs evaluations. The provision stage reports success, but the deployment stage fails and the evaluation stage is skipped.
Exhibit: Pipeline evidence
Provision outputs:
foundryResourceName: contoso-ai-prod
projectName: <empty>
modelDeploymentName: <empty>
evalStorageAccount: stcontosoeval
Deploy log:
Target project: claims-prod
Error: project 'claims-prod' was not found in the Foundry resource.
What is the best next diagnostic step?
Options:
A. Increase the evaluation job timeout before rerunning the workflow.
B. Inspect the Bicep deployment for missing project and model resources.
C. Tune provisioned throughput for the model deployment.
D. Compare prompt variants in the Git repository.
Best answer: B
Explanation: The failure occurs before evaluation begins, so the first diagnostic focus should be the infrastructure automation boundary. The provision stage succeeded, but its outputs show empty project and model deployment values, and the deploy log says the target Foundry project does not exist. That points to a Bicep or Azure CLI provisioning gap, not an evaluation, prompt, or runtime performance issue. A preflight validation should confirm that required resources are declared, deployed, and output for downstream stages, including the Foundry resource, project environment, managed identity/RBAC as needed, foundation-model deployment target, and evaluation storage. The closest distractors assume later lifecycle stages are reachable, but the evidence shows the pipeline has not successfully provisioned the deployment target.
Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning pipeline has a prep step that writes cleaned data and a train step that consumes it. The latest run fails before the training script starts. You must fix the pipeline without changing the compute target, environment image, or source data asset.
Exhibit:
prep.outputs:
clean_data: uri_folder
train.inputs:
training_data: ${{parent.jobs.prep.outputs.preprocessed_data}}
train.error:
UserError: Failed to resolve input 'training_data'.
Output 'preprocessed_data' was not found on job 'prep'.
Which implementation should you apply?
Options:
A. Re-register the source data asset used by prep.
B. Resize the compute cluster used by train.
C. Rebuild the environment with the training dependencies.
D. Bind training_data to prep.outputs.clean_data.
Best answer: D
Explanation: The decisive evidence is a pipeline step dependency failure, not a code, data, environment, or compute failure. The error occurs before the training script starts and explicitly says the train input cannot resolve prep.outputs.preprocessed_data. The prep step exposes clean_data, so the downstream binding must use that output name. Because the constraints say not to change compute, environment, or the source data asset, the operationally correct fix is to update only the pipeline dependency wiring between steps.
A useful troubleshooting pattern is to first locate when the failure occurs: input resolution before execution usually points to pipeline bindings or data references, while failures inside the script point more toward code or environment issues.
prep completed and the failure is between prep and train.Topic: Implement Machine Learning Model Lifecycle and Operations
A GitHub Actions rollout job for an Azure Machine Learning managed online endpoint fails with: Model asset is archived and cannot be deployed. The job deploys models:/churn-risk@champion.
Registry state:
| Version | Alias | Lifecycle state | Validation |
|---|---|---|---|
| 18 | champion | Archived | Passed |
| 19 | candidate | Active | Passed |
| 20 | none | Active | Failed |
What is the best root cause?
Options:
A. The endpoint cannot deploy MLflow-registered models.
B. Version 19 failed validation and is blocked.
C. The champion alias points to archived version 18.
D. Version 20 is the current deployable model.
Best answer: C
Explanation: The rollout job is not selecting the highest version number automatically; it is resolving the explicit registry reference models:/churn-risk@champion. In the visible registry state, the champion alias is attached to version 18. That version has passed validation, but its lifecycle state is Archived, so it is not eligible for deployment. Version 19 is active and passed validation, making it eligible, but it is only marked as candidate, not the current champion. Version 20 is active but failed validation, so it should not be promoted or deployed under the stated process.
The key diagnostic is to compare the alias used by the rollout job with the lifecycle state of the resolved model version.
candidate alias.Topic: Optimize Generative AI Systems and Model Performance
A GenAIOps team is fine-tuning a foundation model in Microsoft Foundry to summarize customer support tickets. The team generated synthetic training examples from product documentation and wants to prevent the model from learning unrelated marketing-style responses. Before registering the fine-tuned model for deployment, which implementation best validates that the synthetic data supports the target task?
Options:
A. Deploy the model and monitor only latency and token usage.
B. Add more synthetic examples from broader product content.
C. Run a held-out ticket-summary evaluation with task-specific quality gates.
D. Approve the model when fine-tuning training loss decreases steadily.
Best answer: C
Explanation: For synthetic-data or fine-tuning validation, the key is to test the model against the target task, not just the training process. In this scenario, the team should use a held-out evaluation dataset of representative support tickets with expected summaries, then apply task-specific metrics or rubrics such as relevance, groundedness, coherence, and off-task behavior checks. This can be automated as a release gate before model registration or deployment in Microsoft Foundry. Training loss can show optimization progress, but it does not prove the model learned the right behavior. Operational metrics such as latency and token usage are useful later, but they do not validate task alignment.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team deployed a churn classifier to an Azure Machine Learning managed online endpoint. Ground-truth labels are written nightly to a curated data asset. The operations lead wants dashboards and alerts when live model quality drops below AUC 0.80 after release. Which configuration should you implement?
Options:
A. Schedule weekly retraining without production labels or metric thresholds.
B. Configure model monitoring with predictions, labels, AUC, and an alert threshold.
C. Configure data drift monitoring against the training feature distribution only.
D. Enable endpoint telemetry for latency, failures, and request volume only.
Best answer: B
Explanation: Production model quality remains visible by monitoring model performance metrics after deployment. For a classifier, Azure Machine Learning must have production predictions and corresponding ground-truth labels to calculate metrics such as AUC, accuracy, precision, or recall. The alert threshold makes the metric operational by notifying the team when quality crosses an agreed limit. Endpoint telemetry and drift detection are useful, but they answer different questions: whether the endpoint is healthy or whether input distributions changed. They do not directly prove the model is still predicting well.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team has registered a fraud model in an Azure Machine Learning workspace. The model must score all transactions uploaded to a datastore every night and write predictions back to storage. There is no requirement for per-request low latency. Which endpoint deployment configuration should the team use?
Options:
A. Create a real-time managed online endpoint.
B. Create a batch endpoint with a batch deployment on AML compute.
C. Expose the model through a Microsoft Foundry serverless API endpoint.
D. Deploy the model as a local endpoint on the training compute.
Best answer: B
Explanation: Azure Machine Learning batch endpoints fit production inference workloads that can run asynchronously over files, folders, or tabular data in bulk. In this scenario, the nightly schedule, datastore input, and storage output are stronger requirements than low-latency request handling. A batch deployment can reference the registered model, use AML compute for scalable scoring, and be invoked by a scheduled job or pipeline when new transactions arrive.
Managed online endpoints are better for synchronous, low-latency inference. The key takeaway is to match the endpoint type to the inference pattern: bulk and scheduled processing maps to batch endpoints.
Topic: Optimize Generative AI Systems and Model Performance
A team uses Microsoft Foundry to fine-tune a foundation model that rewrites support replies for a specialized product line. The fine-tuning job completes, but evaluation shows poor results on issue types that are not represented in the collected examples.
| Evidence | Value |
|---|---|
| Collected examples | 38 labeled replies |
| Intended issue types | 12 |
| Issue types with no examples | 5 |
| SME knowledge | Available for valid case descriptions |
What is the best next diagnostic step?
Options:
A. Create SME-reviewed synthetic examples for the missing issue types.
B. Lower the prompt temperature during evaluation runs.
C. Increase provisioned throughput for the model deployment.
D. Tune the RAG similarity threshold for support documents.
Best answer: A
Explanation: Synthetic data is appropriate when the fine-tuning task needs examples that the available dataset does not contain. Here, the job completed, so the symptom is not a training infrastructure failure. The evaluation weakness aligns with five missing issue types and only 38 labeled examples. The best next step is to use SME knowledge to create realistic, labeled synthetic examples for the uncovered cases, review them for quality and safety, add them to the fine-tuning dataset, and rerun evaluation. This targets the customization gap directly instead of changing deployment capacity or inference settings.
Topic: Design and Implement an MLOps Infrastructure
A machine learning team must create identical Azure Machine Learning workspaces, storage accounts, container registries, and compute targets in dev and prod. The environment must be reproducible from source control, and provisioning must run automatically when infrastructure changes are merged to the main branch. Which implementation should the team use?
Options:
A. Create the resources manually in Azure portal and export the template
B. Register the workspace resources as MLflow artifacts
C. Use an Azure Machine Learning pipeline to create the workspace
D. Run a GitHub Actions workflow that deploys Bicep with Azure CLI
Best answer: D
Explanation: For reproducible MLOps infrastructure, the resource definitions should be stored as infrastructure as code and deployed by an automated workflow. A GitHub Actions workflow can trigger on changes to the infrastructure folder or merges to main, authenticate to Azure, and run Azure CLI commands that deploy Bicep templates. This approach makes workspace resources such as Azure Machine Learning workspaces, storage, container registries, and compute targets consistent across environments and reviewable through Git history.
Manual creation does not provide reliable repeatability, and ML training pipelines or MLflow artifacts are for model lifecycle operations, not provisioning Azure resources. The key is to automate Azure resource deployment from version-controlled IaC.
Topic: Design and Implement an MLOps Infrastructure
A team uses GitHub Actions to submit Azure Machine Learning training pipelines and register models. The Azure Machine Learning workspace has a private endpoint and public network access disabled. The team must keep the workspace private and avoid storing long-lived credentials in GitHub. Which implementation best supports the automation securely?
Options:
A. Use a GitHub-hosted runner and enable public network access only during workflow runs.
B. Store an Azure service principal secret in GitHub and assign Owner on the subscription.
C. Use a self-hosted runner in the workspace VNet with OIDC-based Azure authentication and least-privilege RBAC.
D. Move the training script to an Azure Machine Learning compute cluster and grant all developers workspace Contributor.
Best answer: C
Explanation: Secure MLOps automation must satisfy both identity and network constraints. With public network access disabled, the automation runner needs network line of sight to the workspace private endpoint, such as a self-hosted GitHub Actions runner placed in the connected VNet. For identity, GitHub OIDC federation with Microsoft Entra ID avoids long-lived secrets and provides short-lived tokens. RBAC should be scoped to the workspace and any required dependent resources, not broadly to the subscription. This combination lets the pipeline submit jobs and register assets without exposing the workspace publicly or over-permissioning the automation identity.
Topic: Design and Implement a GenAIOps Infrastructure
A team is creating a Microsoft Foundry project for a customer-support agent. The agent will use a foundation-model deployment, Azure AI Search indexes, and a storage account that contains evaluation datasets. Security requires Microsoft Entra ID-based access, no service keys in code or prompts, and no public network path between the project and the dependent resources.
Which configuration should you create?
Options:
A. Azure Machine Learning datastore connections for the evaluation datasets
B. GitHub repository secrets containing search and storage access keys
C. Foundry project managed identity, RBAC assignments, and private endpoints
D. Public endpoints with IP firewall rules for developer workstations
Best answer: C
Explanation: For a Foundry project that must operate without service keys and without public network access, use managed identity, RBAC, and private networking. The project identity can be granted only the permissions it needs on Azure AI Search and storage, while private endpoints keep traffic on private network paths. This supports GenAIOps operations such as deployments, evaluations, and agent workflows without embedding credentials in code, prompts, or repository settings.
Key takeaway: identity-based authorization and private connectivity are the required resource configuration pattern for this security posture.
Topic: Implement Generative AI Quality Assurance and Observability
A team is adding a quality gate to a GitHub Actions workflow for a RAG chatbot deployed through Microsoft Foundry. The current workflow passes when the prompt flow executes without errors and returns HTTP 200. The release must be blocked when answers are poorly grounded in retrieved sources, even if the endpoint is healthy. Which implementation best meets the requirement?
Options:
A. Run a Foundry evaluation on a mapped test dataset and fail on groundedness threshold breaches
B. Call the deployed endpoint and fail only when the response code is not 200
C. Track average latency and token consumption for each test prompt
D. Validate that every response is well-formed JSON with required fields
Best answer: A
Explanation: Automated evaluation should test the quality outcome the release gate is meant to protect. For a RAG chatbot, poor grounding means generated answers are not sufficiently supported by retrieved source content. A Foundry evaluation workflow should use a representative test dataset with the required data mapping, such as prompt, generated answer, retrieved context, and optionally expected answer or ground truth. The workflow can then calculate quality metrics such as groundedness and fail the release when the metric breaches the defined threshold. Execution checks, schema checks, and operational metrics are useful, but they do not prove that the model’s answer is supported by the retrieved evidence.
Topic: Design and Implement an MLOps Infrastructure
An MLOps team provisions an Azure Machine Learning workspace by using Bicep. Governance policy allows only resources declared in the template; Azure Machine Learning cannot create dependent resources later.
Workspace setup facts:
| Resource | Status |
|---|---|
| Storage account | Configured |
| Key Vault | Configured |
| Application Insights | Configured |
| Container Registry | Missing |
The team must run training jobs that use curated environments now and build custom environments for endpoint deployment next sprint. What should the engineer implement?
Options:
A. Block all training jobs until the container registry exists.
B. Replace the storage account with a container registry.
C. Proceed now; add a workspace container registry before custom image builds.
D. Skip model registration until Application Insights is removed.
Best answer: C
Explanation: Azure Machine Learning workspaces rely on dependent resources for different downstream activities. The configured storage account, Key Vault, and Application Insights support workspace artifacts, secrets, and monitoring. A missing container registry does not necessarily block using curated environments for immediate training jobs, but it matters when Azure Machine Learning must build or store custom container images for environments used in deployment. Because the stem states that undeclared dependent resources cannot be created later, the registry must be added before the custom environment and endpoint work begins.
The key distinction is between activities that use existing curated assets and activities that require new workspace-managed container images.
Topic: Design and Implement an MLOps Infrastructure
A machine learning team has separate Azure Machine Learning workspaces for development, validation, and production. Only models, environments, and components that pass validation can be reused in production pipelines. The team wants a repeatable way to promote approved asset versions without exporting files manually. Which registry-sharing configuration should you use?
Options:
A. Copy asset files from the development datastore into each production workspace datastore.
B. Store model binaries and environment files in GitHub and clone them into each workspace.
C. Register all assets directly in the production workspace from development jobs.
D. Publish approved asset versions to a shared Azure Machine Learning registry and grant production read access.
Best answer: D
Explanation: Azure Machine Learning registries are designed for sharing versioned assets, such as models, environments, and components, across workspaces. In this scenario, the approved asset version should be promoted to a shared registry after validation, and production pipelines should reference the registry asset version. RBAC can separate responsibilities: validation or release automation can publish approved versions, while production workspaces or service identities can consume them. This avoids manual file export, keeps asset lineage and versions explicit, and prevents development jobs from directly changing production workspace assets.
Topic: Design and Implement a GenAIOps Infrastructure
A team is preparing a Microsoft Foundry deployment for a production RAG assistant. Traffic is consistently high during business hours, latency must be predictable, and finance wants to reduce exposure to per-token cost spikes. Development and test workloads remain low volume. Which implementation approach should the team use for the production model deployment?
Options:
A. Use only a serverless API endpoint for all environments
B. Use provisioned throughput sized from load tests
C. Increase prompt retries to smooth latency spikes
D. Fine-tune the model before choosing deployment capacity
Best answer: B
Explanation: Provisioned throughput is the operational fit when a Foundry foundation-model deployment has sustained, predictable high volume and needs stable latency and cost planning. The team should estimate required capacity from realistic load tests, deploy production with provisioned throughput units, and monitor utilization so capacity can be adjusted as demand changes. Low-volume development and test workloads can remain on lower-cost, consumption-oriented options if they do not need the same reserved capacity.
The key distinction is workload predictability: serverless-style consumption is often simpler for variable or low-volume traffic, but it does not provide the same reserved-capacity planning model for sustained production demand.
Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning training pipeline logs each job to MLflow. The release gate should register the run with the highest test_f1 only if p95_latency_ms is at most 80 and the validation-to-test F1 drop is at most 0.03.
| MLflow run | val_f1 | test_f1 | p95_latency_ms |
|---|---|---|---|
| run-a | 0.91 | 0.86 | 62 |
| run-b | 0.89 | 0.88 | 74 |
| run-c | 0.87 | 0.87 | 45 |
| run-d | 0.90 | 0.89 | 96 |
Which run should the pipeline register as the candidate model?
Options:
A. Register run-b.
B. Register run-d.
C. Register run-c.
D. Register run-a.
Best answer: A
Explanation: MLflow run evidence should be compared against the stated release objective, not a single metric in isolation. The objective first filters out runs that violate operational gates: latency must be at most 80 ms, and the validation-to-test F1 drop must be no more than 0.03. run-a has a drop of 0.05, and run-d exceeds the latency limit. That leaves run-b and run-c; among those eligible runs, run-b has the higher test_f1 value. The key takeaway is to apply constraints first, then optimize the target metric among the remaining candidates.
run-a drops from 0.91 to 0.86, exceeding the allowed generalization gap.run-c meets the gates but has lower test_f1 than another eligible run.run-d violates the stated latency gate.Topic: Design and Implement a GenAIOps Infrastructure
An operations team is preparing a Microsoft Foundry chat deployment for a high-volume launch. The model responds correctly in a smoke test, but production has a stated load target.
| Requirement | Value |
|---|---|
| Peak traffic | 900 requests/minute |
| Average tokens | 1,200 tokens/request |
| p95 latency target | 2 seconds |
| Capacity estimate | 45 PTUs |
Which configuration best validates production readiness?
Options:
A. Use a serverless endpoint and run a health probe
B. Allocate 45 PTUs and run a production-shaped load test
C. Increase client retries and keep the current deployment
D. Allocate 45 PTUs and skip load validation
Best answer: B
Explanation: Provisioned throughput planning for high-volume Foundry workloads must be tied to the expected production traffic pattern, not just to whether the model endpoint responds. The stem provides a peak request rate, average token volume, latency target, and a PTU estimate. The readiness decision should configure the model deployment with the estimated PTU allocation and validate it with load that resembles production, measuring latency and throttling under that load. A smoke test or health probe only proves that the deployment is reachable. Retries can hide transient failures but do not create capacity and may worsen latency during saturation.
Topic: Design and Implement a GenAIOps Infrastructure
A GenAIOps team must provision the same Microsoft Foundry resource, project environment settings, managed identity, RBAC assignments, and private networking in development, test, and production subscriptions. The team wants repeatable deployments from source control with environment-specific values supplied at release time. Which configuration choice best meets the requirement?
Options:
A. Use prompt versioning to recreate project settings
B. Use parameterized Bicep templates deployed by Azure CLI
C. Create each Foundry project manually in the portal
D. Store setup steps in a shared runbook document
Best answer: B
Explanation: Repeatable Foundry infrastructure provisioning should use infrastructure as code. A Bicep template can define the Microsoft Foundry resource configuration, project environment settings, managed identities, RBAC assignments, and networking in a source-controlled, reviewable form. Parameters let the same template deploy to development, test, and production with different names, subscriptions, or network values while keeping the intended configuration consistent.
Manual portal setup and runbooks can describe the process, but they are more prone to drift and are harder to validate in pull requests. Prompt versioning is useful for tracking prompt assets, not provisioning Foundry infrastructure.
Topic: Implement Machine Learning Model Lifecycle and Operations
An MLOps team uses Azure Machine Learning to train a PyTorch vision model. The training script already supports torch.distributed and logs metrics and artifacts with MLflow. A single GPU node cannot meet the training-time target. You must keep one coordinated training run with synchronized gradients and register one resulting model. Which implementation should you use?
Options:
A. Launch independent training jobs and select the lowest-loss run.
B. Deploy the model to a managed online endpoint with multiple instances.
C. Create parallel pipeline components for separate epoch ranges.
D. Submit a distributed command job on a multi-node GPU compute cluster.
Best answer: D
Explanation: For a large or deep learning model that exceeds a single node, Azure Machine Learning should run a distributed training job on a compute cluster that can allocate multiple GPU nodes. Because the script already supports torch.distributed, the job configuration should specify the appropriate distribution settings, such as PyTorch distribution, process count per instance, and instance count. This keeps the workers in one coordinated training run, allows synchronized gradient updates, and preserves MLflow tracking and model artifact logging for registration. Parallel pipelines and independent jobs can run multiple tasks, but they do not automatically provide distributed gradient synchronization for one training run.
Topic: Optimize Generative AI Systems and Model Performance
A team fine-tuned a Microsoft Foundry model for customer support and approved support-ft:4 for production after evaluation. After release, quality alerts continue to match the previous version.
Evidence:
| Source | Evidence |
|---|---|
| Dev evaluation | support-ft:4, groundedness 0.86, avg tokens 740 |
| Release tag | expected model support-ft:4, prompt ticket-summary:12 |
| Production endpoint | traffic 100% to deployment using support-ft:3 |
| Production traces | model support-ft:3, groundedness 0.62, avg tokens 1,250 |
What is the best root cause?
Options:
A. Production is still serving the previous fine-tuned model version.
B. The prompt version was not promoted with the model.
C. The evaluation dataset is too small for approval.
D. The endpoint needs more provisioned throughput units.
Best answer: A
Explanation: Versioning evidence should connect the approved fine-tuned model, release artifact, deployed endpoint, and production traces. Here, development evaluation approved support-ft:4, and the release tag also expected support-ft:4. However, the production endpoint routes all traffic to a deployment using support-ft:3, and production traces confirm that requests are being served by support-ft:3. The monitoring symptoms are therefore tied to the old model, not to the evaluated production candidate.
The next operational fix would be to update or roll out the production deployment so that traffic is routed to the approved fine-tuned model version, then continue monitoring quality and token metrics for that version.
ticket-summary:12, and no production prompt mismatch is shown.Topic: Implement Generative AI Quality Assurance and Observability
A team uses Microsoft Foundry to deploy a customer-support copilot. A new prompt variant passes relevance, coherence, latency, and token-cost targets, but the risk and safety evaluation reports repeatable unsafe completions for adversarial test cases. The release workflow currently promotes variants automatically when performance metrics pass.
Which configuration change should the AI operations engineer implement?
Options:
A. Increase provisioned throughput for the model deployment
B. Tune chunk size and similarity threshold for retrieval
C. Add a risk-and-safety gate that blocks promotion for review
D. Promote the variant and monitor latency after release
Best answer: C
Explanation: Risk and safety evaluation findings should be treated differently from ordinary performance issues. If a prompt variant produces repeatable unsafe completions, the release workflow should prevent automatic promotion and require review of the prompt, selected model, guardrails/content filters, or deployment settings. Metrics such as latency, throughput, token consumption, relevance, and coherence can guide performance tuning, but they do not override a safety failure.
The key distinction is that safety issues are release-blocking quality risks, not optimization targets to tune around after deployment.
Topic: Design and Implement a GenAIOps Infrastructure
A team deployed a foundation model in Microsoft Foundry for a production agent workflow. The model works in the Foundry playground, but the agent workflow fails before any prompt steps run.
Diagnostic evidence:
Model deployment name: contoso-prod-gpt4o
Deployment status: Succeeded
Endpoint configured in agent: https://contoso-foundry-prod.example/models
Deployment configured in agent: contoso-gpt4o-prod
Trace status: 404 DeploymentNotFound
What is the most likely root cause?
Options:
A. The agent references the wrong deployment name.
B. The agent managed identity lacks RBAC access.
C. The content safety filter blocked the prompt.
D. The deployment needs more provisioned throughput.
Best answer: A
Explanation: Validating model consumption means checking that the intended application or agent can call the exact deployed foundation model endpoint with the correct deployment identifier and identity. In this case, the endpoint matches the production Foundry resource, and the deployment is in a succeeded state. The visible failure is a 404 DeploymentNotFound, and the configured deployment name in the agent differs from the actual model deployment name. That points to an application configuration mismatch, not a model availability or quality issue. The next fix is to update the agent workflow to reference contoso-prod-gpt4o and rerun a smoke test from the agent context.
DeploymentNotFound.Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning real-time endpoint started failing a post-deployment quality gate immediately after a model promotion. The team must roll back quickly and preserve an audit trail showing the exact model artifact that served traffic.
| Registered model | Version | Source run | Release state |
|---|---|---|---|
fraud-risk | 17 | run-744 | Previous production; gate passed |
fraud-risk | 18 | run-811 | Current endpoint; gate failed |
Endpoint deployment model reference: azureml:fraud-risk:18
What is the best next model-versioning action?
Options:
A. Register run-811 again as version 19.
B. Redeploy version 17 and archive version 18.
C. Retag run-744 as production only.
D. Overwrite version 18 with artifacts from run-744.
Best answer: B
Explanation: Registered model versions provide the audit boundary for rollback and promotion decisions. The endpoint is explicitly serving fraud-risk version 18, and the evidence shows that version 17 was the previous production version that passed the gate. A rollback should point the deployment back to the known-good registered version rather than altering artifacts in place. Archiving or demoting the failed version keeps its lineage available for investigation while reducing the chance it is promoted again.
The key takeaway is that rollback should move traffic to a different registered model version, not mutate the failed version.
Topic: Design and Implement an MLOps Infrastructure
A team uses an Azure Machine Learning pipeline to train a fraud model every week. The source files in an Azure Blob datastore are overwritten during nightly ingestion, but auditors must be able to rerun any released training job with the exact input snapshot used for that run. Which implementation should the team use?
Options:
A. Store the dataset version in the model description after registration.
B. Reference the datastore folder path directly in the pipeline.
C. Create a versioned data asset for each approved snapshot and reference asset_name:version in the pipeline.
D. Reference the data asset by name only so the pipeline uses the latest version.
Best answer: C
Explanation: Azure Machine Learning data assets provide a managed, versioned reference to data used by jobs and pipelines. For reproducibility, the training workflow should consume a specific data asset version, such as fraud_train:12, rather than a mutable datastore path or an unpinned asset name. This preserves the operational contract that a released model can be traced back to the intended input data. The underlying storage still needs appropriate retention, but the pipeline dependency should be expressed as a pinned data asset version.
Topic: Design and Implement a GenAIOps Infrastructure
A GenAIOps team deployed a chat workload in Microsoft Foundry. The internal web app authenticates by using a managed identity with RBAC. A security test shows the model deployment endpoint still resolves publicly and accepts requests from outside the corporate network when a valid token is used. The workload must avoid unnecessary public exposure and be reachable only from the application VNet.
What is the best diagnostic conclusion?
Options:
A. Private Link is missing for the Foundry endpoint.
B. The prompt version was not pinned in Git.
C. Provisioned throughput units are undersized.
D. The managed identity lacks model deployment permissions.
Best answer: A
Explanation: Managed identity and RBAC are identity controls; they do not by themselves remove public network reachability. For a Foundry workload that must avoid unnecessary public exposure, the private-access pattern is to place access behind Azure Private Link/private endpoints from the application VNet and disable public network access where supported. The visible symptom is not an authentication failure, capacity issue, or prompt mismatch because calls succeed with a valid token from outside the intended network. The key diagnostic distinction is identity authorization versus network isolation.
Topic: Design and Implement an MLOps Infrastructure
An Azure Machine Learning training pipeline was rerun after a failed downstream step. The training step completed, but the model metrics changed significantly. The run history shows the same code commit, environment version, and compute target for both runs.
Exhibit: Training input reference
inputs:
training_data:
type: uri_folder
path: azureml://datastores/landing/paths/customer-churn/train/
The storage team confirms that files under customer-churn/train/ are refreshed nightly. What is the best root cause?
Options:
A. The pipeline references a mutable datastore path
B. The compute target reused cached training outputs
C. The environment asset version changed between runs
D. The model was registered before evaluation completed
Best answer: A
Explanation: The core issue is repeatability of training data references. In Azure Machine Learning, a datastore path is a storage location reference; if files at that path are overwritten or refreshed, the same pipeline definition can read different data on a later run. To make training inputs repeatable across jobs and pipelines, create a versioned data asset for the approved training snapshot and reference that asset version, such as a named uri_folder data asset. The visible evidence rules out code, environment, and compute changes, while the nightly refresh explains why metrics changed after rerun. A data asset version creates a stable contract for the pipeline input, even when the underlying storage area continues to receive new data.
Topic: Design and Implement an MLOps Infrastructure
A team connects an Azure Machine Learning workspace to a GitHub repository that stores environment and component YAML files. After a pull request is merged, Azure ML studio shows the new commit in the repository history and the file diffs are correct, but no Azure ML job or asset update occurs. The GitHub repository’s Actions page shows no run for the merge commit.
What is the best next diagnostic step?
Options:
A. Check the GitHub Actions workflow trigger and enablement
B. Reconnect the repository to restore Git history
C. Create a new Azure Machine Learning datastore
D. Register the YAML files manually as workspace assets
Best answer: A
Explanation: Git source control and GitHub Actions serve different purposes in an MLOps setup. Git records version history, branches, pull requests, and file diffs. GitHub Actions executes automation, such as validating YAML, provisioning resources, registering assets, or submitting Azure Machine Learning jobs after a merge. In this scenario, the commit and diffs are visible, so source control integration is functioning. The missing evidence is workflow execution: the Actions page has no run for the merge commit. The next diagnostic step is to inspect whether a workflow exists, is enabled, and has a trigger that matches the merge event and branch.
A repository can have correct version history without any automation running.
Topic: Optimize Generative AI Systems and Model Performance
A team uses Microsoft Foundry for a support assistant. RAG retrieval is accurate, but evaluations show low coherence and inconsistent use of the company’s required troubleshooting format. The team has 200 reviewed examples, can create SME-reviewed synthetic examples, and must avoid full base-model retraining. Which implementation best fits the requirement?
Options:
A. Replace fine-tuning with a longer system prompt only
B. Continue pretraining on all support chat logs without review
C. Run parameter-efficient supervised fine-tuning with curated and synthetic examples
D. Lower the RAG similarity threshold to return more documents
Best answer: C
Explanation: The requirement is behavior and domain-format adaptation, not retrieval improvement. Because RAG already retrieves the right content, tuning chunking or similarity is unlikely to fix inconsistent answer structure. A parameter-efficient supervised fine-tuning approach uses task-specific prompt-response examples to adapt style, terminology, and output format without retraining the entire foundation model. SME-reviewed synthetic examples can expand coverage when labeled data is limited, but they should be curated and evaluated before deployment.
The operational pattern is to create a versioned training dataset, run the fine-tuning job, register or version the resulting model deployment, and compare it against the baseline with quality and safety evaluations.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team trains a classification model in Azure Machine Learning. Before a model can be promoted, reviewers must verify the training inputs, hyperparameters, evaluation metrics, logs, and the exact artifact produced by the job. The team also wants enough evidence to troubleshoot failed or degraded runs later. Which configuration should the engineer implement?
Options:
A. Store the training notebook in GitHub and review commit history.
B. Configure MLflow tracking in the training job and log parameters, metrics, artifacts, and run metadata.
C. Register only the final model artifact in the workspace model registry.
D. Save evaluation metrics to a local CSV file on the training compute.
Best answer: B
Explanation: Training job evidence should be captured with MLflow experiment tracking in Azure Machine Learning so each run keeps comparable, queryable records. The job should log key parameters, metrics, artifacts such as plots or evaluation files, logs, and useful metadata such as code, data, and environment references. This creates a durable link between the produced model artifact and the run that generated it, which supports promotion gates, later evaluation, and troubleshooting. Registering a model is important later, but it does not by itself preserve the full evidence trail for why that model should be promoted or how it was produced.
Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning pipeline registers a fraud detection model only after validation gates pass. The team requires production release to be blocked when responsible AI evidence shows unmitigated harm, even if accuracy targets are met.
Validation results for the latest run:
| Evidence | Result |
|---|---|
| AUC | 0.91, target met |
| Error analysis | False-negative rate is 2.8x higher for one protected group |
| Mitigation record | No mitigation or business sign-off attached |
| Data drift check | Within threshold |
Which implementation should the MLOps engineer apply?
Options:
A. Keep the model in validation and require mitigation evidence before registration
B. Register the model with a lower production traffic percentage
C. Register the model because AUC and drift checks passed
D. Deploy to a small canary endpoint and monitor complaints
Best answer: A
Explanation: Responsible AI evaluation is a pre-deployment quality gate, not just a post-deployment monitoring activity. In this scenario, the aggregate model metric meets the target, but error analysis shows a materially worse false-negative rate for a protected group, and there is no mitigation record or approved exception. The operational behavior should keep the model in validation until the team provides mitigation evidence, a revised evaluation, or an explicit governance sign-off required by policy. Passing data drift and AUC checks does not override unresolved responsible AI evidence.
A safe release process treats subgroup harm as a blocker when the release criteria say responsible AI evidence must support production use.
Topic: Implement Generative AI Quality Assurance and Observability
A team operates a Microsoft Foundry chat application in production. They need a monitoring configuration that can answer these questions: how long each user request takes, how many requests the deployment handles per minute, why a specific response used an unexpected retrieved passage, and which requests drive token-related cost. Which configuration should you apply?
Options:
A. Enable only token-usage totals and provisioned throughput allocation
B. Enable only aggregate latency and CPU resource-usage metrics
C. Enable only application error logs and model quality scores
D. Enable response-time metrics, throughput metrics, traces, and token-usage logging
Best answer: D
Explanation: Continuous monitoring for generative AI systems should collect evidence that matches the operational question. Response time or request latency shows how long an individual request takes from the user or service perspective. Throughput measures volume over time, such as requests per minute. Traces show the step-by-step execution path for a specific request, including retrieval, prompt construction, model calls, and tool calls, which supports debugging unexpected outputs. Token usage identifies prompt and completion token consumption that affects cost. Resource usage, such as CPU or provisioned capacity utilization, is useful for infrastructure pressure but does not explain retrieved context or per-request token cost by itself.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team has registered a churn prediction model in an Azure Machine Learning workspace. The model must score large files uploaded each night, write predictions back to storage, and does not require synchronous request/response latency. The team also wants to avoid managing Kubernetes infrastructure.
Which managed inference option should you configure?
Options:
A. Deploy the model to a batch endpoint
B. Deploy the model to a Kubernetes online endpoint
C. Run the model from a scheduled notebook
D. Deploy the model to a managed online endpoint
Best answer: A
Explanation: Azure Machine Learning batch endpoints are the managed inference option for asynchronous, large-scale scoring jobs. They are appropriate when input data arrives as files or datasets, predictions can be written to storage, and there is no need for low-latency real-time responses. Managed online endpoints are better for real-time APIs that serve one request at a time with strict latency expectations. Kubernetes online endpoints can support custom hosting requirements, but they introduce Kubernetes infrastructure management. A scheduled notebook can run scoring code, but it is not the managed endpoint deployment pattern for serving a registered model.
Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning team trains a churn model using customer features from a feature store. The managed online endpoint will receive only customer IDs and must retrieve features at scoring time. Operations require the endpoint to use the same feature sets and versions validated during training, even if newer feature versions are added later.
What should you implement before registering the model?
Options:
A. Keep feature lookup logic only in the Git repository.
B. Query the latest feature set versions at runtime.
C. Package the feature retrieval specification with the model artifact.
D. Register the training dataset as a versioned data asset.
Best answer: C
Explanation: When production inference depends on feature retrieval, the model package should include the feature retrieval specification used during training and validation. That specification captures the feature sets, versions, entity keys, and lookup details needed for scoring. Packaging it with the model artifact makes the registered model self-contained enough for deployment and helps the endpoint retrieve features consistently with the model’s training assumptions. A versioned dataset can preserve training inputs, but it does not define the live scoring lookup contract. Querying latest feature versions or relying only on external source control can introduce drift between the validated model and production feature access.
Topic: Design and Implement an MLOps Infrastructure
A GitHub Actions workflow submits an Azure Machine Learning pipeline to the ml-prod workspace. The same YAML succeeds in ml-dev, but fails in ml-prod during validation.
component: azureml:preprocess:3
error: Component preprocess version 3 was not found in workspace ml-prod
note: preprocess:3 exists in workspace ml-dev
What is the most likely root cause?
Options:
A. The production datastore lacks read permission for training data.
B. The pipeline needs MLflow model registration before validation.
C. The component is only a local workspace asset in ml-dev.
D. The production compute target cannot pull the component image.
Best answer: C
Explanation: Azure Machine Learning workspace assets, such as components, are scoped to the workspace where they are registered unless they are published to a registry or separately registered in another workspace. The evidence shows validation fails before execution because ml-prod cannot resolve azureml:preprocess:3, while the same component version exists only in ml-dev. For reusable components across workspaces, use an Azure Machine Learning registry reference or ensure the component is registered in each target workspace. Compute, datastore, and MLflow model issues would appear later or involve different resource types.
Topic: Optimize Generative AI Systems and Model Performance
A team operates a Microsoft Foundry RAG assistant for internal policy questions. Recent evaluations show low groundedness. The production foundation model deployment and prompt are locked for this release; only retrieval settings can change. The team must prove that any optimization improves answer quality before rollout. Which implementation should you use?
Options:
A. Promote the lowest similarity threshold based on higher retrieval counts.
B. Fine-tune the foundation model on the policy documents.
C. Switch to a larger foundation model deployment for evaluation.
D. Compare retrieval variants with a fixed model and mapped evaluation dataset.
Best answer: D
Explanation: RAG retrieval optimization should be validated by isolating retrieval changes from model and prompt changes. In Microsoft Foundry, use the same deployed model and prompt, create candidate retrieval configurations such as different chunk sizes, similarity thresholds, or hybrid search settings, and run the same mapped evaluation dataset against each variant. Compare answer-quality metrics such as groundedness and relevance, and optionally check operational metrics such as latency and token consumption. This proves whether retrieval changes improved answer quality rather than masking the result with a different model. Retrieval volume alone is not enough because more retrieved chunks can add noise and reduce groundedness.
Topic: Implement Generative AI Quality Assurance and Observability
A team is configuring an automated evaluation workflow in Microsoft Foundry for a RAG-based support assistant. Pilot users report that responses are easy to read, but some answers include claims that are not supported by the retrieved product documentation. Which evaluation metric should be configured as the primary quality gate?
Options:
A. Groundedness
B. Relevance
C. Coherence
D. Fluency
Best answer: A
Explanation: Groundedness is the best match when the output-quality concern is unsupported claims or hallucinations relative to retrieved context. In a RAG evaluation workflow, the model response should be checked against the retrieved documents or grounding data to confirm that the answer is source-supported. Fluency and coherence can indicate whether the response reads naturally and is logically structured, but they do not prove that claims are backed by the source material. Relevance checks whether the response addresses the user request, not whether every factual statement is grounded.
Topic: Optimize Generative AI Systems and Model Performance
A Microsoft Foundry RAG app supports field technicians for proprietary equipment. Quality evaluation shows low retrieval relevance only for queries that use internal fault codes and acronyms.
Evidence:
| Check | Result |
|---|---|
| Exact keyword search for codes | Finds the right manual page |
| Current vector top-k results | Similar wording, wrong component |
| Chunk size and similarity threshold tests | No material improvement |
| Reviewed query-passage pairs | 2,000 labeled pairs available |
Which next diagnostic or optimization step best follows the evidence?
Options:
A. Increase top-k and rely on the prompt to filter chunks.
B. Switch to a multilingual embedding model.
C. Replace the generator with a larger foundation model.
D. Fine-tune the embedding model with labeled domain pairs.
Best answer: D
Explanation: The evidence points to an embedding mismatch, not a generation problem. Exact keyword search can find the right pages, but vector retrieval ranks semantically similar passages about the wrong component. Because the failures are tied to proprietary codes and acronyms, and there are labeled query-passage relevance pairs, fine-tuning the embedding model is the best next step. Selecting a different embedding model is more appropriate when the mismatch is a known capability gap, such as language coverage, modality, or a clearly better domain-ready embedding model. Here, the visible issue is specialized internal terminology that the current embedding space does not represent well.
The key takeaway is to fix retrieval semantics before changing the answer-generation layer.
Topic: Implement Generative AI Quality Assurance and Observability
A team operates a RAG-based customer support assistant in Microsoft Foundry. A production issue caused some answers to be generated without calling the retrieval step, so citations were missing. A fix has been deployed. You need a monitoring configuration that validates the corrected request path for live traffic, not just aggregate health. Which observability signal should you configure?
Options:
A. Tracing with retrieval spans and prompt-response details
B. Aggregate token consumption by model deployment
C. Provisioned throughput utilization for the deployment
D. Endpoint throughput and average response time
Best answer: A
Explanation: For a production GenAI issue involving a missing step in the request path, the validating signal should expose the per-request execution flow. Tracing is the right observability configuration because it can show spans for retrieval or tool calls, the prompt context passed to the model, and the resulting response. That evidence confirms that the deployed fix changed the behavior that caused missing citations. Aggregate metrics are still useful for operations, but they cannot prove that retrieval now happens for the specific class of affected requests. The key takeaway is to match the signal to the failure mode: use traces for flow and debugging validation, and use metrics for aggregate health, capacity, and cost trends.
Topic: Implement Machine Learning Model Lifecycle and Operations
An Azure Machine Learning real-time endpoint was updated with a newly registered churn model. Training used features from a feature store, but endpoint tests show missing/renamed feature values compared with training runs.
Deployment note:
Model artifact: model.pkl, conda.yml, MLmodel
Feature retrieval spec: not found
Endpoint input: customer_id, transaction_id
Observed issue: feature mismatch at scoring
Which configuration change should the MLOps engineer make?
Options:
A. Register the model with a new MLflow experiment name
B. Grant the endpoint identity registry reader access
C. Increase the endpoint compute instance size
D. Repackage the model with the feature retrieval specification
Best answer: D
Explanation: A feature mismatch after deployment can indicate that the model artifact package does not contain the required feature retrieval specification. The retrieval specification describes which feature sets and feature columns the model expects and enables the inference path to retrieve the same features used during training. In this case, the endpoint input contains only identifiers, while the note explicitly says the feature retrieval spec is missing. Repackaging and registering the model with that specification addresses the packaging defect. Compute size, registry access, and experiment naming do not define the model’s expected feature retrieval behavior.
Topic: Implement Generative AI Quality Assurance and Observability
A production RAG chat app in Microsoft Foundry was returning answers that were not supported by the retrieved documents. The team updated the prompt variant and retrieval instructions. You need to validate in continuous monitoring that the production issue is corrected, without relying on manual spot checks. Which observability signal should you use?
Options:
A. HTTP success rate for model deployment calls
B. Groundedness score from production evaluation traces
C. Average response latency for chat completions
D. Total token consumption per conversation
Best answer: B
Explanation: For a production GenAI issue involving unsupported or hallucinated RAG answers, the needed observability signal is a quality signal tied to the retrieved context. Groundedness evaluates whether the generated response is supported by the source documents or context used for the answer. In Microsoft Foundry observability, monitoring groundedness over production traces can show whether the prompt and retrieval changes corrected the actual failure mode. Latency, token usage, and HTTP success rates are useful operational signals, but they do not validate that answers are factually supported by retrieved content. The key is to match the monitoring signal to the issue being corrected.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team trains a churn model in Azure Machine Learning and wants a release pipeline to promote only models whose training evidence can support later evaluation, rollback analysis, and troubleshooting. The training code already uses MLflow. Which implementation best preserves the required evidence for each candidate model?
Options:
A. Capture endpoint latency and error metrics after deployment.
B. Register only the serialized model file with a production-ready tag.
C. Log parameters, metrics, artifacts, data asset version, environment, and Git commit to the MLflow run.
D. Save metrics in the pipeline console output and upload the model file.
Best answer: C
Explanation: Training job evidence should make a candidate model traceable back to how it was produced. In Azure Machine Learning, MLflow runs are the right place to capture parameters, training and validation metrics, artifacts, model outputs, and useful lineage details such as data asset version, environment, and source commit. That evidence lets a later promotion gate compare runs, a reviewer evaluate model quality, and an engineer troubleshoot why a model changed or failed. Registering the model can reference the run output, but registration alone is not a substitute for run evidence. Production endpoint metrics are useful after deployment, not for proving what happened during training.
Topic: Design and Implement a GenAIOps Infrastructure
A team maintains Microsoft Foundry prompt variants in Git for a claims triage assistant. A pull request fails the automated evaluation, and reviewers cannot isolate which behavior changed.
Exhibit: PR summary
File changed: prompts/claims_triage_v4.prompt
Diff size: +1,160 / -40 lines
Prompt contents now include:
- classify claim severity
- draft a customer email
- summarize prior conversations
- full policy reference table pasted inline
Only variable: {{customer_message}}
Evaluation note: classification cases now return email text and summaries.
What is the best root cause indicated by the evidence?
Options:
A. The foundation model deployment lacks provisioned throughput.
B. The prompt is too monolithic for versioned evaluation.
C. The prompt should be edited only in the Foundry portal.
D. The evaluation dataset is missing latency measurements.
Best answer: B
Explanation: Prompt artifacts should be designed so they are task-focused, reviewable, and versionable. In this case, the same prompt now performs classification, email drafting, and summarization while also embedding a full policy table. That explains both symptoms: classification tests return non-classification content, and reviewers cannot isolate the behavioral change from a large mixed diff. A better operational design would separate task-specific prompts or variants, use clear input variables, and keep large reference content in a managed retrieval or configuration layer when appropriate. Throughput, portal editing, and latency metrics do not explain the visible prompt-version and evaluation behavior.
Topic: Implement Machine Learning Model Lifecycle and Operations
A team has an Azure Machine Learning real-time endpoint serving model version v1 in production. They registered model version v2 and want v2 to receive only 10% of live requests for one week while keeping the existing endpoint URL unchanged. Which implementation should they use?
Options:
A. Create a batch endpoint for v2 and route live calls to it
B. Replace the existing deployment with v2 and monitor errors
C. Add v2 as a new deployment and set endpoint traffic to 10%
D. Register v2 in the workspace and wait for endpoint auto-upgrade
Best answer: C
Explanation: Progressive rollout for an Azure Machine Learning real-time endpoint is implemented by deploying the new model version as an additional deployment behind the existing managed online endpoint, then assigning a small percentage of endpoint traffic to that deployment. In this scenario, v1 can keep 90% of traffic while v2 receives 10%, and clients continue using the same endpoint URL. The team can monitor production metrics and either increase v2 traffic or roll back by setting its traffic allocation to 0%. Replacing the deployment skips the limited-release stage, and model registration alone does not affect serving traffic.
v2 immediately instead of limiting exposure.Topic: Design and Implement a GenAIOps Infrastructure
A GitHub Actions workflow deploys a Microsoft Foundry resource and a foundation model deployment by using Bicep. The resource exists, but the deployment job fails. The team must keep the deployment automated and use least privilege.
Evidence:
az deployment group create --resource-group rg-genai-prod --template-file main.bicep
ERROR AuthorizationFailed:
Client 'sp-aiops-ci' does not have authorization to perform action
'Microsoft.CognitiveServices/accounts/deployments/write'
at scope '/subscriptions/<sub>/resourceGroups/rg-genai-prod/providers/Microsoft.CognitiveServices/accounts/foundry-prod'.
resource foundry 'Microsoft.CognitiveServices/accounts@2024-10-01' existing = {
name: 'foundry-prod'
}
resource modelDeployment 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = {
parent: foundry
name: 'gpt-prod'
properties: { model: { name: 'gpt-4o' } }
}
Which action should you take?
Options:
A. Enable a system-assigned managed identity on the Foundry resource.
B. Grant the pipeline identity deployment-write permission on the Foundry resource.
C. Create the model deployment manually in the Foundry portal.
D. Add an explicit dependsOn from the deployment to the Foundry resource.
Best answer: B
Explanation: The visible failure is an Azure RBAC authorization error for the GitHub Actions service principal, not a Bicep dependency or model configuration problem. The Bicep resource uses parent: foundry, so the model deployment is scoped correctly under the existing Foundry resource. To preserve automation and least privilege, assign the pipeline identity a role at the Foundry resource or resource-group scope that includes Microsoft.CognitiveServices/accounts/deployments/write, such as an appropriate Cognitive Services OpenAI contributor role, and rerun the same IaC deployment.
The key troubleshooting step is to follow the failing action and caller in the error message before changing the template structure.
parent already establishes the resource scope.Topic: Design and Implement an MLOps Infrastructure
A machine learning team must recreate the same Azure Machine Learning workspace, datastore, and compute cluster in dev, test, and prod. The environment must be provisioned from GitHub without manual portal steps. Which GitHub Actions configuration best supports this requirement?
Options:
A. Create resources in the portal and export screenshots to the repo
B. Run a notebook that creates compute after the workspace exists
C. Use OIDC sign-in and deploy parameterized Bicep with Azure CLI
D. Enable GitHub integration in the workspace and sync notebooks
Best answer: C
Explanation: For a reproducible MLOps environment, resource provisioning should be defined as infrastructure as code and executed by automation. A GitHub Actions workflow can authenticate to Azure, deploy parameterized Bicep templates, and run Azure CLI commands to create or update Azure Machine Learning resources consistently across environments. Parameters let the same template target dev, test, and prod without changing the resource definitions.
The key distinction is source-controlled provisioning versus workspace code synchronization. GitHub integration helps manage project files, but it does not by itself define and recreate the Azure resources needed for the environment.
Topic: Design and Implement a GenAIOps Infrastructure
Two GenAIOps teams collaborate in Microsoft Foundry. Team A’s nightly evaluation suddenly runs against Team B’s foundation-model deployment, but Team A’s prompt Git repository has no commit for the change. The run log shows both teams use the same Foundry project, project environment, and managed identity.
Which project-environment issue is the most likely root cause?
Options:
A. Shared project environment with broad team access
B. Insufficient provisioned throughput units for Team A
C. Missing groundedness metric in the evaluation workflow
D. Private endpoint DNS resolution failure
Best answer: A
Explanation: Controlled collaboration in Microsoft Foundry depends on isolating project environments and scoping access so one team’s deployment, prompt, or runtime configuration does not unintentionally affect another team’s work. Here, the prompt repository did not change, but the run used a shared project environment and managed identity. That points to configuration bleed-through from shared environment settings or permissions, not a model-quality or networking symptom. A better setup is separate project environments, appropriately scoped RBAC, managed identities, and Git-backed prompt/version control for each team or controlled shared workspace boundary.
Topic: Optimize Generative AI Systems and Model Performance
A team ran an A/B test to compare two RAG configurations for a Foundry chat solution. The relevance score improved for one variant, but reviewers cannot tell whether retrieval caused the change.
| Setting | Variant 1 | Variant 2 |
|---|---|---|
| Prompt | support-v3 | support-v4 |
| Model deployment | gpt-4o-mini | gpt-4o |
| Index version | kb-index-12 | kb-index-13 |
| Retrieval | semantic, top 5 | hybrid, top 8 |
What is the best next diagnostic step?
Options:
A. Promote Variant 2 because relevance is higher.
B. Increase traffic until the current difference stabilizes.
C. Rerun matched variants changing only retrieval settings.
D. Replace both prompts and retest with a new dataset.
Best answer: C
Explanation: A valid A/B test for RAG optimization isolates the variable being tested. Here, the variants changed retrieval method and top-k, but they also changed prompt version, model deployment, and index version. Any relevance difference could come from any of those changes, so the current evidence does not support attributing the improvement to the retrieval configuration. The next diagnostic step is to rerun the comparison with the same prompt, model, index, evaluation dataset, routing rules, and monitoring setup, changing only the RAG configuration under test. More traffic cannot fix confounding variables; it only measures a flawed comparison more precisely.
Use the Microsoft AI-300 Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try Microsoft AI-300 on Web View Microsoft AI-300 Practice Test
Read the Microsoft AI-300 Cheat Sheet for compact concept review before returning to timed practice.