Browse Certification Practice Tests by Exam Family

Free Microsoft AI-300 Full-Length Practice Exam: 50 Questions

Try 50 free Microsoft AI-300 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length Microsoft AI-300 practice exam includes 50 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the Microsoft AI-300 Cheat Sheet for compact concept review before returning to timed practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try Microsoft AI-300 on Web View full Microsoft AI-300 practice page

Exam snapshot

  • Exam route: Microsoft AI-300
  • Practice-set question count: 50
  • Time limit: 120 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Design and Implement an MLOps Infrastructure19%
Implement Machine Learning Model Lifecycle and Operations29%
Design and Implement a GenAIOps Infrastructure24%
Implement Generative AI Quality Assurance and Observability14%
Optimize Generative AI Systems and Model Performance14%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Design and Implement a GenAIOps Infrastructure

A team is deploying a chat-based generative AI workload in Microsoft Foundry. Load testing shows a steady requirement of 8,000 tokens per minute during business hours, and the operations team must reserve predictable capacity rather than rely on best-effort shared throughput. Which implementation should you use?

Options:

  • A. Deploy the model as a standard serverless API endpoint

  • B. Deploy the model with provisioned throughput units sized for the target token rate

  • C. Create a larger Foundry project environment

  • D. Register the prompt variant in a Git repository

Best answer: B

Explanation: Provisioned throughput is a capacity-planning choice for high-volume foundation model workloads with explicit throughput requirements. In this scenario, the key constraint is not just deploying a model; it is reserving predictable capacity for a known token-per-minute target. A standard model deployment or serverless API endpoint can make the model available, but it does not by itself satisfy the requirement to reserve throughput. Prompt versioning and project environment sizing are useful GenAIOps practices, but they do not allocate dedicated model-serving capacity. The operational distinction is: ordinary deployment exposes the model, while provisioned throughput planning reserves capacity for expected load.

  • Serverless deployment exposes the model but does not meet the stated requirement for reserved predictable capacity.
  • Prompt versioning helps manage prompt lifecycle, not serving throughput.
  • Project environment sizing affects project resources, not dedicated foundation model throughput.

Question 2

Topic: Design and Implement a GenAIOps Infrastructure

A team uses GitHub Actions to deploy Microsoft Foundry infrastructure with Bicep. The workflow creates a Foundry resource, a Foundry project with a managed identity, and a role assignment that lets the project identity access Azure AI Search. The deployment fails.

Exhibit:

Scope: rg-genai-prod
Workflow principal: sp-gh-foundry-deploy
Current role on scope: Contributor

Failed operation:
Microsoft.Authorization/roleAssignments/write

Error:
The client 'sp-gh-foundry-deploy' does not have authorization
to perform action 'Microsoft.Authorization/roleAssignments/write'.

Which configuration change should you make before rerunning the deployment?

Options:

  • A. Add a Bicep dependsOn from the role assignment to the project.

  • B. Grant the workflow principal Role Based Access Control Administrator on the scope.

  • C. Change the Foundry project to use a user-assigned managed identity.

  • D. Store the Azure AI Search admin key as a GitHub secret.

Best answer: B

Explanation: The failure is an authorization problem for the deployment principal, not a Foundry project identity or dependency problem. In Azure, creating resources with Contributor is different from assigning Azure RBAC roles. A Bicep deployment that includes Microsoft.Authorization/roleAssignments needs the deploying identity to have role-assignment permissions at the target scope, such as Role Based Access Control Administrator or User Access Administrator. After that permission is granted, the same Bicep template can create the Foundry resources and assign the project managed identity access to Azure AI Search. The key clue is the failed roleAssignments/write operation.

  • Identity type change does not help because either system-assigned or user-assigned identities still require a permitted deployer to create RBAC assignments.
  • Explicit dependency addresses ordering issues, but the visible error is authorization failure for roleAssignments/write.
  • Search admin key storage bypasses the intended managed identity pattern and does not fix the failed RBAC operation.

Question 3

Topic: Design and Implement an MLOps Infrastructure

An ML engineering team is onboarding a fraud model. Training runs are submitted from individual notebooks, model files are stored in separate storage accounts, and the first endpoint deployment fails.

Deployment step: resolve model asset
Result: failed
Message: No Azure Machine Learning workspace target was provided for the model asset or endpoint.

The team wants a controlled place to organize experiments, assets, jobs, and deployments before adding automation. What should you do next?

Options:

  • A. Register the model in a cross-workspace Azure Machine Learning registry.

  • B. Add a GitHub Actions workflow to rerun the deployment step.

  • C. Create a shared Azure Machine Learning workspace and target it consistently.

  • D. Create a datastore for each storage account used by the notebooks.

Best answer: C

Explanation: The visible failure is not a model packaging or automation problem; the deployment step has no Azure Machine Learning workspace target. A workspace is the controlled management place where teams organize and govern experiments, data and model assets, jobs, compute references, and endpoints. The next step is to create or select a shared workspace and ensure the team’s jobs, registrations, and deployments all target it consistently. After that boundary exists, datastores, registries, and GitHub Actions can be added as supporting capabilities.

  • Datastores only help connect storage, but they do not provide the overall management boundary for experiments, jobs, assets, and endpoints.
  • Registry first is premature because registries are for sharing assets across workspaces, not replacing the workspace target needed here.
  • Rerun automation would repeat the same failure because the missing workspace target has not been resolved.

Question 4

Topic: Implement Generative AI Quality Assurance and Observability

A team uses a Microsoft Foundry evaluation workflow as a release gate for a customer-support GenAI agent. The deployment policy says to block promotion when any safety category has reproducible high-severity findings, even if quality metrics pass.

Evaluation summary:

CheckResult
GroundednessPass
RelevancePass
Jailbreak resistanceHigh severity, reproducible
Self-harm contentLow severity, mitigated

Which configuration should the team apply to the release gate?

Options:

  • A. Block promotion until safety changes pass reevaluation

  • B. Promote because quality metrics passed

  • C. Retest immediately without changing the agent

  • D. Enable tracing only after production release

Best answer: A

Explanation: Risk and safety evaluation results should override quality-pass signals when the release policy defines high-severity findings as blocking. In this case, groundedness and relevance are acceptable, but jailbreak resistance has a reproducible high-severity failure. The release gate should prevent promotion, require a safety-focused change such as prompt, grounding, filtering, or orchestration mitigation, and then rerun the evaluation before release. A retest-only action is appropriate when the evaluation is inconclusive or misconfigured, not when a reproducible failure is already shown. The key takeaway is that passing quality metrics does not compensate for a blocking safety result.

  • Quality-only promotion fails because relevance and groundedness do not cancel a high-severity jailbreak finding.
  • Retest without change fails because the issue is reproducible, not an evaluator setup problem.
  • Post-release tracing fails because observability after release does not satisfy a pre-release safety gate.

Question 5

Topic: Design and Implement a GenAIOps Infrastructure

A GitHub Actions workflow provisions Microsoft Foundry infrastructure with Bicep, then deploys a foundation model and runs evaluations. The provision stage reports success, but the deployment stage fails and the evaluation stage is skipped.

Exhibit: Pipeline evidence

Provision outputs:
  foundryResourceName: contoso-ai-prod
  projectName: <empty>
  modelDeploymentName: <empty>
  evalStorageAccount: stcontosoeval

Deploy log:
  Target project: claims-prod
  Error: project 'claims-prod' was not found in the Foundry resource.

What is the best next diagnostic step?

Options:

  • A. Increase the evaluation job timeout before rerunning the workflow.

  • B. Inspect the Bicep deployment for missing project and model resources.

  • C. Tune provisioned throughput for the model deployment.

  • D. Compare prompt variants in the Git repository.

Best answer: B

Explanation: The failure occurs before evaluation begins, so the first diagnostic focus should be the infrastructure automation boundary. The provision stage succeeded, but its outputs show empty project and model deployment values, and the deploy log says the target Foundry project does not exist. That points to a Bicep or Azure CLI provisioning gap, not an evaluation, prompt, or runtime performance issue. A preflight validation should confirm that required resources are declared, deployed, and output for downstream stages, including the Foundry resource, project environment, managed identity/RBAC as needed, foundation-model deployment target, and evaluation storage. The closest distractors assume later lifecycle stages are reachable, but the evidence shows the pipeline has not successfully provisioned the deployment target.

  • Timeout change fails because the evaluation job never started; the pipeline stopped at deployment.
  • Prompt comparison fails because prompt variants do not explain a missing Foundry project.
  • Throughput tuning fails because there is no visible model deployment to tune yet.

Question 6

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning pipeline has a prep step that writes cleaned data and a train step that consumes it. The latest run fails before the training script starts. You must fix the pipeline without changing the compute target, environment image, or source data asset.

Exhibit:

prep.outputs:
  clean_data: uri_folder

train.inputs:
  training_data: ${{parent.jobs.prep.outputs.preprocessed_data}}

train.error:
  UserError: Failed to resolve input 'training_data'.
  Output 'preprocessed_data' was not found on job 'prep'.

Which implementation should you apply?

Options:

  • A. Re-register the source data asset used by prep.

  • B. Resize the compute cluster used by train.

  • C. Rebuild the environment with the training dependencies.

  • D. Bind training_data to prep.outputs.clean_data.

Best answer: D

Explanation: The decisive evidence is a pipeline step dependency failure, not a code, data, environment, or compute failure. The error occurs before the training script starts and explicitly says the train input cannot resolve prep.outputs.preprocessed_data. The prep step exposes clean_data, so the downstream binding must use that output name. Because the constraints say not to change compute, environment, or the source data asset, the operationally correct fix is to update only the pipeline dependency wiring between steps.

A useful troubleshooting pattern is to first locate when the failure occurs: input resolution before execution usually points to pipeline bindings or data references, while failures inside the script point more toward code or environment issues.

  • Environment rebuild is unnecessary because the job never reaches user code or dependency import execution.
  • Compute resize does not address an unresolved pipeline output reference.
  • Data asset registration is not implicated because prep completed and the failure is between prep and train.

Question 7

Topic: Implement Machine Learning Model Lifecycle and Operations

A GitHub Actions rollout job for an Azure Machine Learning managed online endpoint fails with: Model asset is archived and cannot be deployed. The job deploys models:/churn-risk@champion.

Registry state:

VersionAliasLifecycle stateValidation
18championArchivedPassed
19candidateActivePassed
20noneActiveFailed

What is the best root cause?

Options:

  • A. The endpoint cannot deploy MLflow-registered models.

  • B. Version 19 failed validation and is blocked.

  • C. The champion alias points to archived version 18.

  • D. Version 20 is the current deployable model.

Best answer: C

Explanation: The rollout job is not selecting the highest version number automatically; it is resolving the explicit registry reference models:/churn-risk@champion. In the visible registry state, the champion alias is attached to version 18. That version has passed validation, but its lifecycle state is Archived, so it is not eligible for deployment. Version 19 is active and passed validation, making it eligible, but it is only marked as candidate, not the current champion. Version 20 is active but failed validation, so it should not be promoted or deployed under the stated process.

The key diagnostic is to compare the alias used by the rollout job with the lifecycle state of the resolved model version.

  • Candidate confusion fails because version 19 is active and passed validation, but the job did not reference the candidate alias.
  • Highest version assumption fails because version 20 has no current alias and failed validation.
  • Service capability mismatch fails because Azure Machine Learning can deploy registered MLflow models when the referenced version is deployable.

Question 8

Topic: Optimize Generative AI Systems and Model Performance

A GenAIOps team is fine-tuning a foundation model in Microsoft Foundry to summarize customer support tickets. The team generated synthetic training examples from product documentation and wants to prevent the model from learning unrelated marketing-style responses. Before registering the fine-tuned model for deployment, which implementation best validates that the synthetic data supports the target task?

Options:

  • A. Deploy the model and monitor only latency and token usage.

  • B. Add more synthetic examples from broader product content.

  • C. Run a held-out ticket-summary evaluation with task-specific quality gates.

  • D. Approve the model when fine-tuning training loss decreases steadily.

Best answer: C

Explanation: For synthetic-data or fine-tuning validation, the key is to test the model against the target task, not just the training process. In this scenario, the team should use a held-out evaluation dataset of representative support tickets with expected summaries, then apply task-specific metrics or rubrics such as relevance, groundedness, coherence, and off-task behavior checks. This can be automated as a release gate before model registration or deployment in Microsoft Foundry. Training loss can show optimization progress, but it does not prove the model learned the right behavior. Operational metrics such as latency and token usage are useful later, but they do not validate task alignment.

  • Training loss only fails because lower loss can still reflect overfitting or learning patterns from unrelated synthetic content.
  • Broader synthetic content increases the risk of off-task behavior instead of validating summarization quality.
  • Latency-only monitoring checks runtime performance, not whether responses remain relevant and grounded.

Question 9

Topic: Implement Machine Learning Model Lifecycle and Operations

A team deployed a churn classifier to an Azure Machine Learning managed online endpoint. Ground-truth labels are written nightly to a curated data asset. The operations lead wants dashboards and alerts when live model quality drops below AUC 0.80 after release. Which configuration should you implement?

Options:

  • A. Schedule weekly retraining without production labels or metric thresholds.

  • B. Configure model monitoring with predictions, labels, AUC, and an alert threshold.

  • C. Configure data drift monitoring against the training feature distribution only.

  • D. Enable endpoint telemetry for latency, failures, and request volume only.

Best answer: B

Explanation: Production model quality remains visible by monitoring model performance metrics after deployment. For a classifier, Azure Machine Learning must have production predictions and corresponding ground-truth labels to calculate metrics such as AUC, accuracy, precision, or recall. The alert threshold makes the metric operational by notifying the team when quality crosses an agreed limit. Endpoint telemetry and drift detection are useful, but they answer different questions: whether the endpoint is healthy or whether input distributions changed. They do not directly prove the model is still predicting well.

  • Endpoint telemetry only tracks service health, not predictive quality such as AUC.
  • Drift only can reveal feature-distribution changes but does not measure labeled model performance.
  • Weekly retraining may update the model, but without labels and thresholds it does not provide post-release quality visibility.

Question 10

Topic: Implement Machine Learning Model Lifecycle and Operations

A team has registered a fraud model in an Azure Machine Learning workspace. The model must score all transactions uploaded to a datastore every night and write predictions back to storage. There is no requirement for per-request low latency. Which endpoint deployment configuration should the team use?

Options:

  • A. Create a real-time managed online endpoint.

  • B. Create a batch endpoint with a batch deployment on AML compute.

  • C. Expose the model through a Microsoft Foundry serverless API endpoint.

  • D. Deploy the model as a local endpoint on the training compute.

Best answer: B

Explanation: Azure Machine Learning batch endpoints fit production inference workloads that can run asynchronously over files, folders, or tabular data in bulk. In this scenario, the nightly schedule, datastore input, and storage output are stronger requirements than low-latency request handling. A batch deployment can reference the registered model, use AML compute for scalable scoring, and be invoked by a scheduled job or pipeline when new transactions arrive.

Managed online endpoints are better for synchronous, low-latency inference. The key takeaway is to match the endpoint type to the inference pattern: bulk and scheduled processing maps to batch endpoints.

  • Online endpoint is tempting for production use, but it is optimized for synchronous request-response inference rather than nightly bulk scoring.
  • Local endpoint is not a production deployment pattern for scheduled scoring over datastore inputs.
  • Foundry serverless API targets foundation-model API consumption, not Azure Machine Learning batch scoring of a registered ML model.

Question 11

Topic: Optimize Generative AI Systems and Model Performance

A team uses Microsoft Foundry to fine-tune a foundation model that rewrites support replies for a specialized product line. The fine-tuning job completes, but evaluation shows poor results on issue types that are not represented in the collected examples.

EvidenceValue
Collected examples38 labeled replies
Intended issue types12
Issue types with no examples5
SME knowledgeAvailable for valid case descriptions

What is the best next diagnostic step?

Options:

  • A. Create SME-reviewed synthetic examples for the missing issue types.

  • B. Lower the prompt temperature during evaluation runs.

  • C. Increase provisioned throughput for the model deployment.

  • D. Tune the RAG similarity threshold for support documents.

Best answer: A

Explanation: Synthetic data is appropriate when the fine-tuning task needs examples that the available dataset does not contain. Here, the job completed, so the symptom is not a training infrastructure failure. The evaluation weakness aligns with five missing issue types and only 38 labeled examples. The best next step is to use SME knowledge to create realistic, labeled synthetic examples for the uncovered cases, review them for quality and safety, add them to the fine-tuning dataset, and rerun evaluation. This targets the customization gap directly instead of changing deployment capacity or inference settings.

  • Throughput change addresses capacity and latency, not missing labeled examples for fine-tuning.
  • Temperature tuning can affect response variability, but it does not teach the model issue types absent from training.
  • RAG threshold tuning applies to retrieval quality, while the evidence describes a fine-tuning dataset coverage problem.

Question 12

Topic: Design and Implement an MLOps Infrastructure

A machine learning team must create identical Azure Machine Learning workspaces, storage accounts, container registries, and compute targets in dev and prod. The environment must be reproducible from source control, and provisioning must run automatically when infrastructure changes are merged to the main branch. Which implementation should the team use?

Options:

  • A. Create the resources manually in Azure portal and export the template

  • B. Register the workspace resources as MLflow artifacts

  • C. Use an Azure Machine Learning pipeline to create the workspace

  • D. Run a GitHub Actions workflow that deploys Bicep with Azure CLI

Best answer: D

Explanation: For reproducible MLOps infrastructure, the resource definitions should be stored as infrastructure as code and deployed by an automated workflow. A GitHub Actions workflow can trigger on changes to the infrastructure folder or merges to main, authenticate to Azure, and run Azure CLI commands that deploy Bicep templates. This approach makes workspace resources such as Azure Machine Learning workspaces, storage, container registries, and compute targets consistent across environments and reviewable through Git history.

Manual creation does not provide reliable repeatability, and ML training pipelines or MLflow artifacts are for model lifecycle operations, not provisioning Azure resources. The key is to automate Azure resource deployment from version-controlled IaC.

  • Portal export may help capture an existing state, but manual provisioning is not the reproducible automation requested.
  • ML pipeline provisioning confuses training orchestration with infrastructure deployment.
  • MLflow artifacts track experiment outputs and models, not Azure resource definitions or environment provisioning.

Question 13

Topic: Design and Implement an MLOps Infrastructure

A team uses GitHub Actions to submit Azure Machine Learning training pipelines and register models. The Azure Machine Learning workspace has a private endpoint and public network access disabled. The team must keep the workspace private and avoid storing long-lived credentials in GitHub. Which implementation best supports the automation securely?

Options:

  • A. Use a GitHub-hosted runner and enable public network access only during workflow runs.

  • B. Store an Azure service principal secret in GitHub and assign Owner on the subscription.

  • C. Use a self-hosted runner in the workspace VNet with OIDC-based Azure authentication and least-privilege RBAC.

  • D. Move the training script to an Azure Machine Learning compute cluster and grant all developers workspace Contributor.

Best answer: C

Explanation: Secure MLOps automation must satisfy both identity and network constraints. With public network access disabled, the automation runner needs network line of sight to the workspace private endpoint, such as a self-hosted GitHub Actions runner placed in the connected VNet. For identity, GitHub OIDC federation with Microsoft Entra ID avoids long-lived secrets and provides short-lived tokens. RBAC should be scoped to the workspace and any required dependent resources, not broadly to the subscription. This combination lets the pipeline submit jobs and register assets without exposing the workspace publicly or over-permissioning the automation identity.

  • Temporary public access weakens the stated private-workspace constraint and can be hard to secure reliably for hosted runner ranges.
  • Stored secrets and Owner violates both credential hygiene and least-privilege access control.
  • Compute-only change does not solve GitHub workflow access to the private workspace and over-grants human users.

Question 14

Topic: Design and Implement a GenAIOps Infrastructure

A team is creating a Microsoft Foundry project for a customer-support agent. The agent will use a foundation-model deployment, Azure AI Search indexes, and a storage account that contains evaluation datasets. Security requires Microsoft Entra ID-based access, no service keys in code or prompts, and no public network path between the project and the dependent resources.

Which configuration should you create?

Options:

  • A. Azure Machine Learning datastore connections for the evaluation datasets

  • B. GitHub repository secrets containing search and storage access keys

  • C. Foundry project managed identity, RBAC assignments, and private endpoints

  • D. Public endpoints with IP firewall rules for developer workstations

Best answer: C

Explanation: For a Foundry project that must operate without service keys and without public network access, use managed identity, RBAC, and private networking. The project identity can be granted only the permissions it needs on Azure AI Search and storage, while private endpoints keep traffic on private network paths. This supports GenAIOps operations such as deployments, evaluations, and agent workflows without embedding credentials in code, prompts, or repository settings.

Key takeaway: identity-based authorization and private connectivity are the required resource configuration pattern for this security posture.

  • Repository secrets still rely on keys, which violates the requirement to avoid service keys in code or prompts.
  • IP firewall rules do not remove the public network path and are workstation-centered rather than project-centered.
  • AML datastores are for Azure Machine Learning data access patterns, not the primary Foundry project configuration for this agent.

Question 15

Topic: Implement Generative AI Quality Assurance and Observability

A team is adding a quality gate to a GitHub Actions workflow for a RAG chatbot deployed through Microsoft Foundry. The current workflow passes when the prompt flow executes without errors and returns HTTP 200. The release must be blocked when answers are poorly grounded in retrieved sources, even if the endpoint is healthy. Which implementation best meets the requirement?

Options:

  • A. Run a Foundry evaluation on a mapped test dataset and fail on groundedness threshold breaches

  • B. Call the deployed endpoint and fail only when the response code is not 200

  • C. Track average latency and token consumption for each test prompt

  • D. Validate that every response is well-formed JSON with required fields

Best answer: A

Explanation: Automated evaluation should test the quality outcome the release gate is meant to protect. For a RAG chatbot, poor grounding means generated answers are not sufficiently supported by retrieved source content. A Foundry evaluation workflow should use a representative test dataset with the required data mapping, such as prompt, generated answer, retrieved context, and optionally expected answer or ground truth. The workflow can then calculate quality metrics such as groundedness and fail the release when the metric breaches the defined threshold. Execution checks, schema checks, and operational metrics are useful, but they do not prove that the model’s answer is supported by the retrieved evidence.

  • Endpoint health only confirms availability, not whether the generated answer is grounded in retrieved content.
  • Schema validation catches malformed output, but a valid JSON response can still contain unsupported claims.
  • Latency and tokens help with observability and cost control, but they do not measure answer quality.

Question 16

Topic: Design and Implement an MLOps Infrastructure

An MLOps team provisions an Azure Machine Learning workspace by using Bicep. Governance policy allows only resources declared in the template; Azure Machine Learning cannot create dependent resources later.

Workspace setup facts:

ResourceStatus
Storage accountConfigured
Key VaultConfigured
Application InsightsConfigured
Container RegistryMissing

The team must run training jobs that use curated environments now and build custom environments for endpoint deployment next sprint. What should the engineer implement?

Options:

  • A. Block all training jobs until the container registry exists.

  • B. Replace the storage account with a container registry.

  • C. Proceed now; add a workspace container registry before custom image builds.

  • D. Skip model registration until Application Insights is removed.

Best answer: C

Explanation: Azure Machine Learning workspaces rely on dependent resources for different downstream activities. The configured storage account, Key Vault, and Application Insights support workspace artifacts, secrets, and monitoring. A missing container registry does not necessarily block using curated environments for immediate training jobs, but it matters when Azure Machine Learning must build or store custom container images for environments used in deployment. Because the stem states that undeclared dependent resources cannot be created later, the registry must be added before the custom environment and endpoint work begins.

The key distinction is between activities that use existing curated assets and activities that require new workspace-managed container images.

  • Blocking all training is too broad because curated-environment jobs can run without building a new custom image.
  • Replacing storage fails because storage is still needed for workspace artifacts, datasets, and job outputs.
  • Removing monitoring is unrelated because Application Insights supports monitoring and does not prevent model registration.

Question 17

Topic: Design and Implement an MLOps Infrastructure

A machine learning team has separate Azure Machine Learning workspaces for development, validation, and production. Only models, environments, and components that pass validation can be reused in production pipelines. The team wants a repeatable way to promote approved asset versions without exporting files manually. Which registry-sharing configuration should you use?

Options:

  • A. Copy asset files from the development datastore into each production workspace datastore.

  • B. Store model binaries and environment files in GitHub and clone them into each workspace.

  • C. Register all assets directly in the production workspace from development jobs.

  • D. Publish approved asset versions to a shared Azure Machine Learning registry and grant production read access.

Best answer: D

Explanation: Azure Machine Learning registries are designed for sharing versioned assets, such as models, environments, and components, across workspaces. In this scenario, the approved asset version should be promoted to a shared registry after validation, and production pipelines should reference the registry asset version. RBAC can separate responsibilities: validation or release automation can publish approved versions, while production workspaces or service identities can consume them. This avoids manual file export, keeps asset lineage and versions explicit, and prevents development jobs from directly changing production workspace assets.

  • Datastore copying moves files but does not provide registry-level asset versioning or reusable Azure Machine Learning asset references.
  • GitHub storage is useful for source control, but it is not the right mechanism for sharing registered model, environment, and component assets.
  • Direct production registration bypasses the validation promotion boundary and couples development jobs to production asset state.

Question 18

Topic: Design and Implement a GenAIOps Infrastructure

A team is preparing a Microsoft Foundry deployment for a production RAG assistant. Traffic is consistently high during business hours, latency must be predictable, and finance wants to reduce exposure to per-token cost spikes. Development and test workloads remain low volume. Which implementation approach should the team use for the production model deployment?

Options:

  • A. Use only a serverless API endpoint for all environments

  • B. Use provisioned throughput sized from load tests

  • C. Increase prompt retries to smooth latency spikes

  • D. Fine-tune the model before choosing deployment capacity

Best answer: B

Explanation: Provisioned throughput is the operational fit when a Foundry foundation-model deployment has sustained, predictable high volume and needs stable latency and cost planning. The team should estimate required capacity from realistic load tests, deploy production with provisioned throughput units, and monitor utilization so capacity can be adjusted as demand changes. Low-volume development and test workloads can remain on lower-cost, consumption-oriented options if they do not need the same reserved capacity.

The key distinction is workload predictability: serverless-style consumption is often simpler for variable or low-volume traffic, but it does not provide the same reserved-capacity planning model for sustained production demand.

  • Serverless only is attractive for simplicity, but it leaves production exposed to consumption variability and does not reserve capacity for the steady high-volume workload.
  • Prompt retries may hide transient failures, but they can increase token usage and do not create predictable model-serving capacity.
  • Fine-tuning first may improve task quality or efficiency, but it does not replace capacity planning for production throughput and latency.

Question 19

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning training pipeline logs each job to MLflow. The release gate should register the run with the highest test_f1 only if p95_latency_ms is at most 80 and the validation-to-test F1 drop is at most 0.03.

MLflow runval_f1test_f1p95_latency_ms
run-a0.910.8662
run-b0.890.8874
run-c0.870.8745
run-d0.900.8996

Which run should the pipeline register as the candidate model?

Options:

  • A. Register run-b.

  • B. Register run-d.

  • C. Register run-c.

  • D. Register run-a.

Best answer: A

Explanation: MLflow run evidence should be compared against the stated release objective, not a single metric in isolation. The objective first filters out runs that violate operational gates: latency must be at most 80 ms, and the validation-to-test F1 drop must be no more than 0.03. run-a has a drop of 0.05, and run-d exceeds the latency limit. That leaves run-b and run-c; among those eligible runs, run-b has the higher test_f1 value. The key takeaway is to apply constraints first, then optimize the target metric among the remaining candidates.

  • Best validation score fails because run-a drops from 0.91 to 0.86, exceeding the allowed generalization gap.
  • Lowest latency fails because run-c meets the gates but has lower test_f1 than another eligible run.
  • Best test score overall fails because run-d violates the stated latency gate.

Question 20

Topic: Design and Implement a GenAIOps Infrastructure

An operations team is preparing a Microsoft Foundry chat deployment for a high-volume launch. The model responds correctly in a smoke test, but production has a stated load target.

RequirementValue
Peak traffic900 requests/minute
Average tokens1,200 tokens/request
p95 latency target2 seconds
Capacity estimate45 PTUs

Which configuration best validates production readiness?

Options:

  • A. Use a serverless endpoint and run a health probe

  • B. Allocate 45 PTUs and run a production-shaped load test

  • C. Increase client retries and keep the current deployment

  • D. Allocate 45 PTUs and skip load validation

Best answer: B

Explanation: Provisioned throughput planning for high-volume Foundry workloads must be tied to the expected production traffic pattern, not just to whether the model endpoint responds. The stem provides a peak request rate, average token volume, latency target, and a PTU estimate. The readiness decision should configure the model deployment with the estimated PTU allocation and validate it with load that resembles production, measuring latency and throttling under that load. A smoke test or health probe only proves that the deployment is reachable. Retries can hide transient failures but do not create capacity and may worsen latency during saturation.

  • Health probe only fails because it proves model availability, not sustained throughput at 900 requests/minute.
  • Allocation without testing fails because PTU sizing still needs validation against latency and throttling targets.
  • Client retries fail because retries do not add provisioned capacity and can increase load during bottlenecks.

Question 21

Topic: Design and Implement a GenAIOps Infrastructure

A GenAIOps team must provision the same Microsoft Foundry resource, project environment settings, managed identity, RBAC assignments, and private networking in development, test, and production subscriptions. The team wants repeatable deployments from source control with environment-specific values supplied at release time. Which configuration choice best meets the requirement?

Options:

  • A. Use prompt versioning to recreate project settings

  • B. Use parameterized Bicep templates deployed by Azure CLI

  • C. Create each Foundry project manually in the portal

  • D. Store setup steps in a shared runbook document

Best answer: B

Explanation: Repeatable Foundry infrastructure provisioning should use infrastructure as code. A Bicep template can define the Microsoft Foundry resource configuration, project environment settings, managed identities, RBAC assignments, and networking in a source-controlled, reviewable form. Parameters let the same template deploy to development, test, and production with different names, subscriptions, or network values while keeping the intended configuration consistent.

Manual portal setup and runbooks can describe the process, but they are more prone to drift and are harder to validate in pull requests. Prompt versioning is useful for tracking prompt assets, not provisioning Foundry infrastructure.

  • Manual portal setup fails because it does not provide repeatable, source-controlled infrastructure deployment.
  • Runbook documentation may standardize instructions, but it does not enforce the desired Foundry configuration.
  • Prompt versioning manages prompt changes, not Foundry resources, identities, RBAC, or networking.

Question 22

Topic: Implement Machine Learning Model Lifecycle and Operations

An MLOps team uses Azure Machine Learning to train a PyTorch vision model. The training script already supports torch.distributed and logs metrics and artifacts with MLflow. A single GPU node cannot meet the training-time target. You must keep one coordinated training run with synchronized gradients and register one resulting model. Which implementation should you use?

Options:

  • A. Launch independent training jobs and select the lowest-loss run.

  • B. Deploy the model to a managed online endpoint with multiple instances.

  • C. Create parallel pipeline components for separate epoch ranges.

  • D. Submit a distributed command job on a multi-node GPU compute cluster.

Best answer: D

Explanation: For a large or deep learning model that exceeds a single node, Azure Machine Learning should run a distributed training job on a compute cluster that can allocate multiple GPU nodes. Because the script already supports torch.distributed, the job configuration should specify the appropriate distribution settings, such as PyTorch distribution, process count per instance, and instance count. This keeps the workers in one coordinated training run, allows synchronized gradient updates, and preserves MLflow tracking and model artifact logging for registration. Parallel pipelines and independent jobs can run multiple tasks, but they do not automatically provide distributed gradient synchronization for one training run.

  • Pipeline split fails because separate components do not create one synchronized distributed training process.
  • Endpoint scaling fails because endpoint instances affect inference serving, not model training.
  • Independent jobs fail because selecting the best run is experiment comparison, not distributed training with shared gradients.

Question 23

Topic: Optimize Generative AI Systems and Model Performance

A team fine-tuned a Microsoft Foundry model for customer support and approved support-ft:4 for production after evaluation. After release, quality alerts continue to match the previous version.

Evidence:

SourceEvidence
Dev evaluationsupport-ft:4, groundedness 0.86, avg tokens 740
Release tagexpected model support-ft:4, prompt ticket-summary:12
Production endpointtraffic 100% to deployment using support-ft:3
Production tracesmodel support-ft:3, groundedness 0.62, avg tokens 1,250

What is the best root cause?

Options:

  • A. Production is still serving the previous fine-tuned model version.

  • B. The prompt version was not promoted with the model.

  • C. The evaluation dataset is too small for approval.

  • D. The endpoint needs more provisioned throughput units.

Best answer: A

Explanation: Versioning evidence should connect the approved fine-tuned model, release artifact, deployed endpoint, and production traces. Here, development evaluation approved support-ft:4, and the release tag also expected support-ft:4. However, the production endpoint routes all traffic to a deployment using support-ft:3, and production traces confirm that requests are being served by support-ft:3. The monitoring symptoms are therefore tied to the old model, not to the evaluated production candidate.

The next operational fix would be to update or roll out the production deployment so that traffic is routed to the approved fine-tuned model version, then continue monitoring quality and token metrics for that version.

  • Prompt mismatch fails because the release evidence shows prompt ticket-summary:12, and no production prompt mismatch is shown.
  • Dataset concern is unsupported because the visible failure is a deployed-version mismatch, not an evaluation-design issue.
  • Throughput scaling does not explain why both the endpoint and traces identify the old model version.

Question 24

Topic: Implement Generative AI Quality Assurance and Observability

A team uses Microsoft Foundry to deploy a customer-support copilot. A new prompt variant passes relevance, coherence, latency, and token-cost targets, but the risk and safety evaluation reports repeatable unsafe completions for adversarial test cases. The release workflow currently promotes variants automatically when performance metrics pass.

Which configuration change should the AI operations engineer implement?

Options:

  • A. Increase provisioned throughput for the model deployment

  • B. Tune chunk size and similarity threshold for retrieval

  • C. Add a risk-and-safety gate that blocks promotion for review

  • D. Promote the variant and monitor latency after release

Best answer: C

Explanation: Risk and safety evaluation findings should be treated differently from ordinary performance issues. If a prompt variant produces repeatable unsafe completions, the release workflow should prevent automatic promotion and require review of the prompt, selected model, guardrails/content filters, or deployment settings. Metrics such as latency, throughput, token consumption, relevance, and coherence can guide performance tuning, but they do not override a safety failure.

The key distinction is that safety issues are release-blocking quality risks, not optimization targets to tune around after deployment.

  • More throughput addresses capacity and latency, not unsafe content behavior.
  • Retrieval tuning may improve relevance or groundedness, but the stem identifies unsafe completions from adversarial safety tests.
  • Post-release monitoring is too late because the evaluation already found a repeatable safety issue before promotion.

Question 25

Topic: Design and Implement a GenAIOps Infrastructure

A team deployed a foundation model in Microsoft Foundry for a production agent workflow. The model works in the Foundry playground, but the agent workflow fails before any prompt steps run.

Diagnostic evidence:

Model deployment name: contoso-prod-gpt4o
Deployment status: Succeeded
Endpoint configured in agent: https://contoso-foundry-prod.example/models
Deployment configured in agent: contoso-gpt4o-prod
Trace status: 404 DeploymentNotFound

What is the most likely root cause?

Options:

  • A. The agent references the wrong deployment name.

  • B. The agent managed identity lacks RBAC access.

  • C. The content safety filter blocked the prompt.

  • D. The deployment needs more provisioned throughput.

Best answer: A

Explanation: Validating model consumption means checking that the intended application or agent can call the exact deployed foundation model endpoint with the correct deployment identifier and identity. In this case, the endpoint matches the production Foundry resource, and the deployment is in a succeeded state. The visible failure is a 404 DeploymentNotFound, and the configured deployment name in the agent differs from the actual model deployment name. That points to an application configuration mismatch, not a model availability or quality issue. The next fix is to update the agent workflow to reference contoso-prod-gpt4o and rerun a smoke test from the agent context.

  • RBAC issue would typically surface as an authorization failure such as 401 or 403, not a missing deployment.
  • Safety filtering happens after a model call is routed, so it does not explain a deployment lookup failure.
  • Throughput capacity problems usually appear as throttling, latency, or quota-related errors, not DeploymentNotFound.

Questions 26-50

Question 26

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning real-time endpoint started failing a post-deployment quality gate immediately after a model promotion. The team must roll back quickly and preserve an audit trail showing the exact model artifact that served traffic.

Registered modelVersionSource runRelease state
fraud-risk17run-744Previous production; gate passed
fraud-risk18run-811Current endpoint; gate failed

Endpoint deployment model reference: azureml:fraud-risk:18

What is the best next model-versioning action?

Options:

  • A. Register run-811 again as version 19.

  • B. Redeploy version 17 and archive version 18.

  • C. Retag run-744 as production only.

  • D. Overwrite version 18 with artifacts from run-744.

Best answer: B

Explanation: Registered model versions provide the audit boundary for rollback and promotion decisions. The endpoint is explicitly serving fraud-risk version 18, and the evidence shows that version 17 was the previous production version that passed the gate. A rollback should point the deployment back to the known-good registered version rather than altering artifacts in place. Archiving or demoting the failed version keeps its lineage available for investigation while reducing the chance it is promoted again.

The key takeaway is that rollback should move traffic to a different registered model version, not mutate the failed version.

  • Overwriting artifacts breaks auditability because the same version number would represent two different model contents.
  • Changing tags only does not change the endpoint model reference, so traffic would still serve version 18.
  • Registering the failed run again creates another version of the same failed artifact and does not roll back service traffic.

Question 27

Topic: Design and Implement an MLOps Infrastructure

A team uses an Azure Machine Learning pipeline to train a fraud model every week. The source files in an Azure Blob datastore are overwritten during nightly ingestion, but auditors must be able to rerun any released training job with the exact input snapshot used for that run. Which implementation should the team use?

Options:

  • A. Store the dataset version in the model description after registration.

  • B. Reference the datastore folder path directly in the pipeline.

  • C. Create a versioned data asset for each approved snapshot and reference asset_name:version in the pipeline.

  • D. Reference the data asset by name only so the pipeline uses the latest version.

Best answer: C

Explanation: Azure Machine Learning data assets provide a managed, versioned reference to data used by jobs and pipelines. For reproducibility, the training workflow should consume a specific data asset version, such as fraud_train:12, rather than a mutable datastore path or an unpinned asset name. This preserves the operational contract that a released model can be traced back to the intended input data. The underlying storage still needs appropriate retention, but the pipeline dependency should be expressed as a pinned data asset version.

  • Direct datastore path fails because overwritten files can change what a rerun reads.
  • Latest asset version fails because future registrations can silently change the pipeline input.
  • Model description metadata helps traceability, but it does not make the training job consume the intended data snapshot.

Question 28

Topic: Design and Implement a GenAIOps Infrastructure

A GenAIOps team deployed a chat workload in Microsoft Foundry. The internal web app authenticates by using a managed identity with RBAC. A security test shows the model deployment endpoint still resolves publicly and accepts requests from outside the corporate network when a valid token is used. The workload must avoid unnecessary public exposure and be reachable only from the application VNet.

What is the best diagnostic conclusion?

Options:

  • A. Private Link is missing for the Foundry endpoint.

  • B. The prompt version was not pinned in Git.

  • C. Provisioned throughput units are undersized.

  • D. The managed identity lacks model deployment permissions.

Best answer: A

Explanation: Managed identity and RBAC are identity controls; they do not by themselves remove public network reachability. For a Foundry workload that must avoid unnecessary public exposure, the private-access pattern is to place access behind Azure Private Link/private endpoints from the application VNet and disable public network access where supported. The visible symptom is not an authentication failure, capacity issue, or prompt mismatch because calls succeed with a valid token from outside the intended network. The key diagnostic distinction is identity authorization versus network isolation.

  • Identity-only control fails because permissions can allow valid callers even when the endpoint is still publicly reachable.
  • Capacity sizing does not explain a public endpoint accepting authorized requests from an external network.
  • Prompt versioning affects reproducibility of prompts, not whether the Foundry endpoint is exposed publicly.

Question 29

Topic: Design and Implement an MLOps Infrastructure

An Azure Machine Learning training pipeline was rerun after a failed downstream step. The training step completed, but the model metrics changed significantly. The run history shows the same code commit, environment version, and compute target for both runs.

Exhibit: Training input reference

inputs:
  training_data:
    type: uri_folder
    path: azureml://datastores/landing/paths/customer-churn/train/

The storage team confirms that files under customer-churn/train/ are refreshed nightly. What is the best root cause?

Options:

  • A. The pipeline references a mutable datastore path

  • B. The compute target reused cached training outputs

  • C. The environment asset version changed between runs

  • D. The model was registered before evaluation completed

Best answer: A

Explanation: The core issue is repeatability of training data references. In Azure Machine Learning, a datastore path is a storage location reference; if files at that path are overwritten or refreshed, the same pipeline definition can read different data on a later run. To make training inputs repeatable across jobs and pipelines, create a versioned data asset for the approved training snapshot and reference that asset version, such as a named uri_folder data asset. The visible evidence rules out code, environment, and compute changes, while the nightly refresh explains why metrics changed after rerun. A data asset version creates a stable contract for the pipeline input, even when the underlying storage area continues to receive new data.

  • Compute cache is not supported by the evidence because the symptom is changed input data, not reused outputs.
  • Environment drift is ruled out because the run history shows the same environment version.
  • Registration timing does not explain why the training step itself produced different metrics from the same code.

Question 30

Topic: Design and Implement an MLOps Infrastructure

A team connects an Azure Machine Learning workspace to a GitHub repository that stores environment and component YAML files. After a pull request is merged, Azure ML studio shows the new commit in the repository history and the file diffs are correct, but no Azure ML job or asset update occurs. The GitHub repository’s Actions page shows no run for the merge commit.

What is the best next diagnostic step?

Options:

  • A. Check the GitHub Actions workflow trigger and enablement

  • B. Reconnect the repository to restore Git history

  • C. Create a new Azure Machine Learning datastore

  • D. Register the YAML files manually as workspace assets

Best answer: A

Explanation: Git source control and GitHub Actions serve different purposes in an MLOps setup. Git records version history, branches, pull requests, and file diffs. GitHub Actions executes automation, such as validating YAML, provisioning resources, registering assets, or submitting Azure Machine Learning jobs after a merge. In this scenario, the commit and diffs are visible, so source control integration is functioning. The missing evidence is workflow execution: the Actions page has no run for the merge commit. The next diagnostic step is to inspect whether a workflow exists, is enabled, and has a trigger that matches the merge event and branch.

A repository can have correct version history without any automation running.

  • Repo reconnect is unnecessary because the commit history and diffs are already visible.
  • New datastore does not address why no automation started after the merge.
  • Manual registration bypasses the automation gap instead of diagnosing why the workflow did not execute.

Question 31

Topic: Optimize Generative AI Systems and Model Performance

A team uses Microsoft Foundry for a support assistant. RAG retrieval is accurate, but evaluations show low coherence and inconsistent use of the company’s required troubleshooting format. The team has 200 reviewed examples, can create SME-reviewed synthetic examples, and must avoid full base-model retraining. Which implementation best fits the requirement?

Options:

  • A. Replace fine-tuning with a longer system prompt only

  • B. Continue pretraining on all support chat logs without review

  • C. Run parameter-efficient supervised fine-tuning with curated and synthetic examples

  • D. Lower the RAG similarity threshold to return more documents

Best answer: C

Explanation: The requirement is behavior and domain-format adaptation, not retrieval improvement. Because RAG already retrieves the right content, tuning chunking or similarity is unlikely to fix inconsistent answer structure. A parameter-efficient supervised fine-tuning approach uses task-specific prompt-response examples to adapt style, terminology, and output format without retraining the entire foundation model. SME-reviewed synthetic examples can expand coverage when labeled data is limited, but they should be curated and evaluated before deployment.

The operational pattern is to create a versioned training dataset, run the fine-tuning job, register or version the resulting model deployment, and compare it against the baseline with quality and safety evaluations.

  • RAG threshold tuning targets retrieval recall, but the stem says retrieval is already accurate.
  • Prompt-only change may help, but it is less suitable when consistent learned behavior is required across many cases.
  • Unreviewed chat logs risk low-quality, unsafe, or noncompliant training data and do not preserve the stated curation constraint.

Question 32

Topic: Implement Machine Learning Model Lifecycle and Operations

A team trains a classification model in Azure Machine Learning. Before a model can be promoted, reviewers must verify the training inputs, hyperparameters, evaluation metrics, logs, and the exact artifact produced by the job. The team also wants enough evidence to troubleshoot failed or degraded runs later. Which configuration should the engineer implement?

Options:

  • A. Store the training notebook in GitHub and review commit history.

  • B. Configure MLflow tracking in the training job and log parameters, metrics, artifacts, and run metadata.

  • C. Register only the final model artifact in the workspace model registry.

  • D. Save evaluation metrics to a local CSV file on the training compute.

Best answer: B

Explanation: Training job evidence should be captured with MLflow experiment tracking in Azure Machine Learning so each run keeps comparable, queryable records. The job should log key parameters, metrics, artifacts such as plots or evaluation files, logs, and useful metadata such as code, data, and environment references. This creates a durable link between the produced model artifact and the run that generated it, which supports promotion gates, later evaluation, and troubleshooting. Registering a model is important later, but it does not by itself preserve the full evidence trail for why that model should be promoted or how it was produced.

  • Model-only registration misses the broader run evidence needed to compare training inputs, parameters, metrics, and logs.
  • Local CSV storage is fragile because reviewers may not have durable, centralized access after the compute job ends.
  • Git commit history helps with source control, but it does not capture run metrics, artifacts, or job execution evidence.

Question 33

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning pipeline registers a fraud detection model only after validation gates pass. The team requires production release to be blocked when responsible AI evidence shows unmitigated harm, even if accuracy targets are met.

Validation results for the latest run:

EvidenceResult
AUC0.91, target met
Error analysisFalse-negative rate is 2.8x higher for one protected group
Mitigation recordNo mitigation or business sign-off attached
Data drift checkWithin threshold

Which implementation should the MLOps engineer apply?

Options:

  • A. Keep the model in validation and require mitigation evidence before registration

  • B. Register the model with a lower production traffic percentage

  • C. Register the model because AUC and drift checks passed

  • D. Deploy to a small canary endpoint and monitor complaints

Best answer: A

Explanation: Responsible AI evaluation is a pre-deployment quality gate, not just a post-deployment monitoring activity. In this scenario, the aggregate model metric meets the target, but error analysis shows a materially worse false-negative rate for a protected group, and there is no mitigation record or approved exception. The operational behavior should keep the model in validation until the team provides mitigation evidence, a revised evaluation, or an explicit governance sign-off required by policy. Passing data drift and AUC checks does not override unresolved responsible AI evidence.

A safe release process treats subgroup harm as a blocker when the release criteria say responsible AI evidence must support production use.

  • Aggregate metric pass fails because AUC does not prove equitable performance across protected groups.
  • Canary monitoring fails because the release policy requires pre-deployment evidence before exposing users to the model.
  • Reduced traffic fails because lowering traffic does not resolve or document the unmitigated responsible AI issue.

Question 34

Topic: Implement Generative AI Quality Assurance and Observability

A team operates a Microsoft Foundry chat application in production. They need a monitoring configuration that can answer these questions: how long each user request takes, how many requests the deployment handles per minute, why a specific response used an unexpected retrieved passage, and which requests drive token-related cost. Which configuration should you apply?

Options:

  • A. Enable only token-usage totals and provisioned throughput allocation

  • B. Enable only aggregate latency and CPU resource-usage metrics

  • C. Enable only application error logs and model quality scores

  • D. Enable response-time metrics, throughput metrics, traces, and token-usage logging

Best answer: D

Explanation: Continuous monitoring for generative AI systems should collect evidence that matches the operational question. Response time or request latency shows how long an individual request takes from the user or service perspective. Throughput measures volume over time, such as requests per minute. Traces show the step-by-step execution path for a specific request, including retrieval, prompt construction, model calls, and tool calls, which supports debugging unexpected outputs. Token usage identifies prompt and completion token consumption that affects cost. Resource usage, such as CPU or provisioned capacity utilization, is useful for infrastructure pressure but does not explain retrieved context or per-request token cost by itself.

  • Aggregate latency only misses per-request traces and token evidence needed for debugging and cost attribution.
  • Error logs only can show failures, but they do not provide throughput, token consumption, or the full request path.
  • PTU allocation only relates to capacity planning, not the observed evidence for specific responses and token-driven cost.

Question 35

Topic: Implement Machine Learning Model Lifecycle and Operations

A team has registered a churn prediction model in an Azure Machine Learning workspace. The model must score large files uploaded each night, write predictions back to storage, and does not require synchronous request/response latency. The team also wants to avoid managing Kubernetes infrastructure.

Which managed inference option should you configure?

Options:

  • A. Deploy the model to a batch endpoint

  • B. Deploy the model to a Kubernetes online endpoint

  • C. Run the model from a scheduled notebook

  • D. Deploy the model to a managed online endpoint

Best answer: A

Explanation: Azure Machine Learning batch endpoints are the managed inference option for asynchronous, large-scale scoring jobs. They are appropriate when input data arrives as files or datasets, predictions can be written to storage, and there is no need for low-latency real-time responses. Managed online endpoints are better for real-time APIs that serve one request at a time with strict latency expectations. Kubernetes online endpoints can support custom hosting requirements, but they introduce Kubernetes infrastructure management. A scheduled notebook can run scoring code, but it is not the managed endpoint deployment pattern for serving a registered model.

  • Online endpoint mismatch fails because the workload does not need synchronous low-latency API responses.
  • Kubernetes overhead fails because the team explicitly wants to avoid managing Kubernetes infrastructure.
  • Notebook scheduling fails because it is an operational workaround, not a managed inference endpoint for serving the registered model.

Question 36

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning team trains a churn model using customer features from a feature store. The managed online endpoint will receive only customer IDs and must retrieve features at scoring time. Operations require the endpoint to use the same feature sets and versions validated during training, even if newer feature versions are added later.

What should you implement before registering the model?

Options:

  • A. Keep feature lookup logic only in the Git repository.

  • B. Query the latest feature set versions at runtime.

  • C. Package the feature retrieval specification with the model artifact.

  • D. Register the training dataset as a versioned data asset.

Best answer: C

Explanation: When production inference depends on feature retrieval, the model package should include the feature retrieval specification used during training and validation. That specification captures the feature sets, versions, entity keys, and lookup details needed for scoring. Packaging it with the model artifact makes the registered model self-contained enough for deployment and helps the endpoint retrieve features consistently with the model’s training assumptions. A versioned dataset can preserve training inputs, but it does not define the live scoring lookup contract. Querying latest feature versions or relying only on external source control can introduce drift between the validated model and production feature access.

  • Versioned dataset preserves training data but does not tell the endpoint how to retrieve live features for scoring.
  • Latest feature versions breaks the requirement to use the same validated feature sets and versions.
  • Git-only lookup logic separates the retrieval contract from the registered model artifact used for deployment.

Question 37

Topic: Design and Implement an MLOps Infrastructure

A GitHub Actions workflow submits an Azure Machine Learning pipeline to the ml-prod workspace. The same YAML succeeds in ml-dev, but fails in ml-prod during validation.

component: azureml:preprocess:3
error: Component preprocess version 3 was not found in workspace ml-prod
note: preprocess:3 exists in workspace ml-dev

What is the most likely root cause?

Options:

  • A. The production datastore lacks read permission for training data.

  • B. The pipeline needs MLflow model registration before validation.

  • C. The component is only a local workspace asset in ml-dev.

  • D. The production compute target cannot pull the component image.

Best answer: C

Explanation: Azure Machine Learning workspace assets, such as components, are scoped to the workspace where they are registered unless they are published to a registry or separately registered in another workspace. The evidence shows validation fails before execution because ml-prod cannot resolve azureml:preprocess:3, while the same component version exists only in ml-dev. For reusable components across workspaces, use an Azure Machine Learning registry reference or ensure the component is registered in each target workspace. Compute, datastore, and MLflow model issues would appear later or involve different resource types.

  • Compute image pull is not supported by the evidence because validation cannot find the component asset before compute starts.
  • Datastore permission could break data access, but the visible error is about resolving a component version.
  • MLflow registration applies to model tracking and registration, not making a pipeline component available in another workspace.

Question 38

Topic: Optimize Generative AI Systems and Model Performance

A team operates a Microsoft Foundry RAG assistant for internal policy questions. Recent evaluations show low groundedness. The production foundation model deployment and prompt are locked for this release; only retrieval settings can change. The team must prove that any optimization improves answer quality before rollout. Which implementation should you use?

Options:

  • A. Promote the lowest similarity threshold based on higher retrieval counts.

  • B. Fine-tune the foundation model on the policy documents.

  • C. Switch to a larger foundation model deployment for evaluation.

  • D. Compare retrieval variants with a fixed model and mapped evaluation dataset.

Best answer: D

Explanation: RAG retrieval optimization should be validated by isolating retrieval changes from model and prompt changes. In Microsoft Foundry, use the same deployed model and prompt, create candidate retrieval configurations such as different chunk sizes, similarity thresholds, or hybrid search settings, and run the same mapped evaluation dataset against each variant. Compare answer-quality metrics such as groundedness and relevance, and optionally check operational metrics such as latency and token consumption. This proves whether retrieval changes improved answer quality rather than masking the result with a different model. Retrieval volume alone is not enough because more retrieved chunks can add noise and reduce groundedness.

  • Fine-tuning violates the release constraint because it changes model behavior instead of isolating retrieval optimization.
  • Larger model swap can improve results for unrelated reasons, so it does not validate the retrieval change.
  • Retrieval count only measures volume, not whether generated answers are grounded or relevant.

Question 39

Topic: Implement Generative AI Quality Assurance and Observability

A team is configuring an automated evaluation workflow in Microsoft Foundry for a RAG-based support assistant. Pilot users report that responses are easy to read, but some answers include claims that are not supported by the retrieved product documentation. Which evaluation metric should be configured as the primary quality gate?

Options:

  • A. Groundedness

  • B. Relevance

  • C. Coherence

  • D. Fluency

Best answer: A

Explanation: Groundedness is the best match when the output-quality concern is unsupported claims or hallucinations relative to retrieved context. In a RAG evaluation workflow, the model response should be checked against the retrieved documents or grounding data to confirm that the answer is source-supported. Fluency and coherence can indicate whether the response reads naturally and is logically structured, but they do not prove that claims are backed by the source material. Relevance checks whether the response addresses the user request, not whether every factual statement is grounded.

  • Fluency trap fails because a polished answer can still contain unsupported claims.
  • Coherence trap fails because a logically organized response may still be ungrounded.
  • Relevance trap fails because answering the prompt does not ensure the facts came from retrieved documentation.

Question 40

Topic: Optimize Generative AI Systems and Model Performance

A Microsoft Foundry RAG app supports field technicians for proprietary equipment. Quality evaluation shows low retrieval relevance only for queries that use internal fault codes and acronyms.

Evidence:

CheckResult
Exact keyword search for codesFinds the right manual page
Current vector top-k resultsSimilar wording, wrong component
Chunk size and similarity threshold testsNo material improvement
Reviewed query-passage pairs2,000 labeled pairs available

Which next diagnostic or optimization step best follows the evidence?

Options:

  • A. Increase top-k and rely on the prompt to filter chunks.

  • B. Switch to a multilingual embedding model.

  • C. Replace the generator with a larger foundation model.

  • D. Fine-tune the embedding model with labeled domain pairs.

Best answer: D

Explanation: The evidence points to an embedding mismatch, not a generation problem. Exact keyword search can find the right pages, but vector retrieval ranks semantically similar passages about the wrong component. Because the failures are tied to proprietary codes and acronyms, and there are labeled query-passage relevance pairs, fine-tuning the embedding model is the best next step. Selecting a different embedding model is more appropriate when the mismatch is a known capability gap, such as language coverage, modality, or a clearly better domain-ready embedding model. Here, the visible issue is specialized internal terminology that the current embedding space does not represent well.

The key takeaway is to fix retrieval semantics before changing the answer-generation layer.

  • Larger generator misses the point because the correct chunks are not being retrieved reliably.
  • Multilingual swap is unsupported because the evidence shows proprietary terminology, not a language-coverage issue.
  • Higher top-k may add noise and does not teach the embedding model the meaning of internal codes.

Question 41

Topic: Implement Generative AI Quality Assurance and Observability

A team operates a RAG-based customer support assistant in Microsoft Foundry. A production issue caused some answers to be generated without calling the retrieval step, so citations were missing. A fix has been deployed. You need a monitoring configuration that validates the corrected request path for live traffic, not just aggregate health. Which observability signal should you configure?

Options:

  • A. Tracing with retrieval spans and prompt-response details

  • B. Aggregate token consumption by model deployment

  • C. Provisioned throughput utilization for the deployment

  • D. Endpoint throughput and average response time

Best answer: A

Explanation: For a production GenAI issue involving a missing step in the request path, the validating signal should expose the per-request execution flow. Tracing is the right observability configuration because it can show spans for retrieval or tool calls, the prompt context passed to the model, and the resulting response. That evidence confirms that the deployed fix changed the behavior that caused missing citations. Aggregate metrics are still useful for operations, but they cannot prove that retrieval now happens for the specific class of affected requests. The key takeaway is to match the signal to the failure mode: use traces for flow and debugging validation, and use metrics for aggregate health, capacity, and cost trends.

  • Token totals help monitor cost and usage, but they do not show whether retrieval was executed for each affected request.
  • Latency metrics can show performance changes, but acceptable response time does not prove that citations are grounded in retrieved content.
  • PTU utilization helps capacity planning, but it cannot validate the corrected RAG orchestration path.

Question 42

Topic: Implement Machine Learning Model Lifecycle and Operations

An Azure Machine Learning real-time endpoint was updated with a newly registered churn model. Training used features from a feature store, but endpoint tests show missing/renamed feature values compared with training runs.

Deployment note:

Model artifact: model.pkl, conda.yml, MLmodel
Feature retrieval spec: not found
Endpoint input: customer_id, transaction_id
Observed issue: feature mismatch at scoring

Which configuration change should the MLOps engineer make?

Options:

  • A. Register the model with a new MLflow experiment name

  • B. Grant the endpoint identity registry reader access

  • C. Increase the endpoint compute instance size

  • D. Repackage the model with the feature retrieval specification

Best answer: D

Explanation: A feature mismatch after deployment can indicate that the model artifact package does not contain the required feature retrieval specification. The retrieval specification describes which feature sets and feature columns the model expects and enables the inference path to retrieve the same features used during training. In this case, the endpoint input contains only identifiers, while the note explicitly says the feature retrieval spec is missing. Repackaging and registering the model with that specification addresses the packaging defect. Compute size, registry access, and experiment naming do not define the model’s expected feature retrieval behavior.

  • Compute sizing may affect throughput or latency, but it does not resolve missing feature definitions in the model package.
  • Registry access helps share or pull assets, but the artifact shown is already deployed and lacks the needed spec.
  • Experiment naming changes tracking organization, not the packaged feature retrieval contract used at inference.

Question 43

Topic: Implement Generative AI Quality Assurance and Observability

A production RAG chat app in Microsoft Foundry was returning answers that were not supported by the retrieved documents. The team updated the prompt variant and retrieval instructions. You need to validate in continuous monitoring that the production issue is corrected, without relying on manual spot checks. Which observability signal should you use?

Options:

  • A. HTTP success rate for model deployment calls

  • B. Groundedness score from production evaluation traces

  • C. Average response latency for chat completions

  • D. Total token consumption per conversation

Best answer: B

Explanation: For a production GenAI issue involving unsupported or hallucinated RAG answers, the needed observability signal is a quality signal tied to the retrieved context. Groundedness evaluates whether the generated response is supported by the source documents or context used for the answer. In Microsoft Foundry observability, monitoring groundedness over production traces can show whether the prompt and retrieval changes corrected the actual failure mode. Latency, token usage, and HTTP success rates are useful operational signals, but they do not validate that answers are factually supported by retrieved content. The key is to match the monitoring signal to the issue being corrected.

  • Latency monitoring may show performance improvement, but it does not prove answers are supported by retrieved documents.
  • Token consumption helps manage cost and prompt size, but it does not measure response correctness or support.
  • HTTP success rate confirms calls are completing, but successful responses can still be ungrounded.

Question 44

Topic: Implement Machine Learning Model Lifecycle and Operations

A team trains a churn model in Azure Machine Learning and wants a release pipeline to promote only models whose training evidence can support later evaluation, rollback analysis, and troubleshooting. The training code already uses MLflow. Which implementation best preserves the required evidence for each candidate model?

Options:

  • A. Capture endpoint latency and error metrics after deployment.

  • B. Register only the serialized model file with a production-ready tag.

  • C. Log parameters, metrics, artifacts, data asset version, environment, and Git commit to the MLflow run.

  • D. Save metrics in the pipeline console output and upload the model file.

Best answer: C

Explanation: Training job evidence should make a candidate model traceable back to how it was produced. In Azure Machine Learning, MLflow runs are the right place to capture parameters, training and validation metrics, artifacts, model outputs, and useful lineage details such as data asset version, environment, and source commit. That evidence lets a later promotion gate compare runs, a reviewer evaluate model quality, and an engineer troubleshoot why a model changed or failed. Registering the model can reference the run output, but registration alone is not a substitute for run evidence. Production endpoint metrics are useful after deployment, not for proving what happened during training.

  • Model-only registration misses the training context needed for comparison and troubleshooting.
  • Console-only metrics are fragile and harder to query or associate with model lineage than MLflow run data.
  • Endpoint monitoring helps operate a deployed model but does not preserve evidence from the training job.

Question 45

Topic: Design and Implement a GenAIOps Infrastructure

A team maintains Microsoft Foundry prompt variants in Git for a claims triage assistant. A pull request fails the automated evaluation, and reviewers cannot isolate which behavior changed.

Exhibit: PR summary

File changed: prompts/claims_triage_v4.prompt
Diff size: +1,160 / -40 lines
Prompt contents now include:
- classify claim severity
- draft a customer email
- summarize prior conversations
- full policy reference table pasted inline
Only variable: {{customer_message}}
Evaluation note: classification cases now return email text and summaries.

What is the best root cause indicated by the evidence?

Options:

  • A. The foundation model deployment lacks provisioned throughput.

  • B. The prompt is too monolithic for versioned evaluation.

  • C. The prompt should be edited only in the Foundry portal.

  • D. The evaluation dataset is missing latency measurements.

Best answer: B

Explanation: Prompt artifacts should be designed so they are task-focused, reviewable, and versionable. In this case, the same prompt now performs classification, email drafting, and summarization while also embedding a full policy table. That explains both symptoms: classification tests return non-classification content, and reviewers cannot isolate the behavioral change from a large mixed diff. A better operational design would separate task-specific prompts or variants, use clear input variables, and keep large reference content in a managed retrieval or configuration layer when appropriate. Throughput, portal editing, and latency metrics do not explain the visible prompt-version and evaluation behavior.

  • Throughput capacity affects performance and concurrency, not why classification outputs include email drafts.
  • Portal-only editing is not required; Git-based prompt versioning is appropriate for operational control.
  • Latency metrics may be useful for observability, but the failure shown is task behavior and artifact manageability.

Question 46

Topic: Implement Machine Learning Model Lifecycle and Operations

A team has an Azure Machine Learning real-time endpoint serving model version v1 in production. They registered model version v2 and want v2 to receive only 10% of live requests for one week while keeping the existing endpoint URL unchanged. Which implementation should they use?

Options:

  • A. Create a batch endpoint for v2 and route live calls to it

  • B. Replace the existing deployment with v2 and monitor errors

  • C. Add v2 as a new deployment and set endpoint traffic to 10%

  • D. Register v2 in the workspace and wait for endpoint auto-upgrade

Best answer: C

Explanation: Progressive rollout for an Azure Machine Learning real-time endpoint is implemented by deploying the new model version as an additional deployment behind the existing managed online endpoint, then assigning a small percentage of endpoint traffic to that deployment. In this scenario, v1 can keep 90% of traffic while v2 receives 10%, and clients continue using the same endpoint URL. The team can monitor production metrics and either increase v2 traffic or roll back by setting its traffic allocation to 0%. Replacing the deployment skips the limited-release stage, and model registration alone does not affect serving traffic.

  • Replacing production fails because it sends all traffic to v2 immediately instead of limiting exposure.
  • Batch endpoint fails because batch endpoints are for asynchronous batch scoring, not live request rollout.
  • Registration only fails because registering a model version does not automatically update endpoint deployments or traffic routing.

Question 47

Topic: Design and Implement a GenAIOps Infrastructure

A GitHub Actions workflow deploys a Microsoft Foundry resource and a foundation model deployment by using Bicep. The resource exists, but the deployment job fails. The team must keep the deployment automated and use least privilege.

Evidence:

az deployment group create --resource-group rg-genai-prod --template-file main.bicep
ERROR AuthorizationFailed:
Client 'sp-aiops-ci' does not have authorization to perform action
'Microsoft.CognitiveServices/accounts/deployments/write'
at scope '/subscriptions/<sub>/resourceGroups/rg-genai-prod/providers/Microsoft.CognitiveServices/accounts/foundry-prod'.
resource foundry 'Microsoft.CognitiveServices/accounts@2024-10-01' existing = {
  name: 'foundry-prod'
}

resource modelDeployment 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = {
  parent: foundry
  name: 'gpt-prod'
  properties: { model: { name: 'gpt-4o' } }
}

Which action should you take?

Options:

  • A. Enable a system-assigned managed identity on the Foundry resource.

  • B. Grant the pipeline identity deployment-write permission on the Foundry resource.

  • C. Create the model deployment manually in the Foundry portal.

  • D. Add an explicit dependsOn from the deployment to the Foundry resource.

Best answer: B

Explanation: The visible failure is an Azure RBAC authorization error for the GitHub Actions service principal, not a Bicep dependency or model configuration problem. The Bicep resource uses parent: foundry, so the model deployment is scoped correctly under the existing Foundry resource. To preserve automation and least privilege, assign the pipeline identity a role at the Foundry resource or resource-group scope that includes Microsoft.CognitiveServices/accounts/deployments/write, such as an appropriate Cognitive Services OpenAI contributor role, and rerun the same IaC deployment.

The key troubleshooting step is to follow the failing action and caller in the error message before changing the template structure.

  • Dependency change fails because the error is authorization-related, and parent already establishes the resource scope.
  • Managed identity change fails because the caller is the GitHub Actions service principal, not the Foundry resource identity.
  • Manual portal deployment violates the stated automation constraint and does not fix the pipeline permission issue.

Question 48

Topic: Design and Implement an MLOps Infrastructure

A machine learning team must recreate the same Azure Machine Learning workspace, datastore, and compute cluster in dev, test, and prod. The environment must be provisioned from GitHub without manual portal steps. Which GitHub Actions configuration best supports this requirement?

Options:

  • A. Create resources in the portal and export screenshots to the repo

  • B. Run a notebook that creates compute after the workspace exists

  • C. Use OIDC sign-in and deploy parameterized Bicep with Azure CLI

  • D. Enable GitHub integration in the workspace and sync notebooks

Best answer: C

Explanation: For a reproducible MLOps environment, resource provisioning should be defined as infrastructure as code and executed by automation. A GitHub Actions workflow can authenticate to Azure, deploy parameterized Bicep templates, and run Azure CLI commands to create or update Azure Machine Learning resources consistently across environments. Parameters let the same template target dev, test, and prod without changing the resource definitions.

The key distinction is source-controlled provisioning versus workspace code synchronization. GitHub integration helps manage project files, but it does not by itself define and recreate the Azure resources needed for the environment.

  • Notebook automation is less suitable because it depends on an existing environment and is not the primary IaC mechanism for workspace resources.
  • Workspace GitHub integration helps with source control, but it does not provision the workspace, datastore, or compute cluster.
  • Portal setup fails because manual steps are not reproducible or reliably auditable from Git.

Question 49

Topic: Design and Implement a GenAIOps Infrastructure

Two GenAIOps teams collaborate in Microsoft Foundry. Team A’s nightly evaluation suddenly runs against Team B’s foundation-model deployment, but Team A’s prompt Git repository has no commit for the change. The run log shows both teams use the same Foundry project, project environment, and managed identity.

Which project-environment issue is the most likely root cause?

Options:

  • A. Shared project environment with broad team access

  • B. Insufficient provisioned throughput units for Team A

  • C. Missing groundedness metric in the evaluation workflow

  • D. Private endpoint DNS resolution failure

Best answer: A

Explanation: Controlled collaboration in Microsoft Foundry depends on isolating project environments and scoping access so one team’s deployment, prompt, or runtime configuration does not unintentionally affect another team’s work. Here, the prompt repository did not change, but the run used a shared project environment and managed identity. That points to configuration bleed-through from shared environment settings or permissions, not a model-quality or networking symptom. A better setup is separate project environments, appropriately scoped RBAC, managed identities, and Git-backed prompt/version control for each team or controlled shared workspace boundary.

  • Throughput capacity would more likely cause throttling, queuing, or latency symptoms, not a silent switch to another team’s deployment.
  • Private networking failures would usually block access or produce connection errors, not route an evaluation to a different configured model.
  • Evaluation metric choice affects quality scoring, but it does not explain why the run used Team B’s deployment.

Question 50

Topic: Optimize Generative AI Systems and Model Performance

A team ran an A/B test to compare two RAG configurations for a Foundry chat solution. The relevance score improved for one variant, but reviewers cannot tell whether retrieval caused the change.

SettingVariant 1Variant 2
Promptsupport-v3support-v4
Model deploymentgpt-4o-minigpt-4o
Index versionkb-index-12kb-index-13
Retrievalsemantic, top 5hybrid, top 8

What is the best next diagnostic step?

Options:

  • A. Promote Variant 2 because relevance is higher.

  • B. Increase traffic until the current difference stabilizes.

  • C. Rerun matched variants changing only retrieval settings.

  • D. Replace both prompts and retest with a new dataset.

Best answer: C

Explanation: A valid A/B test for RAG optimization isolates the variable being tested. Here, the variants changed retrieval method and top-k, but they also changed prompt version, model deployment, and index version. Any relevance difference could come from any of those changes, so the current evidence does not support attributing the improvement to the retrieval configuration. The next diagnostic step is to rerun the comparison with the same prompt, model, index, evaluation dataset, routing rules, and monitoring setup, changing only the RAG configuration under test. More traffic cannot fix confounding variables; it only measures a flawed comparison more precisely.

  • Premature promotion ignores that higher relevance may be caused by the model, prompt, or index rather than retrieval.
  • More traffic can improve statistical confidence but cannot remove the confounding changes already present.
  • New prompt and dataset introduces additional variables, making attribution even harder.

Continue with full practice

Use the Microsoft AI-300 Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try Microsoft AI-300 on Web View Microsoft AI-300 Practice Test

Focused topic pages

Free review resource

Read the Microsoft AI-300 Cheat Sheet for compact concept review before returning to timed practice.

Revised on Monday, May 25, 2026