AWS MLA-C01: ML Monitoring

May 1, 2026

Try 10 focused AWS MLA-C01 questions on ML Monitoring, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AWS MLA-C01 on Web View full AWS MLA-C01 practice page

Topic snapshot

Field	Detail
Exam route	AWS MLA-C01
Topic area	ML Solution Monitoring, Maintenance, and Security
Blueprint weight	24%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate ML Solution Monitoring, Maintenance, and Security for AWS MLA-C01. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 24% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: ML Solution Monitoring, Maintenance, and Security

A company hosts a fraud-detection model on an Amazon SageMaker real-time endpoint. It serves ~50,000 tabular transactions/day with p95 latency <100 ms. True fraud labels arrive in a separate system about 7 days later. The team must (1) detect data drift in input features within 1 hour, (2) detect concept drift as a drop in model quality once labels arrive, (3) store monitoring artifacts encrypted with a customer managed KMS key, and (4) minimize operational overhead using AWS-managed capabilities.

Which solution best meets these requirements?

Options:

A. Use CloudWatch endpoint latency and invocation metrics with alarms for anomalous traffic patterns
B. Use SageMaker Model Monitor data quality (hourly) and model quality (with delayed ground truth) with KMS-encrypted S3 outputs
C. Use SageMaker Clarify bias and explainability reports on a schedule to detect drift
D. Skip monitoring and retrain nightly with SageMaker Pipelines and Automatic Model Tuning

Best answer: B

Explanation: Data drift is a change in the input feature distribution, so it’s best detected by comparing current inference data statistics to a baseline on a frequent schedule. Concept drift is a change in the relationship between inputs and labels, so it requires ground-truth labels to measure model-quality metrics after labels arrive. SageMaker Model Monitor supports both data quality and model quality monitoring with encrypted S3 outputs and low ops overhead.

Data drift affects what the model sees at inference time (feature distributions shift from training), which can degrade performance even if the underlying fraud process is unchanged. Concept drift affects what the model should predict (the mapping from features to fraud label changes), which is observed as worsening model-quality metrics once ground truth is available.

SageMaker Model Monitor fits this setup by:

Running an hourly data quality monitor to compare feature distributions against a training baseline (detects data drift within 1 hour).
Running a model quality monitor that joins predictions with delayed labels in S3 to compute quality metrics over time (detects concept drift via metric degradation).
Writing reports to KMS-encrypted S3 locations with managed scheduling and CloudWatch metrics/alarms.

Monitoring is what distinguishes drift types; retraining alone does not.

Only operational metrics can’t identify feature distribution drift or label-relationship changes.
Clarify-only monitoring focuses on bias/explainability; it doesn’t directly measure data drift vs labeled model-quality degradation.
Retrain without monitoring doesn’t detect or attribute performance drops to data drift vs concept drift and increases cost/ops risk.

Question 2

Topic: ML Solution Monitoring, Maintenance, and Security

A team monitors an Amazon SageMaker real-time endpoint with a CloudWatch alarm that triggers when p95 ModelLatency is >500 ms for 1 out of 1 5-minute datapoints. The alarm paged an on-call engineer.

Exhibit: Last 20 minutes (5-minute periods)

Period start (UTC)	p95 ModelLatency (ms)	5XXError	Invocations
12:00	130	0	9,980
12:05	125	0	10,105
12:10	2,400	18	10,022
12:15	140	0	10,110

Which change best distinguishes transient noise from meaningful performance degradation while still alerting promptly on sustained issues?

Options:

A. Lower the threshold to 300 ms
B. Switch to 1-minute periods with 1 of 1 breaching
C. Set alarm to 2 of 3 datapoints breaching
D. Disable latency alarms and alert only on 5XXError

Best answer: C

Explanation: The exhibit shows a single 5-minute spike (12:10) with normal latency before and after, which is characteristic of transient noise. Using a windowed alarm evaluation (multiple periods with a datapoint count requirement) prevents paging on one-off spikes while still alarming when degradation persists across consecutive periods.

This pattern indicates a brief anomaly rather than sustained degradation: p95 latency is normal at 12:05 (125 ms) and 12:15 (140 ms) but spikes only at 12:10 (2,400 ms), as shown in the exhibit table. A better alert design is to require multiple breaching datapoints within a rolling window (CloudWatch EvaluationPeriods with DatapointsToAlarm) so a single outlier does not trigger an incident.

A practical configuration is:

Keep a 5-minute period for stable p95 aggregation
Set EvaluationPeriods=3
Set DatapointsToAlarm=2 with the same 500 ms threshold

This preserves prompt detection for sustained latency increases while filtering the single-period spike seen at 12:10.

Lower threshold increases sensitivity and would page more often on normal variability.
Shorter period with 1/1 makes the alert even more reactive to transient spikes.
Only 5XX-based alerting can miss latency regressions that do not produce errors.

Question 3

Topic: ML Solution Monitoring, Maintenance, and Security

A team deployed a SageMaker real-time endpoint and enabled SageMaker Model Monitor with a daily monitoring schedule. The processing job fails every run, but the endpoint serves traffic and the team cannot retrain the model this week.

Exhibit: Failure message

ClientError: ValidationException
Baseline constraints and statistics are required but were not found.
Expected in s3://ml-prod/monitoring/baselines/churn/

Data capture is enabled and inference payloads are landing in S3. Which action will fix the root cause with the least change?

Options:

A. Increase the monitoring instance size to avoid processing failures
B. Increase data capture sampling to 100% for better reports
C. Run a baseline job to generate stats/constraints in that S3 path
D. Create a CloudWatch alarm on 5XX and disable Model Monitor

Best answer: C

Explanation: The monitoring schedule is failing because it has no baseline artifacts to compare captured inference data against. SageMaker Model Monitor requires baseline statistics and constraints generated from a representative dataset (often training or validation data). Creating those baseline files in the expected S3 location allows the processing job to run successfully without changing the endpoint or retraining.

Symptom: each Model Monitor processing run fails with an error indicating missing baseline constraints/statistics.

Root cause: the monitoring schedule was created (or configured) to use baseline artifacts in a specific S3 prefix, but no baseline was generated and stored there. Without baseline statistics/constraints, Model Monitor cannot compute violations/drift by comparing current captured data to an established reference.

Fix: generate the baseline from a representative reference dataset already in S3 (typically the training/validation data used for the model), and write the resulting statistics.json and constraints.json to the S3 prefix referenced by the monitoring schedule, then rerun the schedule. The key takeaway is that monitoring depends on baseline model/data quality metrics established before (or at) deployment.

Bigger instances doesn’t address missing baseline artifacts; it only helps resource exhaustion.
Disabling Model Monitor avoids the error but removes the required drift/data-quality comparison.
Higher sampling improves coverage of captured data but still can’t run comparisons without a baseline.

Question 4

Topic: ML Solution Monitoring, Maintenance, and Security

A team runs a real-time Amazon SageMaker endpoint behind an API. They want to add alerting to quickly detect inference workflow anomalies.

Exhibit: CloudWatch metrics (last 5 minutes)

Invocations: 2,000
Invocation4XXErrors: 380
Invocation5XXErrors: 0
ModelLatency p95: 48 ms
CPUUtilization: 34%

Which alerting signal is the most appropriate to configure for this situation?

Options:

A. Alarm on endpoint CPUUtilization exceeding a threshold
B. Alarm on Invocations dropping below a baseline
C. Alarm on Invocation4XXErrors for the SageMaker endpoint
D. Alarm on ModelLatency p95 exceeding a threshold

Best answer: C

Explanation: The best alerting signal is the SageMaker endpoint’s 4XX error metric because it directly indicates failed inference requests due to invalid inputs or request formatting. The exhibit shows a substantial count of Invocation4XXErrors: 380 while Invocation5XXErrors: 0, pointing to a request/payload issue rather than service failure or performance saturation.

For real-time SageMaker inference, CloudWatch endpoint metrics separate failures into 4XX (client-side/request issues) and 5XX (server-side/model container/platform issues). In the exhibit, Invocation4XXErrors: 380 is high while Invocation5XXErrors: 0, and latency/CPU look normal (ModelLatency p95: 48 ms, CPUUtilization: 34%). That pattern is most consistent with an upstream data/schema/payload problem (for example, missing required fields, wrong content-type, or schema drift) causing requests to be rejected.

The most AWS-appropriate alerting signal here is a CloudWatch alarm on the endpoint’s Invocation4XXErrors (typically using a rate or count over a short period) so operators are notified as soon as bad requests start spiking. A latency or CPU alarm would not reliably detect this failure mode.

Latency-only alert can miss cases where requests fail fast with 4XX while latency remains normal (as shown by ModelLatency p95: 48 ms).
CPU alert targets resource saturation, but CPUUtilization: 34% does not indicate overload.
Invocation volume alert detects traffic changes, but Invocations: 2,000 is not the anomalous signal in the exhibit compared to Invocation4XXErrors: 380.

Question 5

Topic: ML Solution Monitoring, Maintenance, and Security

A team hosts a PyTorch model on an Amazon SageMaker real-time endpoint (one production variant). Traffic is spiky, and operators are manually increasing instance count during peak hours to prevent timeouts. The team configured Application Auto Scaling target tracking on SageMakerVariantInvocationsPerInstance, but the endpoint never scales and during spikes clients receive HTTP 504 errors.

Exhibit: Application Auto Scaling activity history

Failed to scale sagemaker/variant/my-endpoint/AllTraffic.
AccessDenied: Unable to assume service-linked role
AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint.

The team must reduce operational toil while keeping the endpoint reliable and low-latency. Which action fixes the root cause with the least change?

Options:

A. Increase the endpoint minimum instance count to four
B. Add a Lambda scheduled scaler using EventBridge and UpdateEndpointWeightsAndCapacities
C. Migrate the deployment to an asynchronous inference endpoint
D. Create the SageMaker endpoint Application Auto Scaling service-linked role

Best answer: D

Explanation: The scaling policy is correct, but it cannot execute because Application Auto Scaling cannot assume the required service-linked role. Creating (or allowing creation of) AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint restores managed auto scaling so capacity increases automatically during spikes. This removes manual scaling toil while improving reliability and preserving low-latency real-time inference.

Symptom: the endpoint returns 504s during traffic spikes and never scales out even though a target tracking policy is configured. Root cause: Application Auto Scaling cannot assume the required service-linked role for SageMaker endpoints (as shown by the AccessDenied message in the scaling activity history), so scaling actions fail.

Fix: ensure the service-linked role exists and is usable in the account so Application Auto Scaling can call SageMaker on your behalf:

Allow/create AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint (and remove any IAM/SCP restrictions blocking it).
Keep the existing target tracking policy on SageMakerVariantInvocationsPerInstance.

This is the least-change, managed approach that eliminates manual scale operations and improves reliability during spikes.

Overprovisioning capacity increases reliability but raises steady-state cost and doesn’t address failed scaling actions.
Custom scheduled scaling can work, but it adds bespoke automation and ongoing maintenance compared to managed auto scaling.
Async inference changes the serving mode and is unsuitable when low-latency real-time responses are required.

Question 6

Topic: ML Solution Monitoring, Maintenance, and Security

A machine learning team runs a production Amazon SageMaker real-time endpoint. They want an automated alert when the endpoint begins failing requests (HTTP 5XX) so an on-call engineer is notified.

Which solution uses Amazon CloudWatch to meet this requirement with the least operational effort?

Options:

A. Create a CloudWatch alarm on the SageMaker Invocation5XXErrors metric and notify an SNS topic
B. Enable AWS CloudTrail data events for SageMaker and alert on InvokeEndpoint failures
C. Send endpoint access logs to CloudWatch Logs and use CloudWatch Logs Insights queries to detect 5XX errors
D. Enable VPC Flow Logs on the endpoint subnets and alarm on rejected network traffic

Best answer: A

Explanation: SageMaker publishes endpoint performance and error metrics to CloudWatch, and CloudWatch alarms can continuously evaluate those metrics against a threshold. Creating an alarm on Invocation5XXErrors provides a direct, managed way to detect failing inferences and notify the on-call engineer through an action such as Amazon SNS.

For high-level health monitoring of SageMaker training and inference infrastructure, CloudWatch metrics are the primary signal source, and CloudWatch alarms are the mechanism that turns metric thresholds into automated notifications or actions. SageMaker endpoints publish metrics such as invocation counts, latency, and error counts (including 5XX errors) to CloudWatch. By creating an alarm on the endpoint’s 5XX error metric, you get near-real-time alerting without building custom log parsing or event processing.

CloudTrail records API activity for auditing, and Logs/Logs Insights are best for troubleshooting and deeper analysis, but alarms natively evaluate metrics; using logs typically requires additional steps (like metric filters) to turn log patterns into alarmable metrics.

CloudTrail for health is primarily for auditing API calls, not continuous endpoint health/error monitoring.
Logs Insights polling is useful for investigation, but it is not the simplest native alerting path compared to alarming directly on a published metric.
VPC Flow Logs can show network-level rejects, but it does not directly indicate application-level 5XX inference failures.

Question 7

Topic: ML Solution Monitoring, Maintenance, and Security

A team serves a fraud detection model from a Kubernetes deployment on Amazon EKS behind an ALB. To reduce cost, Karpenter is configured to provision only EC2 Spot Instances for the node pool running the inference pods. During peak hours, users report intermittent HTTP 503 errors and increased latency. CloudWatch shows frequent Spot interruption notices followed by node termination and pod rescheduling.

The team must restore 99.9% availability with the smallest change while still optimizing ongoing costs. Which action will fix the root cause?

Options:

A. Move inference to a SageMaker Batch Transform job scheduled every 5 minutes
B. Switch the Spot allocation strategy to capacity-optimized for the existing node pool
C. Add an On-Demand node group for inference and cover baseline with a Compute Savings Plan
D. Purchase EC2 Reserved Instances for the current Spot node pool

Best answer: C

Explanation: The 503 spikes coincide with Spot interruption notices, so the root cause is hosting an always-on, latency-sensitive inference service on interruptible capacity. The minimal fix is to place baseline inference on On-Demand instances and use a cost discount that applies to On-Demand usage (for example, a Compute Savings Plan), while keeping Spot for noncritical or burst capacity.

Symptom: intermittent 503s/latency spikes during peak traffic, aligned with Spot interruption notices.

Root cause: the inference pods run only on Spot Instances, so AWS can reclaim capacity at any time; node termination forces pod eviction and warm-up, causing brief unavailability even with autoscaling.

Fix: run the steady, availability-sensitive inference capacity on On-Demand (optionally discounted) and reserve Spot for interrupt-tolerant workloads.

A practical minimal-change approach is:

Create an On-Demand node group sized for baseline traffic.
Apply taints/affinity so inference pods land on On-Demand.
Purchase a Compute Savings Plan (or Reserved Instances) to reduce On-Demand cost.

Changing Spot strategy can reduce interruptions but cannot eliminate them for a 99.9% availability target.

Capacity-optimized Spot can reduce interruption frequency but Spot can still be reclaimed at any time.
Reserved Instances for Spot is not a valid way to discount Spot capacity; RIs apply to On-Demand usage.
Batch Transform is offline inference and does not meet real-time serving requirements implied by HTTP 503 errors.

Question 8

Topic: ML Solution Monitoring, Maintenance, and Security

A team is securing ML artifacts on AWS: training data in Amazon S3, model artifacts in S3, and Amazon SageMaker endpoints. The solution must use encryption at rest with AWS KMS customer managed keys (CMKs) and enforce TLS for data in transit. Which action best reflects the core principle of separation of duties for key management?

Options:

A. Grant SageMaker role kms:* on the CMK
B. Separate KMS key admins from SageMaker key users
C. Use one IAM user to manage keys and data
D. Use SSE-S3 so AWS manages all encryption keys

Best answer: B

Explanation: Separation of duties means different identities own key administration versus key usage. With KMS CMKs, customers are responsible for defining who can administer keys and who can use them for encryption/decryption. Splitting those permissions between a security team and SageMaker execution roles supports encryption requirements while limiting blast radius.

The key principle is separation of duties: avoid giving the same principal both administrative control of encryption keys and the ability to use those keys to access encrypted ML artifacts. With AWS KMS customer managed keys, AWS operates the service, but the customer controls the CMK policies and grants; you decide who can administer the CMK (e.g., rotate/disable/change policy) and who can use it for Encrypt/Decrypt in S3 and SageMaker. Enforcing TLS addresses encryption in transit, but separation of duties is specifically achieved by splitting KMS admin permissions (security team) from KMS usage permissions (SageMaker execution role and other workload roles). The closest trap is over-privileging the workload role, which conflicts with controlled key governance.

Over-privileged workload role granting kms:* to SageMaker breaks separation of duties and least privilege.
Wrong responsibility model using SSE-S3 removes CMK control and does not meet CMK governance intent.
Single identity risk using one IAM user for both key admin and data access defeats separation of duties.

Question 9

Topic: ML Solution Monitoring, Maintenance, and Security

A company hosts a churn model on an Amazon SageMaker real-time endpoint. The ML team spends hours each week manually comparing recent predictions to a sample of labeled outcomes to catch performance degradation. They want to reduce operational toil while keeping the system reliable as data changes.

Which approach best meets this goal?

Options:

A. Restrict endpoint invocation to one IAM role with least privilege
B. Store only hashed customer IDs in the inference payload and logs
C. Automate drift detection with SageMaker Model Monitor and CloudWatch alarms
D. Require separate approvers for training and production deployments

Best answer: C

Explanation: Automating data and model drift monitoring reduces repetitive manual verification work while improving reliability. SageMaker Model Monitor can run scheduled baselines and emit metrics/alerts to CloudWatch so operators are notified quickly when the model’s behavior changes in production. This directly targets silent performance degradation with less ongoing effort.

The core principle is monitoring for drift: production ML systems need continuous, automated checks because real-world data distributions and model performance can change over time. Using a managed service such as SageMaker Model Monitor reduces operational toil by scheduling monitoring jobs, capturing constraints/statistics, and publishing results and metrics that can drive alarms and runbooks.

A typical low-toil pattern is:

Establish a baseline from training/validation data
Schedule Model Monitor to analyze live inference data
Send drift/constraint metrics to CloudWatch
Trigger an automated response (ticket, rollback, or retraining pipeline)

Security controls and governance are important, but they don’t replace drift monitoring as the mechanism to maintain ML reliability under changing data.

Least privilege improves security posture but does not detect degraded model quality.
Separation of duties reduces change risk, yet it doesn’t automate ongoing performance checks.
Data minimization limits sensitive data exposure, but it won’t identify drift-related reliability issues.

Question 10

Topic: ML Solution Monitoring, Maintenance, and Security

A healthcare company trains and deploys models with Amazon SageMaker using PII. Security requirements mandate (1) encryption at rest with customer managed AWS KMS keys for data and model artifacts, and (2) no public internet egress for training or inference; all traffic must stay inside a VPC.

Which action should the ML engineer AVOID?

Options:

A. Encrypt S3 artifacts and training volumes with a CMK
B. Use public subnets and store artifacts in unencrypted S3
C. Enable SageMaker network isolation for training and inference
D. Run training and endpoints in private subnets with VPC endpoints

Best answer: B

Explanation: The requirements call for defense-in-depth: encrypt all sensitive ML artifacts at rest with customer-managed KMS keys and prevent public internet connectivity by using VPC integration and network isolation controls. The avoided action is the one that introduces both public network exposure and unencrypted storage for PII-related artifacts.

For regulated ML workloads on SageMaker, meet high-level security requirements by combining encryption and network controls. Use SSE-KMS with a customer managed key for S3 buckets that hold training data, model artifacts, and output, and use KMS encryption for attached volumes used during processing/training. To prevent internet egress, place Studio/training jobs/endpoints in a VPC (typically private subnets) and use VPC endpoints (for example, to S3 and ECR) so traffic stays on the AWS network. Enable network isolation when you must ensure the container cannot make outbound network calls during training/inference.

The key takeaway is to avoid architectures that introduce public subnets/internet paths or leave artifacts unencrypted.

Private subnets + VPC endpoints keeps service traffic private without public egress.
CMK encryption for S3 and volumes directly satisfies encryption-at-rest requirements for sensitive artifacts.
Network isolation is an additional control to block container outbound connectivity during runs.

Continue with full practice

Use the AWS MLA-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AWS MLA-C01 on Web View AWS MLA-C01 Practice Test

Free review resource

Read the AWS MLA-C01 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026

ML Deployment

Free Practice Exam

Browse Certification Practice Tests by Exam Family

AWS MLA-C01: ML Monitoring

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue with full practice

Related focused pages

Free review resource