AWS MLA-C01 Practice Test: ML Engineer Associate

Prepare for AWS Certified Machine Learning Engineer Associate (MLA-C01) with free sample questions, a full-length diagnostic, topic drills, timed practice, data preparation, model development, deployment, monitoring, governance, and detailed explanations in IT Mastery.

MLA-C01 is AWS’s Machine Learning Engineer Associate certification for candidates who need applied MLOps, SageMaker, deployment, monitoring, and ML-platform judgment on AWS. If you are searching for MLA-C01 sample questions, a practice test, mock exam, or exam simulator, this is the main IT Mastery page to start on web and continue on iOS or Android with the same IT Mastery account.

Interactive Practice Center

Start a practice session for AWS Certified Machine Learning Engineer - Associate (MLA-C01) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.

Open Full App in a New Tab

A small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same app-family account they use on web and mobile.

Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same IT Mastery account across web and mobile.

Free diagnostic: Try the 65-question AWS MLA-C01 full-length practice exam before subscribing. Use it as one ML engineering baseline, then return to IT Mastery for timed mocks, domain drills, explanations, and the full Machine Learning Engineer Associate question bank.

What this MLA-C01 practice page gives you

  • a direct route into IT Mastery practice for MLA-C01
  • topic drills and mixed sets across data preparation, model development, deployment, and monitoring
  • detailed explanations that show why the best AWS ML engineering answer is correct
  • a clear free-preview path before you subscribe
  • the same IT Mastery account across web and mobile

MLA-C01 exam snapshot

  • Vendor: AWS
  • Official exam name: AWS Certified Machine Learning Engineer - Associate (MLA-C01)
  • Exam code: MLA-C01
  • Items: 65 total, including scored and unscored items
  • Exam time: 130 minutes
  • Question types: multiple-choice and multiple-response
  • Passing score: 720 scaled

MLA-C01 questions usually reward the option that delivers a reliable, monitorable, and secure ML workflow rather than a narrow modeling answer with weak production readiness.

Topic coverage for MLA-C01 practice

DomainWeight
Data Preparation for Machine Learning28%
ML Model Development26%
Deployment and Orchestration of ML Workflows22%
ML Solution Monitoring, Maintenance, and Security24%

MLA-C01 ML engineering decision filters

Use these filters when a modeling answer ignores production constraints:

  • Data boundary: check data quality, labeling, leakage, feature engineering, train/validation/test splits, and sensitive-data controls before model selection.
  • Model lifecycle: separate training, tuning, evaluation, registry approval, deployment, monitoring, and retraining responsibilities.
  • SageMaker fit: identify whether the scenario calls for built-in algorithms, training jobs, pipelines, Feature Store, Model Registry, endpoints, batch transform, or monitoring.
  • Deployment pattern: choose real-time, asynchronous, batch, multi-model, shadow, canary, or blue/green deployment based on latency, volume, and risk.
  • Production monitoring: look for drift, bias, data quality, model quality, endpoint health, cost, explainability, and security signals.

MLA-C01 readiness map

AreaWhat strong readiness looks like
Data preparationYou can prevent leakage, handle imbalance, select features, manage labels, and keep training data secure and reproducible.
Model developmentYou can choose tuning, evaluation, model selection, metrics, experiment tracking, and model registry workflows deliberately.
Deployment and orchestrationYou can match SageMaker deployment and pipeline patterns to latency, release, scale, and operational requirements.
Monitoring and securityYou can monitor drift, quality, bias, endpoint behavior, cost, IAM, encryption, and auditability in production ML systems.

How to use the MLA-C01 simulator efficiently

  1. Start with domain drills so you can separate data-prep gaps from model-development, deployment, or monitoring gaps.
  2. Review every miss until you can explain the SageMaker feature, workflow pattern, or security/monitoring trade-off behind the best answer.
  3. Move into mixed sets once you can switch between feature engineering, training, endpoints, orchestration, drift, and governance scenarios without losing the production lens.
  4. Finish with timed runs so the 130-minute pace feels routine before exam day.

Final 7-day MLA-C01 practice sequence

DayPractice focus
7Take the free full-length diagnostic and tag misses by lifecycle stage.
6Drill data preparation, leakage, feature engineering, imbalance, labeling, and data-quality scenarios.
5Drill training, tuning, metrics, evaluation, experiment tracking, and model registry decisions.
4Drill SageMaker deployment, endpoints, batch transform, pipelines, release patterns, and orchestration.
3Drill monitoring, drift, explainability, security, IAM, encryption, and cost-control scenarios.
2Complete a timed mixed set and explain whether each miss was a data, model, deployment, or monitoring issue.
1Review only weak lifecycle transitions; avoid late memorization of low-value service trivia.

When MLA-C01 practice is enough

If you can score above roughly 75% on several unseen mixed attempts and explain the ML lifecycle reason behind your answers, you are probably ready to take the exam. Continuing past that point should improve production judgment, not turn scenario stems into memorized patterns.

Focused sample questions

Use these child pages when you want focused IT Mastery practice before returning to mixed sets and timed mocks.

Free study resources

Need concept review first? Read the AWS MLA-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks, topic drills, and full IT Mastery practice.

Free preview vs premium

  • Free preview: a smaller web set so you can validate the question style and explanation depth.
  • Premium: the full MLA-C01 practice bank, focused drills, mixed sets, timed mock exams, detailed explanations, and progress tracking across web and mobile.

24 MLA-C01 sample questions with detailed explanations

These are original IT Mastery practice questions aligned to the MLA-C01 machine-learning lifecycle, data preparation, model development, deployment, monitoring, security, and AWS service-selection decisions. They are not AWS exam questions and are not copied from any exam sponsor. Use them to check readiness here, then continue in IT Mastery with mixed sets, topic drills, and timed mocks.

Question 1

Topic: Content Domain 2: ML Model Development

A company is building a SageMaker model to detect payment fraud. Only 0.5% of transactions are fraud.

Constraints:

  • Model evaluation must reflect real-world performance (no train/validation leakage).
  • Transaction records contain sensitive data and must not leave the AWS account.

Which TWO approaches should the team AVOID when addressing the class imbalance?

  • A. Apply SMOTE to the full dataset before splitting
  • B. Oversample only the training split; keep validation unchanged
  • C. Use focal loss or weighted cross-entropy for the minority class
  • D. Use Ground Truth to label more fraud examples in-account
  • E. Use class weights (cost-sensitive loss) during training
  • F. Send records to an external SaaS to generate synthetic frauds

Best answers: A, F

Explanation: For imbalanced classification, acceptable approaches include cost-sensitive learning (class weights/focal loss), careful resampling applied only to the training split, and improving labels for the minority class. You must avoid approaches that contaminate validation data (data leakage) or move sensitive records outside the AWS account.

Imbalanced classification is commonly handled with cost-sensitive learning (penalize minority-class errors more) and/or resampling (over/under-sampling). In all cases, the validation/test split must represent the real distribution and remain independent of any transformation learned from training data.

Practical rules:

  • Do any oversampling/SMOTE after the train/validation split and only on the training split.
  • Prefer class-weighted losses or focal loss when you want to keep the original dataset distribution.
  • If you need more positives, acquire/label more minority examples using in-account workflows.

Also, constraints like “data must not leave the AWS account” rule out external data-synthesis services even if they might help class balance.


Question 2

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

Which statement is INCORRECT about selecting deployment infrastructure for autoscaling and high availability (HA) for ML inference on AWS?

  • A. Shift traffic between SageMaker production variants for safer deployments.
  • B. Use two AZ subnets and multiple endpoint instances for HA.
  • C. A single-instance SageMaker endpoint is highly available by default.
  • D. Place ECS inference tasks behind multi-AZ ALB for high availability.

Best answer: C

Explanation: High availability requires redundant capacity so that a single instance or AZ failure does not interrupt inference. A single-instance real-time endpoint can be replaced if it fails, but traffic will fail during the replacement. Multi-AZ design plus multiple serving instances and controlled rollout patterns are standard ways to reduce outages while supporting autoscaling.

The core idea for HA in ML inference is eliminating single points of failure in the serving stack. A real-time endpoint with only one instance cannot be considered highly available because an instance crash, maintenance event, or container start failure will interrupt requests until a new instance is healthy.

To build HA and autoscaling:

  • Run multiple serving instances and scale them with Application Auto Scaling.
  • Use infrastructure that can spread across AZs (for example, provide subnets in at least two AZs and maintain more than one instance/task).
  • Use deployment patterns that reduce rollout risk (for example, weighted traffic shifting between SageMaker production variants).

A common failure mode is assuming “automatic replacement” equals HA; replacement is recovery, not continuous availability.


Question 3

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A team deployed an image segmentation model to a SageMaker real-time endpoint (single ml.g4dn.xlarge) behind Amazon API Gateway (REST). During traffic bursts, CloudWatch shows the endpoint returns 5XX and API Gateway logs show 504 Integration timeout.

A trace of the model container shows some inferences take 70-120 seconds (large images), and these requests fail even though CPU and GPU utilization are not saturated. The application can accept an asynchronous pattern where callers receive a job ID and retrieve results within 5 minutes. The team wants the smallest change while staying on SageMaker.

Which deployment target should the team use to fix the root cause?

  • A. Deploy the model to a SageMaker asynchronous inference endpoint
  • B. Change the endpoint instance type to a larger GPU instance
  • C. Enable SageMaker endpoint auto scaling on InvocationsPerInstance
  • D. Package the model as an AWS Lambda container image

Best answer: A

Explanation: The symptom is request failures caused by end-to-end synchronous timeouts (API Gateway 504) when inference takes longer than the request/response window. The root cause is not capacity but long-running inference duration for some inputs. Using SageMaker asynchronous inference is the minimal SageMaker-native deployment change that supports long processing times by decoupling submission from result retrieval.

Synchronous inference paths (API Gateway to a SageMaker real-time endpoint) have strict request time limits, so any inference that runs longer can fail even if the endpoint has enough compute. Here, traces show 70-120 second processing for some images, which triggers API Gateway integration timeouts and surfaces as 5XX/504 errors.

SageMaker asynchronous inference is designed for this pattern:

  • Client submits a request and immediately receives an acknowledgment/job identifier.
  • SageMaker processes the request in the background.
  • Output is delivered to S3 (and can be paired with notifications/polling).

This fixes the timeout-driven failure mode without changing the model code or leaving SageMaker; scaling or bigger instances might reduce runtime but does not remove the synchronous timeout constraint.


Question 4

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A retail company builds a churn model from ~180 GB of daily CSV files in Amazon S3. The team must (1) create and validate feature engineering steps with a report that can be reviewed before use, (2) export a repeatable workflow that runs daily without manual Studio steps, (3) keep processing inside a VPC with S3/KMS encryption, and (4) minimize custom infrastructure (no self-managed Spark/EMR).

Which solution best meets these requirements?

  • A. Build a SageMaker Data Wrangler flow, generate a Data Quality and Insights Report, and export the flow to SageMaker Pipelines as a Processing step that writes curated features to SageMaker Feature Store; trigger the pipeline daily with EventBridge
  • B. Export the Data Wrangler flow to a notebook and run it daily on an EC2 instance with cron, writing the output dataset to S3
  • C. Use AWS Glue Studio visual ETL to transform the CSVs and store the output in S3, then reimplement the same transformations in the inference code
  • D. Run an EMR Spark job with custom Deequ checks and write engineered features into DynamoDB for model training

Best answer: A

Explanation: SageMaker Data Wrangler is designed to build interactive, repeatable feature engineering flows and to validate them with built-in analysis and data quality reports. Exporting the flow to SageMaker Pipelines as a Processing step operationalizes the exact same transformation logic on a schedule with governance-friendly execution history. Writing outputs to SageMaker Feature Store makes the engineered features reusable and consistent across training and inference workflows.

The core requirement is a repeatable, reviewable feature engineering workflow created in SageMaker Data Wrangler. In Studio, you can build a Data Wrangler .flow, run built-in analyses, and generate a Data Quality and Insights Report to validate schema, missing values, distributions, and other checks before promoting the transformation.

To make the workflow repeatable and auditable, export the Data Wrangler flow to SageMaker Pipelines (as a Processing step). This lets the same transformation logic run on demand or on a schedule (for example via EventBridge), with execution history in SageMaker. Running the processing job in a VPC and using S3 with KMS encryption aligns with the security constraints, and persisting engineered features to SageMaker Feature Store supports consistent reuse across model builds and inference consumers.

Approaches that rely on manual notebook runs or separate ETL implementations break repeatability and governance of a single transformation definition.


Question 5

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A team uses AWS CloudFormation to provision a SageMaker real-time endpoint that runs in private subnets (no internet gateway, no NAT gateway). The VPC already has an S3 gateway endpoint. The stack uses a CloudFormation service role named CFNDeployRole.

During stack creation, CloudFormation shows these failures:

CREATE_FAILED AWS::SageMaker::Endpoint
AccessDenied: User is not authorized to perform iam:PassRole on arn:aws:iam::123456789012:role/SageMakerExecutionRole

CREATE_FAILED AWS::SageMaker::Endpoint
ResourceInitializationError: failed to pull image... dial tcp...ecr...:443: i/o timeout

CREATE_FAILED AWS::SageMaker::Endpoint
Failed to create log stream... dial tcp...logs...:443: i/o timeout

Which actions will resolve these provisioning failures? (Select THREE.)

  • A. Create interface VPC endpoints for Amazon ECR (ecr.api and ecr.dkr)
  • B. Add a NAT gateway and update private subnet route tables
  • C. Allow iam:PassRole on SageMakerExecutionRole for CFNDeployRole
  • D. Add an S3 gateway endpoint to the VPC
  • E. Increase the endpoint instance size to reduce initialization timeouts
  • F. Create an interface VPC endpoint for CloudWatch Logs in the VPC

Best answers: A, C, F

Explanation: The failures map directly to missing IAM permission and missing private network paths. iam:PassRole is required so CloudFormation can attach the SageMaker execution role during endpoint creation. Because the endpoint runs in private subnets without NAT, you must add VPC interface endpoints (PrivateLink) for the services the container must reach: ECR to pull the image and CloudWatch Logs to write logs.

When provisioning SageMaker endpoints with infrastructure-as-code, creation can fail due to (1) IAM permissions used by the provisioning tool and (2) networking reachability from the endpoint ENIs.

Here, the explicit iam:PassRole error indicates the CloudFormation service role lacks permission to pass the execution role to SageMaker. The i/o timeout errors to ECR and CloudWatch Logs indicate the endpoint is attempting to reach public service endpoints but has no internet egress. In a private VPC design, the AWS-appropriate fix is to add the needed interface VPC endpoints (PrivateLink) so the endpoint can reach ECR to pull the container image and CloudWatch Logs to publish logs without NAT. The key takeaway is to align IAM (PassRole) and VPC endpoints with the endpoint’s required service dependencies.


Question 6

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A company is building an ML training dataset in Amazon SageMaker. Source data is stored in three places: clickstream logs in Amazon S3, customer transactions in an Amazon RDS for PostgreSQL database, and user profile attributes in an Amazon DynamoDB table. The team wants appropriate, AWS-native mechanisms to extract each dataset for preprocessing and model training.

Which THREE actions meet these requirements? (Select THREE.)

  • A. Use RDS automated backups as the primary mechanism to export tables to S3
  • B. Attach an existing EBS volume directly to the SageMaker training job
  • C. Configure the SageMaker job to read training input directly from S3
  • D. Use DynamoDB point-in-time export to S3 for the table data
  • E. Use EFS replication to copy files from EFS into S3
  • F. Use an AWS Glue job with a JDBC connection to RDS and write results to S3

Best answers: C, D, F

Explanation: Use mechanisms that natively extract from each datastore with minimal operational overhead. SageMaker can consume objects directly from S3, AWS Glue can extract relational data from RDS using JDBC and write it to S3, and DynamoDB supports exporting table data to S3. These approaches are common building blocks for creating training datasets in AWS ML pipelines.

For ML data preparation, the most common pattern is to extract data from operational stores into an analytics-friendly landing zone (often S3), then preprocess it for training.

In this scenario:

  • S3 data can be read directly by SageMaker training/processing jobs as input data.
  • RDS is a relational store; an ETL/extract mechanism such as AWS Glue with a JDBC connection can pull the required tables/queries and write the output to S3.
  • DynamoDB provides a managed export capability that writes table data to S3, which is well suited for batch feature generation.

Options that rely on unsupported direct attachments or misuse backup/replication features do not provide a correct or intended extraction path for building ML datasets.


Question 7

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A team runs nightly inference by using an Amazon SageMaker batch transform job. The job reads from s3://ml-prod/inference/input/customer_scores.csv (one 120-GB CSV file) and writes to s3://ml-prod/inference/output/. The job is configured with InstanceCount=4, but it consistently misses the 2-hour window and CloudWatch shows only one instance has sustained CPU usage while the other three are mostly idle.

Which change will fix the root cause with the LEAST operational change while keeping the batch workflow?

  • A. Replace batch transform with a real-time endpoint and enable autoscaling
  • B. Increase MaxConcurrentTransforms for the transform job
  • C. Increase the instance count from 4 to 8
  • D. Shard the CSV into multiple S3 objects and set the input S3Uri to the prefix

Best answer: D

Explanation: The symptom (three idle instances) indicates the job is not being parallelized across the cluster. In SageMaker batch transform, instance-level parallelism primarily comes from distributing separate S3 input objects to different instances. Writing the input as multiple files and pointing the transform job at the S3 prefix allows all instances to process data concurrently and still write results to the configured S3 output location.

Symptom: a 4-instance batch transform job runs long and only one instance is busy.

Root cause: SageMaker batch transform distributes work to instances based on the number of S3 input objects. When the input is a single large file, one instance receives that object and processes it, leaving other instances with little or nothing to do.

Fix: split the input into multiple smaller files (for example, by partitioning/sharding the CSV) in an S3 prefix such as s3://ml-prod/inference/input/sharded/, and configure the transform job’s input location to that prefix while keeping the output S3 prefix the same.

This increases parallelism without changing the model or moving to an online serving pattern.


Question 8

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A team uses AWS CodePipeline to build and deploy a SageMaker real-time endpoint. CodeBuild pulls a trained model artifact from an S3 bucket that is encrypted with a customer managed KMS key (SSE-KMS), builds an inference image, and pushes it to Amazon ECR.

Exhibit: recent pipeline failures

[Container]... Running command aws s3 cp s3://ml-prod-models/run-173/model.tar.gz./model.tar.gz
fatal error: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

[Container]... Phase complete: POST_BUILD State: FAILED
Message: no matching artifact paths found: model.tar.gz

Security and audit requirements:

  • Artifact buckets must remain private and encrypted with SSE-KMS.
  • CI/CD roles must follow least privilege.
  • Each deployment must be traceable to a specific commit and immutable build output.

Which TWO mitigation actions should the team AVOID?

  • A. Fix the buildspec artifacts section so it publishes the correct model.tar.gz path to CodePipeline
  • B. Add a CodeBuild pre-build check that fails fast if the expected S3 object key and local artifact path do not exist
  • C. Change the deployment stage to always deploy the latest ECR image tag to simplify rollbacks
  • D. Attach the AdministratorAccess managed policy to the CodeBuild service role
  • E. Add least-privilege permissions for the CodeBuild role to s3:GetObject on the model bucket and kms:Decrypt on the specific CMK
  • F. Update the pipeline to use an immutable image tag (for example, commit SHA) and deploy that exact tag

Best answers: C, D

Explanation: Two actions violate explicit requirements even if they could “unblock” the pipeline. Broadening the build role to admin permissions breaks least privilege, and deploying a floating latest tag breaks auditability and reproducibility. The remaining actions address the actual failure modes shown: missing KMS/S3 access and a misconfigured build artifact path.

CI/CD failures like the ones shown usually come from (1) insufficient permissions to read encrypted artifacts and (2) incorrect artifact publication paths.

Here, the S3 copy fails because the build identity needs both S3 read access and the ability to use the KMS key that encrypts the object (for example, s3:GetObject plus kms:Decrypt/key policy permissions). The subsequent “no matching artifact paths” error indicates the build did not produce the file where CodeBuild/CodePipeline expects it, so the buildspec.yml artifacts paths (or the build output location) must be corrected and validated.

Mitigations should preserve least privilege and immutable, commit-traceable deployments; shortcuts like admin roles or floating tags undermine those requirements.


Question 9

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

Which Amazon SageMaker capability provides a central place to version trained model artifacts and metadata, apply an approval workflow (for example, staging to production), and reliably promote the same model package to deployment?

  • A. Amazon SageMaker Model Monitor
  • B. SageMaker Automatic Model Tuning
  • C. Amazon SageMaker Feature Store
  • D. Amazon SageMaker Model Registry

Best answer: D

Explanation: Amazon SageMaker Model Registry is the service feature used to register model versions and govern promotion using approval statuses. It enables consistent staging-to-production release of the same model package artifact with associated metadata and lineage, supporting reliable deployment workflows.

SageMaker Model Registry is a governance and release-management capability for ML models: it stores model package versions (artifact locations, inference container/image, metrics, and metadata) in a model package group and tracks an approval status (such as pending/approved). This lets teams promote a specific, immutable model package through environments (dev/staging/prod) and drive deployments from CI/CD using the same approved artifact, improving test-to-production parity and reducing “it worked in staging” drift caused by redeploying different builds.

The closest confusion is SageMaker Feature Store, which manages feature definitions and online/offline feature values for training/inference consistency, not model version promotion.


Question 10

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A model is deployed to an Amazon SageMaker real-time endpoint, but requests often take 5-10 minutes and include large payloads. The endpoint frequently times out during traffic bursts even though low latency is not required. Which deployment option is specifically designed for long-running, bursty inference and is the most appropriate corrective action?

  • A. SageMaker asynchronous inference endpoint
  • B. SageMaker serverless inference endpoint
  • C. SageMaker batch transform job
  • D. Add Application Auto Scaling to the real-time endpoint

Best answer: A

Explanation: SageMaker asynchronous inference is intended for requests that can take a long time to complete and arrive in bursts. It uses a queue-based pattern so clients don’t hold open real-time HTTP connections that commonly time out. Results are delivered asynchronously (for example, to Amazon S3), matching the stated requirements.

The core issue is a wrong endpoint type: a real-time endpoint is optimized for low-latency, synchronous request/response patterns and can time out when inference takes minutes. SageMaker asynchronous inference is the appropriate deployment option when you have long-running inference, large payloads, or bursty traffic and you can accept delayed responses. It buffers requests using a managed queue and decouples request submission from model execution, which reduces client timeouts and helps absorb bursts without requiring you to overprovision steady capacity.

Batch transform is also asynchronous, but it is a job-based, offline pattern for processing datasets, not interactive per-request inference.

Key takeaway: choose asynchronous inference endpoints for long-running, spiky, request-driven workloads where immediate responses are not required.


Question 11

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

An ML team deploys models to Amazon SageMaker endpoints using a custom inference container in Amazon ECR and model artifacts in Amazon S3. The team wants safe rollbacks and repeatable deployments across environments.

Which action best aligns with the core principle of reproducibility?

  • A. Enable SageMaker Model Monitor to detect drift over time.
  • B. Require separate approvals for model registration and deployment.
  • C. Version models and images; deploy using immutable version identifiers.
  • D. Grant endpoints only the minimum IAM permissions required.

Best answer: C

Explanation: Reproducibility means you can recreate the same deployed system from known, immutable inputs. Using explicit versioning for model artifacts and container images and referencing those immutable identifiers in deployment definitions makes rollbacks deterministic and deployments repeatable across environments.

For rollback safety and repeatability, the key principle is reproducibility: the deployed model must be reconstructable from fixed, uniquely identified artifacts. In practice on AWS, this means treating the model artifact and inference image as versioned, immutable inputs to deployment automation.

Common patterns include:

  • Store model artifacts with versioning (for example, S3 object version IDs) and promote versions via a registry.
  • Use immutable container references (for example, ECR image digests rather than mutable tags like latest).
  • Configure deployment/IaC to reference those exact versions so you can redeploy the same combination or roll back by selecting a prior version.

Controls like IAM scoping, drift monitoring, and approval workflows are valuable, but they do not by themselves make deployments reproducible.


Question 12

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A company is building an Amazon SageMaker image model to detect whether workers are wearing safety helmets. Images are stored in Amazon S3 and were collected continuously, so capture order strongly correlates with lighting (day vs. night). Only 6% of the images are from night conditions, and the current model underperforms at night.

The team wants to reduce prediction bias across lighting conditions while keeping train/validation/test metrics statistically valid for expected production use. Which actions should the team take? Select THREE.

  • A. Build the test set with a 50/50 day-night mix regardless of prevalence
  • B. Shuffle the dataset before splitting into train/validation/test
  • C. Apply augmentation to the full dataset before splitting
  • D. Augment night images only in the training set
  • E. Duplicate night images into validation and test sets to balance them
  • F. Stratify the split by label and lighting condition

Best answers: B, D, F

Explanation: Use data-splitting and augmentation techniques that improve representation of the under-sampled condition without leaking synthetic or duplicated samples into evaluation. Stratification and shuffling help ensure each split reflects the underlying data generation process rather than collection order. Augment only the training data to improve robustness while keeping validation/test unbiased.

To reduce prediction bias across subgroups (here, lighting conditions) while preserving statistical validity, the evaluation data must remain an untouched sample of the real-world distribution you expect in production. Use splitting methods that prevent accidental skews caused by collection order and ensure minority conditions appear in every split.

Practical approach:

  • Shuffle before splitting when samples are IID but ordered by collection artifacts.
  • Stratify the split so each subset contains an appropriate share of labels and lighting conditions.
  • Apply augmentation only to the training subset (especially for the underrepresented condition) to improve generalization without inflating evaluation results.

Key takeaway: improve representation and robustness in training while keeping validation/test as a clean, unbiased measurement set.


Question 13

Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security

A team runs a real-time Amazon SageMaker endpoint behind an API. They want to add alerting to quickly detect inference workflow anomalies.

Exhibit: CloudWatch metrics (last 5 minutes)

Invocations: 2,000
Invocation4XXErrors: 380
Invocation5XXErrors: 0
ModelLatency p95: 48 ms
CPUUtilization: 34%

Which alerting signal is the most appropriate to configure for this situation?

  • A. Alarm on Invocations dropping below a baseline
  • B. Alarm on Invocation4XXErrors for the SageMaker endpoint
  • C. Alarm on endpoint CPUUtilization exceeding a threshold
  • D. Alarm on ModelLatency p95 exceeding a threshold

Best answer: B

Explanation: The best alerting signal is the SageMaker endpoint’s 4XX error metric because it directly indicates failed inference requests due to invalid inputs or request formatting. The exhibit shows a substantial count of Invocation4XXErrors: 380 while Invocation5XXErrors: 0, pointing to a request/payload issue rather than service failure or performance saturation.

For real-time SageMaker inference, CloudWatch endpoint metrics separate failures into 4XX (client-side/request issues) and 5XX (server-side/model container/platform issues). In the exhibit, Invocation4XXErrors: 380 is high while Invocation5XXErrors: 0, and latency/CPU look normal (ModelLatency p95: 48 ms, CPUUtilization: 34%). That pattern is most consistent with an upstream data/schema/payload problem (for example, missing required fields, wrong content-type, or schema drift) causing requests to be rejected.

The most AWS-appropriate alerting signal here is a CloudWatch alarm on the endpoint’s Invocation4XXErrors (typically using a rate or count over a short period) so operators are notified as soon as bad requests start spiking. A latency or CPU alarm would not reliably detect this failure mode.


Question 14

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A company deploys an Amazon SageMaker real-time endpoint in private subnets (no route to an internet gateway and no NAT). The model image is in Amazon ECR and the model artifacts are in Amazon S3. Callers are in the same VPC and from on-premises over AWS Direct Connect (private VIF only).

After deployment, the endpoint shows errors pulling the container image, and VPC callers invoking the endpoint time out. The company requires that all traffic stays on private connectivity (no public internet).

Which TWO actions will meet these requirements? (Select TWO.)

  • A. Add a NAT gateway and route 0.0.0.0/0 from the private subnets
  • B. Place an internet-facing ALB in front of the SageMaker endpoint
  • C. Create an interface VPC endpoint for SageMaker Runtime with Private DNS
  • D. Move the endpoint to public subnets and assign public IPs
  • E. Allow inbound TCP 443 from on-premises in the endpoint security group
  • F. Create VPC endpoints for S3 and ECR (API and DKR)

Best answers: C, F

Explanation: SageMaker endpoint invocation uses the SageMaker Runtime service endpoint, so private callers need an interface VPC endpoint (AWS PrivateLink) to avoid public internet paths. Separately, when an endpoint is in private subnets without NAT, it needs private access to dependencies like S3 and ECR through VPC endpoints so it can download model artifacts and container images.

To keep both deployment-time and inference-time traffic private when using SageMaker endpoints in private subnets, you must provide private network paths to the AWS services involved.

  • For inference calls from the VPC (and from on-prem over Direct Connect private VIF), create an interface VPC endpoint for com.amazonaws.<region>.sagemaker.runtime and enable Private DNS so InvokeEndpoint resolves to private IPs.
  • For endpoint provisioning and startup without NAT/IGW, create VPC endpoints so the endpoint can fetch the model and container: an S3 gateway endpoint (for S3 access) and interface endpoints for ECR (both API and DKR) so the image can be pulled privately.

Security groups attached to the endpoint control the endpoint’s ENI traffic to VPC resources, not the client’s InvokeEndpoint path to the SageMaker Runtime service.


Question 15

Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security

A team sees an unexpected daily cost spike in an AWS account that heavily uses Amazon SageMaker. They need the next investigation step to attribute the spike to specific ML activities (for example, individual training jobs, endpoint instance-hours, or batch transforms) at the most granular level.

Which AWS capability best fits this need?

  • A. Amazon SageMaker Model Monitor
  • B. AWS Cost and Usage Report (CUR)
  • C. Amazon SageMaker Model Registry
  • D. Amazon SageMaker Clarify

Best answer: B

Explanation: The AWS Cost and Usage Report (CUR) is designed for detailed cost attribution and investigation. It provides the most granular, line-item billing and usage data, which you can query to correlate a spend spike with specific SageMaker activities such as training job usage and endpoint instance-hours.

The core concept is cost attribution: when costs spike, you first need granular billing/usage data that can be grouped by service, resource identifiers, and (ideally) cost allocation tags. AWS Cost and Usage Report (CUR) is the AWS billing dataset that contains line-item cost and usage records, making it the best starting point to correlate a cost increase to concrete SageMaker activities (training, hosting/endpoints, batch transform, processing).

A common next step is to enable CUR delivery to Amazon S3 and query it (for example, with Amazon Athena) while filtering/grouping on SageMaker-related line items and tags to identify the specific job, endpoint, or instance-hours driving the change.


Question 16

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A company is deploying a SageMaker inference workload for a payment fraud model. The endpoint must meet a strict, consistent p99 latency SLO during business hours, and the team wants repeatable performance across deployments (no cold-start variability). Which action best reflects the core principle of reproducibility when choosing between on-demand and provisioned resources?

  • A. Use an asynchronous inference endpoint that can scale to zero
  • B. Use SageMaker Serverless Inference so capacity is created on demand
  • C. Run inference with Batch Transform jobs only when new files arrive in S3
  • D. Deploy a real-time endpoint with provisioned instances and a nonzero minimum capacity

Best answer: D

Explanation: Reproducibility emphasizes consistent, repeatable behavior of the system under the same conditions. For low-latency real-time inference, provisioned (always-on) instances with a nonzero minimum capacity reduce variability from cold starts and capacity spin-up. This makes endpoint performance more predictable across deployments than on-demand options.

The principle being applied is reproducibility: the system should behave consistently and predictably when deployed repeatedly. For inference workloads with strict latency SLOs, provisioned resources (an always-on real-time endpoint with a nonzero minimum capacity) provide stable compute availability and avoid cold-start effects that can introduce performance variance.

On-demand inference options (serverless, async scaling-to-zero, or batch) are optimized for intermittent traffic and cost efficiency, but they can add warm-up time or queueing that makes latency less consistent. The key takeaway is to choose provisioned capacity when you need predictable real-time performance, and on-demand capacity when traffic is spiky and latency tolerance is higher.


Question 17

Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security

A company in a regulated industry must run Amazon SageMaker training jobs and a real-time endpoint in a VPC with no internet access (no direct or indirect egress). Training data is in Amazon S3, and custom containers are stored in Amazon ECR. Logs must go to Amazon CloudWatch.

Which approach is NOT appropriate to meet the network isolation requirement?

  • A. Place jobs in a public subnet and use a NAT gateway so the container can download packages from the internet at startup
  • B. Use VPC endpoints for S3, ECR, and CloudWatch so jobs can access required services privately
  • C. Enable SageMaker network isolation and ensure all dependencies are packaged in the container or retrieved from S3 via VPC endpoints
  • D. Run training and the endpoint in private subnets with no route to an internet gateway

Best answer: A

Explanation: When policies require no internet access, SageMaker training and inference must run in private subnets without internet routes and rely on private connectivity (VPC endpoints) for AWS service access. Allowing internet egress (even via NAT) breaks network isolation because the container can reach external networks.

The core principle is enforcing network isolation by preventing any internet egress path for training and inference while still allowing access to required AWS services over private networking. In SageMaker, this typically means placing jobs/endpoints in private subnets, removing routes to an internet gateway/NAT gateway, and using VPC endpoints to reach services such as S3 (Gateway endpoint), ECR (Interface endpoints), and CloudWatch Logs (Interface endpoint). If additional packages or artifacts are needed, they must be pre-packaged into the container image or stored in S3 and accessed through those endpoints.

Allowing a NAT gateway so containers can fetch dependencies at runtime is an anti-pattern because it reintroduces outbound internet connectivity.


Question 18

Topic: Content Domain 2: ML Model Development

A team brings a custom Docker image (built externally) into Amazon SageMaker to run a training job. The Estimator points to an S3 URI using the training input channel. The training job fails almost immediately.

Exhibit: CloudWatch log excerpt

FileNotFoundError: [Errno 2] No such file or directory: '/data/train.csv'

The container’s training code was written to read local files from /data and to write outputs to /output. The team wants the smallest change that makes the container work with SageMaker-managed training without changing the dataset in S3.

Which action will fix the root cause?

  • A. Increase the training instance size to avoid container disk pressure
  • B. Attach an IAM policy that allows s3:GetObject for the input bucket
  • C. Switch the job to a SageMaker Batch Transform job
  • D. Update the container to read from SM_CHANNEL_TRAINING and write artifacts to /opt/ml/model

Best answer: D

Explanation: The error shows the container is looking for training data in /data, but SageMaker-managed training stages input data under /opt/ml/input/data/<channel>. For the training channel, the portable way to locate the data is to read the SM_CHANNEL_TRAINING environment variable. The container should also write model artifacts to /opt/ml/model so SageMaker can upload them to S3 at job completion.

When you bring your own training container, SageMaker injects a standard filesystem layout and environment variables that your code must follow. Input data for each channel is downloaded and mounted under /opt/ml/input/data/<channel> (for example, the training channel path is available via SM_CHANNEL_TRAINING). If the container instead hardcodes paths like /data/train.csv, it will fail even though the S3 input configuration is correct.

To integrate cleanly with SageMaker training:

  • Read training/validation data from /opt/ml/input/data/<channel> (or SM_CHANNEL_*).
  • Write the trained model artifacts to /opt/ml/model so SageMaker can package and upload them.

Changing instance size, IAM, or job type does not address a path/layout mismatch inside the container.


Question 19

Topic: Content Domain 3: Deployment and Orchestration of ML Workflows

A company is deploying a real-time Amazon SageMaker endpoint for an NLP model. The FP16 model artifact in Amazon S3 is 3.2 GB, and when loaded it uses ~5.5 GB of host RAM (or equivalent GPU memory) due to runtime overhead. The endpoint must meet p95 latency of 150 ms at 100 requests/second steady state and up to 300 requests/second during bursts.

Which inference infrastructure choice should the ML engineer AVOID?

  • A. Use a GPU instance with enough GPU memory to keep the model resident
  • B. Choose a small CPU instance with 4 GB RAM and rely on swap
  • C. Scale out to multiple instances behind the endpoint to handle 300 RPS bursts
  • D. Use a distilled smaller model on CPU for traffic with relaxed latency needs

Best answer: B

Explanation: For real-time inference, the model must fit comfortably in memory (RAM or GPU memory) without swapping, and the compute must sustain the required throughput while meeting the latency SLO. Choosing an instance that cannot hold the loaded model forces paging or crashes, which predictably increases tail latency and reduces availability.

The core principle is to size inference compute so the model and runtime stay memory-resident and can execute fast enough to meet p95 latency at peak concurrency. If the loaded model needs ~5.5 GB, selecting a 4 GB RAM instance guarantees memory pressure; using swap turns memory misses into disk I/O, which dramatically increases tail latency and can cause OOM kills during bursts.

A sound approach is:

  • Ensure the model fits with headroom in RAM/VRAM (plus framework overhead).
  • Use GPU when needed for latency/throughput (or when the model benefits from acceleration).
  • Scale out with multiple instances to meet peak RPS, then right-size based on utilization.

The key takeaway is that swapping is not a capacity strategy for latency-sensitive inference.


Question 20

Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security

A team is hosting a model on an Amazon SageMaker Serverless Inference endpoint to minimize cost. CloudWatch shows latency spikes only at the start of sudden traffic bursts after long idle periods, then returns to normal once the endpoint is “warmed up.”

Which SageMaker capability should the team use to reduce these burst-time latency spikes while keeping the serverless deployment model?

  • A. Enable Application Auto Scaling on a real-time endpoint
  • B. Configure provisioned concurrency on the serverless endpoint
  • C. Request a higher SageMaker endpoint concurrency service quota
  • D. Switch to an asynchronous inference endpoint

Best answer: B

Explanation: The symptom is cold-start latency on a SageMaker Serverless Inference endpoint after idle periods. Provisioned concurrency mitigates cold starts by keeping a set amount of serverless capacity pre-warmed and ready to serve requests. This preserves the serverless operational model while improving burst performance.

Serverless Inference can show higher latency for the first requests after a period of inactivity because capacity must be initialized (a cold start). When the goal is to keep a serverless endpoint but reduce those cold-start spikes, the high-level mitigation is to configure provisioned concurrency for the serverless endpoint. This maintains pre-initialized capacity so sudden bursts are served with lower, more consistent latency.

The key distinction is that this targets cold-start behavior specifically; other approaches either change the inference pattern (queued/async) or move to always-on instances (different cost model).


Question 21

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A company is building a near-real-time fraud model on AWS. Transaction labels are stored in Amazon S3 (weekly retraining, 200 million rows). Customer and merchant features change multiple times per day and can arrive up to 24 hours late. The team must prevent training data leakage by using correct point-in-time feature values and must serve online predictions with <50 ms p99 latency. Data must be encrypted with AWS KMS and the solution should minimize custom feature-joining code.

Which approach is the BEST fit?

  • A. Use Feature Store with record ID/event time; build PIT dataset.
  • B. DynamoDB online features and manual timestamped S3 training sets.
  • C. Feature Store event time set to processing ingestion time.
  • D. Glue joins latest S3 snapshots by key for training.

Best answer: A

Explanation: Amazon SageMaker Feature Store is designed to manage feature groups with a record identifier and event time so offline training datasets can be built with point-in-time correctness (avoiding leakage), while the same features can be retrieved with low latency for online inference. Enabling offline and online stores also supports governance requirements such as KMS encryption without building custom joining systems.

To prevent leakage, the training set must join each transaction to the most recent feature values that were valid at the transaction time, even when feature updates arrive late. In SageMaker Feature Store, you model this by creating feature groups that include:

  • RecordIdentifierFeatureName (for example, customer_id or merchant_id)
  • EventTimeFeatureName (the source event/effective timestamp, not ingestion time)
  • Offline store in S3 (KMS-encrypted) and online store for <50 ms reads

Then you build the training dataset using Feature Store offline retrieval with point-in-time correctness (for example, the CreateDataset workflow) so each transaction timestamp is used to retrieve the correct historical features. This avoids maintaining custom temporal join logic while keeping training and inference features consistent.


Question 22

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A healthcare company trains models in Amazon SageMaker using CSV feature extracts generated nightly in an on-premises data center. About 500 GB must be ingested each night into an Amazon S3 bucket in us-east-1 and be available within 2 hours.

Requirements:

  • Data contains PII and must use TLS in transit and encryption at rest with a customer managed AWS KMS key
  • The transfer must include an integrity check (verification that data arrives uncorrupted)
  • The solution must be fully managed and require minimal custom code and operations

Which approach BEST meets these requirements?

  • A. Run a nightly cron job using aws s3 cp over HTTP to upload to S3 with SSE-S3 enabled
  • B. Use AWS DataSync with an on-premises agent to copy data to an S3 bucket encrypted with SSE-KMS (CMK), and restrict access using IAM and S3 bucket policy
  • C. Export the data to an AWS Snowball Edge device each night and ship it to AWS for import into S3
  • D. Stream the nightly files through Amazon Kinesis Data Firehose into S3 with SSE-KMS enabled

Best answer: B

Explanation: AWS DataSync is designed for scheduled, high-volume transfers from on premises to AWS with TLS in transit and optional task verification to confirm data integrity. It can land data directly in an S3 bucket encrypted with SSE-KMS using a customer managed key, and it avoids building and operating a custom ingestion application.

For secure ingestion from on premises to S3, the key requirements are encryption in transit, encryption at rest with a customer managed KMS key, and an integrity/verification mechanism, while keeping operations low. AWS DataSync fits bulk, scheduled transfers and provides managed connectivity from an on-premises agent to AWS using TLS. It can copy into an S3 bucket configured for SSE-KMS (CMK) and can run task verification to validate transferred data.

A typical secure setup is:

  • Configure the S3 bucket with default SSE-KMS using the CMK
  • Create a DataSync task from the on-premises location to the S3 location and enable verification
  • Use IAM and S3 bucket policies (and optionally S3 access points/VPC endpoints) to tightly control access

This directly addresses confidentiality (TLS + SSE-KMS) and integrity (verification) without custom ingestion code.


Question 23

Topic: Content Domain 1: Data Preparation for Machine Learning (ML)

A team is designing an Amazon SageMaker Ground Truth workflow to label customer-support emails for intent classification. The team will use a private workforce and needs consistently high-quality labels with a clear process to catch systematic labeling mistakes.

Which workflow design choice best applies the core principle of separation of duties to improve label quality?

  • A. Grant labelers read-only S3 access to the input bucket
  • B. Enable default SSE-KMS encryption for the S3 buckets
  • C. Use one workforce for labeling and a different workforce for audits
  • D. Version the labeling job manifests and templates in source control

Best answer: C

Explanation: Separation of duties means different people (or teams) perform execution and independent verification. In Ground Truth, assigning labeling to one workforce and quality audits/verification to a separate workforce reduces correlated errors and makes systematic issues easier to detect and correct, producing higher-quality labels.

The key principle is separation of duties: the party producing an outcome should not be the same party validating it. For Ground Truth labeling, this translates into designing quality controls where an independent group reviews outputs, creates or maintains a gold set, and/or performs verification tasks.

A practical pattern is:

  • Use a private workforce to perform the primary labeling task.
  • Route samples (or low-confidence items) to a separate auditor workforce for review/verification.
  • Use the auditor feedback to correct labels and refine instructions.

This structure improves dataset quality by reducing shared bias and preventing the same annotators from “grading their own work.”


Question 24

Topic: Content Domain 2: ML Model Development

A company is building a model to classify customer support tickets into 12 categories. The team has 8,000 labeled tickets and must deliver an endpoint in 2 weeks.

They created a SageMaker Automatic Model Tuning job that trains a PyTorch Transformer initialized with random weights (no pretrained checkpoint). After 30 tuning jobs, the best validation F1-score is 0.54, and additional jobs do not improve it. Training F1 approaches 0.95, but validation F1 drops after the first epoch.

Which change will fix the root cause with the least effort while meeting the timeline?

  • A. Increase tuning jobs and widen the learning rate range
  • B. Train longer and add early stopping to the current script
  • C. Fine-tune a SageMaker JumpStart pretrained text model
  • D. Move training to larger GPU instances and increase batch size

Best answer: C

Explanation: The tuning results indicate severe overfitting caused by training a large NLP model from scratch on a small labeled dataset. A pretrained/foundation model approach (such as SageMaker JumpStart) is designed to be fine-tuned with limited labeled data and typically reaches strong validation performance quickly. This is the lowest-effort change that also fits the 2-week delivery constraint.

Symptom: training performance becomes very high while validation performance peaks early and then degrades, and hyperparameter tuning cannot improve the validation F1.

Root cause: the team is training a Transformer from random initialization with only 8,000 labeled examples, which is typically insufficient to learn strong language representations from scratch; tuning can’t compensate for the lack of pretraining signal.

Fix: start from a pretrained/foundation model (for example, a SageMaker JumpStart text classification model) and fine-tune it on the labeled tickets. This uses transfer learning so the model begins with general language features and needs far less labeled data and experimentation to generalize well.

The key takeaway is to prefer pretrained/foundation models over scratch training when labeled data and time are limited.

MLA-C01 machine learning engineer map

Use this map after the sample questions to connect individual items to the AWS ML engineering lifecycle decisions these practice samples test.

    flowchart LR
	  S1["ML problem definition"] --> S2
	  S2["Prepare features and training data"] --> S3
	  S3["Train tune and evaluate model"] --> S4
	  S4["Deploy endpoint or batch job"] --> S5
	  S5["Monitor drift quality and cost"] --> S6
	  S6["Retrain or retire model"]

Quick Cheat Sheet

CueWhat to remember
Problem framingChoose supervised, unsupervised, forecasting, NLP, vision, or GenAI pattern based on outcome and data.
Data prepCheck leakage, imbalance, features, labels, quality, and train-validation-test split.
TrainingTrack experiments, metrics, hyperparameters, artifacts, and reproducibility.
DeploymentChoose real-time endpoint, batch transform, serverless, or pipeline deployment by latency and scale.
MonitoringWatch drift, bias, accuracy, latency, failures, and cost after release.

Mini Glossary

  • Feature: Input variable used by a machine learning model.
  • Hyperparameter: Training configuration value set before model training.
  • Model drift: Model performance degradation caused by changing data or behavior.
  • SageMaker Pipelines: AWS workflow service for ML pipeline orchestration.
  • Training job: Process that fits a model to training data.

In this section

Revised on Friday, May 15, 2026