Prepare for AWS Certified Machine Learning Engineer Associate (MLA-C01) with free sample questions, a full-length diagnostic, topic drills, timed practice, data preparation, model development, deployment, monitoring, governance, and detailed explanations in IT Mastery.
MLA-C01 is AWS’s Machine Learning Engineer Associate certification for candidates who need applied MLOps, SageMaker, deployment, monitoring, and ML-platform judgment on AWS. If you are searching for MLA-C01 sample questions, a practice test, mock exam, or exam simulator, this is the main IT Mastery page to start on web and continue on iOS or Android with the same IT Mastery account.
Start a practice session for AWS Certified Machine Learning Engineer - Associate (MLA-C01) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.
Open Full App in a New TabA small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same app-family account they use on web and mobile.
Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same IT Mastery account across web and mobile.
Free diagnostic: Try the 65-question AWS MLA-C01 full-length practice exam before subscribing. Use it as one ML engineering baseline, then return to IT Mastery for timed mocks, domain drills, explanations, and the full Machine Learning Engineer Associate question bank.
MLA-C01 questions usually reward the option that delivers a reliable, monitorable, and secure ML workflow rather than a narrow modeling answer with weak production readiness.
| Domain | Weight |
|---|---|
| Data Preparation for Machine Learning | 28% |
| ML Model Development | 26% |
| Deployment and Orchestration of ML Workflows | 22% |
| ML Solution Monitoring, Maintenance, and Security | 24% |
Use these filters when a modeling answer ignores production constraints:
| Area | What strong readiness looks like |
|---|---|
| Data preparation | You can prevent leakage, handle imbalance, select features, manage labels, and keep training data secure and reproducible. |
| Model development | You can choose tuning, evaluation, model selection, metrics, experiment tracking, and model registry workflows deliberately. |
| Deployment and orchestration | You can match SageMaker deployment and pipeline patterns to latency, release, scale, and operational requirements. |
| Monitoring and security | You can monitor drift, quality, bias, endpoint behavior, cost, IAM, encryption, and auditability in production ML systems. |
| Day | Practice focus |
|---|---|
| 7 | Take the free full-length diagnostic and tag misses by lifecycle stage. |
| 6 | Drill data preparation, leakage, feature engineering, imbalance, labeling, and data-quality scenarios. |
| 5 | Drill training, tuning, metrics, evaluation, experiment tracking, and model registry decisions. |
| 4 | Drill SageMaker deployment, endpoints, batch transform, pipelines, release patterns, and orchestration. |
| 3 | Drill monitoring, drift, explainability, security, IAM, encryption, and cost-control scenarios. |
| 2 | Complete a timed mixed set and explain whether each miss was a data, model, deployment, or monitoring issue. |
| 1 | Review only weak lifecycle transitions; avoid late memorization of low-value service trivia. |
If you can score above roughly 75% on several unseen mixed attempts and explain the ML lifecycle reason behind your answers, you are probably ready to take the exam. Continuing past that point should improve production judgment, not turn scenario stems into memorized patterns.
Use these child pages when you want focused IT Mastery practice before returning to mixed sets and timed mocks.
Need concept review first? Read the AWS MLA-C01 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks, topic drills, and full IT Mastery practice.
These are original IT Mastery practice questions aligned to the MLA-C01 machine-learning lifecycle, data preparation, model development, deployment, monitoring, security, and AWS service-selection decisions. They are not AWS exam questions and are not copied from any exam sponsor. Use them to check readiness here, then continue in IT Mastery with mixed sets, topic drills, and timed mocks.
Topic: Content Domain 2: ML Model Development
A company is building a SageMaker model to detect payment fraud. Only 0.5% of transactions are fraud.
Constraints:
Which TWO approaches should the team AVOID when addressing the class imbalance?
Best answers: A, F
Explanation: For imbalanced classification, acceptable approaches include cost-sensitive learning (class weights/focal loss), careful resampling applied only to the training split, and improving labels for the minority class. You must avoid approaches that contaminate validation data (data leakage) or move sensitive records outside the AWS account.
Imbalanced classification is commonly handled with cost-sensitive learning (penalize minority-class errors more) and/or resampling (over/under-sampling). In all cases, the validation/test split must represent the real distribution and remain independent of any transformation learned from training data.
Practical rules:
Also, constraints like “data must not leave the AWS account” rule out external data-synthesis services even if they might help class balance.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
Which statement is INCORRECT about selecting deployment infrastructure for autoscaling and high availability (HA) for ML inference on AWS?
Best answer: C
Explanation: High availability requires redundant capacity so that a single instance or AZ failure does not interrupt inference. A single-instance real-time endpoint can be replaced if it fails, but traffic will fail during the replacement. Multi-AZ design plus multiple serving instances and controlled rollout patterns are standard ways to reduce outages while supporting autoscaling.
The core idea for HA in ML inference is eliminating single points of failure in the serving stack. A real-time endpoint with only one instance cannot be considered highly available because an instance crash, maintenance event, or container start failure will interrupt requests until a new instance is healthy.
To build HA and autoscaling:
A common failure mode is assuming “automatic replacement” equals HA; replacement is recovery, not continuous availability.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A team deployed an image segmentation model to a SageMaker real-time endpoint (single ml.g4dn.xlarge) behind Amazon API Gateway (REST). During traffic bursts, CloudWatch shows the endpoint returns 5XX and API Gateway logs show 504 Integration timeout.
A trace of the model container shows some inferences take 70-120 seconds (large images), and these requests fail even though CPU and GPU utilization are not saturated. The application can accept an asynchronous pattern where callers receive a job ID and retrieve results within 5 minutes. The team wants the smallest change while staying on SageMaker.
Which deployment target should the team use to fix the root cause?
Best answer: A
Explanation: The symptom is request failures caused by end-to-end synchronous timeouts (API Gateway 504) when inference takes longer than the request/response window. The root cause is not capacity but long-running inference duration for some inputs. Using SageMaker asynchronous inference is the minimal SageMaker-native deployment change that supports long processing times by decoupling submission from result retrieval.
Synchronous inference paths (API Gateway to a SageMaker real-time endpoint) have strict request time limits, so any inference that runs longer can fail even if the endpoint has enough compute. Here, traces show 70-120 second processing for some images, which triggers API Gateway integration timeouts and surfaces as 5XX/504 errors.
SageMaker asynchronous inference is designed for this pattern:
This fixes the timeout-driven failure mode without changing the model code or leaving SageMaker; scaling or bigger instances might reduce runtime but does not remove the synchronous timeout constraint.
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A retail company builds a churn model from ~180 GB of daily CSV files in Amazon S3. The team must (1) create and validate feature engineering steps with a report that can be reviewed before use, (2) export a repeatable workflow that runs daily without manual Studio steps, (3) keep processing inside a VPC with S3/KMS encryption, and (4) minimize custom infrastructure (no self-managed Spark/EMR).
Which solution best meets these requirements?
Best answer: A
Explanation: SageMaker Data Wrangler is designed to build interactive, repeatable feature engineering flows and to validate them with built-in analysis and data quality reports. Exporting the flow to SageMaker Pipelines as a Processing step operationalizes the exact same transformation logic on a schedule with governance-friendly execution history. Writing outputs to SageMaker Feature Store makes the engineered features reusable and consistent across training and inference workflows.
The core requirement is a repeatable, reviewable feature engineering workflow created in SageMaker Data Wrangler. In Studio, you can build a Data Wrangler .flow, run built-in analyses, and generate a Data Quality and Insights Report to validate schema, missing values, distributions, and other checks before promoting the transformation.
To make the workflow repeatable and auditable, export the Data Wrangler flow to SageMaker Pipelines (as a Processing step). This lets the same transformation logic run on demand or on a schedule (for example via EventBridge), with execution history in SageMaker. Running the processing job in a VPC and using S3 with KMS encryption aligns with the security constraints, and persisting engineered features to SageMaker Feature Store supports consistent reuse across model builds and inference consumers.
Approaches that rely on manual notebook runs or separate ETL implementations break repeatability and governance of a single transformation definition.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A team uses AWS CloudFormation to provision a SageMaker real-time endpoint that runs in private subnets (no internet gateway, no NAT gateway). The VPC already has an S3 gateway endpoint. The stack uses a CloudFormation service role named CFNDeployRole.
During stack creation, CloudFormation shows these failures:
CREATE_FAILED AWS::SageMaker::Endpoint
AccessDenied: User is not authorized to perform iam:PassRole on arn:aws:iam::123456789012:role/SageMakerExecutionRole
CREATE_FAILED AWS::SageMaker::Endpoint
ResourceInitializationError: failed to pull image... dial tcp...ecr...:443: i/o timeout
CREATE_FAILED AWS::SageMaker::Endpoint
Failed to create log stream... dial tcp...logs...:443: i/o timeout
Which actions will resolve these provisioning failures? (Select THREE.)
iam:PassRole on SageMakerExecutionRole for CFNDeployRoleBest answers: A, C, F
Explanation: The failures map directly to missing IAM permission and missing private network paths. iam:PassRole is required so CloudFormation can attach the SageMaker execution role during endpoint creation. Because the endpoint runs in private subnets without NAT, you must add VPC interface endpoints (PrivateLink) for the services the container must reach: ECR to pull the image and CloudWatch Logs to write logs.
When provisioning SageMaker endpoints with infrastructure-as-code, creation can fail due to (1) IAM permissions used by the provisioning tool and (2) networking reachability from the endpoint ENIs.
Here, the explicit iam:PassRole error indicates the CloudFormation service role lacks permission to pass the execution role to SageMaker. The i/o timeout errors to ECR and CloudWatch Logs indicate the endpoint is attempting to reach public service endpoints but has no internet egress. In a private VPC design, the AWS-appropriate fix is to add the needed interface VPC endpoints (PrivateLink) so the endpoint can reach ECR to pull the container image and CloudWatch Logs to publish logs without NAT. The key takeaway is to align IAM (PassRole) and VPC endpoints with the endpoint’s required service dependencies.
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A company is building an ML training dataset in Amazon SageMaker. Source data is stored in three places: clickstream logs in Amazon S3, customer transactions in an Amazon RDS for PostgreSQL database, and user profile attributes in an Amazon DynamoDB table. The team wants appropriate, AWS-native mechanisms to extract each dataset for preprocessing and model training.
Which THREE actions meet these requirements? (Select THREE.)
Best answers: C, D, F
Explanation: Use mechanisms that natively extract from each datastore with minimal operational overhead. SageMaker can consume objects directly from S3, AWS Glue can extract relational data from RDS using JDBC and write it to S3, and DynamoDB supports exporting table data to S3. These approaches are common building blocks for creating training datasets in AWS ML pipelines.
For ML data preparation, the most common pattern is to extract data from operational stores into an analytics-friendly landing zone (often S3), then preprocess it for training.
In this scenario:
Options that rely on unsupported direct attachments or misuse backup/replication features do not provide a correct or intended extraction path for building ML datasets.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A team runs nightly inference by using an Amazon SageMaker batch transform job. The job reads from s3://ml-prod/inference/input/customer_scores.csv (one 120-GB CSV file) and writes to s3://ml-prod/inference/output/. The job is configured with InstanceCount=4, but it consistently misses the 2-hour window and CloudWatch shows only one instance has sustained CPU usage while the other three are mostly idle.
Which change will fix the root cause with the LEAST operational change while keeping the batch workflow?
MaxConcurrentTransforms for the transform jobBest answer: D
Explanation: The symptom (three idle instances) indicates the job is not being parallelized across the cluster. In SageMaker batch transform, instance-level parallelism primarily comes from distributing separate S3 input objects to different instances. Writing the input as multiple files and pointing the transform job at the S3 prefix allows all instances to process data concurrently and still write results to the configured S3 output location.
Symptom: a 4-instance batch transform job runs long and only one instance is busy.
Root cause: SageMaker batch transform distributes work to instances based on the number of S3 input objects. When the input is a single large file, one instance receives that object and processes it, leaving other instances with little or nothing to do.
Fix: split the input into multiple smaller files (for example, by partitioning/sharding the CSV) in an S3 prefix such as s3://ml-prod/inference/input/sharded/, and configure the transform job’s input location to that prefix while keeping the output S3 prefix the same.
This increases parallelism without changing the model or moving to an online serving pattern.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A team uses AWS CodePipeline to build and deploy a SageMaker real-time endpoint. CodeBuild pulls a trained model artifact from an S3 bucket that is encrypted with a customer managed KMS key (SSE-KMS), builds an inference image, and pushes it to Amazon ECR.
Exhibit: recent pipeline failures
[Container]... Running command aws s3 cp s3://ml-prod-models/run-173/model.tar.gz./model.tar.gz
fatal error: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
[Container]... Phase complete: POST_BUILD State: FAILED
Message: no matching artifact paths found: model.tar.gz
Security and audit requirements:
Which TWO mitigation actions should the team AVOID?
artifacts section so it publishes the correct model.tar.gz path to CodePipelinelatest ECR image tag to simplify rollbackss3:GetObject on the model bucket and kms:Decrypt on the specific CMKBest answers: C, D
Explanation: Two actions violate explicit requirements even if they could “unblock” the pipeline. Broadening the build role to admin permissions breaks least privilege, and deploying a floating latest tag breaks auditability and reproducibility. The remaining actions address the actual failure modes shown: missing KMS/S3 access and a misconfigured build artifact path.
CI/CD failures like the ones shown usually come from (1) insufficient permissions to read encrypted artifacts and (2) incorrect artifact publication paths.
Here, the S3 copy fails because the build identity needs both S3 read access and the ability to use the KMS key that encrypts the object (for example, s3:GetObject plus kms:Decrypt/key policy permissions). The subsequent “no matching artifact paths” error indicates the build did not produce the file where CodeBuild/CodePipeline expects it, so the buildspec.yml artifacts paths (or the build output location) must be corrected and validated.
Mitigations should preserve least privilege and immutable, commit-traceable deployments; shortcuts like admin roles or floating tags undermine those requirements.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
Which Amazon SageMaker capability provides a central place to version trained model artifacts and metadata, apply an approval workflow (for example, staging to production), and reliably promote the same model package to deployment?
Best answer: D
Explanation: Amazon SageMaker Model Registry is the service feature used to register model versions and govern promotion using approval statuses. It enables consistent staging-to-production release of the same model package artifact with associated metadata and lineage, supporting reliable deployment workflows.
SageMaker Model Registry is a governance and release-management capability for ML models: it stores model package versions (artifact locations, inference container/image, metrics, and metadata) in a model package group and tracks an approval status (such as pending/approved). This lets teams promote a specific, immutable model package through environments (dev/staging/prod) and drive deployments from CI/CD using the same approved artifact, improving test-to-production parity and reducing “it worked in staging” drift caused by redeploying different builds.
The closest confusion is SageMaker Feature Store, which manages feature definitions and online/offline feature values for training/inference consistency, not model version promotion.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A model is deployed to an Amazon SageMaker real-time endpoint, but requests often take 5-10 minutes and include large payloads. The endpoint frequently times out during traffic bursts even though low latency is not required. Which deployment option is specifically designed for long-running, bursty inference and is the most appropriate corrective action?
Best answer: A
Explanation: SageMaker asynchronous inference is intended for requests that can take a long time to complete and arrive in bursts. It uses a queue-based pattern so clients don’t hold open real-time HTTP connections that commonly time out. Results are delivered asynchronously (for example, to Amazon S3), matching the stated requirements.
The core issue is a wrong endpoint type: a real-time endpoint is optimized for low-latency, synchronous request/response patterns and can time out when inference takes minutes. SageMaker asynchronous inference is the appropriate deployment option when you have long-running inference, large payloads, or bursty traffic and you can accept delayed responses. It buffers requests using a managed queue and decouples request submission from model execution, which reduces client timeouts and helps absorb bursts without requiring you to overprovision steady capacity.
Batch transform is also asynchronous, but it is a job-based, offline pattern for processing datasets, not interactive per-request inference.
Key takeaway: choose asynchronous inference endpoints for long-running, spiky, request-driven workloads where immediate responses are not required.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
An ML team deploys models to Amazon SageMaker endpoints using a custom inference container in Amazon ECR and model artifacts in Amazon S3. The team wants safe rollbacks and repeatable deployments across environments.
Which action best aligns with the core principle of reproducibility?
Best answer: C
Explanation: Reproducibility means you can recreate the same deployed system from known, immutable inputs. Using explicit versioning for model artifacts and container images and referencing those immutable identifiers in deployment definitions makes rollbacks deterministic and deployments repeatable across environments.
For rollback safety and repeatability, the key principle is reproducibility: the deployed model must be reconstructable from fixed, uniquely identified artifacts. In practice on AWS, this means treating the model artifact and inference image as versioned, immutable inputs to deployment automation.
Common patterns include:
latest).Controls like IAM scoping, drift monitoring, and approval workflows are valuable, but they do not by themselves make deployments reproducible.
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A company is building an Amazon SageMaker image model to detect whether workers are wearing safety helmets. Images are stored in Amazon S3 and were collected continuously, so capture order strongly correlates with lighting (day vs. night). Only 6% of the images are from night conditions, and the current model underperforms at night.
The team wants to reduce prediction bias across lighting conditions while keeping train/validation/test metrics statistically valid for expected production use. Which actions should the team take? Select THREE.
Best answers: B, D, F
Explanation: Use data-splitting and augmentation techniques that improve representation of the under-sampled condition without leaking synthetic or duplicated samples into evaluation. Stratification and shuffling help ensure each split reflects the underlying data generation process rather than collection order. Augment only the training data to improve robustness while keeping validation/test unbiased.
To reduce prediction bias across subgroups (here, lighting conditions) while preserving statistical validity, the evaluation data must remain an untouched sample of the real-world distribution you expect in production. Use splitting methods that prevent accidental skews caused by collection order and ensure minority conditions appear in every split.
Practical approach:
Key takeaway: improve representation and robustness in training while keeping validation/test as a clean, unbiased measurement set.
Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security
A team runs a real-time Amazon SageMaker endpoint behind an API. They want to add alerting to quickly detect inference workflow anomalies.
Exhibit: CloudWatch metrics (last 5 minutes)
Invocations: 2,000
Invocation4XXErrors: 380
Invocation5XXErrors: 0
ModelLatency p95: 48 ms
CPUUtilization: 34%
Which alerting signal is the most appropriate to configure for this situation?
Invocations dropping below a baselineInvocation4XXErrors for the SageMaker endpointCPUUtilization exceeding a thresholdModelLatency p95 exceeding a thresholdBest answer: B
Explanation: The best alerting signal is the SageMaker endpoint’s 4XX error metric because it directly indicates failed inference requests due to invalid inputs or request formatting. The exhibit shows a substantial count of Invocation4XXErrors: 380 while Invocation5XXErrors: 0, pointing to a request/payload issue rather than service failure or performance saturation.
For real-time SageMaker inference, CloudWatch endpoint metrics separate failures into 4XX (client-side/request issues) and 5XX (server-side/model container/platform issues). In the exhibit, Invocation4XXErrors: 380 is high while Invocation5XXErrors: 0, and latency/CPU look normal (ModelLatency p95: 48 ms, CPUUtilization: 34%). That pattern is most consistent with an upstream data/schema/payload problem (for example, missing required fields, wrong content-type, or schema drift) causing requests to be rejected.
The most AWS-appropriate alerting signal here is a CloudWatch alarm on the endpoint’s Invocation4XXErrors (typically using a rate or count over a short period) so operators are notified as soon as bad requests start spiking. A latency or CPU alarm would not reliably detect this failure mode.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A company deploys an Amazon SageMaker real-time endpoint in private subnets (no route to an internet gateway and no NAT). The model image is in Amazon ECR and the model artifacts are in Amazon S3. Callers are in the same VPC and from on-premises over AWS Direct Connect (private VIF only).
After deployment, the endpoint shows errors pulling the container image, and VPC callers invoking the endpoint time out. The company requires that all traffic stays on private connectivity (no public internet).
Which TWO actions will meet these requirements? (Select TWO.)
Best answers: C, F
Explanation: SageMaker endpoint invocation uses the SageMaker Runtime service endpoint, so private callers need an interface VPC endpoint (AWS PrivateLink) to avoid public internet paths. Separately, when an endpoint is in private subnets without NAT, it needs private access to dependencies like S3 and ECR through VPC endpoints so it can download model artifacts and container images.
To keep both deployment-time and inference-time traffic private when using SageMaker endpoints in private subnets, you must provide private network paths to the AWS services involved.
com.amazonaws.<region>.sagemaker.runtime and enable Private DNS so InvokeEndpoint resolves to private IPs.Security groups attached to the endpoint control the endpoint’s ENI traffic to VPC resources, not the client’s InvokeEndpoint path to the SageMaker Runtime service.
Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security
A team sees an unexpected daily cost spike in an AWS account that heavily uses Amazon SageMaker. They need the next investigation step to attribute the spike to specific ML activities (for example, individual training jobs, endpoint instance-hours, or batch transforms) at the most granular level.
Which AWS capability best fits this need?
Best answer: B
Explanation: The AWS Cost and Usage Report (CUR) is designed for detailed cost attribution and investigation. It provides the most granular, line-item billing and usage data, which you can query to correlate a spend spike with specific SageMaker activities such as training job usage and endpoint instance-hours.
The core concept is cost attribution: when costs spike, you first need granular billing/usage data that can be grouped by service, resource identifiers, and (ideally) cost allocation tags. AWS Cost and Usage Report (CUR) is the AWS billing dataset that contains line-item cost and usage records, making it the best starting point to correlate a cost increase to concrete SageMaker activities (training, hosting/endpoints, batch transform, processing).
A common next step is to enable CUR delivery to Amazon S3 and query it (for example, with Amazon Athena) while filtering/grouping on SageMaker-related line items and tags to identify the specific job, endpoint, or instance-hours driving the change.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A company is deploying a SageMaker inference workload for a payment fraud model. The endpoint must meet a strict, consistent p99 latency SLO during business hours, and the team wants repeatable performance across deployments (no cold-start variability). Which action best reflects the core principle of reproducibility when choosing between on-demand and provisioned resources?
Best answer: D
Explanation: Reproducibility emphasizes consistent, repeatable behavior of the system under the same conditions. For low-latency real-time inference, provisioned (always-on) instances with a nonzero minimum capacity reduce variability from cold starts and capacity spin-up. This makes endpoint performance more predictable across deployments than on-demand options.
The principle being applied is reproducibility: the system should behave consistently and predictably when deployed repeatedly. For inference workloads with strict latency SLOs, provisioned resources (an always-on real-time endpoint with a nonzero minimum capacity) provide stable compute availability and avoid cold-start effects that can introduce performance variance.
On-demand inference options (serverless, async scaling-to-zero, or batch) are optimized for intermittent traffic and cost efficiency, but they can add warm-up time or queueing that makes latency less consistent. The key takeaway is to choose provisioned capacity when you need predictable real-time performance, and on-demand capacity when traffic is spiky and latency tolerance is higher.
Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security
A company in a regulated industry must run Amazon SageMaker training jobs and a real-time endpoint in a VPC with no internet access (no direct or indirect egress). Training data is in Amazon S3, and custom containers are stored in Amazon ECR. Logs must go to Amazon CloudWatch.
Which approach is NOT appropriate to meet the network isolation requirement?
Best answer: A
Explanation: When policies require no internet access, SageMaker training and inference must run in private subnets without internet routes and rely on private connectivity (VPC endpoints) for AWS service access. Allowing internet egress (even via NAT) breaks network isolation because the container can reach external networks.
The core principle is enforcing network isolation by preventing any internet egress path for training and inference while still allowing access to required AWS services over private networking. In SageMaker, this typically means placing jobs/endpoints in private subnets, removing routes to an internet gateway/NAT gateway, and using VPC endpoints to reach services such as S3 (Gateway endpoint), ECR (Interface endpoints), and CloudWatch Logs (Interface endpoint). If additional packages or artifacts are needed, they must be pre-packaged into the container image or stored in S3 and accessed through those endpoints.
Allowing a NAT gateway so containers can fetch dependencies at runtime is an anti-pattern because it reintroduces outbound internet connectivity.
Topic: Content Domain 2: ML Model Development
A team brings a custom Docker image (built externally) into Amazon SageMaker to run a training job. The Estimator points to an S3 URI using the training input channel. The training job fails almost immediately.
Exhibit: CloudWatch log excerpt
FileNotFoundError: [Errno 2] No such file or directory: '/data/train.csv'
The container’s training code was written to read local files from /data and to write outputs to /output. The team wants the smallest change that makes the container work with SageMaker-managed training without changing the dataset in S3.
Which action will fix the root cause?
s3:GetObject for the input bucketSM_CHANNEL_TRAINING and write artifacts to /opt/ml/modelBest answer: D
Explanation: The error shows the container is looking for training data in /data, but SageMaker-managed training stages input data under /opt/ml/input/data/<channel>. For the training channel, the portable way to locate the data is to read the SM_CHANNEL_TRAINING environment variable. The container should also write model artifacts to /opt/ml/model so SageMaker can upload them to S3 at job completion.
When you bring your own training container, SageMaker injects a standard filesystem layout and environment variables that your code must follow. Input data for each channel is downloaded and mounted under /opt/ml/input/data/<channel> (for example, the training channel path is available via SM_CHANNEL_TRAINING). If the container instead hardcodes paths like /data/train.csv, it will fail even though the S3 input configuration is correct.
To integrate cleanly with SageMaker training:
/opt/ml/input/data/<channel> (or SM_CHANNEL_*)./opt/ml/model so SageMaker can package and upload them.Changing instance size, IAM, or job type does not address a path/layout mismatch inside the container.
Topic: Content Domain 3: Deployment and Orchestration of ML Workflows
A company is deploying a real-time Amazon SageMaker endpoint for an NLP model. The FP16 model artifact in Amazon S3 is 3.2 GB, and when loaded it uses ~5.5 GB of host RAM (or equivalent GPU memory) due to runtime overhead. The endpoint must meet p95 latency of 150 ms at 100 requests/second steady state and up to 300 requests/second during bursts.
Which inference infrastructure choice should the ML engineer AVOID?
Best answer: B
Explanation: For real-time inference, the model must fit comfortably in memory (RAM or GPU memory) without swapping, and the compute must sustain the required throughput while meeting the latency SLO. Choosing an instance that cannot hold the loaded model forces paging or crashes, which predictably increases tail latency and reduces availability.
The core principle is to size inference compute so the model and runtime stay memory-resident and can execute fast enough to meet p95 latency at peak concurrency. If the loaded model needs ~5.5 GB, selecting a 4 GB RAM instance guarantees memory pressure; using swap turns memory misses into disk I/O, which dramatically increases tail latency and can cause OOM kills during bursts.
A sound approach is:
The key takeaway is that swapping is not a capacity strategy for latency-sensitive inference.
Topic: Content Domain 4: ML Solution Monitoring, Maintenance, and Security
A team is hosting a model on an Amazon SageMaker Serverless Inference endpoint to minimize cost. CloudWatch shows latency spikes only at the start of sudden traffic bursts after long idle periods, then returns to normal once the endpoint is “warmed up.”
Which SageMaker capability should the team use to reduce these burst-time latency spikes while keeping the serverless deployment model?
Best answer: B
Explanation: The symptom is cold-start latency on a SageMaker Serverless Inference endpoint after idle periods. Provisioned concurrency mitigates cold starts by keeping a set amount of serverless capacity pre-warmed and ready to serve requests. This preserves the serverless operational model while improving burst performance.
Serverless Inference can show higher latency for the first requests after a period of inactivity because capacity must be initialized (a cold start). When the goal is to keep a serverless endpoint but reduce those cold-start spikes, the high-level mitigation is to configure provisioned concurrency for the serverless endpoint. This maintains pre-initialized capacity so sudden bursts are served with lower, more consistent latency.
The key distinction is that this targets cold-start behavior specifically; other approaches either change the inference pattern (queued/async) or move to always-on instances (different cost model).
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A company is building a near-real-time fraud model on AWS. Transaction labels are stored in Amazon S3 (weekly retraining, 200 million rows). Customer and merchant features change multiple times per day and can arrive up to 24 hours late. The team must prevent training data leakage by using correct point-in-time feature values and must serve online predictions with <50 ms p99 latency. Data must be encrypted with AWS KMS and the solution should minimize custom feature-joining code.
Which approach is the BEST fit?
Best answer: A
Explanation: Amazon SageMaker Feature Store is designed to manage feature groups with a record identifier and event time so offline training datasets can be built with point-in-time correctness (avoiding leakage), while the same features can be retrieved with low latency for online inference. Enabling offline and online stores also supports governance requirements such as KMS encryption without building custom joining systems.
To prevent leakage, the training set must join each transaction to the most recent feature values that were valid at the transaction time, even when feature updates arrive late. In SageMaker Feature Store, you model this by creating feature groups that include:
RecordIdentifierFeatureName (for example, customer_id or merchant_id)EventTimeFeatureName (the source event/effective timestamp, not ingestion time)Then you build the training dataset using Feature Store offline retrieval with point-in-time correctness (for example, the CreateDataset workflow) so each transaction timestamp is used to retrieve the correct historical features. This avoids maintaining custom temporal join logic while keeping training and inference features consistent.
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A healthcare company trains models in Amazon SageMaker using CSV feature extracts generated nightly in an on-premises data center. About 500 GB must be ingested each night into an Amazon S3 bucket in us-east-1 and be available within 2 hours.
Requirements:
Which approach BEST meets these requirements?
aws s3 cp over HTTP to upload to S3 with SSE-S3 enabledBest answer: B
Explanation: AWS DataSync is designed for scheduled, high-volume transfers from on premises to AWS with TLS in transit and optional task verification to confirm data integrity. It can land data directly in an S3 bucket encrypted with SSE-KMS using a customer managed key, and it avoids building and operating a custom ingestion application.
For secure ingestion from on premises to S3, the key requirements are encryption in transit, encryption at rest with a customer managed KMS key, and an integrity/verification mechanism, while keeping operations low. AWS DataSync fits bulk, scheduled transfers and provides managed connectivity from an on-premises agent to AWS using TLS. It can copy into an S3 bucket configured for SSE-KMS (CMK) and can run task verification to validate transferred data.
A typical secure setup is:
This directly addresses confidentiality (TLS + SSE-KMS) and integrity (verification) without custom ingestion code.
Topic: Content Domain 1: Data Preparation for Machine Learning (ML)
A team is designing an Amazon SageMaker Ground Truth workflow to label customer-support emails for intent classification. The team will use a private workforce and needs consistently high-quality labels with a clear process to catch systematic labeling mistakes.
Which workflow design choice best applies the core principle of separation of duties to improve label quality?
Best answer: C
Explanation: Separation of duties means different people (or teams) perform execution and independent verification. In Ground Truth, assigning labeling to one workforce and quality audits/verification to a separate workforce reduces correlated errors and makes systematic issues easier to detect and correct, producing higher-quality labels.
The key principle is separation of duties: the party producing an outcome should not be the same party validating it. For Ground Truth labeling, this translates into designing quality controls where an independent group reviews outputs, creates or maintains a gold set, and/or performs verification tasks.
A practical pattern is:
This structure improves dataset quality by reducing shared bias and preventing the same annotators from “grading their own work.”
Topic: Content Domain 2: ML Model Development
A company is building a model to classify customer support tickets into 12 categories. The team has 8,000 labeled tickets and must deliver an endpoint in 2 weeks.
They created a SageMaker Automatic Model Tuning job that trains a PyTorch Transformer initialized with random weights (no pretrained checkpoint). After 30 tuning jobs, the best validation F1-score is 0.54, and additional jobs do not improve it. Training F1 approaches 0.95, but validation F1 drops after the first epoch.
Which change will fix the root cause with the least effort while meeting the timeline?
Best answer: C
Explanation: The tuning results indicate severe overfitting caused by training a large NLP model from scratch on a small labeled dataset. A pretrained/foundation model approach (such as SageMaker JumpStart) is designed to be fine-tuned with limited labeled data and typically reaches strong validation performance quickly. This is the lowest-effort change that also fits the 2-week delivery constraint.
Symptom: training performance becomes very high while validation performance peaks early and then degrades, and hyperparameter tuning cannot improve the validation F1.
Root cause: the team is training a Transformer from random initialization with only 8,000 labeled examples, which is typically insufficient to learn strong language representations from scratch; tuning can’t compensate for the lack of pretraining signal.
Fix: start from a pretrained/foundation model (for example, a SageMaker JumpStart text classification model) and fine-tune it on the labeled tickets. This uses transfer learning so the model begins with general language features and needs far less labeled data and experimentation to generalize well.
The key takeaway is to prefer pretrained/foundation models over scratch training when labeled data and time are limited.
Use this map after the sample questions to connect individual items to the AWS ML engineering lifecycle decisions these practice samples test.
flowchart LR
S1["ML problem definition"] --> S2
S2["Prepare features and training data"] --> S3
S3["Train tune and evaluate model"] --> S4
S4["Deploy endpoint or batch job"] --> S5
S5["Monitor drift quality and cost"] --> S6
S6["Retrain or retire model"]
| Cue | What to remember |
|---|---|
| Problem framing | Choose supervised, unsupervised, forecasting, NLP, vision, or GenAI pattern based on outcome and data. |
| Data prep | Check leakage, imbalance, features, labels, quality, and train-validation-test split. |
| Training | Track experiments, metrics, hyperparameters, artifacts, and reproducibility. |
| Deployment | Choose real-time endpoint, batch transform, serverless, or pipeline deployment by latency and scale. |
| Monitoring | Watch drift, bias, accuracy, latency, failures, and cost after release. |