AWS MLA-C01: ML Deployment

May 1, 2026

Try 10 focused AWS MLA-C01 questions on ML Deployment, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AWS MLA-C01 on Web View full AWS MLA-C01 practice page

Topic snapshot

Field	Detail
Exam route	AWS MLA-C01
Topic area	Deployment and Orchestration of ML Workflows
Blueprint weight	22%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Deployment and Orchestration of ML Workflows for AWS MLA-C01. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Deployment and Orchestration of ML Workflows

A team has trained a PyTorch image classification model in Amazon SageMaker and needs to run inference on ARM-based CPU edge devices with tight latency constraints and intermittent connectivity. The team wants to keep the same trained model artifacts but reduce inference latency on the devices.

Which approach is most appropriate?

Options:

A. Use SageMaker Automatic Model Tuning to retrain the model for lower edge latency
B. Use SageMaker Neo to compile the trained model for the target ARM edge runtime
C. Deploy the model to a SageMaker serverless endpoint to minimize device-side latency
D. Run a SageMaker batch transform job on a schedule and ship predictions to devices

Best answer: B

Explanation: SageMaker Neo is designed to optimize inference by compiling a trained model for a specific target hardware and runtime, such as ARM-based CPUs on edge devices. This improves on-device performance without requiring changes to the device connectivity model or switching to cloud-hosted inference.

Edge optimization is appropriate when inference must run locally (for latency, bandwidth, privacy, or offline operation) and the same trained model needs better performance on constrained hardware. SageMaker Neo addresses this by compiling a framework model into an optimized executable for a specified target (for example, an ARM CPU runtime), which typically reduces inference latency and can improve throughput on the device.

In contrast, cloud deployment options (real-time/serverless endpoints) primarily optimize server-side hosting and scaling, not on-device execution. Batch transform is for offline, large-scale inference and does not meet interactive, on-device latency needs.

Key takeaway: use Neo when the requirement is faster inference on specific edge hardware.

Hyperparameter tuning improves model quality/speed through retraining, but it is not a hardware-specific compilation step for edge targets.
Serverless endpoint reduces ops for hosted inference, but inference still occurs in AWS, not on the ARM devices.
Batch transform is suited to offline/bulk inference, not low-latency, on-device predictions.

Question 2

Topic: Deployment and Orchestration of ML Workflows

A company has 40 ML repositories that build and deploy Amazon SageMaker inference containers. During peak hours, many commits happen at once and the CI/CD system occasionally fails with throttling and long queues.

Which THREE design actions will best improve scalability by accounting for AWS CodePipeline/CodeBuild/CodeDeploy capabilities and quotas? (Select THREE.)

Options:

A. Use only CodeDeploy in-place deployments to eliminate deployment concurrency concerns
B. Split a monolithic pipeline into multiple pipelines to distribute executions
C. Use a single unencrypted S3 artifact bucket to prevent CI/CD throttling
D. Configure CodeBuild concurrency and plan quota increases via Service Quotas
E. Enable CodeBuild build caching to shorten build time and reduce queueing
F. Run builds directly in CodePipeline actions to avoid CodeBuild quotas

Correct answers: B, D and E

Explanation: Scaling CI/CD on AWS requires designing for concurrency and execution quotas rather than assuming unlimited parallelism. CodeBuild has explicit concurrency controls and account quotas that you monitor and raise when needed. You can also reduce demand by shortening builds (caching) and distribute execution load by decomposing large workflows into multiple pipelines.

The core idea is to design the CI/CD system so it can absorb bursts while staying within service quotas and using each service as intended. CodeBuild is the primary scalable compute layer for builds/tests and has both configurable per-project concurrency and account-level concurrent build quotas (managed through Service Quotas). CodePipeline orchestrates workflows; decomposing a single pipeline into multiple pipelines (for teams/services/environments) helps distribute concurrent executions and reduces bottlenecks from a single serialized pipeline. Independently, reducing build duration with CodeBuild caching (and reusing layers/artifacts) decreases queue depth and the likelihood of hitting concurrency limits.

The key takeaway is to scale by increasing/controlling concurrency where appropriate, distributing orchestration load, and reducing per-run resource time—not by trying to “bypass” the services that enforce quotas.

OK Configure CodeBuild concurrency and plan quota increases via Service Quotas: directly addresses the most common burst bottleneck (concurrent builds).
OK Split a monolithic pipeline into multiple pipelines to distribute executions: reduces contention and increases parallel orchestration capacity.
OK Enable CodeBuild build caching to shorten build time and reduce queueing: lowers average runtime so concurrency quotas go further.
NO Running builds “in CodePipeline” does not avoid build compute limits; CodePipeline orchestrates and still relies on underlying action providers.
NO In-place deployments do not remove the need to consider deployment scaling and can add availability risk.
NO S3 encryption choice is unrelated to CI/CD throttling; disabling encryption worsens security without fixing scale.

Question 3

Topic: Deployment and Orchestration of ML Workflows

A team uses AWS CodePipeline to deploy an Amazon SageMaker inference container. They add an AWS CodeBuild project that runs unit and integration tests and then builds and pushes a versioned Docker image to Amazon ECR from a buildspec.yml stored with the source code.

Which core principle is this action primarily applying?

Options:

A. Monitoring for drift
B. Least privilege
C. Separation of duties
D. Reproducibility

Best answer: D

Explanation: This setup emphasizes reproducibility by defining the build and test steps as code and executing them in a consistent build environment. Using a versioned buildspec.yml and producing versioned container images makes it easier to recreate the same ML artifact later and reduces “works on my machine” differences.

The core principle is reproducibility: the ability to rebuild the same artifact and rerun the same tests with the same inputs and process. Using CodeBuild with a buildspec.yml checked into source control turns the build and test procedure into an auditable, repeatable definition. Running unit and integration tests in CodeBuild and producing a versioned container image in ECR standardizes the build environment and outputs, which supports consistent deployments across dev, test, and prod and enables reliable rollback or rebuild when needed. This is distinct from access-control or monitoring concerns, which would require additional actions beyond defining and executing the build steps.

Least privilege would focus on tightening the CodeBuild service role permissions, not on defining build steps.
Separation of duties would emphasize approval gates and role separation between build and deploy responsibilities.
Monitoring for drift would involve model/data drift detection (for example, Model Monitor baselines and reports), not CI build automation.

Question 4

Topic: Deployment and Orchestration of ML Workflows

A team runs an Amazon SageMaker real-time inference endpoint 24/7 on the same instance type, and usage is highly predictable. The team wants to reduce cost without changing the deployment architecture.

Which purchasing option is most appropriate for this workload?

Options:

A. Buy EC2 Reserved Instances for the instance type used by the endpoint
B. Run the real-time endpoint on Spot Instances to minimize cost
C. Purchase a SageMaker Savings Plan for the committed usage
D. Keep the endpoint on On-Demand pricing for maximum flexibility

Best answer: C

Explanation: For steady, predictable 24/7 inference, the best fit is a commitment-based discount that applies to SageMaker usage. SageMaker Savings Plans reduce cost while allowing the endpoint to keep running normally, trading flexibility for a time-bound spend commitment.

The core tradeoff for predictable inference is flexibility versus commitment. On-Demand pricing has no long-term commitment and is best for uncertain or spiky usage, but it is typically the most expensive option for always-on endpoints.

SageMaker Savings Plans are designed for predictable SageMaker usage (including inference) and provide discounted rates when you commit to a consistent amount of usage over a 1- or 3-year term. This reduces cost without requiring a change to how the endpoint is deployed or invoked.

In contrast, EC2 Reserved Instances do not directly apply to SageMaker-managed endpoint instance usage, and Spot capacity is interruptible and therefore a poor fit for steady real-time inference availability.

On-Demand only remains the highest-cost choice for an always-on, predictable endpoint.
EC2 Reserved Instances don’t map directly to SageMaker-managed endpoint instance billing.
Spot for real-time inference is interruptible capacity and can break steady availability requirements.

Question 5

Topic: Deployment and Orchestration of ML Workflows

A team uses Amazon SageMaker Pipelines to retrain a model. They must automate executions with these requirements:

Start retraining only when changes are merged into the main branch.
Start retraining when a new approved labeling manifest is written to s3://ml-data/groundtruth/approved/ (file name manifest.json).
Run a nightly evaluation pipeline at 01:00 UTC.
Avoid unnecessary pipeline executions and avoid public unauthenticated triggers.

Which TWO trigger configurations should the team AVOID?

Options:

A. Trigger on every CodeCommit push to any branch
B. EventBridge schedule rule (01:00 UTC) starts evaluation pipeline
C. Use a public unauthenticated webhook to start the pipeline
D. EventBridge rule for S3 Object Created with prefix+suffix filter
E. EventBridge rule filtered to CodeCommit merge into main
F. EventBridge rules target a Step Functions workflow that starts pipelines

Correct answers: A and C

Explanation: Use managed, authenticated event sources (EventBridge with CodeCommit and S3 events) and a cron schedule to start SageMaker Pipelines. The key is to filter events tightly to match the required branch and S3 object patterns so executions occur only when intended. Public, unauthenticated triggers and overly broad repository triggers violate the stated security and cost/correctness requirements.

At a high level, pipeline triggers on AWS should be implemented with EventBridge rules (scheduled and event-pattern based) that target a controlled execution path (for example, StartPipelineExecution directly or via Step Functions/Lambda). In this scenario:

Repository-driven automation should filter to merges/updates affecting only refs/heads/main to prevent retraining from unreviewed branches.
Data-driven automation can use S3-to-EventBridge (Object Created) with a prefix filter (groundtruth/approved/) and a suffix filter (manifest.json) so only approved manifests start retraining.
A nightly run is best handled by an EventBridge schedule (cron) targeting the evaluation pipeline.

The main anti-patterns are triggers that are too broad (causing extra executions) or that introduce an unauthenticated public entry point.

Broad repo trigger (every push) is not acceptable because it can start executions from feature branches and inflate cost.
Public webhook is not acceptable because it creates an unauthenticated external trigger path.
EventBridge schedule is appropriate for the 01:00 UTC nightly evaluation requirement.
Filtered EventBridge patterns for CodeCommit and S3 are appropriate because they scope executions to the required branch and approved manifest object.

Question 6

Topic: Deployment and Orchestration of ML Workflows

A company uses a CI/CD pipeline for an ML workflow in Amazon SageMaker: data lands in Amazon S3, a SageMaker Pipeline runs processing and training, the model is deployed to a real-time endpoint, and monitoring metrics go to Amazon CloudWatch.

The company is regulated and must prove, during audits, exactly which training dataset, training code, and container image produced the model currently running in production. The solution must add minimal operational overhead and must not prevent automated deployments.

Which change best improves end-to-end logging and auditability under these constraints?

Options:

A. Enable S3 versioning and register models with lineage metadata
B. Tag the training image in Amazon ECR as latest
C. Enable CloudWatch Logs retention for training and inference
D. Use Spot Instances for training with S3 checkpoints

Best answer: A

Explanation: Enable S3 versioning and capture lineage at model registration time so each deployed model package is tied to immutable inputs (S3 object versions), the exact code revision, and the exact container image digest. This creates a reliable audit trail from a production endpoint back through the CI/CD pipeline to the originating artifacts with low additional runtime overhead.

For compliance traceability, the key is immutable artifact identification and a single system of record that links them. Enabling S3 versioning makes training data references auditable (you can prove the exact object version used), and registering each trained model in the SageMaker Model Registry allows you to attach and retain metadata such as the S3 URI + version ID(s), the source code commit SHA used by the pipeline, the ECR image digest, and pipeline execution identifiers.

This keeps deployments automated (deploy the approved model package) while providing an end-to-end chain of custody across data, code, and model. The main tradeoff is modest added storage/metadata management (and typically an approval gate if required by policy), not increased training or inference cost.

Only logs helps troubleshooting, but doesn’t uniquely tie a production model to specific data/code/image inputs.
Using latest reduces reproducibility because tags can be moved and are not immutable identifiers.
Spot training can reduce cost, but it does not improve auditability of what was deployed.

Question 7

Topic: Deployment and Orchestration of ML Workflows

A company retrains a churn model nightly in us-east-1. A new 500 GB parquet snapshot lands in Amazon S3 each day from an ingestion job. The team must ensure that training and evaluation always use the same immutable snapshot, and they must be able to audit which dataset produced each model version. All data must stay private (no public internet) and encrypted with AWS KMS. The workflow must finish by 6:00 AM and should minimize always-on costs.

Which approach is the BEST AWS-native solution?

Options:

A. Manually start SageMaker Pipeline executions from SageMaker Studio daily
B. Schedule a SageMaker training job that reads s3://bucket/latest/
C. Stream events with Kinesis and continuously retrain on the stream
D. EventBridge triggers Step Functions, then a parameterized SageMaker Pipeline

Best answer: D

Explanation: Use an event-driven orchestration that passes an explicit S3 snapshot location into a parameterized SageMaker Pipeline execution. This ensures downstream training and evaluation consume the same immutable dataset and creates a traceable lineage from dataset snapshot to model version. Step Functions and EventBridge provide low-cost scheduling/triggering without always-on infrastructure while keeping encryption and network controls in place.

The core requirement is coupling ingestion orchestration to the ML workflow so the pipeline runs against the correct dataset snapshot and records lineage. A common AWS-native pattern is:

Ingestion writes a new snapshot to S3 (for example, date-partitioned or versioned), encrypted with KMS.
Amazon EventBridge (S3 event or scheduled rule) triggers AWS Step Functions.
Step Functions runs any ingestion/validation (for example, AWS Glue) and then starts a SageMaker Pipeline execution.
The SageMaker Pipeline uses parameters (for example, InputDataS3Uri) so processing, training, and evaluation all reference the same snapshot, and the resulting model is registered with metadata linking back to that URI.

This meets auditability and correctness while avoiding always-on services and supporting private connectivity (VPC, VPC endpoints) and KMS encryption.

Hardcoded “latest” path risks training/evaluation picking different data as new files arrive and weakens dataset-to-model auditability.
Continuous streaming retraining adds unnecessary complexity and typically higher always-on cost for a nightly batch SLA.
Manual Studio starts violates the automation/CI/CD intent and is error-prone for consistent dataset selection.

Question 8

Topic: Deployment and Orchestration of ML Workflows

A team deploys a new version of a model to an existing Amazon SageMaker real-time endpoint. They want to send a small percentage of production traffic to the new model first, monitor key metrics (for example, latency and errors), and then progressively increase traffic only if the metrics look good. If problems occur, they want to immediately route all traffic back to the previous model version.

Which deployment strategy does this describe?

Options:

A. Blue/green deployment
B. Canary deployment
C. Shadow deployment
D. Linear deployment

Best answer: B

Explanation: This is a canary deployment: a new model version initially receives only a small portion of live traffic while the team monitors operational or model-quality signals. If the canary behaves poorly, rollback is fast because traffic can be shifted back to the previous version (for example, by restoring production-variant weights).

A canary deployment releases a new model to production by exposing it to a small, controlled fraction of real user traffic, then increasing that fraction as confidence grows. On SageMaker real-time endpoints, this is commonly implemented with multiple production variants and weighted traffic distribution; rollback is simply setting the old variant weight back to 100% (and optionally removing the new variant) when alarms trigger.

The key idea is risk reduction through limited exposure while using production telemetry to decide whether to proceed.

Blue/green confusion: blue/green keeps two full environments and typically switches traffic over at once rather than starting with a small canary percentage.
Linear vs canary: linear shifting increases traffic in fixed increments on a schedule; a canary is defined by the initial small exposure and promotion based on observed health.
Shadow traffic: shadow deployments mirror requests to the new model without serving its responses to users, so it is not receiving a real portion of user-facing traffic.

Question 9

Topic: Deployment and Orchestration of ML Workflows

A team is productionizing an ML workflow on AWS that includes AWS Glue feature generation, Amazon SageMaker training, offline evaluation, and deployment to a real-time endpoint. The workflow has dependencies, needs retries on transient failures, and must include a manual approval gate with an auditable record of what model version was promoted.

Which approach should you AVOID?

Options:

A. Use Step Functions to orchestrate SageMaker jobs with a human-approval wait
B. Use Amazon MWAA (Airflow) to orchestrate Glue and SageMaker jobs
C. Use EventBridge and Lambda to auto-deploy using an admin IAM role
D. Use SageMaker Pipelines with Model Registry approval before deployment

Best answer: C

Explanation: Use a purpose-built workflow orchestrator (SageMaker Pipelines, Airflow, or Step Functions) to model dependencies, retries, and promotion controls. These services support explicit pipeline state and integrate with approval and versioning patterns for auditable releases. Auto-promoting from ad-hoc triggers with an admin role is a deployment governance anti-pattern.

For production ML, orchestration selection should match workflow complexity and operational controls: you need explicit step dependencies, retry/error handling, and a controlled promotion mechanism that records exactly which artifacts were approved and deployed. SageMaker Pipelines natively fits SageMaker-centric workflows and can integrate with Model Registry for versioned approvals. MWAA (Airflow) is well-suited when coordinating broader data/ETL plus ML tasks in a DAG with rich scheduling and dependencies. Step Functions is a strong choice for serverless stateful orchestration with branching, retries, and human approval wait patterns. In contrast, stitching together EventBridge and Lambda to automatically deploy with an admin IAM role removes the approval gate and audit-friendly model/version lineage while also violating least-privilege access.

Native ML governance Using SageMaker Pipelines with Model Registry aligns with versioned approvals and auditable promotion.
DAG orchestration Airflow on MWAA is acceptable for complex dependencies across Glue and SageMaker tasks.
State machine control Step Functions supports retries/branches and human-in-the-loop approval without custom orchestration code.

Question 10

Topic: Deployment and Orchestration of ML Workflows

A team uses AWS CodeCommit and AWS CodePipeline to run a SageMaker workflow (data prep -7 train/tune -7 register -7 deploy -7 monitor). The pipeline currently triggers on every commit to the main branch and automatically deploys to a production real-time endpoint.

Several incidents were caused by unreviewed changes to training configuration files (hyperparameters and feature lists) being pushed directly to main. The team must keep automated deployments, add an auditable link from each deployed model to the exact code/config used, and reduce the chance of unsafe changes reaching production.

Which change best improves reliability and operability while meeting these constraints?

Options:

A. Allow direct commits to main and rely on CloudWatch alarms
B. Force-rebase main regularly to keep history small
C. Use a long-lived dev branch and merge to main monthly
D. Require PRs into protected main; deploy only from release tags

Best answer: D

Explanation: Using short-lived branches with pull requests into a protected main branch adds an approval and automated-check gate before production. Triggering production deployments from immutable Git release tags (and recording the tag with the model package) provides a clear, auditable mapping from a deployed model back to the exact code and configuration that produced it. The tradeoff is slightly slower releases due to PR reviews and explicit tagging.

The core improvement is adding Git-based change control and traceability to the ML CI/CD path. Short-lived feature branches plus pull requests into a protected main enable required reviews and CI status checks (lint/unit tests, pipeline dry-run, config validation) before code/config can affect production. Separately, deploying only from an immutable Git tag (for example, an annotated release/vX.Y.Z tag) makes the deployment input a fixed commit, so the team can always reproduce what code and configuration trained the registered model and what was deployed.

A practical pattern is:

Develop on branches and open PRs into protected main
Merge only after required approvals and passing checks
Create a release tag on the merge commit
Configure CodePipeline to trigger production on the tag and store the tag/commit SHA with the model artifact/registry entry

This increases reliability and auditability at the cost of adding a small, intentional release ceremony (PR + tag).

Long-lived dev branch increases merge risk and still lacks a precise, enforced release identifier for traceability.
Alarms instead of gates detect issues after deployment and do not prevent unsafe config changes reaching production.
Force-rebasing main breaks audit trails and can invalidate links between deployed models and source history.

Continue with full practice

Use the AWS MLA-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AWS MLA-C01 on Web View AWS MLA-C01 Practice Test

Free review resource

Read the AWS MLA-C01 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026

ML Model Development

ML Monitoring

Browse Certification Practice Tests by Exam Family

AWS MLA-C01: ML Deployment

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue with full practice

Related focused pages

Free review resource