Try 10 focused AWS MLA-C01 questions on ML Deployment, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
| Field | Detail |
|---|---|
| Exam route | AWS MLA-C01 |
| Topic area | Deployment and Orchestration of ML Workflows |
| Blueprint weight | 22% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Deployment and Orchestration of ML Workflows for AWS MLA-C01. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.
Topic: Deployment and Orchestration of ML Workflows
A team has trained a PyTorch image classification model in Amazon SageMaker and needs to run inference on ARM-based CPU edge devices with tight latency constraints and intermittent connectivity. The team wants to keep the same trained model artifacts but reduce inference latency on the devices.
Which approach is most appropriate?
Options:
A. Use SageMaker Automatic Model Tuning to retrain the model for lower edge latency
B. Use SageMaker Neo to compile the trained model for the target ARM edge runtime
C. Deploy the model to a SageMaker serverless endpoint to minimize device-side latency
D. Run a SageMaker batch transform job on a schedule and ship predictions to devices
Best answer: B
Explanation: SageMaker Neo is designed to optimize inference by compiling a trained model for a specific target hardware and runtime, such as ARM-based CPUs on edge devices. This improves on-device performance without requiring changes to the device connectivity model or switching to cloud-hosted inference.
Edge optimization is appropriate when inference must run locally (for latency, bandwidth, privacy, or offline operation) and the same trained model needs better performance on constrained hardware. SageMaker Neo addresses this by compiling a framework model into an optimized executable for a specified target (for example, an ARM CPU runtime), which typically reduces inference latency and can improve throughput on the device.
In contrast, cloud deployment options (real-time/serverless endpoints) primarily optimize server-side hosting and scaling, not on-device execution. Batch transform is for offline, large-scale inference and does not meet interactive, on-device latency needs.
Key takeaway: use Neo when the requirement is faster inference on specific edge hardware.
Topic: Deployment and Orchestration of ML Workflows
A company has 40 ML repositories that build and deploy Amazon SageMaker inference containers. During peak hours, many commits happen at once and the CI/CD system occasionally fails with throttling and long queues.
Which THREE design actions will best improve scalability by accounting for AWS CodePipeline/CodeBuild/CodeDeploy capabilities and quotas? (Select THREE.)
Options:
A. Use only CodeDeploy in-place deployments to eliminate deployment concurrency concerns
B. Split a monolithic pipeline into multiple pipelines to distribute executions
C. Use a single unencrypted S3 artifact bucket to prevent CI/CD throttling
D. Configure CodeBuild concurrency and plan quota increases via Service Quotas
E. Enable CodeBuild build caching to shorten build time and reduce queueing
F. Run builds directly in CodePipeline actions to avoid CodeBuild quotas
Correct answers: B, D and E
Explanation: Scaling CI/CD on AWS requires designing for concurrency and execution quotas rather than assuming unlimited parallelism. CodeBuild has explicit concurrency controls and account quotas that you monitor and raise when needed. You can also reduce demand by shortening builds (caching) and distribute execution load by decomposing large workflows into multiple pipelines.
The core idea is to design the CI/CD system so it can absorb bursts while staying within service quotas and using each service as intended. CodeBuild is the primary scalable compute layer for builds/tests and has both configurable per-project concurrency and account-level concurrent build quotas (managed through Service Quotas). CodePipeline orchestrates workflows; decomposing a single pipeline into multiple pipelines (for teams/services/environments) helps distribute concurrent executions and reduces bottlenecks from a single serialized pipeline. Independently, reducing build duration with CodeBuild caching (and reusing layers/artifacts) decreases queue depth and the likelihood of hitting concurrency limits.
The key takeaway is to scale by increasing/controlling concurrency where appropriate, distributing orchestration load, and reducing per-run resource time—not by trying to “bypass” the services that enforce quotas.
Topic: Deployment and Orchestration of ML Workflows
A team uses AWS CodePipeline to deploy an Amazon SageMaker inference container. They add an AWS CodeBuild project that runs unit and integration tests and then builds and pushes a versioned Docker image to Amazon ECR from a buildspec.yml stored with the source code.
Which core principle is this action primarily applying?
Options:
A. Monitoring for drift
B. Least privilege
C. Separation of duties
D. Reproducibility
Best answer: D
Explanation: This setup emphasizes reproducibility by defining the build and test steps as code and executing them in a consistent build environment. Using a versioned buildspec.yml and producing versioned container images makes it easier to recreate the same ML artifact later and reduces “works on my machine” differences.
The core principle is reproducibility: the ability to rebuild the same artifact and rerun the same tests with the same inputs and process. Using CodeBuild with a buildspec.yml checked into source control turns the build and test procedure into an auditable, repeatable definition. Running unit and integration tests in CodeBuild and producing a versioned container image in ECR standardizes the build environment and outputs, which supports consistent deployments across dev, test, and prod and enables reliable rollback or rebuild when needed. This is distinct from access-control or monitoring concerns, which would require additional actions beyond defining and executing the build steps.
Topic: Deployment and Orchestration of ML Workflows
A team runs an Amazon SageMaker real-time inference endpoint 24/7 on the same instance type, and usage is highly predictable. The team wants to reduce cost without changing the deployment architecture.
Which purchasing option is most appropriate for this workload?
Options:
A. Buy EC2 Reserved Instances for the instance type used by the endpoint
B. Run the real-time endpoint on Spot Instances to minimize cost
C. Purchase a SageMaker Savings Plan for the committed usage
D. Keep the endpoint on On-Demand pricing for maximum flexibility
Best answer: C
Explanation: For steady, predictable 24/7 inference, the best fit is a commitment-based discount that applies to SageMaker usage. SageMaker Savings Plans reduce cost while allowing the endpoint to keep running normally, trading flexibility for a time-bound spend commitment.
The core tradeoff for predictable inference is flexibility versus commitment. On-Demand pricing has no long-term commitment and is best for uncertain or spiky usage, but it is typically the most expensive option for always-on endpoints.
SageMaker Savings Plans are designed for predictable SageMaker usage (including inference) and provide discounted rates when you commit to a consistent amount of usage over a 1- or 3-year term. This reduces cost without requiring a change to how the endpoint is deployed or invoked.
In contrast, EC2 Reserved Instances do not directly apply to SageMaker-managed endpoint instance usage, and Spot capacity is interruptible and therefore a poor fit for steady real-time inference availability.
Topic: Deployment and Orchestration of ML Workflows
A team uses Amazon SageMaker Pipelines to retrain a model. They must automate executions with these requirements:
main branch.s3://ml-data/groundtruth/approved/ (file name manifest.json).Which TWO trigger configurations should the team AVOID?
Options:
A. Trigger on every CodeCommit push to any branch
B. EventBridge schedule rule (01:00 UTC) starts evaluation pipeline
C. Use a public unauthenticated webhook to start the pipeline
D. EventBridge rule for S3 Object Created with prefix+suffix filter
E. EventBridge rule filtered to CodeCommit merge into main
F. EventBridge rules target a Step Functions workflow that starts pipelines
Correct answers: A and C
Explanation: Use managed, authenticated event sources (EventBridge with CodeCommit and S3 events) and a cron schedule to start SageMaker Pipelines. The key is to filter events tightly to match the required branch and S3 object patterns so executions occur only when intended. Public, unauthenticated triggers and overly broad repository triggers violate the stated security and cost/correctness requirements.
At a high level, pipeline triggers on AWS should be implemented with EventBridge rules (scheduled and event-pattern based) that target a controlled execution path (for example, StartPipelineExecution directly or via Step Functions/Lambda). In this scenario:
refs/heads/main to prevent retraining from unreviewed branches.groundtruth/approved/) and a suffix filter (manifest.json) so only approved manifests start retraining.The main anti-patterns are triggers that are too broad (causing extra executions) or that introduce an unauthenticated public entry point.
Topic: Deployment and Orchestration of ML Workflows
A company uses a CI/CD pipeline for an ML workflow in Amazon SageMaker: data lands in Amazon S3, a SageMaker Pipeline runs processing and training, the model is deployed to a real-time endpoint, and monitoring metrics go to Amazon CloudWatch.
The company is regulated and must prove, during audits, exactly which training dataset, training code, and container image produced the model currently running in production. The solution must add minimal operational overhead and must not prevent automated deployments.
Which change best improves end-to-end logging and auditability under these constraints?
Options:
A. Enable S3 versioning and register models with lineage metadata
B. Tag the training image in Amazon ECR as latest
C. Enable CloudWatch Logs retention for training and inference
D. Use Spot Instances for training with S3 checkpoints
Best answer: A
Explanation: Enable S3 versioning and capture lineage at model registration time so each deployed model package is tied to immutable inputs (S3 object versions), the exact code revision, and the exact container image digest. This creates a reliable audit trail from a production endpoint back through the CI/CD pipeline to the originating artifacts with low additional runtime overhead.
For compliance traceability, the key is immutable artifact identification and a single system of record that links them. Enabling S3 versioning makes training data references auditable (you can prove the exact object version used), and registering each trained model in the SageMaker Model Registry allows you to attach and retain metadata such as the S3 URI + version ID(s), the source code commit SHA used by the pipeline, the ECR image digest, and pipeline execution identifiers.
This keeps deployments automated (deploy the approved model package) while providing an end-to-end chain of custody across data, code, and model. The main tradeoff is modest added storage/metadata management (and typically an approval gate if required by policy), not increased training or inference cost.
latest reduces reproducibility because tags can be moved and are not immutable identifiers.Topic: Deployment and Orchestration of ML Workflows
A company retrains a churn model nightly in us-east-1. A new 500 GB parquet snapshot lands in Amazon S3 each day from an ingestion job. The team must ensure that training and evaluation always use the same immutable snapshot, and they must be able to audit which dataset produced each model version. All data must stay private (no public internet) and encrypted with AWS KMS. The workflow must finish by 6:00 AM and should minimize always-on costs.
Which approach is the BEST AWS-native solution?
Options:
A. Manually start SageMaker Pipeline executions from SageMaker Studio daily
B. Schedule a SageMaker training job that reads s3://bucket/latest/
C. Stream events with Kinesis and continuously retrain on the stream
D. EventBridge triggers Step Functions, then a parameterized SageMaker Pipeline
Best answer: D
Explanation: Use an event-driven orchestration that passes an explicit S3 snapshot location into a parameterized SageMaker Pipeline execution. This ensures downstream training and evaluation consume the same immutable dataset and creates a traceable lineage from dataset snapshot to model version. Step Functions and EventBridge provide low-cost scheduling/triggering without always-on infrastructure while keeping encryption and network controls in place.
The core requirement is coupling ingestion orchestration to the ML workflow so the pipeline runs against the correct dataset snapshot and records lineage. A common AWS-native pattern is:
InputDataS3Uri) so processing, training, and evaluation all reference the same snapshot, and the resulting model is registered with metadata linking back to that URI.This meets auditability and correctness while avoiding always-on services and supporting private connectivity (VPC, VPC endpoints) and KMS encryption.
Topic: Deployment and Orchestration of ML Workflows
A team deploys a new version of a model to an existing Amazon SageMaker real-time endpoint. They want to send a small percentage of production traffic to the new model first, monitor key metrics (for example, latency and errors), and then progressively increase traffic only if the metrics look good. If problems occur, they want to immediately route all traffic back to the previous model version.
Which deployment strategy does this describe?
Options:
A. Blue/green deployment
B. Canary deployment
C. Shadow deployment
D. Linear deployment
Best answer: B
Explanation: This is a canary deployment: a new model version initially receives only a small portion of live traffic while the team monitors operational or model-quality signals. If the canary behaves poorly, rollback is fast because traffic can be shifted back to the previous version (for example, by restoring production-variant weights).
A canary deployment releases a new model to production by exposing it to a small, controlled fraction of real user traffic, then increasing that fraction as confidence grows. On SageMaker real-time endpoints, this is commonly implemented with multiple production variants and weighted traffic distribution; rollback is simply setting the old variant weight back to 100% (and optionally removing the new variant) when alarms trigger.
The key idea is risk reduction through limited exposure while using production telemetry to decide whether to proceed.
Topic: Deployment and Orchestration of ML Workflows
A team is productionizing an ML workflow on AWS that includes AWS Glue feature generation, Amazon SageMaker training, offline evaluation, and deployment to a real-time endpoint. The workflow has dependencies, needs retries on transient failures, and must include a manual approval gate with an auditable record of what model version was promoted.
Which approach should you AVOID?
Options:
A. Use Step Functions to orchestrate SageMaker jobs with a human-approval wait
B. Use Amazon MWAA (Airflow) to orchestrate Glue and SageMaker jobs
C. Use EventBridge and Lambda to auto-deploy using an admin IAM role
D. Use SageMaker Pipelines with Model Registry approval before deployment
Best answer: C
Explanation: Use a purpose-built workflow orchestrator (SageMaker Pipelines, Airflow, or Step Functions) to model dependencies, retries, and promotion controls. These services support explicit pipeline state and integrate with approval and versioning patterns for auditable releases. Auto-promoting from ad-hoc triggers with an admin role is a deployment governance anti-pattern.
For production ML, orchestration selection should match workflow complexity and operational controls: you need explicit step dependencies, retry/error handling, and a controlled promotion mechanism that records exactly which artifacts were approved and deployed. SageMaker Pipelines natively fits SageMaker-centric workflows and can integrate with Model Registry for versioned approvals. MWAA (Airflow) is well-suited when coordinating broader data/ETL plus ML tasks in a DAG with rich scheduling and dependencies. Step Functions is a strong choice for serverless stateful orchestration with branching, retries, and human approval wait patterns. In contrast, stitching together EventBridge and Lambda to automatically deploy with an admin IAM role removes the approval gate and audit-friendly model/version lineage while also violating least-privilege access.
Topic: Deployment and Orchestration of ML Workflows
A team uses AWS CodeCommit and AWS CodePipeline to run a SageMaker workflow (data prep -7 train/tune -7 register -7 deploy -7 monitor). The pipeline currently triggers on every commit to the main branch and automatically deploys to a production real-time endpoint.
Several incidents were caused by unreviewed changes to training configuration files (hyperparameters and feature lists) being pushed directly to main. The team must keep automated deployments, add an auditable link from each deployed model to the exact code/config used, and reduce the chance of unsafe changes reaching production.
Which change best improves reliability and operability while meeting these constraints?
Options:
A. Allow direct commits to main and rely on CloudWatch alarms
B. Force-rebase main regularly to keep history small
C. Use a long-lived dev branch and merge to main monthly
D. Require PRs into protected main; deploy only from release tags
Best answer: D
Explanation: Using short-lived branches with pull requests into a protected main branch adds an approval and automated-check gate before production. Triggering production deployments from immutable Git release tags (and recording the tag with the model package) provides a clear, auditable mapping from a deployed model back to the exact code and configuration that produced it. The tradeoff is slightly slower releases due to PR reviews and explicit tagging.
The core improvement is adding Git-based change control and traceability to the ML CI/CD path. Short-lived feature branches plus pull requests into a protected main enable required reviews and CI status checks (lint/unit tests, pipeline dry-run, config validation) before code/config can affect production. Separately, deploying only from an immutable Git tag (for example, an annotated release/vX.Y.Z tag) makes the deployment input a fixed commit, so the team can always reproduce what code and configuration trained the registered model and what was deployed.
A practical pattern is:
mainThis increases reliability and auditability at the cost of adding a small, intentional release ceremony (PR + tag).
Use the AWS MLA-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try AWS MLA-C01 on Web View AWS MLA-C01 Practice Test
Read the AWS MLA-C01 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.