Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.
Quick facts (MLA-C01)
| Item |
Value |
| Questions |
65 (multiple-choice + multiple-response) |
| Time |
130 minutes |
| Passing score |
720 (scaled 100–1000) |
| Cost |
150 USD |
| Domains |
D1 28% • D2 26% • D3 22% • D4 24% |
Fast strategy (what the exam expects)
- If the question says best-fit managed ML, the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints).
- If the scenario is “data is messy,” think data quality checks, profiling, transformations, and feature consistency (train/serve).
- If the scenario is “accuracy dropped in prod,” think drift, monitoring baselines, A/B or shadow, and retraining triggers.
- If the scenario is “cost is spiking,” think right-sizing, endpoint type selection, auto scaling, Spot / Savings Plans, and budgets/tags.
- If there’s “security/compliance,” include least privilege IAM, encryption, VPC isolation, and audit logging.
- Read the last sentence first to capture constraints: latency, cost, ops effort, compliance, auditability.
Domain weights (how to allocate your time)
| Domain |
Weight |
Prep focus |
| Domain 1: Data Preparation for ML |
28% |
Ingest/ETL, feature engineering, data quality and bias basics |
| Domain 2: ML Model Development |
26% |
Model choice, training/tuning, evaluation, Clarify/Debugger/Registry |
| Domain 3: Deployment + Orchestration |
22% |
Endpoint types, scaling, IaC, CI/CD for ML pipelines |
| Domain 4: Monitoring + Security |
24% |
Drift/model monitor, infra monitoring + costs, security controls |
0) SageMaker service map (high yield)
| Capability |
What it’s for |
MLA‑C01 “why it matters” |
| SageMaker Data Wrangler |
Data prep + feature engineering |
Fast, repeatable transforms; reduces time-to-first-model |
| SageMaker Feature Store |
Central feature storage |
Avoid train/serve skew; feature reuse and governance |
| SageMaker Training |
Managed training jobs |
Repeatable, scalable training on AWS compute |
| SageMaker AMT |
Hyperparameter tuning |
Systematic search for better model configs |
| SageMaker Clarify |
Bias + explainability |
Responsible ML evidence + model understanding |
| SageMaker Model Debugger |
Training diagnostics |
Debug convergence and training instability |
| SageMaker Model Registry |
Versioning + approvals |
Auditability, rollback, safe promotion to prod |
| SageMaker Endpoints |
Managed model serving |
Real-time/serverless/async inference patterns |
| SageMaker Model Monitor |
Monitoring workflows |
Detect drift and quality issues in production |
| SageMaker Pipelines |
ML workflow orchestration |
Build-test-train-evaluate-register-deploy automation |
1) End-to-end ML on AWS (mental model)
flowchart LR
S["Sources"] --> I["Ingest"]
I --> T["Transform + Quality Checks"]
T --> F["Feature Engineering + Feature Store"]
F --> TR["Train + Tune"]
TR --> E["Evaluate + Bias/Explainability"]
E --> R["Register + Approve"]
R --> D["Deploy Endpoint or Batch"]
D --> M["Monitor Drift/Quality/Cost/Security"]
M -->|Triggers| RT["Retrain"]
RT --> TR
High-yield framing: MLA‑C01 is about the pipeline, not just the model.
2) Domain 1 — Data preparation (28%)
| You need… |
Typical best-fit |
Why |
| Visual data prep + fast iteration |
SageMaker Data Wrangler |
Interactive + repeatable workflows |
| No/low-code transforms and profiling |
AWS Glue DataBrew |
Good for business-friendly prep |
| Scalable ETL jobs |
AWS Glue / Spark |
Production batch ETL at scale |
| Big Spark workloads (custom) |
Amazon EMR |
More control over Spark |
| Simple streaming transforms |
AWS Lambda |
Event-driven, lightweight |
| Streaming analytics |
Managed Apache Flink |
Stateful streaming at scale |
| Format |
Why it shows up |
Typical trade-off |
| Parquet / ORC |
Columnar analytics + efficient reads |
Best for large tabular datasets |
| CSV / JSON |
Interop + simplicity |
Bigger + slower at scale |
| Avro |
Schema evolution + streaming |
Good for pipelines |
| RecordIO |
ML-specific record formats |
Useful with some training stacks |
Rule: choose formats based on access patterns (scan vs selective reads), schema evolution, and scale.
Data ingestion and storage (high yield)
- Amazon S3: default data lake for ML (durable, cheap, scalable).
- Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.
- Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.
Common best answers:
- Use AWS Glue / Spark on EMR for big ETL jobs.
- Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations.
- Use SageMaker Feature Store to keep training/inference features consistent.
Feature Store: why it matters
- Avoid train/serve skew: the feature used in training is the same feature served to inference.
- Support feature reuse across teams and models.
- Enable governance: feature definitions and versions.
Data integrity + bias basics (often tested)
| Problem |
What to do |
Tooling you might name |
| Missing/invalid data |
Add data quality checks + fail fast |
Glue DataBrew / Glue Data Quality |
| Class imbalance |
Resampling or synthetic data |
(Conceptual) + Clarify for analysis |
| Bias sources |
Identify selection/measurement bias |
SageMaker Clarify (bias analysis) |
| Sensitive data |
Classify + mask/anonymize + encrypt |
KMS + access controls |
| Compliance constraints |
Data residency + least privilege + audit logs |
IAM + CloudTrail + region choices |
High-yield rule: don’t “fix” model issues before you verify data quality and leakage.
3) Domain 2 — Model development (26%)
Choosing an approach
| If you need… |
Typical best-fit |
| A standard AI capability with minimal ML ops |
AWS AI services (Translate/Transcribe/Rekognition, etc.) |
| A custom model with managed training + deployment |
Amazon SageMaker |
| A foundation model / generative capability |
Amazon Bedrock (when applicable) |
Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops.
Training and tuning (high yield)
- Training loop terms: epoch, step, batch size.
- Speedups: early stopping, distributed training.
- Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features.
- Hyperparameter tuning: random search vs Bayesian optimization; in SageMaker, use Automatic Model Tuning (AMT).
Metrics picker (what to choose)
| Task |
Common metrics |
What the exam tries to trick you on |
| Classification |
Accuracy, precision, recall, F1, ROC-AUC |
Class imbalance makes accuracy misleading |
| Regression |
MAE/RMSE |
Outliers and error cost (what matters more?) |
| Model selection |
Metric + cost/latency |
“Best” isn’t only accuracy |
Overfitting vs underfitting (signals)
| Symptom |
Likely issue |
Typical fix |
| Train ↑, validation ↓ |
Overfitting |
Regularization, simpler model, more data, better features |
| Both low |
Underfitting |
More expressive model, better features, tune hyperparameters |
Clarify vs Debugger vs Model Monitor (common confusion)
| Tool |
What it helps with |
When to name it |
| SageMaker Clarify |
Bias + explainability |
Fairness questions, “why did it predict X?” |
| SageMaker Model Debugger |
Training diagnostics + convergence |
Training instability, loss not decreasing, debugging training |
| SageMaker Model Monitor |
Production monitoring workflows |
Drift, data quality degradation, monitoring baselines |
Model Registry (repeatability + governance)
- Track: model artifacts, metrics, lineage, approvals.
- Enables safe promotion/rollback and audit-ready workflows.
4) Domain 3 — Deployment and orchestration (22%)
Endpoint types (must-know picker)
| Endpoint type |
Best for |
Typical constraint |
| Real-time |
Steady, low-latency inference |
Cost for always-on capacity |
| Serverless |
Spiky traffic, scale-to-zero |
Cold starts + limits |
| Asynchronous |
Long inference time, bursty workloads |
Event-style patterns + polling/callback |
| Batch inference |
Scheduled/offline scoring |
Not interactive |
Scaling metrics (what to pick)
| Metric |
Good when… |
Watch out |
| Invocations per instance |
Request volume drives load |
Spiky traffic can cause oscillation |
| Latency |
You have a latency SLO |
Noisy metrics require smoothing |
| CPU/GPU utilization |
Compute bound models |
Not always correlated to request rate |
Multi-model / multi-container (why they exist)
- Multi-model: multiple models behind one endpoint to reduce cost.
- Multi-container: pre/post-processing plus model serving, or multiple frameworks.
IaC + containers (exam patterns)
- IaC: CloudFormation or CDK for reproducible environments.
- Containers: build/publish to ECR, deploy via SageMaker, ECS, or EKS.
CI/CD for ML (what’s different)
You version and validate more than code:
- Code + data + features + model artifacts + evaluation reports
- Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation
Typical services: CodePipeline/CodeBuild/CodeDeploy, SageMaker Pipelines, EventBridge triggers.
flowchart LR
G["Git push"] --> CP["CodePipeline"]
CP --> CB["CodeBuild: tests + build"]
CB --> P["SageMaker Pipeline: process/train/eval"]
P --> Gate{"Meets<br/>thresholds?"}
Gate -->|yes| MR["Model Registry approve"]
Gate -->|no| Stop["Stop + report"]
MR --> Dep["Deploy (canary/shadow)"]
Dep --> Mon["Monitor + rollback triggers"]
5) Domain 4 — Monitoring, cost, and security (24%)
Monitoring and drift (high yield)
- Data drift: input distribution changed.
- Concept drift: relationship between input and label changed.
- Use baselines + ongoing checks; monitor latency/errors too.
Common services/patterns:
- SageMaker Model Monitor for monitoring workflows.
- A/B testing or shadow deployments for safe comparison.
Monitoring checklist (what to instrument)
- Inference quality: when ground truth is available later, compare predicted vs actual.
- Data quality: nulls, ranges, schema changes, category explosion.
- Distribution shift: feature histograms/summary stats vs baseline.
- Ops signals: p50/p95 latency, error rate, throttles, timeouts.
- Safety/security: anomalous traffic spikes, abuse patterns, permission failures.
Infra + cost optimization (high yield)
| Theme |
What to do |
| Observability |
CloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces |
| Rightsizing |
Pick instance family/size based on perf; use Inference Recommender + Compute Optimizer |
| Spend control |
Tags + Cost Explorer + Budgets + Trusted Advisor |
| Purchasing options |
Spot / Reserved / Savings Plans where the workload fits |
Cost levers (common “best answer” patterns)
- Choose the right inference mode first: batch (cheapest) → async → serverless → real-time (most always-on).
- Right-size and auto scale; don’t leave endpoints overprovisioned.
- Use Spot for fault-tolerant training/batch where interruptions are acceptable.
- Use Budgets + tags early (before the bills surprise you).
Security defaults (high yield)
- Least privilege IAM for training jobs, pipelines, and endpoints.
- Encrypt at rest + in transit (KMS + TLS).
- VPC isolation (subnets + security groups) for ML resources when required.
- Audit trails (CloudTrail) + controlled access to logs and artifacts.
Common IAM/security “gotchas”
- Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch).
- Endpoint role has broad S3 access (“*”) instead of a tight prefix.
- Secrets leak into logs/artifacts (build logs, notebooks, environment variables).
- No audit trail for model registry approvals or endpoint updates.
Next steps
- Use the Syllabus as your checklist.
- Use Practice to drill weak tasks fast.
- Use the Study Plan if you want a 30/60/90‑day schedule.