MLA-C01 Cheatsheet — SageMaker, MLOps, Endpoint Types, Monitoring & Security (High Yield)

High-signal MLA-C01 reference: data ingestion/ETL + feature engineering, model selection/training/tuning/evaluation, SageMaker deployment endpoint choices, CI/CD and orchestration patterns, monitoring/drift/cost optimization, and security/governance essentials.

Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.


Quick facts (MLA-C01)

Item Value
Questions 65 (multiple-choice + multiple-response)
Time 130 minutes
Passing score 720 (scaled 100–1000)
Cost 150 USD
Domains D1 28% • D2 26% • D3 22% • D4 24%

Fast strategy (what the exam expects)

  • If the question says best-fit managed ML, the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints).
  • If the scenario is “data is messy,” think data quality checks, profiling, transformations, and feature consistency (train/serve).
  • If the scenario is “accuracy dropped in prod,” think drift, monitoring baselines, A/B or shadow, and retraining triggers.
  • If the scenario is “cost is spiking,” think right-sizing, endpoint type selection, auto scaling, Spot / Savings Plans, and budgets/tags.
  • If there’s “security/compliance,” include least privilege IAM, encryption, VPC isolation, and audit logging.
  • Read the last sentence first to capture constraints: latency, cost, ops effort, compliance, auditability.

Domain weights (how to allocate your time)

Domain Weight Prep focus
Domain 1: Data Preparation for ML 28% Ingest/ETL, feature engineering, data quality and bias basics
Domain 2: ML Model Development 26% Model choice, training/tuning, evaluation, Clarify/Debugger/Registry
Domain 3: Deployment + Orchestration 22% Endpoint types, scaling, IaC, CI/CD for ML pipelines
Domain 4: Monitoring + Security 24% Drift/model monitor, infra monitoring + costs, security controls

0) SageMaker service map (high yield)

Capability What it’s for MLA‑C01 “why it matters”
SageMaker Data Wrangler Data prep + feature engineering Fast, repeatable transforms; reduces time-to-first-model
SageMaker Feature Store Central feature storage Avoid train/serve skew; feature reuse and governance
SageMaker Training Managed training jobs Repeatable, scalable training on AWS compute
SageMaker AMT Hyperparameter tuning Systematic search for better model configs
SageMaker Clarify Bias + explainability Responsible ML evidence + model understanding
SageMaker Model Debugger Training diagnostics Debug convergence and training instability
SageMaker Model Registry Versioning + approvals Auditability, rollback, safe promotion to prod
SageMaker Endpoints Managed model serving Real-time/serverless/async inference patterns
SageMaker Model Monitor Monitoring workflows Detect drift and quality issues in production
SageMaker Pipelines ML workflow orchestration Build-test-train-evaluate-register-deploy automation

1) End-to-end ML on AWS (mental model)

    flowchart LR
	  S["Sources"] --> I["Ingest"]
	  I --> T["Transform + Quality Checks"]
	  T --> F["Feature Engineering + Feature Store"]
	  F --> TR["Train + Tune"]
	  TR --> E["Evaluate + Bias/Explainability"]
	  E --> R["Register + Approve"]
	  R --> D["Deploy Endpoint or Batch"]
	  D --> M["Monitor Drift/Quality/Cost/Security"]
	  M -->|Triggers| RT["Retrain"]
	  RT --> TR

High-yield framing: MLA‑C01 is about the pipeline, not just the model.


2) Domain 1 — Data preparation (28%)

“Which tool should I use?” (ETL and prep picker)

You need… Typical best-fit Why
Visual data prep + fast iteration SageMaker Data Wrangler Interactive + repeatable workflows
No/low-code transforms and profiling AWS Glue DataBrew Good for business-friendly prep
Scalable ETL jobs AWS Glue / Spark Production batch ETL at scale
Big Spark workloads (custom) Amazon EMR More control over Spark
Simple streaming transforms AWS Lambda Event-driven, lightweight
Streaming analytics Managed Apache Flink Stateful streaming at scale

Data formats (pickers)

Format Why it shows up Typical trade-off
Parquet / ORC Columnar analytics + efficient reads Best for large tabular datasets
CSV / JSON Interop + simplicity Bigger + slower at scale
Avro Schema evolution + streaming Good for pipelines
RecordIO ML-specific record formats Useful with some training stacks

Rule: choose formats based on access patterns (scan vs selective reads), schema evolution, and scale.

Data ingestion and storage (high yield)

  • Amazon S3: default data lake for ML (durable, cheap, scalable).
  • Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.
  • Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.

Common best answers:

  • Use AWS Glue / Spark on EMR for big ETL jobs.
  • Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations.
  • Use SageMaker Feature Store to keep training/inference features consistent.

Feature Store: why it matters

  • Avoid train/serve skew: the feature used in training is the same feature served to inference.
  • Support feature reuse across teams and models.
  • Enable governance: feature definitions and versions.

Data integrity + bias basics (often tested)

Problem What to do Tooling you might name
Missing/invalid data Add data quality checks + fail fast Glue DataBrew / Glue Data Quality
Class imbalance Resampling or synthetic data (Conceptual) + Clarify for analysis
Bias sources Identify selection/measurement bias SageMaker Clarify (bias analysis)
Sensitive data Classify + mask/anonymize + encrypt KMS + access controls
Compliance constraints Data residency + least privilege + audit logs IAM + CloudTrail + region choices

High-yield rule: don’t “fix” model issues before you verify data quality and leakage.


3) Domain 2 — Model development (26%)

Choosing an approach

If you need… Typical best-fit
A standard AI capability with minimal ML ops AWS AI services (Translate/Transcribe/Rekognition, etc.)
A custom model with managed training + deployment Amazon SageMaker
A foundation model / generative capability Amazon Bedrock (when applicable)

Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops.

Training and tuning (high yield)

  • Training loop terms: epoch, step, batch size.
  • Speedups: early stopping, distributed training.
  • Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features.
  • Hyperparameter tuning: random search vs Bayesian optimization; in SageMaker, use Automatic Model Tuning (AMT).

Metrics picker (what to choose)

Task Common metrics What the exam tries to trick you on
Classification Accuracy, precision, recall, F1, ROC-AUC Class imbalance makes accuracy misleading
Regression MAE/RMSE Outliers and error cost (what matters more?)
Model selection Metric + cost/latency “Best” isn’t only accuracy

Overfitting vs underfitting (signals)

Symptom Likely issue Typical fix
Train ↑, validation ↓ Overfitting Regularization, simpler model, more data, better features
Both low Underfitting More expressive model, better features, tune hyperparameters

Clarify vs Debugger vs Model Monitor (common confusion)

Tool What it helps with When to name it
SageMaker Clarify Bias + explainability Fairness questions, “why did it predict X?”
SageMaker Model Debugger Training diagnostics + convergence Training instability, loss not decreasing, debugging training
SageMaker Model Monitor Production monitoring workflows Drift, data quality degradation, monitoring baselines

Model Registry (repeatability + governance)

  • Track: model artifacts, metrics, lineage, approvals.
  • Enables safe promotion/rollback and audit-ready workflows.

4) Domain 3 — Deployment and orchestration (22%)

Endpoint types (must-know picker)

Endpoint type Best for Typical constraint
Real-time Steady, low-latency inference Cost for always-on capacity
Serverless Spiky traffic, scale-to-zero Cold starts + limits
Asynchronous Long inference time, bursty workloads Event-style patterns + polling/callback
Batch inference Scheduled/offline scoring Not interactive

Scaling metrics (what to pick)

Metric Good when… Watch out
Invocations per instance Request volume drives load Spiky traffic can cause oscillation
Latency You have a latency SLO Noisy metrics require smoothing
CPU/GPU utilization Compute bound models Not always correlated to request rate

Multi-model / multi-container (why they exist)

  • Multi-model: multiple models behind one endpoint to reduce cost.
  • Multi-container: pre/post-processing plus model serving, or multiple frameworks.

IaC + containers (exam patterns)

  • IaC: CloudFormation or CDK for reproducible environments.
  • Containers: build/publish to ECR, deploy via SageMaker, ECS, or EKS.

CI/CD for ML (what’s different)

You version and validate more than code:

  • Code + data + features + model artifacts + evaluation reports
  • Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation

Typical services: CodePipeline/CodeBuild/CodeDeploy, SageMaker Pipelines, EventBridge triggers.

    flowchart LR
	  G["Git push"] --> CP["CodePipeline"]
	  CP --> CB["CodeBuild: tests + build"]
	  CB --> P["SageMaker Pipeline: process/train/eval"]
	  P --> Gate{"Meets<br/>thresholds?"}
	  Gate -->|yes| MR["Model Registry approve"]
	  Gate -->|no| Stop["Stop + report"]
	  MR --> Dep["Deploy (canary/shadow)"]
	  Dep --> Mon["Monitor + rollback triggers"]

5) Domain 4 — Monitoring, cost, and security (24%)

Monitoring and drift (high yield)

  • Data drift: input distribution changed.
  • Concept drift: relationship between input and label changed.
  • Use baselines + ongoing checks; monitor latency/errors too.

Common services/patterns:

  • SageMaker Model Monitor for monitoring workflows.
  • A/B testing or shadow deployments for safe comparison.

Monitoring checklist (what to instrument)

  • Inference quality: when ground truth is available later, compare predicted vs actual.
  • Data quality: nulls, ranges, schema changes, category explosion.
  • Distribution shift: feature histograms/summary stats vs baseline.
  • Ops signals: p50/p95 latency, error rate, throttles, timeouts.
  • Safety/security: anomalous traffic spikes, abuse patterns, permission failures.

Infra + cost optimization (high yield)

Theme What to do
Observability CloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces
Rightsizing Pick instance family/size based on perf; use Inference Recommender + Compute Optimizer
Spend control Tags + Cost Explorer + Budgets + Trusted Advisor
Purchasing options Spot / Reserved / Savings Plans where the workload fits

Cost levers (common “best answer” patterns)

  • Choose the right inference mode first: batch (cheapest) → asyncserverlessreal-time (most always-on).
  • Right-size and auto scale; don’t leave endpoints overprovisioned.
  • Use Spot for fault-tolerant training/batch where interruptions are acceptable.
  • Use Budgets + tags early (before the bills surprise you).

Security defaults (high yield)

  • Least privilege IAM for training jobs, pipelines, and endpoints.
  • Encrypt at rest + in transit (KMS + TLS).
  • VPC isolation (subnets + security groups) for ML resources when required.
  • Audit trails (CloudTrail) + controlled access to logs and artifacts.

Common IAM/security “gotchas”

  • Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch).
  • Endpoint role has broad S3 access (“*”) instead of a tight prefix.
  • Secrets leak into logs/artifacts (build logs, notebooks, environment variables).
  • No audit trail for model registry approvals or endpoint updates.

Next steps

  • Use the Syllabus as your checklist.
  • Use Practice to drill weak tasks fast.
  • Use the Study Plan if you want a 30/60/90‑day schedule.