MLA-C01 Cheatsheet — SageMaker, MLOps, Endpoint Types, Monitoring & Security (High Yield)

High-signal MLA-C01 reference: data ingestion/ETL + feature engineering, model selection/training/tuning/evaluation, SageMaker deployment endpoint choices, CI/CD and orchestration patterns, monitoring/drift/cost optimization, and security/governance essentials.

Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.


Quick facts (MLA-C01)

ItemValue
Questions65 (multiple-choice + multiple-response)
Time130 minutes
Passing score720 (scaled 100–1000)
Cost150 USD
DomainsD1 28% • D2 26% • D3 22% • D4 24%

Fast strategy (what the exam expects)

  • If the question says best-fit managed ML, the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints).
  • If the scenario is “data is messy,” think data quality checks, profiling, transformations, and feature consistency (train/serve).
  • If the scenario is “accuracy dropped in prod,” think drift, monitoring baselines, A/B or shadow, and retraining triggers.
  • If the scenario is “cost is spiking,” think right-sizing, endpoint type selection, auto scaling, Spot / Savings Plans, and budgets/tags.
  • If there’s “security/compliance,” include least privilege IAM, encryption, VPC isolation, and audit logging.
  • Read the last sentence first to capture constraints: latency, cost, ops effort, compliance, auditability.

Domain weights (how to allocate your time)

DomainWeightPrep focus
Domain 1: Data Preparation for ML28%Ingest/ETL, feature engineering, data quality and bias basics
Domain 2: ML Model Development26%Model choice, training/tuning, evaluation, Clarify/Debugger/Registry
Domain 3: Deployment + Orchestration22%Endpoint types, scaling, IaC, CI/CD for ML pipelines
Domain 4: Monitoring + Security24%Drift/model monitor, infra monitoring + costs, security controls

0) SageMaker service map (high yield)

CapabilityWhat it’s forMLA‑C01 “why it matters”
SageMaker Data WranglerData prep + feature engineeringFast, repeatable transforms; reduces time-to-first-model
SageMaker Feature StoreCentral feature storageAvoid train/serve skew; feature reuse and governance
SageMaker TrainingManaged training jobsRepeatable, scalable training on AWS compute
SageMaker AMTHyperparameter tuningSystematic search for better model configs
SageMaker ClarifyBias + explainabilityResponsible ML evidence + model understanding
SageMaker Model DebuggerTraining diagnosticsDebug convergence and training instability
SageMaker Model RegistryVersioning + approvalsAuditability, rollback, safe promotion to prod
SageMaker EndpointsManaged model servingReal-time/serverless/async inference patterns
SageMaker Model MonitorMonitoring workflowsDetect drift and quality issues in production
SageMaker PipelinesML workflow orchestrationBuild-test-train-evaluate-register-deploy automation

1) End-to-end ML on AWS (mental model)

    flowchart LR
	  S["Sources"] --> I["Ingest"]
	  I --> T["Transform + Quality Checks"]
	  T --> F["Feature Engineering + Feature Store"]
	  F --> TR["Train + Tune"]
	  TR --> E["Evaluate + Bias/Explainability"]
	  E --> R["Register + Approve"]
	  R --> D["Deploy Endpoint or Batch"]
	  D --> M["Monitor Drift/Quality/Cost/Security"]
	  M -->|Triggers| RT["Retrain"]
	  RT --> TR

High-yield framing: MLA‑C01 is about the pipeline, not just the model.


2) Domain 1 — Data preparation (28%)

“Which tool should I use?” (ETL and prep picker)

You need…Typical best-fitWhy
Visual data prep + fast iterationSageMaker Data WranglerInteractive + repeatable workflows
No/low-code transforms and profilingAWS Glue DataBrewGood for business-friendly prep
Scalable ETL jobsAWS Glue / SparkProduction batch ETL at scale
Big Spark workloads (custom)Amazon EMRMore control over Spark
Simple streaming transformsAWS LambdaEvent-driven, lightweight
Streaming analyticsManaged Apache FlinkStateful streaming at scale

Data formats (pickers)

FormatWhy it shows upTypical trade-off
Parquet / ORCColumnar analytics + efficient readsBest for large tabular datasets
CSV / JSONInterop + simplicityBigger + slower at scale
AvroSchema evolution + streamingGood for pipelines
RecordIOML-specific record formatsUseful with some training stacks

Rule: choose formats based on access patterns (scan vs selective reads), schema evolution, and scale.

Data ingestion and storage (high yield)

  • Amazon S3: default data lake for ML (durable, cheap, scalable).
  • Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.
  • Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.

Common best answers:

  • Use AWS Glue / Spark on EMR for big ETL jobs.
  • Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations.
  • Use SageMaker Feature Store to keep training/inference features consistent.

Feature Store: why it matters

  • Avoid train/serve skew: the feature used in training is the same feature served to inference.
  • Support feature reuse across teams and models.
  • Enable governance: feature definitions and versions.

Data integrity + bias basics (often tested)

ProblemWhat to doTooling you might name
Missing/invalid dataAdd data quality checks + fail fastGlue DataBrew / Glue Data Quality
Class imbalanceResampling or synthetic data(Conceptual) + Clarify for analysis
Bias sourcesIdentify selection/measurement biasSageMaker Clarify (bias analysis)
Sensitive dataClassify + mask/anonymize + encryptKMS + access controls
Compliance constraintsData residency + least privilege + audit logsIAM + CloudTrail + region choices

High-yield rule: don’t “fix” model issues before you verify data quality and leakage.


3) Domain 2 — Model development (26%)

Choosing an approach

If you need…Typical best-fit
A standard AI capability with minimal ML opsAWS AI services (Translate/Transcribe/Rekognition, etc.)
A custom model with managed training + deploymentAmazon SageMaker
A foundation model / generative capabilityAmazon Bedrock (when applicable)

Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops.

Training and tuning (high yield)

  • Training loop terms: epoch, step, batch size.
  • Speedups: early stopping, distributed training.
  • Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features.
  • Hyperparameter tuning: random search vs Bayesian optimization; in SageMaker, use Automatic Model Tuning (AMT).

Metrics picker (what to choose)

TaskCommon metricsWhat the exam tries to trick you on
ClassificationAccuracy, precision, recall, F1, ROC-AUCClass imbalance makes accuracy misleading
RegressionMAE/RMSEOutliers and error cost (what matters more?)
Model selectionMetric + cost/latency“Best” isn’t only accuracy

Overfitting vs underfitting (signals)

SymptomLikely issueTypical fix
Train ↑, validation ↓OverfittingRegularization, simpler model, more data, better features
Both lowUnderfittingMore expressive model, better features, tune hyperparameters

Clarify vs Debugger vs Model Monitor (common confusion)

ToolWhat it helps withWhen to name it
SageMaker ClarifyBias + explainabilityFairness questions, “why did it predict X?”
SageMaker Model DebuggerTraining diagnostics + convergenceTraining instability, loss not decreasing, debugging training
SageMaker Model MonitorProduction monitoring workflowsDrift, data quality degradation, monitoring baselines

Model Registry (repeatability + governance)

  • Track: model artifacts, metrics, lineage, approvals.
  • Enables safe promotion/rollback and audit-ready workflows.

4) Domain 3 — Deployment and orchestration (22%)

Endpoint types (must-know picker)

Endpoint typeBest forTypical constraint
Real-timeSteady, low-latency inferenceCost for always-on capacity
ServerlessSpiky traffic, scale-to-zeroCold starts + limits
AsynchronousLong inference time, bursty workloadsEvent-style patterns + polling/callback
Batch inferenceScheduled/offline scoringNot interactive

Scaling metrics (what to pick)

MetricGood when…Watch out
Invocations per instanceRequest volume drives loadSpiky traffic can cause oscillation
LatencyYou have a latency SLONoisy metrics require smoothing
CPU/GPU utilizationCompute bound modelsNot always correlated to request rate

Multi-model / multi-container (why they exist)

  • Multi-model: multiple models behind one endpoint to reduce cost.
  • Multi-container: pre/post-processing plus model serving, or multiple frameworks.

IaC + containers (exam patterns)

  • IaC: CloudFormation or CDK for reproducible environments.
  • Containers: build/publish to ECR, deploy via SageMaker, ECS, or EKS.

CI/CD for ML (what’s different)

You version and validate more than code:

  • Code + data + features + model artifacts + evaluation reports
  • Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation

Typical services: CodePipeline/CodeBuild/CodeDeploy, SageMaker Pipelines, EventBridge triggers.

    flowchart LR
	  G["Git push"] --> CP["CodePipeline"]
	  CP --> CB["CodeBuild: tests + build"]
	  CB --> P["SageMaker Pipeline: process/train/eval"]
	  P --> Gate{"Meets<br/>thresholds?"}
	  Gate -->|yes| MR["Model Registry approve"]
	  Gate -->|no| Stop["Stop + report"]
	  MR --> Dep["Deploy (canary/shadow)"]
	  Dep --> Mon["Monitor + rollback triggers"]

5) Domain 4 — Monitoring, cost, and security (24%)

Monitoring and drift (high yield)

  • Data drift: input distribution changed.
  • Concept drift: relationship between input and label changed.
  • Use baselines + ongoing checks; monitor latency/errors too.

Common services/patterns:

  • SageMaker Model Monitor for monitoring workflows.
  • A/B testing or shadow deployments for safe comparison.

Monitoring checklist (what to instrument)

  • Inference quality: when ground truth is available later, compare predicted vs actual.
  • Data quality: nulls, ranges, schema changes, category explosion.
  • Distribution shift: feature histograms/summary stats vs baseline.
  • Ops signals: p50/p95 latency, error rate, throttles, timeouts.
  • Safety/security: anomalous traffic spikes, abuse patterns, permission failures.

Infra + cost optimization (high yield)

ThemeWhat to do
ObservabilityCloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces
RightsizingPick instance family/size based on perf; use Inference Recommender + Compute Optimizer
Spend controlTags + Cost Explorer + Budgets + Trusted Advisor
Purchasing optionsSpot / Reserved / Savings Plans where the workload fits

Cost levers (common “best answer” patterns)

  • Choose the right inference mode first: batch (cheapest) → asyncserverlessreal-time (most always-on).
  • Right-size and auto scale; don’t leave endpoints overprovisioned.
  • Use Spot for fault-tolerant training/batch where interruptions are acceptable.
  • Use Budgets + tags early (before the bills surprise you).

Security defaults (high yield)

  • Least privilege IAM for training jobs, pipelines, and endpoints.
  • Encrypt at rest + in transit (KMS + TLS).
  • VPC isolation (subnets + security groups) for ML resources when required.
  • Audit trails (CloudTrail) + controlled access to logs and artifacts.

Common IAM/security “gotchas”

  • Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch).
  • Endpoint role has broad S3 access (“*”) instead of a tight prefix.
  • Secrets leak into logs/artifacts (build logs, notebooks, environment variables).
  • No audit trail for model registry approvals or endpoint updates.

Next steps

  • Use the Syllabus as your checklist.
  • Use Practice to drill weak tasks fast.
  • Use the Study Plan if you want a 30/60/90‑day schedule.