MLA-C01 — AWS Certified Machine Learning Engineer – Associate Quick Reference

Last revised: June 29, 2026

Compact AWS MLA-C01 reference for machine learning engineering: data prep, SageMaker training, deployment, MLOps, monitoring, and security decisions.

Exam identity and quick-use map

This independent Quick Reference supports preparation for AWS Certified Machine Learning Engineer – Associate (MLA-C01). It focuses on the AWS service choices, ML engineering workflows, security controls, deployment patterns, and troubleshooting distinctions that commonly drive scenario questions.

Use this page to answer: What AWS service or pattern should I choose, and why?

    flowchart LR
	    A[Data sources] --> B[Ingest and store]
	    B --> C[Clean, label, transform]
	    C --> D[Feature engineering]
	    D --> E[Train or tune model]
	    E --> F[Evaluate]
	    F --> G{Meets criteria?}
	    G -- No --> C
	    G -- Yes --> H[Register and approve]
	    H --> I[Deploy: real-time, async, batch, serverless]
	    I --> J[Monitor: data, quality, bias, latency, errors]
	    J --> K{Drift or degradation?}
	    K -- Yes --> L[Retrain pipeline]
	    L --> E
	    K -- No --> I

High-yield AWS service selection

Need in scenario	Prefer	Why it fits	Common trap
Durable landing zone for training data, artifacts, model outputs	Amazon S3	Native integration with SageMaker, Glue, Athena, EMR, Redshift Spectrum	Do not store large training datasets only on notebook instance storage
Data catalog for files in S3	AWS Glue Data Catalog	Central schema/catalog for Athena, Glue, EMR, Redshift Spectrum	Athena queries data; Glue Data Catalog stores metadata
Serverless SQL over S3	Amazon Athena	Ad hoc queries without managing clusters	Not ideal for heavy ETL pipelines that need complex transforms
Serverless ETL, crawlers, Spark jobs	AWS Glue	Managed ETL and schema discovery	Use EMR when cluster-level control/custom big data stack is required
Custom big data processing frameworks	Amazon EMR	Managed Hadoop/Spark/Hive ecosystem with more configuration control	More operational responsibility than Glue
Data warehouse analytics	Amazon Redshift	Columnar analytics, BI, warehouse workloads	S3 + Athena is often enough for ad hoc lake queries
Streaming ingestion with custom consumers	Amazon Kinesis Data Streams	Low-latency streams and multiple consuming apps	Not the same as Firehose delivery
Managed streaming delivery to S3/Redshift/OpenSearch	Amazon Data Firehose	Minimal administration for delivery and buffering	Less control than Kinesis Data Streams
Kafka-compatible streaming	Amazon MSK	Managed Apache Kafka compatibility	Choose only when Kafka ecosystem compatibility matters
Human data labeling	Amazon SageMaker Ground Truth	Managed labeling workflows and workforces	For sensitive data, prefer private workforce controls
Reusable online/offline ML features	Amazon SageMaker Feature Store	Helps reduce training-serving skew	Do not duplicate feature logic in separate train and inference code
No-code/low-code ML exploration	Amazon SageMaker Canvas	Business-user model building and predictions	Production-grade MLOps still needs controlled pipelines and deployment
Managed notebook and ML IDE	Amazon SageMaker Studio	Development, experiments, pipelines, model registry integration	Notebook success does not equal reproducible pipeline
Managed training jobs	Amazon SageMaker Training	Scalable, repeatable training with containers, S3 inputs, IAM roles	Avoid training on notebook instances for production workflows
Hyperparameter search	SageMaker automatic model tuning	Runs multiple training jobs against objective metric	Do not tune against final test set
ML workflow orchestration	Amazon SageMaker Pipelines	ML-native steps, lineage, parameters, model registry integration	Use Step Functions for broader cross-service workflow orchestration
General workflow orchestration across AWS services	AWS Step Functions	Serverless state machines, retries, approvals, integrations	Less ML-specific lineage than SageMaker Pipelines
Model package approval and versioning	SageMaker Model Registry	Tracks model versions, metadata, approval state	S3 artifact alone is not governed deployment
Real-time hosted inference	SageMaker real-time endpoint	Persistent low-latency API endpoint	Idle endpoints can create unnecessary cost
Offline scoring of large datasets	SageMaker Batch Transform	No persistent endpoint; reads/writes S3	Not for interactive request/response inference
Large payloads or longer inference times	SageMaker Asynchronous Inference	Queued requests, S3 outputs, scales endpoint capacity	Not true synchronous low-latency API behavior
Intermittent inference traffic	SageMaker Serverless Inference	No instance management for spiky/idle workloads	Consider cold starts and workload suitability
Many similar models behind one endpoint	SageMaker multi-model endpoint	Consolidates model hosting	Model load/cache behavior can affect latency
Foundation model API without managing model infrastructure	Amazon Bedrock	Managed access to foundation models, agents, guardrails, knowledge bases	Do not choose custom SageMaker training when managed FM API is enough
Custom ML containers	Amazon ECR + SageMaker	Bring your own algorithm or inference container	Container must satisfy SageMaker training/inference contracts
Logs, metrics, alarms	Amazon CloudWatch	Operational visibility for endpoints, training, pipelines	Model quality drift requires ML-specific monitoring too
API auditing	AWS CloudTrail	Who called what AWS API and when	CloudWatch logs are not a substitute for API audit trails
Sensitive data discovery in S3	Amazon Macie	Finds and reports sensitive data	Macie does not replace IAM, KMS, or data access design

Data engineering and preparation reference

Storage, catalog, and query decisions

Pattern	Best fit	Exam cues
Raw/bronze data lake	S3 buckets with prefixes, encryption, lifecycle policies	“Store raw source data durably and cheaply”
Curated training dataset	S3 curated prefix, Parquet/CSV/RecordIO as appropriate	“Reusable prepared dataset for training jobs”
Schema discovery	Glue crawler + Glue Data Catalog	“Infer schema from files in S3”
SQL exploration	Athena	“Run SQL directly on S3 data”
Repeatable ETL	Glue ETL job or SageMaker Processing	Glue for general ETL; SageMaker Processing when tightly coupled to ML workflow
Distributed feature engineering	Glue, EMR, or SageMaker Processing with Spark	Choose based on required control and integration
Warehouse-to-ML source	Redshift unload/query integration, Data Wrangler, or direct connector	“Training from warehouse data”
Streaming features/events	Kinesis Data Streams, MSK, Data Firehose	Distinguish custom stream processing from managed delivery

Data split and leakage traps

Situation	Split strategy	Watch for
Independent and identically distributed tabular data	Random train/validation/test split	Fit preprocessing only on training data
Imbalanced classes	Stratified split	Accuracy may be misleading
Time series forecasting	Chronological split	Random split leaks future information
Same user/device/account appears many times	Group-aware split	Avoid same entity in train and test
Small dataset	Cross-validation if feasible	Keep final holdout untouched
Hyperparameter tuning	Train/validation or cross-validation	Test set is for final estimate only
Feature engineering before split	Usually unsafe	Scaling, imputation, encoding, and feature selection can leak test statistics

Data quality checklist

Check	Why it matters	AWS-oriented action
Missing values	Many algorithms cannot use nulls directly	Impute, add missing indicator, or filter
Outliers	Can dominate loss and scaling	Cap, transform, robust scaling, investigate source
Class imbalance	Optimizer may favor majority class	Resampling, class weights, threshold tuning, PR AUC/F1
Label noise	Limits achievable accuracy	Ground Truth review, consensus labeling, quality audits
Duplicate rows	Can leak across splits	Deduplicate before splitting or group split
Skewed distributions	Affects linear models and distance methods	Log transform, normalization, robust scaling
High cardinality categoricals	Sparse and overfit-prone	Hashing, target encoding with care, embeddings
PII/sensitive data	Security and governance risk	Macie, IAM least privilege, KMS, tokenization/redaction

Feature Store concepts

Concept	Meaning	Exam relevance
Feature group	Named collection of feature definitions and records	Organizes reusable features
Offline store	Historical features, typically in S3	Training, batch analytics, backfills
Online store	Low-latency feature lookup	Real-time inference
Event time	Timestamp associated with feature record	Correct point-in-time training data
Training-serving skew	Different feature logic or freshness between training and production	Feature Store and shared transformation code reduce risk

SageMaker development and training

Development environment choices

Need	Choose	Notes
Full ML IDE and managed notebooks	SageMaker Studio	Useful for experiments, pipelines, registry, monitoring
Notebook-only experimentation	SageMaker notebook instances or Studio notebooks	Stop idle resources; not a production pipeline by itself
Business-user model building	SageMaker Canvas	Low-code predictions and exploration
Visual data prep	SageMaker Data Wrangler where available in the workflow	Useful for profiling, transforms, export to jobs/pipelines
Scripted reproducible processing	SageMaker Processing	Run preprocessing/evaluation containers at scale
Production training	SageMaker Training job	Isolated, repeatable, containerized, logged

Training job anatomy

Recognize these knobs in scenario and configuration questions:

TrainingJob:
  AlgorithmSpecification:
    TrainingImage: <ECR image or built-in algorithm>
    TrainingInputMode: File | FastFile | Pipe
  RoleArn: <SageMaker execution role>
  InputDataConfig:
    - ChannelName: train
      DataSource: s3://bucket/prefix/train/
    - ChannelName: validation
      DataSource: s3://bucket/prefix/validation/
  OutputDataConfig:
    S3OutputPath: s3://bucket/prefix/model-artifacts/
    KmsKeyId: <optional KMS key>
  ResourceConfig:
    InstanceType: <training instance type>
    InstanceCount: <count>
    VolumeKmsKeyId: <optional KMS key>
  HyperParameters:
    objective: binary:logistic
  VpcConfig:
    Subnets: [private-subnet]
    SecurityGroupIds: [sg-id]
  StoppingCondition:
    MaxRuntimeInSeconds: <limit>

Training input modes and data access

Mode/source	Best fit	Trap
File mode	Common default; data copied from S3 to training volume	Startup can be slower for very large data
FastFile mode	S3 data exposed with file-like access where supported	Confirm algorithm/framework support
Pipe mode	Streams data to algorithm where supported	Container/algorithm must support streaming
Amazon FSx for Lustre	High-performance distributed file access	More setup than simple S3 inputs
Amazon EFS	Shared file system across instances	Consider throughput and access pattern
Checkpoints to S3	Long or interruptible training jobs	Needed to resume rather than restart from scratch

Container and algorithm choices

Choice	Use when	Notes
Built-in SageMaker algorithm	Standard algorithm fits problem	Less container work, optimized integration
Framework estimator/script mode	TensorFlow, PyTorch, scikit-learn, XGBoost scripts	Bring training script; SageMaker manages job
Custom Docker container	Need custom runtime, dependencies, algorithm, or inference stack	Must follow SageMaker container conventions
Bring your own model artifact	Model already trained elsewhere	Package with compatible inference container
Amazon ECR image	Custom training/inference image	Execution role needs pull permissions

Built-in algorithm selection cues

Problem cue	Likely algorithm family	Notes
Tabular classification/regression with nonlinear patterns	XGBoost	Strong default for structured data
Large-scale linear classification/regression	Linear Learner	Works well for sparse/high-dimensional linear problems
Recommendation or sparse feature interactions	Factorization Machines	Common for user-item sparse matrices
Clustering without labels	K-Means	Unsupervised segmentation
Anomaly detection in numeric/time-series-like data	Random Cut Forest	Detects unusual observations
Text classification or word embeddings	BlazingText	Text-focused built-in option
Forecasting multiple related time series	DeepAR	Uses historical time series patterns
Image classification/detection	Image Classification, Object Detection, or framework model	Often use transfer learning or pretrained models
Custom deep learning architecture	PyTorch/TensorFlow on SageMaker	Use framework estimator or custom container

Hyperparameter tuning

Element	What to know
Objective metric	Metric to maximize or minimize; must be emitted by training job
Search space	Ranges or categorical values for hyperparameters
Early stopping	Stops weak jobs when supported/appropriate
Validation set	Used to compare tuning jobs
Final test set	Held out until final evaluation
Overfitting risk	More tuning can overfit validation data

Model evaluation metrics

Confusion matrix terms

Term	Meaning
TP	Predicted positive and actually positive
FP	Predicted positive but actually negative
TN	Predicted negative and actually negative
FN	Predicted negative but actually positive

\[ \begin{aligned} Accuracy &= \frac{TP + TN}{TP + TN + FP + FN} \\ Precision &= \frac{TP}{TP + FP} \\ Recall &= \frac{TP}{TP + FN} \\ F1 &= 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} \end{aligned} \]

Metric selection table

Task or risk	Prefer	Avoid over-relying on
Balanced classification	Accuracy, ROC AUC, F1	Accuracy alone if costs differ
Rare positive class	Precision, recall, F1, PR AUC	Accuracy and sometimes ROC AUC
False negatives are costly	Recall/sensitivity	Precision alone
False positives are costly	Precision	Recall alone
Probabilistic classification	Log loss, calibration	Only thresholded accuracy
Regression with large-error penalty	RMSE	MAE if large errors must be emphasized
Regression with robust typical error	MAE	RMSE if outliers dominate unfairly
Forecasting	MAE, RMSE, MAPE/sMAPE where valid	MAPE when actual values can be zero
Ranking/recommendation	NDCG, MAP, precision@k/recall@k	Generic classification accuracy
Clustering	Silhouette score, within-cluster sum of squares	Supervised metrics without labels

\[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]\[ MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| \]

Evaluation traps

Trap	Correct reasoning
“High accuracy” on imbalanced data	Check confusion matrix, recall, precision, F1, PR AUC
Tuning threshold on test set	Tune threshold on validation set; reserve test set
Random split for time series	Use chronological split
Preprocessing entire dataset before split	Fit transforms on train only, apply to validation/test
Comparing models with different test data	Use the same holdout or controlled cross-validation
Better offline metric but worse production	Investigate data drift, training-serving skew, latency/timeouts, feature freshness

Deployment and inference patterns

Inference mode decision matrix

Requirement	Choose	Why	Watch for
Low-latency request/response	SageMaker real-time endpoint	Persistent HTTPS endpoint	Scale and monitor latency/errors
Spiky or intermittent traffic	SageMaker Serverless Inference	No instance management	Cold start and workload suitability
Large payloads or long processing	SageMaker Asynchronous Inference	Queued async invocation, S3 output	Client does not wait synchronously
Offline batch scoring	SageMaker Batch Transform	Reads S3 input, writes S3 output	No always-on endpoint
Many tenant- or segment-specific models	Multi-model endpoint	Hosts multiple models behind one endpoint	Initial model load can add latency
Multiple containers in one endpoint	Multi-container endpoint	Direct or serial container invocation patterns	Not the same as multi-model hosting
Edge or disconnected inference	AWS IoT Greengrass or device runtime pattern	Local inference near data source	Model update and device security matter
Lightweight model behind app API	AWS Lambda plus API Gateway, if suitable	Simple serverless app integration	Not ideal for large models/heavy inference

Deployment controls

Control	Use for	Notes
Production variants	Traffic splitting across model variants	Supports A/B style testing
Shadow variant	Test new model on production traffic without serving its response	Useful before promotion
Canary/linear rollout pattern	Gradual production traffic shift	Pair with CloudWatch alarms and rollback
Auto scaling	Adjust endpoint capacity based on demand	Monitor latency, invocation volume, errors
Data capture	Store inference inputs/outputs in S3	Required for many monitoring workflows
Model Registry approval	Gate promotion to staging/prod	Supports governance and reproducibility
Inference Recommender	Evaluate hosting instance/config options	Use when unsure about performance/cost tradeoff

SageMaker inference container contract

Endpoint	Purpose
`/ping`	Health check
`/invocations`	Inference requests

Common container issues: wrong content type, missing dependencies, slow model load, model artifact path mismatch, container not listening correctly, memory exhaustion, or IAM denial when pulling ECR image or reading S3 artifact.

Generative AI and foundation model choices

Scenario	Prefer	Reasoning
Use managed foundation models through API	Amazon Bedrock	Avoids managing model infrastructure
Need guardrails for FM application behavior	Guardrails for Amazon Bedrock	Central control for safety and policy behavior
Need RAG over enterprise documents	Knowledge Bases for Amazon Bedrock or custom RAG stack	Retrieves current private context instead of retraining model for facts
Need agents that call tools/APIs	Agents for Amazon Bedrock	Orchestrates tasks with FM reasoning and actions
Need deploy/tune open or pretrained model in SageMaker environment	SageMaker JumpStart or SageMaker hosting	More control over model/container/VPC/MLOps
Need custom model architecture/training loop	SageMaker custom training	Full control, more engineering responsibility
Need semantic search	Embeddings + vector store such as Amazon OpenSearch Service/OpenSearch Serverless, Aurora PostgreSQL with vector support, or managed Bedrock knowledge base	Match text by meaning, not exact keywords

Prompt, RAG, fine-tuning, or training?

Need	Usually choose	Why
Change output format, tone, instructions	Prompt engineering	Fastest and lowest operational complexity
Use private or frequently changing facts	RAG	Keeps knowledge external and updateable
Improve behavior on repeated task pattern	Fine-tuning/customization where supported	Teaches task style or domain pattern
Add brand-new domain facts only	RAG first	Fine-tuning is not a reliable database
Build specialized model from scratch	Custom training	Highest cost/complexity; use only when necessary

MLOps and automation

Pipeline stages to recognize

Stage	SageMaker/AWS service fit	Key artifacts
Ingest	S3, Kinesis, Data Firehose, DMS	Raw data
Validate/profile	Glue Data Quality, SageMaker Processing, Data Wrangler	Data reports, constraints
Transform	Glue, EMR, SageMaker Processing	Curated dataset, features
Train	SageMaker Training	Model artifact, metrics
Tune	SageMaker automatic model tuning	Best training job, hyperparameters
Evaluate	SageMaker Processing or pipeline evaluation step	Evaluation report
Conditional gate	SageMaker Pipelines condition step	Pass/fail metric rule
Register	SageMaker Model Registry	Model package/version
Approve	Manual or automated approval workflow	Approval state
Deploy	SageMaker endpoint, Batch Transform, CI/CD pipeline	Endpoint or batch job
Monitor	Model Monitor, Clarify, CloudWatch	Drift reports, alarms
Retrain	EventBridge, Pipelines, Step Functions	New model version

SageMaker Pipelines pattern

Process raw data
  -> Train model
  -> Evaluate metrics
  -> If metric passes threshold:
         Register model package
         Optionally deploy to staging
     Else:
         Stop and record failure

CI/CD and governance distinctions

Need	Prefer	Notes
Version infrastructure	AWS CloudFormation or AWS CDK	Reproducible environments
Build/test custom containers	AWS CodeBuild + Amazon ECR	Scan and control images
Orchestrate release stages	AWS CodePipeline or equivalent CI/CD	Separate dev/test/prod
Trigger pipeline on data or approval event	Amazon EventBridge	Event-driven retraining/deployment
Human approval	CodePipeline approval, Step Functions, or registry approval process	Useful before production changes
Track experiments	SageMaker Experiments	Parameters, metrics, artifacts, lineage
Reproduce training	Pin code, image, dependencies, data version, hyperparameters, random seeds	Not just “rerun notebook”

Security, privacy, and governance

IAM and access patterns

Control	Exam-ready meaning
SageMaker execution role	Role assumed by SageMaker jobs/endpoints to access S3, ECR, CloudWatch, KMS, VPC resources
Least privilege	Restrict actions and resource ARNs, especially S3 prefixes and KMS keys
IAM user/role separation	Human identity starts jobs; execution role is used by managed service
Resource policies	S3 bucket policies, KMS key policies, ECR repository policies may also be required
Temporary credentials	Prefer IAM roles over long-lived static keys
Secrets Manager	Store database/API credentials; do not hardcode in notebooks or containers

Network and encryption controls

Requirement	Use	Notes
Encrypt S3 training data/artifacts	S3 server-side encryption with AWS KMS where required	Execution role needs KMS permissions
Encrypt training/inference volumes	KMS key options where supported	Include key policy permissions
Private training/inference network path	VPC configuration with private subnets/security groups	Ensure access to S3/ECR/CloudWatch through endpoints or controlled egress
No internet access from training container	Network isolation where appropriate	Container cannot fetch packages from internet
Private AWS service access	VPC endpoints/AWS PrivateLink where supported	Avoid public internet routes
Audit API calls	CloudTrail	Who changed endpoint, role, pipeline, bucket, key
Monitor logs/metrics	CloudWatch	Operational visibility
Detect sensitive data in S3	Macie	Complements, not replaces, access controls
Govern data lake permissions	AWS Lake Formation	Centralized lake permissions over cataloged data

Security traps

Trap	Correct answer direction
AccessDenied from training job despite user access	Check SageMaker execution role, bucket policy, KMS key policy
Private subnet job cannot pull image or read S3	Add required VPC endpoints or controlled NAT path
KMS-encrypted S3 object unreadable	Execution role needs both S3 and KMS decrypt permissions
Secret passed as plain environment variable	Use Secrets Manager or secure parameter retrieval
Public notebook or endpoint exposure	Use IAM, VPC, security groups, private access, and least privilege
Sensitive labeling data	Use private workforce and secure data access controls

Monitoring, observability, and troubleshooting

What to monitor

Layer	Tool/service	Signals
Endpoint operations	CloudWatch metrics/logs	Invocations, latency, errors, resource utilization
Training jobs	CloudWatch logs, SageMaker job status	Script errors, metric output, resource failures
API activity	CloudTrail	Create/update/delete endpoint, IAM, S3, KMS API calls
Input/output drift	SageMaker Model Monitor data quality	Feature distribution changes
Model performance	SageMaker Model Monitor model quality	Requires ground truth labels
Bias drift	SageMaker Clarify / Model Monitor integration	Bias metric changes over time
Explainability drift	Clarify feature attribution monitoring	Feature importance changes
Data capture	SageMaker endpoint data capture to S3	Inputs/outputs for monitoring and analysis

Troubleshooting decision table

Symptom	Likely checks
Training job cannot access data	Execution role, S3 URI, bucket policy, KMS key policy, VPC endpoint
Training job starts but algorithm fails	Input format, content type, channel names, hyperparameters, script error
Metrics not visible for tuning	Training script must emit metric matching tuning regex/definition
Endpoint creation fails	Model artifact path, container image, IAM/ECR access, model load errors
Endpoint returns 4xx	Request format, content type, authentication, payload schema
Endpoint returns 5xx	Container logs, model exception, memory/timeout, dependency error
Latency increases	Instance sizing, concurrency, autoscaling, payload size, cold starts, model size
Production accuracy drops	Data drift, label drift, feature skew, upstream schema change, stale features
Model Monitor has no quality report	Ground truth labels may be missing or delayed
Costs unexpectedly high	Idle endpoints/notebooks, overprovisioned instances, unnecessary always-on hosting
Batch job slow	Input sharding, data format, instance choice, transform strategy
Pipeline did not trigger	EventBridge rule, permissions, source event pattern, pipeline parameters

Cost-aware engineering choices

Cost pressure	Practical pattern
Idle development environments	Stop notebooks/Studio apps when unused; use lifecycle controls where appropriate
Always-on endpoint with rare traffic	Consider Serverless Inference, Asynchronous Inference, or Batch Transform
Many small models	Consider multi-model endpoints
Large recurring batch scoring	Use Batch Transform and right-size compute
Long training jobs	Use checkpoints; consider managed spot training where suitable
Overtraining	Use early stopping and sensible tuning search spaces
Duplicate feature computation	Reuse Feature Store and shared processing jobs
Unused artifacts/logs	Apply S3 lifecycle policies and retention controls
Inefficient data format	Prefer columnar/compressed formats such as Parquet for analytics workloads

Scenario shortcuts

If the stem says…	Likely answer	Why
“Run SQL on files in S3 without managing servers”	Athena + Glue Data Catalog	Serverless query over data lake
“Infer schema from new S3 data”	Glue crawler	Populates catalog metadata
“Large-scale ETL with serverless Spark”	AWS Glue	Managed ETL
“Need full Spark cluster configuration control”	EMR	More control than Glue
“Label images with human reviewers”	SageMaker Ground Truth	Managed labeling
“Avoid different feature code in training and inference”	SageMaker Feature Store	Reduces training-serving skew
“Train model reproducibly at scale”	SageMaker Training job	Managed, containerized, repeatable
“Find best hyperparameters automatically”	SageMaker automatic model tuning	Searches parameter space
“Track parameters, metrics, and artifacts”	SageMaker Experiments	Experiment lineage
“Approve model before production”	SageMaker Model Registry	Model package governance
“Deploy for millisecond-style request/response”	Real-time endpoint	Persistent inference
“Score millions of records nightly”	Batch Transform	Offline batch predictions
“Requests can take longer and response can be stored in S3”	Asynchronous Inference	Queued async processing
“Traffic is unpredictable and often idle”	Serverless Inference	No instance management
“Compare new model silently on production traffic”	Shadow variant	Does not affect user response
“Detect input feature distribution drift”	Model Monitor data quality	Baseline vs captured data
“Detect accuracy degradation after labels arrive”	Model Monitor model quality	Needs ground truth
“Who changed the endpoint configuration?”	CloudTrail	API audit
“Endpoint has high 5xx errors”	CloudWatch logs + container diagnostics	Operational troubleshooting
“Use foundation model without hosting it”	Amazon Bedrock	Managed FM API
“Add current company documents to FM answers”	RAG / Knowledge Bases for Amazon Bedrock	Retrieves external knowledge
“Sensitive S3 training data may contain PII”	Macie + IAM/KMS controls	Discovery plus protection
“Private training with no internet”	VPC config, endpoints, network isolation	Controlled network path

Final review checklist

Map every scenario to the lifecycle step: data, features, training, evaluation, deployment, monitoring, or governance.
Distinguish Athena vs Glue vs EMR, Pipelines vs Step Functions, and real-time vs async vs batch vs serverless inference.
For security questions, check execution role, S3 policy, KMS policy, VPC path, and CloudTrail.
For model quality questions, identify whether the issue is data quality, drift, bias, feature skew, evaluation metric choice, or deployment configuration.
For MLOps questions, prefer repeatable jobs, tracked artifacts, model registry approval, automated deployment, and monitoring-triggered retraining over manual notebook workflows.

Next step: use this Quick Reference as a drill sheet, then practice scenario questions that force you to choose the correct AWS service, deployment mode, monitoring control, or security fix for MLA-C01.

Scenario Guide

ML Data Prep