PMLE — Google Cloud Professional Machine Learning Engineer - 2026 Guide Exam Blueprint

Last revised: July 1, 2026

Practical exam blueprint for Google Cloud PMLE exam readiness.

How to Use This Exam Blueprint

Use this checklist as a practical readiness map for the Google Cloud Professional Machine Learning Engineer - 2026 Guide exam, code PMLE. It is organized around the skills a professional machine learning engineer is expected to apply on Google Cloud: framing ML problems, preparing data, training models, deploying solutions, operating ML systems, and making security, governance, cost, and reliability tradeoffs.

Do not treat this as an official scoring guide or a list of exact exam weights. Instead, use it to answer three questions:

Can I choose the right Google Cloud approach for a realistic ML scenario?
Can I explain the lifecycle from data to deployed, monitored model?
Can I identify weak designs, operational risks, and better alternatives?

For each section, mark topics as:

Status	Meaning
Ready	You can explain it, choose it in a scenario, and spot common traps.
Review	You understand the concept but hesitate on service selection or tradeoffs.
Practice	You need hands-on review, scenario drills, or architecture comparison.

Exam identity

Item	Checklist detail
Vendor/provider	Google Cloud
Official exam title	Google Cloud Professional Machine Learning Engineer - 2026 Guide
Exam code	PMLE
Page purpose	Independent exam blueprint and final-review support
Readiness focus	Applied ML engineering, Google Cloud service selection, MLOps, security, data workflows, deployment, monitoring, and troubleshooting

Topic-area readiness map

Readiness area	What you should be able to do	Ready?
ML problem framing	Translate business goals into supervised, unsupervised, forecasting, recommendation, ranking, anomaly detection, or generative AI approaches.	[ ]
Data sourcing and ingestion	Choose patterns for batch, streaming, structured, semi-structured, and unstructured data using Google Cloud services.	[ ]
Data quality and feature readiness	Detect leakage, skew, missing values, outliers, label quality issues, imbalance, and schema drift.	[ ]
Exploratory analysis	Use statistical summaries, distributions, correlation, segmentation, and visual checks to validate assumptions.	[ ]
Feature engineering	Select appropriate encoding, normalization, text/image/audio transformations, embeddings, feature crosses, and time-based features.	[ ]
Model selection	Match model families and Google Cloud options to problem type, data size, latency needs, interpretability, and operational constraints.	[ ]
Training architecture	Choose between AutoML, custom training, BigQuery ML, notebooks, distributed training, accelerators, and containerized jobs.	[ ]
Hyperparameter tuning	Explain tuning goals, search strategies, validation sets, early stopping, overfitting risks, and resource tradeoffs.	[ ]
Evaluation metrics	Choose metrics for classification, regression, ranking, recommendation, forecasting, anomaly detection, and generative AI evaluation.	[ ]
Vertex AI workflows	Understand how Vertex AI supports datasets, training, pipelines, model registry, endpoints, batch prediction, experiments, and monitoring.	[ ]
BigQuery ML	Recognize when in-database training and inference reduce data movement and simplify analytics-oriented ML workflows.	[ ]
Pipelines and MLOps	Design reproducible, versioned workflows with orchestration, metadata, testing, CI/CD, approvals, and rollback paths.	[ ]
Deployment patterns	Compare online prediction, batch prediction, embedded inference, streaming inference, and custom serving on managed compute.	[ ]
Monitoring and retraining	Monitor model performance, data drift, prediction skew, service health, latency, errors, and retraining triggers.	[ ]
Security and IAM	Apply least privilege, service accounts, data access controls, encryption choices, auditability, and private connectivity patterns.	[ ]
Governance and responsible AI	Address explainability, fairness, documentation, human review, privacy, safety, lineage, and model approval needs.	[ ]
Cost and performance	Balance training cost, serving latency, accelerator usage, data movement, batch windows, caching, and operational complexity.	[ ]
Troubleshooting	Diagnose failed training jobs, poor metrics, serving errors, data pipeline failures, skew, permissions, and scaling issues.	[ ]

Core PMLE readiness checklist

ML problem framing

You should be able to answer scenario questions such as: “What type of ML solution fits this business problem, and what are the risks?”

Scenario cue	Strong answer should consider
“Predict a numeric value”	Regression, baseline model, RMSE/MAE, outlier sensitivity, feature leakage.
“Classify user behavior”	Binary or multiclass classification, precision/recall tradeoff, class imbalance, threshold tuning.
“Detect rare fraud or defects”	Anomaly detection or imbalanced classification, recall importance, false-positive cost, alert review workflow.
“Forecast future demand”	Time-aware splits, seasonality, holidays/events, leakage prevention, forecast horizon, retraining cadence.
“Recommend products or content”	Candidate generation, ranking, implicit feedback, cold start, diversity, business rules, online evaluation.
“Search or match semantic meaning”	Embeddings, vector search, retrieval quality, latency, grounding data freshness.
“Generate text, code, or summaries”	Prompt design, grounding, evaluation, safety, hallucination risk, human review, data privacy.
“Explain why predictions happen”	Explainability, feature attribution, model choice, stakeholder requirements, auditability.

Checklist:

I can define the prediction target clearly.
I can identify whether labels exist and whether they are reliable.
I can distinguish correlation from causation in exam scenarios.
I can choose offline and online evaluation metrics aligned to business impact.
I can explain why a non-ML solution may be better when rules, SQL, or simple automation is enough.
I can identify when human-in-the-loop review is necessary.
I can describe failure modes before choosing a model.

Data readiness and feature engineering

Topic	What “ready” means
Data sources	You can identify whether data belongs in BigQuery, Cloud Storage, Pub/Sub, operational databases, or specialized stores based on access pattern and structure.
Batch ingestion	You can reason about scheduled loads, transformations, validation, lineage, and reproducibility.
Streaming ingestion	You can reason about event-time processing, late data, windowing, deduplication, and real-time features.
Data quality	You can detect missing values, invalid categories, inconsistent units, duplicates, bad labels, and outlier handling choices.
Schema management	You can explain why schema changes can break training, serving, and monitoring.
Train/validation/test splits	You can choose random, stratified, group-based, or time-based splits appropriately.
Feature leakage	You can spot features that would not be available at prediction time.
Feature skew	You can explain training-serving skew and how consistent preprocessing reduces it.
Feature stores	You can describe why reusable, versioned, point-in-time-correct features matter.
Embeddings	You can explain when embeddings help with text, images, recommendations, semantic search, or generative AI retrieval.

Can you do this?

Given a table of events, identify the label, entity, timestamp, features, and prediction time.
Explain why random splitting is risky for time-series or user-level grouped data.
Identify leakage from future aggregates, post-outcome fields, manually corrected labels, or target-derived columns.
Choose whether preprocessing belongs in SQL, Dataflow, pipeline components, training code, or serving code.
Explain the difference between data drift, concept drift, and prediction drift.
Describe how to validate data before starting expensive training jobs.
Identify when feature normalization, standardization, bucketization, encoding, or dimensionality reduction may help.

Google Cloud service selection checklist

High-level service decision map

If the scenario emphasizes…	Consider…	Watch for…
Fast managed model development with less custom code	Vertex AI AutoML capabilities	Need for custom architecture, special preprocessing, or explainability constraints.
Custom model code, frameworks, or containers	Vertex AI custom training	Container dependencies, accelerator needs, packaging, reproducibility.
Data already lives in BigQuery and model is analytics-oriented	BigQuery ML	Data movement, SQL-based workflow, supported model type, operational serving needs.
Repeatable ML workflows	Vertex AI Pipelines	Artifact tracking, component boundaries, parameterization, metadata, CI/CD.
Managed model deployment	Vertex AI endpoints or batch prediction	Latency, scaling, traffic splitting, monitoring, model versioning.
Large-scale batch transformation	Dataflow, BigQuery, or Dataproc depending on workload	Windowing, cost, pipeline complexity, team skill set.
Containerized custom APIs	Cloud Run or Google Kubernetes Engine	Operational burden, scaling behavior, networking, model loading time.
Streaming features or inference	Pub/Sub, Dataflow, online serving patterns	Event time, backpressure, state, latency, monitoring.
Semantic retrieval or RAG	Vertex AI, embeddings, vector search patterns, managed data stores	Grounding quality, freshness, access control, prompt injection risk.
Experiment tracking and reproducibility	Vertex AI Experiments, metadata, model registry patterns	Missing lineage, untracked parameters, unversioned datasets.

Storage and data platform checks

Data/workload pattern	Readiness prompts
Analytical warehouse	Can you explain when BigQuery is a good fit for feature creation, large-scale analysis, and BigQuery ML?
Object storage	Can you explain when Cloud Storage is appropriate for raw data, training files, model artifacts, images, audio, and exports?
Streaming events	Can you reason about Pub/Sub and downstream processing for near-real-time ML features or predictions?
Operational serving data	Can you identify when low-latency application databases or caches are needed alongside ML services?
Sensitive data	Can you apply IAM, encryption choices, data minimization, masking, and audit logging concepts?
Cross-project access	Can you reason about service accounts, project boundaries, shared datasets, and least privilege?

Model development and training readiness

Model selection prompts

Problem	Candidate model families or approaches	Exam-style tradeoffs
Binary classification	Logistic regression, tree-based models, neural networks, AutoML	Interpretability, threshold tuning, imbalance, calibration.
Multiclass classification	Softmax models, boosted trees, deep learning, AutoML	Confusion among similar classes, class imbalance, label quality.
Regression	Linear models, tree-based models, neural networks	Outliers, metric choice, feature scaling, prediction intervals.
Forecasting	Time-series models, feature-based regression, managed forecasting options	Leakage, seasonality, horizon, backtesting.
Recommendations	Collaborative filtering, matrix factorization, ranking models, embeddings	Cold start, feedback loops, diversity, business rules.
Anomaly detection	Statistical thresholds, unsupervised models, supervised rare-event models	False positives, drift, alert fatigue, label scarcity.
Computer vision	AutoML vision workflows, custom deep learning, transfer learning	Data volume, labeling, augmentation, serving latency.
NLP	Text classification, embeddings, sequence models, generative AI	Tokenization, context limits, grounding, bias, privacy.
Generative AI	Prompting, retrieval-augmented generation, tuning, evaluation workflows	Hallucination, safety, data leakage, cost, latency.

Training architecture checklist

I can decide whether to use managed AutoML, BigQuery ML, or custom training.
I can describe the artifacts needed for custom training: source code, dependencies, container image, training data reference, output model location, metrics.
I can explain why containers improve reproducibility.
I can choose CPU, GPU, or TPU-style acceleration conceptually based on workload type.
I can explain distributed training tradeoffs without assuming it always improves performance.
I can identify when hyperparameter tuning is worth the additional cost.
I can design train/validation/test separation for reliable evaluation.
I can explain early stopping, regularization, dropout, pruning, and model complexity controls.
I can recognize overfitting, underfitting, high bias, and high variance from learning curves.
I can explain why reproducibility requires versioned data, code, parameters, environment, and random seeds where applicable.

Evaluation metric checks

Know when each metric is appropriate.

Metric area	Use when…	Watch for…
Accuracy	Classes are balanced and error costs are similar.	Misleading for rare-event detection.
Precision	False positives are expensive.	High precision can miss many true positives.
Recall	False negatives are expensive.	High recall can create alert fatigue.
F1 score	Need a balance of precision and recall.	May hide business-specific error costs.
ROC-AUC	Ranking binary predictions across thresholds.	Can look optimistic with severe imbalance.
PR-AUC	Positive class is rare and precision/recall matter.	More informative than ROC in many imbalanced cases.
RMSE	Large regression errors should be penalized strongly.	Sensitive to outliers.
MAE	Need average absolute error that is easier to interpret.	Does not emphasize large errors as strongly.
MAPE	Percentage error is meaningful.	Problematic near zero actual values.
Log loss	Probabilistic classification quality matters.	Penalizes overconfident wrong predictions.
Ranking metrics	Search, recommendation, ordered result quality.	Position, diversity, and business constraints matter.
Forecast backtesting	Time-series validation across historical windows.	Random splits can leak future information.
Generative AI evaluation	Output quality, groundedness, safety, relevance, factuality.	Human evaluation and domain criteria may be needed.

Key formulas to recognize:

\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall} = \frac{TP}{TP + FN} \]\[ \text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]

Can you do this?

Given a confusion matrix, compute precision and recall.
Choose a threshold based on business costs.
Explain why improving offline AUC may not improve production value.
Identify the right metric for imbalanced classification.
Explain why forecast evaluation must respect time order.
Compare two models using both quality and operational constraints.

BigQuery ML readiness

BigQuery ML is often relevant when data is already in BigQuery and the team wants SQL-based model development or inference.

You should be able to recognize patterns like:

CREATE OR REPLACE MODEL `project.dataset.model_name`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['label']
) AS
SELECT
  label,
  feature_1,
  feature_2,
  feature_3
FROM `project.dataset.training_table`;

And evaluation patterns like:

SELECT *
FROM ML.EVALUATE(
  MODEL `project.dataset.model_name`,
  TABLE `project.dataset.eval_table`
);

Checklist:

I can explain when BigQuery ML reduces data movement.
I can identify when SQL-based feature engineering is sufficient.
I can distinguish training, evaluation, prediction, and explainability-style workflows conceptually.
I can recognize that model choice must match the problem type and data shape.
I can explain when a model trained in BigQuery may still need an operational serving strategy.
I can reason about access control for datasets, models, and prediction outputs.
I can identify when exporting artifacts or integrating with Vertex AI workflows may be useful.

Vertex AI readiness

Vertex AI lifecycle areas

Area	What to know
Datasets	How managed datasets can organize data for training and evaluation workflows.
Training	Difference between managed AutoML-style training and custom training jobs.
Workbench/notebooks	How notebooks fit experimentation, not as the only production workflow.
Experiments	Why parameter, metric, and artifact tracking matter.
Pipelines	How reusable components create reproducible end-to-end workflows.
Metadata	Why lineage supports debugging, governance, and reproducibility.
Model Registry	How versioned models support approval, deployment, rollback, and auditability.
Endpoints	How online prediction requires scaling, latency, traffic management, and monitoring decisions.
Batch prediction	When offline scoring is better than real-time serving.
Monitoring	How to detect drift, skew, data quality issues, prediction changes, and service health concerns.
Explainability	When feature attribution or explanation support is needed for trust, debugging, or review.

Vertex AI scenario checks

Scenario	Better reasoning
“The team wants minimal ML code and a managed workflow.”	Consider managed training options, but verify data type, customization needs, and deployment requirements.
“The model needs a custom TensorFlow/PyTorch/scikit-learn architecture.”	Consider custom training with a reproducible environment and tracked artifacts.
“Training succeeds locally but fails in the cloud.”	Check dependencies, container image, permissions, paths, data access, region/project configuration, and resource assumptions.
“A model version performs worse after deployment.”	Check data skew, traffic split, metric definitions, feature pipeline changes, and rollback options.
“A pipeline step is nondeterministic.”	Check random seeds, versioned inputs, container versions, external data dependencies, and metadata tracking.
“Business requires approval before production.”	Use model registry, evaluation gates, documentation, access controls, and deployment controls.

MLOps and pipeline readiness

End-to-end workflow

    flowchart LR
	    A[Define ML objective] --> B[Ingest and validate data]
	    B --> C[Engineer features]
	    C --> D[Train model]
	    D --> E[Evaluate and compare]
	    E --> F{Meets release criteria?}
	    F -- No --> C
	    F -- Yes --> G[Register model]
	    G --> H[Deploy or batch score]
	    H --> I[Monitor data, model, and service]
	    I --> J{Retrain or rollback?}
	    J -- Retrain --> B
	    J -- Rollback --> G

Pipeline readiness checklist

I can break an ML workflow into reusable components.
I can explain the difference between orchestration and model training.
I can identify pipeline inputs, outputs, artifacts, and parameters.
I can explain why each component should be idempotent where practical.
I can describe how metadata helps reproduce a model.
I can design evaluation gates before deployment.
I can explain CI/CD for ML as more than code deployment: it includes data, model, and pipeline validation.
I can compare manual, scheduled, event-driven, and performance-triggered retraining.
I can identify rollback requirements for online and batch prediction.
I can explain why production ML needs monitoring beyond infrastructure metrics.

Testing checklist for ML systems

Test type	What to validate
Unit tests	Feature functions, parsing logic, custom transforms, metric calculations.
Data validation	Schema, ranges, nulls, categories, duplication, label quality, freshness.
Training tests	Job starts, reads data, writes artifacts, logs metrics, handles small sample runs.
Evaluation tests	Metrics are computed consistently and compared against baselines.
Pipeline tests	Components pass artifacts correctly and fail safely.
Serving tests	Prediction format, latency, error handling, model loading, response schema.
Security tests	Service account permissions, secret handling, data access boundaries.
Monitoring tests	Alerts fire for drift, skew, errors, latency, and failed jobs.

Deployment and serving readiness

Serving pattern decision table

Pattern	Use when…	Key tradeoffs
Online prediction endpoint	Applications need low-latency predictions.	Scaling, latency, request format, traffic split, monitoring.
Batch prediction	Predictions can be generated on a schedule or for large datasets.	Batch windows, storage outputs, downstream consumption.
In-database prediction	Analytics users score data already in BigQuery.	SQL workflow, model support, integration with reporting.
Streaming inference	Events need rapid scoring as they arrive.	Pipeline complexity, state, backpressure, error handling.
Custom serving container	Need custom preprocessing, model server, or runtime.	More responsibility for packaging, dependencies, health checks.
Application-embedded model	Small model close to application logic.	Versioning, rollout, monitoring, and update complexity.
Edge or client-side inference	Need offline, low-latency, or privacy-sensitive local inference.	Model size, update strategy, device variability.

Deployment checklist

I can choose online vs batch prediction based on latency and business process.
I can explain why training-time preprocessing must match serving-time preprocessing.
I can design model versioning and rollback.
I can explain canary, blue/green, and traffic-splitting concepts.
I can identify request/response schema risks.
I can monitor prediction latency, error rate, throughput, and resource utilization.
I can explain how autoscaling and cold starts may affect user experience.
I can choose when custom containers are justified.
I can identify how to secure prediction endpoints.
I can describe how batch outputs should be validated before downstream use.

Monitoring, troubleshooting, and retraining

What to monitor

Monitoring area	Examples of questions to answer
Data quality	Are incoming features missing, invalid, stale, or outside expected ranges?
Training-serving skew	Do production feature distributions differ from training distributions?
Data drift	Has input data changed meaningfully since training?
Concept drift	Has the relationship between features and labels changed?
Prediction drift	Are predictions shifting in unexpected ways?
Model quality	Are delayed labels showing degraded accuracy, precision, recall, or business KPIs?
Service health	Are latency, error rates, saturation, and availability acceptable?
Cost	Did training, inference, storage, or data processing costs change unexpectedly?
Fairness and safety	Are outcomes or generated outputs creating unacceptable risk for user groups or use cases?

Troubleshooting scenarios

Symptom	Likely checks
Training job cannot read data	IAM permissions, service account, project/dataset access, path, network restrictions.
Training job fails after dependency install	Container image, package versions, runtime mismatch, missing system libraries.
Model overfits	More validation, regularization, simpler model, more data, feature review, early stopping.
Offline metrics are good but production is poor	Leakage, skew, wrong preprocessing, changed data, label delay, wrong threshold, metric mismatch.
Prediction endpoint returns errors	Input schema, model signature, container health, permissions, resource limits, dependency loading.
Latency is too high	Model size, preprocessing cost, cold start, accelerator choice, batching, caching, endpoint scaling.
Batch prediction output is incomplete	Input format, failed records, permissions, output location, quota/resource constraints.
Monitoring alerts are noisy	Thresholds, seasonality, alert grouping, baseline choice, business relevance.
Retraining makes model worse	Bad new data, label delay, drift misdiagnosis, changed objective, missing evaluation gate.

Can you do this?

Identify whether a failure is data, model, pipeline, infrastructure, or permission related.
Explain why delayed ground truth affects monitoring.
Define a retraining trigger that is not just “run every day.”
Choose rollback when retraining produces a worse model.
Distinguish model drift from a temporary business event.
Explain why monitoring generated AI outputs may require human or domain-specific evaluation.

Security, privacy, and governance checklist

IAM and access control

Topic	Readiness expectation
Least privilege	Grant only the permissions needed for data access, training, deployment, and monitoring.
Service accounts	Use workload-specific service accounts instead of broad user credentials.
Project boundaries	Understand how projects separate environments, teams, billing, and access.
Dataset access	Control who can view raw data, transformed data, labels, predictions, and model outputs.
Model access	Treat models and embeddings as sensitive when they may reveal training data or business logic.
Secrets	Store secrets outside code and notebooks.
Audit logging	Know why audit trails matter for sensitive ML workflows.
Private connectivity	Recognize when private network paths reduce exposure.

Data protection and responsible AI

I can identify personally identifiable, confidential, regulated, or business-sensitive data in an ML pipeline.
I can explain data minimization and purpose limitation in practical terms.
I can describe encryption at rest and in transit conceptually.
I can reason about customer-managed encryption keys when stronger key control is required.
I can explain when de-identification, masking, tokenization, or aggregation may be needed.
I can describe how lineage supports auditability.
I can explain why fairness checks require both metric review and context.
I can identify when explainability is required for trust, debugging, or approval.
I can explain human review for high-impact predictions.
I can recognize prompt injection, data exfiltration, and unsafe output risks in generative AI systems.

Generative AI and retrieval-augmented workflows

For PMLE preparation, be ready to reason about modern ML engineering patterns that include generative AI, embeddings, and retrieval workflows on Google Cloud.

Generative AI decision checks

Requirement	Consider	Watch for
Summarize or draft text from trusted documents	Retrieval-augmented generation	Document freshness, access control, hallucination, citation quality.
Semantic search	Embeddings and vector search patterns	Embedding quality, indexing strategy, latency, relevance evaluation.
Domain-specific language	Prompt engineering, grounding, tuning, or custom model approach	Cost, data sensitivity, evaluation complexity.
Safer output	Safety filters, constraints, human review, evaluation sets	Overblocking, underblocking, ambiguous policy rules.
Reduce hallucination	Grounding, retrieval, constrained responses, verification	Source quality and prompt injection.
Improve consistency	Prompt templates, examples, evaluation harnesses	Overfitting to narrow examples.
Protect private data	Access controls, data minimization, logging review	Sensitive prompts, stored outputs, embeddings leakage.

RAG readiness checklist

I can describe the difference between model knowledge and retrieved context.
I can outline document ingestion, chunking, embedding, indexing, retrieval, prompt assembly, generation, and evaluation.
I can explain why chunk size and overlap affect retrieval quality.
I can identify stale or unauthorized documents as RAG risks.
I can define evaluation criteria: relevance, groundedness, factuality, completeness, safety, and latency.
I can reason about access control at retrieval time, not just at ingestion time.
I can identify prompt injection risks from retrieved documents.
I can explain why human review may be needed for high-impact generated content.

Cost, reliability, and performance tradeoffs

Design choice	Cost/performance questions
AutoML vs custom training	Is the productivity gain worth less control? Does the model need special architecture or preprocessing?
CPU vs accelerator	Does the workload benefit enough from acceleration to justify complexity and cost?
Large model vs smaller model	Is quality improvement worth latency, memory, and serving cost?
Online vs batch	Does the business truly need real-time predictions?
Frequent retraining	Is retraining driven by measurable drift or business need?
Feature store adoption	Does reuse, consistency, and online/offline parity justify added operational design?
Streaming pipeline	Is near-real-time value worth complexity versus scheduled batch?
Custom serving	Is flexibility worth added operational responsibility?
Data movement	Can training or inference happen where the data already resides?
Monitoring depth	Are alerts actionable and aligned to business risk?

Can you do this?

Choose a simpler architecture when complexity is not justified.
Identify hidden costs from large-scale experimentation.
Explain why reducing model size may improve reliability.
Compare latency, throughput, and cost in serving decisions.
Design batch workflows that meet business deadlines without online serving.
Identify when caching, batching, or asynchronous processing is appropriate.
Explain how failed pipelines can create downstream business risk.

Architecture scenario drills

Use these prompts for final review. For each one, state the ML approach, Google Cloud services, data path, evaluation metric, deployment pattern, monitoring plan, and security controls.

Scenario 1: Fraud detection

Checklist:

Define prediction target and label delay.
Address class imbalance.
Choose precision/recall tradeoff based on investigation capacity.
Prevent leakage from post-transaction fields.
Use time-aware validation.
Design online or near-real-time scoring if required.
Monitor drift, false positives, false negatives, and alert fatigue.
Secure sensitive transaction and user data.

Scenario 2: Demand forecasting

Checklist:

Use historical time windows correctly.
Avoid random split leakage.
Add seasonality, calendar, price, promotion, and inventory features when available.
Choose forecast horizon and granularity.
Evaluate with backtesting.
Decide batch prediction schedule.
Monitor forecast error by segment.
Handle new products or sparse history.

Scenario 3: Product recommendation

Checklist:

Identify explicit and implicit feedback.
Handle cold-start users and items.
Separate candidate generation from ranking if needed.
Include business constraints such as availability or diversity.
Evaluate offline and online behavior carefully.
Watch for feedback loops and popularity bias.
Decide batch vs online updates.
Monitor click-through, conversion, user satisfaction, and fairness concerns.

Scenario 4: Image classification

Checklist:

Validate label quality and class balance.
Decide managed AutoML vs custom model.
Use train/validation/test split without duplicate leakage.
Consider augmentation and transfer learning.
Evaluate per-class metrics.
Choose deployment pattern based on latency and device needs.
Monitor image distribution changes.
Secure stored images and model outputs.

Scenario 5: Generative AI support assistant

Checklist:

Decide prompt-only, RAG, tuning, or custom model approach.
Identify source documents and access controls.
Design retrieval, grounding, and citation behavior.
Evaluate factuality, relevance, safety, and refusal behavior.
Protect sensitive user prompts and retrieved content.
Monitor hallucinations, unsafe outputs, latency, and cost.
Include human escalation for high-risk answers.
Plan document refresh and evaluation updates.

Common weak areas and traps

Trap	Why it hurts exam performance	Better habit
Choosing the most advanced model by default	PMLE scenarios often reward fit-for-purpose design.	Start with business goal, data, constraints, and baseline.
Ignoring leakage	Leakage creates unrealistic metrics and poor production performance.	Ask: “Would this feature exist at prediction time?”
Using random splits for time-series data	Future information can leak into training.	Use time-aware validation and backtesting.
Optimizing accuracy on imbalanced data	Accuracy can hide failure on rare but important cases.	Use precision, recall, PR-AUC, cost-based thresholds.
Treating notebooks as production pipelines	Notebooks alone often lack reproducibility and controls.	Move repeatable steps into versioned pipelines.
Forgetting training-serving skew	Different preprocessing breaks production behavior.	Reuse transformations and validate production inputs.
Monitoring only CPU and memory	ML systems fail through data and behavior changes too.	Monitor data, predictions, quality, and business metrics.
Retraining automatically without gates	Bad new data can make the model worse.	Use validation, approval, and rollback criteria.
Granting broad permissions	Overly broad IAM increases risk.	Use least privilege and workload-specific service accounts.
Moving large data unnecessarily	Data movement adds cost, latency, and complexity.	Train or score near the data when practical.
Ignoring delayed labels	Quality monitoring may lag behind production changes.	Use proxy metrics plus delayed ground-truth evaluation.
Assuming generative AI is always the answer	It may add cost, latency, and risk.	Use rules, search, retrieval, or smaller models when sufficient.
Skipping human review	Some outputs or predictions carry high impact.	Add review, escalation, and documentation where needed.

“Can you do this?” final skill checklist

Design and architecture

I can design an end-to-end ML solution on Google Cloud from raw data to monitored predictions.
I can justify service choices without relying on memorized product names only.
I can compare AutoML, custom training, and BigQuery ML for a scenario.
I can choose batch, streaming, or online serving patterns.
I can identify the simplest architecture that satisfies requirements.
I can include security, governance, monitoring, and cost controls in the design.

Data and modeling

I can detect leakage, skew, drift, imbalance, and bad labels.
I can choose the right split strategy.
I can select metrics aligned with business risk.
I can explain overfitting, underfitting, bias, variance, and regularization.
I can reason about feature engineering for tabular, text, image, time-series, and event data.
I can explain when embeddings are useful.

MLOps

I can describe pipeline components and artifacts.
I can version data, code, parameters, models, and environments conceptually.
I can design evaluation gates before deployment.
I can plan rollback and retraining.
I can monitor both infrastructure and model behavior.
I can troubleshoot failures across data, code, IAM, serving, and monitoring.

Security and responsible AI

I can apply least privilege to ML workflows.
I can identify sensitive data in training and prediction flows.
I can explain auditability, lineage, and model documentation.
I can reason about explainability and fairness requirements.
I can identify prompt injection, hallucination, unsafe output, and data leakage risks.
I can add human review where automation alone is risky.

Final-week review checklist

Use this section to focus review time before the exam.

7 to 5 days out

Review Google Cloud ML service selection: Vertex AI, BigQuery ML, Dataflow, Pub/Sub, Cloud Storage, BigQuery, Cloud Run, Google Kubernetes Engine, and supporting security services.
Rebuild your mental map of the ML lifecycle: data, features, training, evaluation, registry, deployment, monitoring, retraining.
Drill metric selection for classification, regression, forecasting, ranking, and generative AI.
Practice spotting leakage and skew in short scenarios.
Review IAM and service account patterns for ML workflows.
Summarize when to use online prediction, batch prediction, and streaming inference.

4 to 2 days out

Work through architecture scenarios without notes.
For each missed question, classify the miss: service selection, metric choice, data issue, MLOps, security, or troubleshooting.
Review common traps, especially overfitting, class imbalance, time leakage, and overbroad permissions.
Practice explaining tradeoffs in one or two sentences.
Review model monitoring and retraining decision points.
Review generative AI/RAG risks: grounding, evaluation, safety, prompt injection, and access control.

Final 24 hours

Review your weakest service-selection tables.
Recheck metric formulas and threshold tradeoffs.
Review the difference between data drift, concept drift, prediction drift, and training-serving skew.
Review pipeline artifact flow and rollback logic.
Review security defaults: least privilege, service accounts, sensitive data handling, auditability.
Avoid cramming obscure limits or quotas unless your own study materials require them.
Rest enough to read scenarios carefully and avoid rushing.

Practical next step

Pick one weak area from the readiness map and turn it into a short drill: write the problem type, data source, Google Cloud services, model approach, metric, deployment pattern, monitoring plan, and security controls. Then compare your answer against the checklist above and repeat with a different scenario until your choices feel deliberate rather than memorized.

Study Plan

Scenario Guide

PMLE — Google Cloud Professional Machine Learning Engineer - 2026 Guide Exam Blueprint

How to Use This Exam Blueprint

Exam identity

Topic-area readiness map

Core PMLE readiness checklist

ML problem framing

Data readiness and feature engineering

Google Cloud service selection checklist

High-level service decision map

Storage and data platform checks

Model development and training readiness

Model selection prompts

Training architecture checklist

Evaluation metric checks

BigQuery ML readiness

Vertex AI readiness

Vertex AI lifecycle areas

Vertex AI scenario checks

MLOps and pipeline readiness

End-to-end workflow

Pipeline readiness checklist

Testing checklist for ML systems

Deployment and serving readiness

Serving pattern decision table

Deployment checklist

Monitoring, troubleshooting, and retraining

What to monitor

Troubleshooting scenarios

Security, privacy, and governance checklist

IAM and access control

Data protection and responsible AI

Generative AI and retrieval-augmented workflows

Generative AI decision checks

RAG readiness checklist

Cost, reliability, and performance tradeoffs

Architecture scenario drills

Scenario 1: Fraud detection

Scenario 2: Demand forecasting

Scenario 3: Product recommendation

Scenario 4: Image classification

Scenario 5: Generative AI support assistant

Common weak areas and traps

“Can you do this?” final skill checklist

Design and architecture

Data and modeling

MLOps

Security and responsible AI

Final-week review checklist

7 to 5 days out

4 to 2 days out

Final 24 hours

Practical next step

Browse Certification Practice Tests by Exam Family