PMLE — Google Cloud Professional Machine Learning Engineer - 2026 Guide Exam Blueprint

Practical exam blueprint for Google Cloud PMLE exam readiness.

How to Use This Exam Blueprint

Use this checklist as a practical readiness map for the Google Cloud Professional Machine Learning Engineer - 2026 Guide exam, code PMLE. It is organized around the skills a professional machine learning engineer is expected to apply on Google Cloud: framing ML problems, preparing data, training models, deploying solutions, operating ML systems, and making security, governance, cost, and reliability tradeoffs.

Do not treat this as an official scoring guide or a list of exact exam weights. Instead, use it to answer three questions:

  1. Can I choose the right Google Cloud approach for a realistic ML scenario?
  2. Can I explain the lifecycle from data to deployed, monitored model?
  3. Can I identify weak designs, operational risks, and better alternatives?

For each section, mark topics as:

StatusMeaning
ReadyYou can explain it, choose it in a scenario, and spot common traps.
ReviewYou understand the concept but hesitate on service selection or tradeoffs.
PracticeYou need hands-on review, scenario drills, or architecture comparison.

Exam identity

ItemChecklist detail
Vendor/providerGoogle Cloud
Official exam titleGoogle Cloud Professional Machine Learning Engineer - 2026 Guide
Exam codePMLE
Page purposeIndependent exam blueprint and final-review support
Readiness focusApplied ML engineering, Google Cloud service selection, MLOps, security, data workflows, deployment, monitoring, and troubleshooting

Topic-area readiness map

Readiness areaWhat you should be able to doReady?
ML problem framingTranslate business goals into supervised, unsupervised, forecasting, recommendation, ranking, anomaly detection, or generative AI approaches.[ ]
Data sourcing and ingestionChoose patterns for batch, streaming, structured, semi-structured, and unstructured data using Google Cloud services.[ ]
Data quality and feature readinessDetect leakage, skew, missing values, outliers, label quality issues, imbalance, and schema drift.[ ]
Exploratory analysisUse statistical summaries, distributions, correlation, segmentation, and visual checks to validate assumptions.[ ]
Feature engineeringSelect appropriate encoding, normalization, text/image/audio transformations, embeddings, feature crosses, and time-based features.[ ]
Model selectionMatch model families and Google Cloud options to problem type, data size, latency needs, interpretability, and operational constraints.[ ]
Training architectureChoose between AutoML, custom training, BigQuery ML, notebooks, distributed training, accelerators, and containerized jobs.[ ]
Hyperparameter tuningExplain tuning goals, search strategies, validation sets, early stopping, overfitting risks, and resource tradeoffs.[ ]
Evaluation metricsChoose metrics for classification, regression, ranking, recommendation, forecasting, anomaly detection, and generative AI evaluation.[ ]
Vertex AI workflowsUnderstand how Vertex AI supports datasets, training, pipelines, model registry, endpoints, batch prediction, experiments, and monitoring.[ ]
BigQuery MLRecognize when in-database training and inference reduce data movement and simplify analytics-oriented ML workflows.[ ]
Pipelines and MLOpsDesign reproducible, versioned workflows with orchestration, metadata, testing, CI/CD, approvals, and rollback paths.[ ]
Deployment patternsCompare online prediction, batch prediction, embedded inference, streaming inference, and custom serving on managed compute.[ ]
Monitoring and retrainingMonitor model performance, data drift, prediction skew, service health, latency, errors, and retraining triggers.[ ]
Security and IAMApply least privilege, service accounts, data access controls, encryption choices, auditability, and private connectivity patterns.[ ]
Governance and responsible AIAddress explainability, fairness, documentation, human review, privacy, safety, lineage, and model approval needs.[ ]
Cost and performanceBalance training cost, serving latency, accelerator usage, data movement, batch windows, caching, and operational complexity.[ ]
TroubleshootingDiagnose failed training jobs, poor metrics, serving errors, data pipeline failures, skew, permissions, and scaling issues.[ ]

Core PMLE readiness checklist

ML problem framing

You should be able to answer scenario questions such as: “What type of ML solution fits this business problem, and what are the risks?”

Scenario cueStrong answer should consider
“Predict a numeric value”Regression, baseline model, RMSE/MAE, outlier sensitivity, feature leakage.
“Classify user behavior”Binary or multiclass classification, precision/recall tradeoff, class imbalance, threshold tuning.
“Detect rare fraud or defects”Anomaly detection or imbalanced classification, recall importance, false-positive cost, alert review workflow.
“Forecast future demand”Time-aware splits, seasonality, holidays/events, leakage prevention, forecast horizon, retraining cadence.
“Recommend products or content”Candidate generation, ranking, implicit feedback, cold start, diversity, business rules, online evaluation.
“Search or match semantic meaning”Embeddings, vector search, retrieval quality, latency, grounding data freshness.
“Generate text, code, or summaries”Prompt design, grounding, evaluation, safety, hallucination risk, human review, data privacy.
“Explain why predictions happen”Explainability, feature attribution, model choice, stakeholder requirements, auditability.

Checklist:

  • I can define the prediction target clearly.
  • I can identify whether labels exist and whether they are reliable.
  • I can distinguish correlation from causation in exam scenarios.
  • I can choose offline and online evaluation metrics aligned to business impact.
  • I can explain why a non-ML solution may be better when rules, SQL, or simple automation is enough.
  • I can identify when human-in-the-loop review is necessary.
  • I can describe failure modes before choosing a model.

Data readiness and feature engineering

TopicWhat “ready” means
Data sourcesYou can identify whether data belongs in BigQuery, Cloud Storage, Pub/Sub, operational databases, or specialized stores based on access pattern and structure.
Batch ingestionYou can reason about scheduled loads, transformations, validation, lineage, and reproducibility.
Streaming ingestionYou can reason about event-time processing, late data, windowing, deduplication, and real-time features.
Data qualityYou can detect missing values, invalid categories, inconsistent units, duplicates, bad labels, and outlier handling choices.
Schema managementYou can explain why schema changes can break training, serving, and monitoring.
Train/validation/test splitsYou can choose random, stratified, group-based, or time-based splits appropriately.
Feature leakageYou can spot features that would not be available at prediction time.
Feature skewYou can explain training-serving skew and how consistent preprocessing reduces it.
Feature storesYou can describe why reusable, versioned, point-in-time-correct features matter.
EmbeddingsYou can explain when embeddings help with text, images, recommendations, semantic search, or generative AI retrieval.

Can you do this?

  • Given a table of events, identify the label, entity, timestamp, features, and prediction time.
  • Explain why random splitting is risky for time-series or user-level grouped data.
  • Identify leakage from future aggregates, post-outcome fields, manually corrected labels, or target-derived columns.
  • Choose whether preprocessing belongs in SQL, Dataflow, pipeline components, training code, or serving code.
  • Explain the difference between data drift, concept drift, and prediction drift.
  • Describe how to validate data before starting expensive training jobs.
  • Identify when feature normalization, standardization, bucketization, encoding, or dimensionality reduction may help.

Google Cloud service selection checklist

High-level service decision map

If the scenario emphasizes…Consider…Watch for…
Fast managed model development with less custom codeVertex AI AutoML capabilitiesNeed for custom architecture, special preprocessing, or explainability constraints.
Custom model code, frameworks, or containersVertex AI custom trainingContainer dependencies, accelerator needs, packaging, reproducibility.
Data already lives in BigQuery and model is analytics-orientedBigQuery MLData movement, SQL-based workflow, supported model type, operational serving needs.
Repeatable ML workflowsVertex AI PipelinesArtifact tracking, component boundaries, parameterization, metadata, CI/CD.
Managed model deploymentVertex AI endpoints or batch predictionLatency, scaling, traffic splitting, monitoring, model versioning.
Large-scale batch transformationDataflow, BigQuery, or Dataproc depending on workloadWindowing, cost, pipeline complexity, team skill set.
Containerized custom APIsCloud Run or Google Kubernetes EngineOperational burden, scaling behavior, networking, model loading time.
Streaming features or inferencePub/Sub, Dataflow, online serving patternsEvent time, backpressure, state, latency, monitoring.
Semantic retrieval or RAGVertex AI, embeddings, vector search patterns, managed data storesGrounding quality, freshness, access control, prompt injection risk.
Experiment tracking and reproducibilityVertex AI Experiments, metadata, model registry patternsMissing lineage, untracked parameters, unversioned datasets.

Storage and data platform checks

Data/workload patternReadiness prompts
Analytical warehouseCan you explain when BigQuery is a good fit for feature creation, large-scale analysis, and BigQuery ML?
Object storageCan you explain when Cloud Storage is appropriate for raw data, training files, model artifacts, images, audio, and exports?
Streaming eventsCan you reason about Pub/Sub and downstream processing for near-real-time ML features or predictions?
Operational serving dataCan you identify when low-latency application databases or caches are needed alongside ML services?
Sensitive dataCan you apply IAM, encryption choices, data minimization, masking, and audit logging concepts?
Cross-project accessCan you reason about service accounts, project boundaries, shared datasets, and least privilege?

Model development and training readiness

Model selection prompts

ProblemCandidate model families or approachesExam-style tradeoffs
Binary classificationLogistic regression, tree-based models, neural networks, AutoMLInterpretability, threshold tuning, imbalance, calibration.
Multiclass classificationSoftmax models, boosted trees, deep learning, AutoMLConfusion among similar classes, class imbalance, label quality.
RegressionLinear models, tree-based models, neural networksOutliers, metric choice, feature scaling, prediction intervals.
ForecastingTime-series models, feature-based regression, managed forecasting optionsLeakage, seasonality, horizon, backtesting.
RecommendationsCollaborative filtering, matrix factorization, ranking models, embeddingsCold start, feedback loops, diversity, business rules.
Anomaly detectionStatistical thresholds, unsupervised models, supervised rare-event modelsFalse positives, drift, alert fatigue, label scarcity.
Computer visionAutoML vision workflows, custom deep learning, transfer learningData volume, labeling, augmentation, serving latency.
NLPText classification, embeddings, sequence models, generative AITokenization, context limits, grounding, bias, privacy.
Generative AIPrompting, retrieval-augmented generation, tuning, evaluation workflowsHallucination, safety, data leakage, cost, latency.

Training architecture checklist

  • I can decide whether to use managed AutoML, BigQuery ML, or custom training.
  • I can describe the artifacts needed for custom training: source code, dependencies, container image, training data reference, output model location, metrics.
  • I can explain why containers improve reproducibility.
  • I can choose CPU, GPU, or TPU-style acceleration conceptually based on workload type.
  • I can explain distributed training tradeoffs without assuming it always improves performance.
  • I can identify when hyperparameter tuning is worth the additional cost.
  • I can design train/validation/test separation for reliable evaluation.
  • I can explain early stopping, regularization, dropout, pruning, and model complexity controls.
  • I can recognize overfitting, underfitting, high bias, and high variance from learning curves.
  • I can explain why reproducibility requires versioned data, code, parameters, environment, and random seeds where applicable.

Evaluation metric checks

Know when each metric is appropriate.

Metric areaUse when…Watch for…
AccuracyClasses are balanced and error costs are similar.Misleading for rare-event detection.
PrecisionFalse positives are expensive.High precision can miss many true positives.
RecallFalse negatives are expensive.High recall can create alert fatigue.
F1 scoreNeed a balance of precision and recall.May hide business-specific error costs.
ROC-AUCRanking binary predictions across thresholds.Can look optimistic with severe imbalance.
PR-AUCPositive class is rare and precision/recall matter.More informative than ROC in many imbalanced cases.
RMSELarge regression errors should be penalized strongly.Sensitive to outliers.
MAENeed average absolute error that is easier to interpret.Does not emphasize large errors as strongly.
MAPEPercentage error is meaningful.Problematic near zero actual values.
Log lossProbabilistic classification quality matters.Penalizes overconfident wrong predictions.
Ranking metricsSearch, recommendation, ordered result quality.Position, diversity, and business constraints matter.
Forecast backtestingTime-series validation across historical windows.Random splits can leak future information.
Generative AI evaluationOutput quality, groundedness, safety, relevance, factuality.Human evaluation and domain criteria may be needed.

Key formulas to recognize:

\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall} = \frac{TP}{TP + FN} \]\[ \text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]

Can you do this?

  • Given a confusion matrix, compute precision and recall.
  • Choose a threshold based on business costs.
  • Explain why improving offline AUC may not improve production value.
  • Identify the right metric for imbalanced classification.
  • Explain why forecast evaluation must respect time order.
  • Compare two models using both quality and operational constraints.

BigQuery ML readiness

BigQuery ML is often relevant when data is already in BigQuery and the team wants SQL-based model development or inference.

You should be able to recognize patterns like:

CREATE OR REPLACE MODEL `project.dataset.model_name`
OPTIONS(
  model_type = 'logistic_reg',
  input_label_cols = ['label']
) AS
SELECT
  label,
  feature_1,
  feature_2,
  feature_3
FROM `project.dataset.training_table`;

And evaluation patterns like:

SELECT *
FROM ML.EVALUATE(
  MODEL `project.dataset.model_name`,
  TABLE `project.dataset.eval_table`
);

Checklist:

  • I can explain when BigQuery ML reduces data movement.
  • I can identify when SQL-based feature engineering is sufficient.
  • I can distinguish training, evaluation, prediction, and explainability-style workflows conceptually.
  • I can recognize that model choice must match the problem type and data shape.
  • I can explain when a model trained in BigQuery may still need an operational serving strategy.
  • I can reason about access control for datasets, models, and prediction outputs.
  • I can identify when exporting artifacts or integrating with Vertex AI workflows may be useful.

Vertex AI readiness

Vertex AI lifecycle areas

AreaWhat to know
DatasetsHow managed datasets can organize data for training and evaluation workflows.
TrainingDifference between managed AutoML-style training and custom training jobs.
Workbench/notebooksHow notebooks fit experimentation, not as the only production workflow.
ExperimentsWhy parameter, metric, and artifact tracking matter.
PipelinesHow reusable components create reproducible end-to-end workflows.
MetadataWhy lineage supports debugging, governance, and reproducibility.
Model RegistryHow versioned models support approval, deployment, rollback, and auditability.
EndpointsHow online prediction requires scaling, latency, traffic management, and monitoring decisions.
Batch predictionWhen offline scoring is better than real-time serving.
MonitoringHow to detect drift, skew, data quality issues, prediction changes, and service health concerns.
ExplainabilityWhen feature attribution or explanation support is needed for trust, debugging, or review.

Vertex AI scenario checks

ScenarioBetter reasoning
“The team wants minimal ML code and a managed workflow.”Consider managed training options, but verify data type, customization needs, and deployment requirements.
“The model needs a custom TensorFlow/PyTorch/scikit-learn architecture.”Consider custom training with a reproducible environment and tracked artifacts.
“Training succeeds locally but fails in the cloud.”Check dependencies, container image, permissions, paths, data access, region/project configuration, and resource assumptions.
“A model version performs worse after deployment.”Check data skew, traffic split, metric definitions, feature pipeline changes, and rollback options.
“A pipeline step is nondeterministic.”Check random seeds, versioned inputs, container versions, external data dependencies, and metadata tracking.
“Business requires approval before production.”Use model registry, evaluation gates, documentation, access controls, and deployment controls.

MLOps and pipeline readiness

End-to-end workflow

    flowchart LR
	    A[Define ML objective] --> B[Ingest and validate data]
	    B --> C[Engineer features]
	    C --> D[Train model]
	    D --> E[Evaluate and compare]
	    E --> F{Meets release criteria?}
	    F -- No --> C
	    F -- Yes --> G[Register model]
	    G --> H[Deploy or batch score]
	    H --> I[Monitor data, model, and service]
	    I --> J{Retrain or rollback?}
	    J -- Retrain --> B
	    J -- Rollback --> G

Pipeline readiness checklist

  • I can break an ML workflow into reusable components.
  • I can explain the difference between orchestration and model training.
  • I can identify pipeline inputs, outputs, artifacts, and parameters.
  • I can explain why each component should be idempotent where practical.
  • I can describe how metadata helps reproduce a model.
  • I can design evaluation gates before deployment.
  • I can explain CI/CD for ML as more than code deployment: it includes data, model, and pipeline validation.
  • I can compare manual, scheduled, event-driven, and performance-triggered retraining.
  • I can identify rollback requirements for online and batch prediction.
  • I can explain why production ML needs monitoring beyond infrastructure metrics.

Testing checklist for ML systems

Test typeWhat to validate
Unit testsFeature functions, parsing logic, custom transforms, metric calculations.
Data validationSchema, ranges, nulls, categories, duplication, label quality, freshness.
Training testsJob starts, reads data, writes artifacts, logs metrics, handles small sample runs.
Evaluation testsMetrics are computed consistently and compared against baselines.
Pipeline testsComponents pass artifacts correctly and fail safely.
Serving testsPrediction format, latency, error handling, model loading, response schema.
Security testsService account permissions, secret handling, data access boundaries.
Monitoring testsAlerts fire for drift, skew, errors, latency, and failed jobs.

Deployment and serving readiness

Serving pattern decision table

PatternUse when…Key tradeoffs
Online prediction endpointApplications need low-latency predictions.Scaling, latency, request format, traffic split, monitoring.
Batch predictionPredictions can be generated on a schedule or for large datasets.Batch windows, storage outputs, downstream consumption.
In-database predictionAnalytics users score data already in BigQuery.SQL workflow, model support, integration with reporting.
Streaming inferenceEvents need rapid scoring as they arrive.Pipeline complexity, state, backpressure, error handling.
Custom serving containerNeed custom preprocessing, model server, or runtime.More responsibility for packaging, dependencies, health checks.
Application-embedded modelSmall model close to application logic.Versioning, rollout, monitoring, and update complexity.
Edge or client-side inferenceNeed offline, low-latency, or privacy-sensitive local inference.Model size, update strategy, device variability.

Deployment checklist

  • I can choose online vs batch prediction based on latency and business process.
  • I can explain why training-time preprocessing must match serving-time preprocessing.
  • I can design model versioning and rollback.
  • I can explain canary, blue/green, and traffic-splitting concepts.
  • I can identify request/response schema risks.
  • I can monitor prediction latency, error rate, throughput, and resource utilization.
  • I can explain how autoscaling and cold starts may affect user experience.
  • I can choose when custom containers are justified.
  • I can identify how to secure prediction endpoints.
  • I can describe how batch outputs should be validated before downstream use.

Monitoring, troubleshooting, and retraining

What to monitor

Monitoring areaExamples of questions to answer
Data qualityAre incoming features missing, invalid, stale, or outside expected ranges?
Training-serving skewDo production feature distributions differ from training distributions?
Data driftHas input data changed meaningfully since training?
Concept driftHas the relationship between features and labels changed?
Prediction driftAre predictions shifting in unexpected ways?
Model qualityAre delayed labels showing degraded accuracy, precision, recall, or business KPIs?
Service healthAre latency, error rates, saturation, and availability acceptable?
CostDid training, inference, storage, or data processing costs change unexpectedly?
Fairness and safetyAre outcomes or generated outputs creating unacceptable risk for user groups or use cases?

Troubleshooting scenarios

SymptomLikely checks
Training job cannot read dataIAM permissions, service account, project/dataset access, path, network restrictions.
Training job fails after dependency installContainer image, package versions, runtime mismatch, missing system libraries.
Model overfitsMore validation, regularization, simpler model, more data, feature review, early stopping.
Offline metrics are good but production is poorLeakage, skew, wrong preprocessing, changed data, label delay, wrong threshold, metric mismatch.
Prediction endpoint returns errorsInput schema, model signature, container health, permissions, resource limits, dependency loading.
Latency is too highModel size, preprocessing cost, cold start, accelerator choice, batching, caching, endpoint scaling.
Batch prediction output is incompleteInput format, failed records, permissions, output location, quota/resource constraints.
Monitoring alerts are noisyThresholds, seasonality, alert grouping, baseline choice, business relevance.
Retraining makes model worseBad new data, label delay, drift misdiagnosis, changed objective, missing evaluation gate.

Can you do this?

  • Identify whether a failure is data, model, pipeline, infrastructure, or permission related.
  • Explain why delayed ground truth affects monitoring.
  • Define a retraining trigger that is not just “run every day.”
  • Choose rollback when retraining produces a worse model.
  • Distinguish model drift from a temporary business event.
  • Explain why monitoring generated AI outputs may require human or domain-specific evaluation.

Security, privacy, and governance checklist

IAM and access control

TopicReadiness expectation
Least privilegeGrant only the permissions needed for data access, training, deployment, and monitoring.
Service accountsUse workload-specific service accounts instead of broad user credentials.
Project boundariesUnderstand how projects separate environments, teams, billing, and access.
Dataset accessControl who can view raw data, transformed data, labels, predictions, and model outputs.
Model accessTreat models and embeddings as sensitive when they may reveal training data or business logic.
SecretsStore secrets outside code and notebooks.
Audit loggingKnow why audit trails matter for sensitive ML workflows.
Private connectivityRecognize when private network paths reduce exposure.

Data protection and responsible AI

  • I can identify personally identifiable, confidential, regulated, or business-sensitive data in an ML pipeline.
  • I can explain data minimization and purpose limitation in practical terms.
  • I can describe encryption at rest and in transit conceptually.
  • I can reason about customer-managed encryption keys when stronger key control is required.
  • I can explain when de-identification, masking, tokenization, or aggregation may be needed.
  • I can describe how lineage supports auditability.
  • I can explain why fairness checks require both metric review and context.
  • I can identify when explainability is required for trust, debugging, or approval.
  • I can explain human review for high-impact predictions.
  • I can recognize prompt injection, data exfiltration, and unsafe output risks in generative AI systems.

Generative AI and retrieval-augmented workflows

For PMLE preparation, be ready to reason about modern ML engineering patterns that include generative AI, embeddings, and retrieval workflows on Google Cloud.

Generative AI decision checks

RequirementConsiderWatch for
Summarize or draft text from trusted documentsRetrieval-augmented generationDocument freshness, access control, hallucination, citation quality.
Semantic searchEmbeddings and vector search patternsEmbedding quality, indexing strategy, latency, relevance evaluation.
Domain-specific languagePrompt engineering, grounding, tuning, or custom model approachCost, data sensitivity, evaluation complexity.
Safer outputSafety filters, constraints, human review, evaluation setsOverblocking, underblocking, ambiguous policy rules.
Reduce hallucinationGrounding, retrieval, constrained responses, verificationSource quality and prompt injection.
Improve consistencyPrompt templates, examples, evaluation harnessesOverfitting to narrow examples.
Protect private dataAccess controls, data minimization, logging reviewSensitive prompts, stored outputs, embeddings leakage.

RAG readiness checklist

  • I can describe the difference between model knowledge and retrieved context.
  • I can outline document ingestion, chunking, embedding, indexing, retrieval, prompt assembly, generation, and evaluation.
  • I can explain why chunk size and overlap affect retrieval quality.
  • I can identify stale or unauthorized documents as RAG risks.
  • I can define evaluation criteria: relevance, groundedness, factuality, completeness, safety, and latency.
  • I can reason about access control at retrieval time, not just at ingestion time.
  • I can identify prompt injection risks from retrieved documents.
  • I can explain why human review may be needed for high-impact generated content.

Cost, reliability, and performance tradeoffs

Design choiceCost/performance questions
AutoML vs custom trainingIs the productivity gain worth less control? Does the model need special architecture or preprocessing?
CPU vs acceleratorDoes the workload benefit enough from acceleration to justify complexity and cost?
Large model vs smaller modelIs quality improvement worth latency, memory, and serving cost?
Online vs batchDoes the business truly need real-time predictions?
Frequent retrainingIs retraining driven by measurable drift or business need?
Feature store adoptionDoes reuse, consistency, and online/offline parity justify added operational design?
Streaming pipelineIs near-real-time value worth complexity versus scheduled batch?
Custom servingIs flexibility worth added operational responsibility?
Data movementCan training or inference happen where the data already resides?
Monitoring depthAre alerts actionable and aligned to business risk?

Can you do this?

  • Choose a simpler architecture when complexity is not justified.
  • Identify hidden costs from large-scale experimentation.
  • Explain why reducing model size may improve reliability.
  • Compare latency, throughput, and cost in serving decisions.
  • Design batch workflows that meet business deadlines without online serving.
  • Identify when caching, batching, or asynchronous processing is appropriate.
  • Explain how failed pipelines can create downstream business risk.

Architecture scenario drills

Use these prompts for final review. For each one, state the ML approach, Google Cloud services, data path, evaluation metric, deployment pattern, monitoring plan, and security controls.

Scenario 1: Fraud detection

Checklist:

  • Define prediction target and label delay.
  • Address class imbalance.
  • Choose precision/recall tradeoff based on investigation capacity.
  • Prevent leakage from post-transaction fields.
  • Use time-aware validation.
  • Design online or near-real-time scoring if required.
  • Monitor drift, false positives, false negatives, and alert fatigue.
  • Secure sensitive transaction and user data.

Scenario 2: Demand forecasting

Checklist:

  • Use historical time windows correctly.
  • Avoid random split leakage.
  • Add seasonality, calendar, price, promotion, and inventory features when available.
  • Choose forecast horizon and granularity.
  • Evaluate with backtesting.
  • Decide batch prediction schedule.
  • Monitor forecast error by segment.
  • Handle new products or sparse history.

Scenario 3: Product recommendation

Checklist:

  • Identify explicit and implicit feedback.
  • Handle cold-start users and items.
  • Separate candidate generation from ranking if needed.
  • Include business constraints such as availability or diversity.
  • Evaluate offline and online behavior carefully.
  • Watch for feedback loops and popularity bias.
  • Decide batch vs online updates.
  • Monitor click-through, conversion, user satisfaction, and fairness concerns.

Scenario 4: Image classification

Checklist:

  • Validate label quality and class balance.
  • Decide managed AutoML vs custom model.
  • Use train/validation/test split without duplicate leakage.
  • Consider augmentation and transfer learning.
  • Evaluate per-class metrics.
  • Choose deployment pattern based on latency and device needs.
  • Monitor image distribution changes.
  • Secure stored images and model outputs.

Scenario 5: Generative AI support assistant

Checklist:

  • Decide prompt-only, RAG, tuning, or custom model approach.
  • Identify source documents and access controls.
  • Design retrieval, grounding, and citation behavior.
  • Evaluate factuality, relevance, safety, and refusal behavior.
  • Protect sensitive user prompts and retrieved content.
  • Monitor hallucinations, unsafe outputs, latency, and cost.
  • Include human escalation for high-risk answers.
  • Plan document refresh and evaluation updates.

Common weak areas and traps

TrapWhy it hurts exam performanceBetter habit
Choosing the most advanced model by defaultPMLE scenarios often reward fit-for-purpose design.Start with business goal, data, constraints, and baseline.
Ignoring leakageLeakage creates unrealistic metrics and poor production performance.Ask: “Would this feature exist at prediction time?”
Using random splits for time-series dataFuture information can leak into training.Use time-aware validation and backtesting.
Optimizing accuracy on imbalanced dataAccuracy can hide failure on rare but important cases.Use precision, recall, PR-AUC, cost-based thresholds.
Treating notebooks as production pipelinesNotebooks alone often lack reproducibility and controls.Move repeatable steps into versioned pipelines.
Forgetting training-serving skewDifferent preprocessing breaks production behavior.Reuse transformations and validate production inputs.
Monitoring only CPU and memoryML systems fail through data and behavior changes too.Monitor data, predictions, quality, and business metrics.
Retraining automatically without gatesBad new data can make the model worse.Use validation, approval, and rollback criteria.
Granting broad permissionsOverly broad IAM increases risk.Use least privilege and workload-specific service accounts.
Moving large data unnecessarilyData movement adds cost, latency, and complexity.Train or score near the data when practical.
Ignoring delayed labelsQuality monitoring may lag behind production changes.Use proxy metrics plus delayed ground-truth evaluation.
Assuming generative AI is always the answerIt may add cost, latency, and risk.Use rules, search, retrieval, or smaller models when sufficient.
Skipping human reviewSome outputs or predictions carry high impact.Add review, escalation, and documentation where needed.

“Can you do this?” final skill checklist

Design and architecture

  • I can design an end-to-end ML solution on Google Cloud from raw data to monitored predictions.
  • I can justify service choices without relying on memorized product names only.
  • I can compare AutoML, custom training, and BigQuery ML for a scenario.
  • I can choose batch, streaming, or online serving patterns.
  • I can identify the simplest architecture that satisfies requirements.
  • I can include security, governance, monitoring, and cost controls in the design.

Data and modeling

  • I can detect leakage, skew, drift, imbalance, and bad labels.
  • I can choose the right split strategy.
  • I can select metrics aligned with business risk.
  • I can explain overfitting, underfitting, bias, variance, and regularization.
  • I can reason about feature engineering for tabular, text, image, time-series, and event data.
  • I can explain when embeddings are useful.

MLOps

  • I can describe pipeline components and artifacts.
  • I can version data, code, parameters, models, and environments conceptually.
  • I can design evaluation gates before deployment.
  • I can plan rollback and retraining.
  • I can monitor both infrastructure and model behavior.
  • I can troubleshoot failures across data, code, IAM, serving, and monitoring.

Security and responsible AI

  • I can apply least privilege to ML workflows.
  • I can identify sensitive data in training and prediction flows.
  • I can explain auditability, lineage, and model documentation.
  • I can reason about explainability and fairness requirements.
  • I can identify prompt injection, hallucination, unsafe output, and data leakage risks.
  • I can add human review where automation alone is risky.

Final-week review checklist

Use this section to focus review time before the exam.

7 to 5 days out

  • Review Google Cloud ML service selection: Vertex AI, BigQuery ML, Dataflow, Pub/Sub, Cloud Storage, BigQuery, Cloud Run, Google Kubernetes Engine, and supporting security services.
  • Rebuild your mental map of the ML lifecycle: data, features, training, evaluation, registry, deployment, monitoring, retraining.
  • Drill metric selection for classification, regression, forecasting, ranking, and generative AI.
  • Practice spotting leakage and skew in short scenarios.
  • Review IAM and service account patterns for ML workflows.
  • Summarize when to use online prediction, batch prediction, and streaming inference.

4 to 2 days out

  • Work through architecture scenarios without notes.
  • For each missed question, classify the miss: service selection, metric choice, data issue, MLOps, security, or troubleshooting.
  • Review common traps, especially overfitting, class imbalance, time leakage, and overbroad permissions.
  • Practice explaining tradeoffs in one or two sentences.
  • Review model monitoring and retraining decision points.
  • Review generative AI/RAG risks: grounding, evaluation, safety, prompt injection, and access control.

Final 24 hours

  • Review your weakest service-selection tables.
  • Recheck metric formulas and threshold tradeoffs.
  • Review the difference between data drift, concept drift, prediction drift, and training-serving skew.
  • Review pipeline artifact flow and rollback logic.
  • Review security defaults: least privilege, service accounts, sensitive data handling, auditability.
  • Avoid cramming obscure limits or quotas unless your own study materials require them.
  • Rest enough to read scenarios carefully and avoid rushing.

Practical next step

Pick one weak area from the readiness map and turn it into a short drill: write the problem type, data source, Google Cloud services, model approach, metric, deployment pattern, monitoring plan, and security controls. Then compare your answer against the checklist above and repeat with a different scenario until your choices feel deliberate rather than memorized.