PMLE — Google Cloud Professional Machine Learning Engineer - 2026 Guide Exam Blueprint
Practical exam blueprint for Google Cloud PMLE exam readiness.
How to Use This Exam Blueprint
Use this checklist as a practical readiness map for the Google Cloud Professional Machine Learning Engineer - 2026 Guide exam, code PMLE. It is organized around the skills a professional machine learning engineer is expected to apply on Google Cloud: framing ML problems, preparing data, training models, deploying solutions, operating ML systems, and making security, governance, cost, and reliability tradeoffs.
Do not treat this as an official scoring guide or a list of exact exam weights. Instead, use it to answer three questions:
- Can I choose the right Google Cloud approach for a realistic ML scenario?
- Can I explain the lifecycle from data to deployed, monitored model?
- Can I identify weak designs, operational risks, and better alternatives?
For each section, mark topics as:
| Status | Meaning |
|---|---|
| Ready | You can explain it, choose it in a scenario, and spot common traps. |
| Review | You understand the concept but hesitate on service selection or tradeoffs. |
| Practice | You need hands-on review, scenario drills, or architecture comparison. |
Exam identity
| Item | Checklist detail |
|---|---|
| Vendor/provider | Google Cloud |
| Official exam title | Google Cloud Professional Machine Learning Engineer - 2026 Guide |
| Exam code | PMLE |
| Page purpose | Independent exam blueprint and final-review support |
| Readiness focus | Applied ML engineering, Google Cloud service selection, MLOps, security, data workflows, deployment, monitoring, and troubleshooting |
Topic-area readiness map
| Readiness area | What you should be able to do | Ready? |
|---|---|---|
| ML problem framing | Translate business goals into supervised, unsupervised, forecasting, recommendation, ranking, anomaly detection, or generative AI approaches. | [ ] |
| Data sourcing and ingestion | Choose patterns for batch, streaming, structured, semi-structured, and unstructured data using Google Cloud services. | [ ] |
| Data quality and feature readiness | Detect leakage, skew, missing values, outliers, label quality issues, imbalance, and schema drift. | [ ] |
| Exploratory analysis | Use statistical summaries, distributions, correlation, segmentation, and visual checks to validate assumptions. | [ ] |
| Feature engineering | Select appropriate encoding, normalization, text/image/audio transformations, embeddings, feature crosses, and time-based features. | [ ] |
| Model selection | Match model families and Google Cloud options to problem type, data size, latency needs, interpretability, and operational constraints. | [ ] |
| Training architecture | Choose between AutoML, custom training, BigQuery ML, notebooks, distributed training, accelerators, and containerized jobs. | [ ] |
| Hyperparameter tuning | Explain tuning goals, search strategies, validation sets, early stopping, overfitting risks, and resource tradeoffs. | [ ] |
| Evaluation metrics | Choose metrics for classification, regression, ranking, recommendation, forecasting, anomaly detection, and generative AI evaluation. | [ ] |
| Vertex AI workflows | Understand how Vertex AI supports datasets, training, pipelines, model registry, endpoints, batch prediction, experiments, and monitoring. | [ ] |
| BigQuery ML | Recognize when in-database training and inference reduce data movement and simplify analytics-oriented ML workflows. | [ ] |
| Pipelines and MLOps | Design reproducible, versioned workflows with orchestration, metadata, testing, CI/CD, approvals, and rollback paths. | [ ] |
| Deployment patterns | Compare online prediction, batch prediction, embedded inference, streaming inference, and custom serving on managed compute. | [ ] |
| Monitoring and retraining | Monitor model performance, data drift, prediction skew, service health, latency, errors, and retraining triggers. | [ ] |
| Security and IAM | Apply least privilege, service accounts, data access controls, encryption choices, auditability, and private connectivity patterns. | [ ] |
| Governance and responsible AI | Address explainability, fairness, documentation, human review, privacy, safety, lineage, and model approval needs. | [ ] |
| Cost and performance | Balance training cost, serving latency, accelerator usage, data movement, batch windows, caching, and operational complexity. | [ ] |
| Troubleshooting | Diagnose failed training jobs, poor metrics, serving errors, data pipeline failures, skew, permissions, and scaling issues. | [ ] |
Core PMLE readiness checklist
ML problem framing
You should be able to answer scenario questions such as: “What type of ML solution fits this business problem, and what are the risks?”
| Scenario cue | Strong answer should consider |
|---|---|
| “Predict a numeric value” | Regression, baseline model, RMSE/MAE, outlier sensitivity, feature leakage. |
| “Classify user behavior” | Binary or multiclass classification, precision/recall tradeoff, class imbalance, threshold tuning. |
| “Detect rare fraud or defects” | Anomaly detection or imbalanced classification, recall importance, false-positive cost, alert review workflow. |
| “Forecast future demand” | Time-aware splits, seasonality, holidays/events, leakage prevention, forecast horizon, retraining cadence. |
| “Recommend products or content” | Candidate generation, ranking, implicit feedback, cold start, diversity, business rules, online evaluation. |
| “Search or match semantic meaning” | Embeddings, vector search, retrieval quality, latency, grounding data freshness. |
| “Generate text, code, or summaries” | Prompt design, grounding, evaluation, safety, hallucination risk, human review, data privacy. |
| “Explain why predictions happen” | Explainability, feature attribution, model choice, stakeholder requirements, auditability. |
Checklist:
- I can define the prediction target clearly.
- I can identify whether labels exist and whether they are reliable.
- I can distinguish correlation from causation in exam scenarios.
- I can choose offline and online evaluation metrics aligned to business impact.
- I can explain why a non-ML solution may be better when rules, SQL, or simple automation is enough.
- I can identify when human-in-the-loop review is necessary.
- I can describe failure modes before choosing a model.
Data readiness and feature engineering
| Topic | What “ready” means |
|---|---|
| Data sources | You can identify whether data belongs in BigQuery, Cloud Storage, Pub/Sub, operational databases, or specialized stores based on access pattern and structure. |
| Batch ingestion | You can reason about scheduled loads, transformations, validation, lineage, and reproducibility. |
| Streaming ingestion | You can reason about event-time processing, late data, windowing, deduplication, and real-time features. |
| Data quality | You can detect missing values, invalid categories, inconsistent units, duplicates, bad labels, and outlier handling choices. |
| Schema management | You can explain why schema changes can break training, serving, and monitoring. |
| Train/validation/test splits | You can choose random, stratified, group-based, or time-based splits appropriately. |
| Feature leakage | You can spot features that would not be available at prediction time. |
| Feature skew | You can explain training-serving skew and how consistent preprocessing reduces it. |
| Feature stores | You can describe why reusable, versioned, point-in-time-correct features matter. |
| Embeddings | You can explain when embeddings help with text, images, recommendations, semantic search, or generative AI retrieval. |
Can you do this?
- Given a table of events, identify the label, entity, timestamp, features, and prediction time.
- Explain why random splitting is risky for time-series or user-level grouped data.
- Identify leakage from future aggregates, post-outcome fields, manually corrected labels, or target-derived columns.
- Choose whether preprocessing belongs in SQL, Dataflow, pipeline components, training code, or serving code.
- Explain the difference between data drift, concept drift, and prediction drift.
- Describe how to validate data before starting expensive training jobs.
- Identify when feature normalization, standardization, bucketization, encoding, or dimensionality reduction may help.
Google Cloud service selection checklist
High-level service decision map
| If the scenario emphasizes… | Consider… | Watch for… |
|---|---|---|
| Fast managed model development with less custom code | Vertex AI AutoML capabilities | Need for custom architecture, special preprocessing, or explainability constraints. |
| Custom model code, frameworks, or containers | Vertex AI custom training | Container dependencies, accelerator needs, packaging, reproducibility. |
| Data already lives in BigQuery and model is analytics-oriented | BigQuery ML | Data movement, SQL-based workflow, supported model type, operational serving needs. |
| Repeatable ML workflows | Vertex AI Pipelines | Artifact tracking, component boundaries, parameterization, metadata, CI/CD. |
| Managed model deployment | Vertex AI endpoints or batch prediction | Latency, scaling, traffic splitting, monitoring, model versioning. |
| Large-scale batch transformation | Dataflow, BigQuery, or Dataproc depending on workload | Windowing, cost, pipeline complexity, team skill set. |
| Containerized custom APIs | Cloud Run or Google Kubernetes Engine | Operational burden, scaling behavior, networking, model loading time. |
| Streaming features or inference | Pub/Sub, Dataflow, online serving patterns | Event time, backpressure, state, latency, monitoring. |
| Semantic retrieval or RAG | Vertex AI, embeddings, vector search patterns, managed data stores | Grounding quality, freshness, access control, prompt injection risk. |
| Experiment tracking and reproducibility | Vertex AI Experiments, metadata, model registry patterns | Missing lineage, untracked parameters, unversioned datasets. |
Storage and data platform checks
| Data/workload pattern | Readiness prompts |
|---|---|
| Analytical warehouse | Can you explain when BigQuery is a good fit for feature creation, large-scale analysis, and BigQuery ML? |
| Object storage | Can you explain when Cloud Storage is appropriate for raw data, training files, model artifacts, images, audio, and exports? |
| Streaming events | Can you reason about Pub/Sub and downstream processing for near-real-time ML features or predictions? |
| Operational serving data | Can you identify when low-latency application databases or caches are needed alongside ML services? |
| Sensitive data | Can you apply IAM, encryption choices, data minimization, masking, and audit logging concepts? |
| Cross-project access | Can you reason about service accounts, project boundaries, shared datasets, and least privilege? |
Model development and training readiness
Model selection prompts
| Problem | Candidate model families or approaches | Exam-style tradeoffs |
|---|---|---|
| Binary classification | Logistic regression, tree-based models, neural networks, AutoML | Interpretability, threshold tuning, imbalance, calibration. |
| Multiclass classification | Softmax models, boosted trees, deep learning, AutoML | Confusion among similar classes, class imbalance, label quality. |
| Regression | Linear models, tree-based models, neural networks | Outliers, metric choice, feature scaling, prediction intervals. |
| Forecasting | Time-series models, feature-based regression, managed forecasting options | Leakage, seasonality, horizon, backtesting. |
| Recommendations | Collaborative filtering, matrix factorization, ranking models, embeddings | Cold start, feedback loops, diversity, business rules. |
| Anomaly detection | Statistical thresholds, unsupervised models, supervised rare-event models | False positives, drift, alert fatigue, label scarcity. |
| Computer vision | AutoML vision workflows, custom deep learning, transfer learning | Data volume, labeling, augmentation, serving latency. |
| NLP | Text classification, embeddings, sequence models, generative AI | Tokenization, context limits, grounding, bias, privacy. |
| Generative AI | Prompting, retrieval-augmented generation, tuning, evaluation workflows | Hallucination, safety, data leakage, cost, latency. |
Training architecture checklist
- I can decide whether to use managed AutoML, BigQuery ML, or custom training.
- I can describe the artifacts needed for custom training: source code, dependencies, container image, training data reference, output model location, metrics.
- I can explain why containers improve reproducibility.
- I can choose CPU, GPU, or TPU-style acceleration conceptually based on workload type.
- I can explain distributed training tradeoffs without assuming it always improves performance.
- I can identify when hyperparameter tuning is worth the additional cost.
- I can design train/validation/test separation for reliable evaluation.
- I can explain early stopping, regularization, dropout, pruning, and model complexity controls.
- I can recognize overfitting, underfitting, high bias, and high variance from learning curves.
- I can explain why reproducibility requires versioned data, code, parameters, environment, and random seeds where applicable.
Evaluation metric checks
Know when each metric is appropriate.
| Metric area | Use when… | Watch for… |
|---|---|---|
| Accuracy | Classes are balanced and error costs are similar. | Misleading for rare-event detection. |
| Precision | False positives are expensive. | High precision can miss many true positives. |
| Recall | False negatives are expensive. | High recall can create alert fatigue. |
| F1 score | Need a balance of precision and recall. | May hide business-specific error costs. |
| ROC-AUC | Ranking binary predictions across thresholds. | Can look optimistic with severe imbalance. |
| PR-AUC | Positive class is rare and precision/recall matter. | More informative than ROC in many imbalanced cases. |
| RMSE | Large regression errors should be penalized strongly. | Sensitive to outliers. |
| MAE | Need average absolute error that is easier to interpret. | Does not emphasize large errors as strongly. |
| MAPE | Percentage error is meaningful. | Problematic near zero actual values. |
| Log loss | Probabilistic classification quality matters. | Penalizes overconfident wrong predictions. |
| Ranking metrics | Search, recommendation, ordered result quality. | Position, diversity, and business constraints matter. |
| Forecast backtesting | Time-series validation across historical windows. | Random splits can leak future information. |
| Generative AI evaluation | Output quality, groundedness, safety, relevance, factuality. | Human evaluation and domain criteria may be needed. |
Key formulas to recognize:
\[ \text{Precision} = \frac{TP}{TP + FP} \]\[ \text{Recall} = \frac{TP}{TP + FN} \]\[ \text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]Can you do this?
- Given a confusion matrix, compute precision and recall.
- Choose a threshold based on business costs.
- Explain why improving offline AUC may not improve production value.
- Identify the right metric for imbalanced classification.
- Explain why forecast evaluation must respect time order.
- Compare two models using both quality and operational constraints.
BigQuery ML readiness
BigQuery ML is often relevant when data is already in BigQuery and the team wants SQL-based model development or inference.
You should be able to recognize patterns like:
CREATE OR REPLACE MODEL `project.dataset.model_name`
OPTIONS(
model_type = 'logistic_reg',
input_label_cols = ['label']
) AS
SELECT
label,
feature_1,
feature_2,
feature_3
FROM `project.dataset.training_table`;
And evaluation patterns like:
SELECT *
FROM ML.EVALUATE(
MODEL `project.dataset.model_name`,
TABLE `project.dataset.eval_table`
);
Checklist:
- I can explain when BigQuery ML reduces data movement.
- I can identify when SQL-based feature engineering is sufficient.
- I can distinguish training, evaluation, prediction, and explainability-style workflows conceptually.
- I can recognize that model choice must match the problem type and data shape.
- I can explain when a model trained in BigQuery may still need an operational serving strategy.
- I can reason about access control for datasets, models, and prediction outputs.
- I can identify when exporting artifacts or integrating with Vertex AI workflows may be useful.
Vertex AI readiness
Vertex AI lifecycle areas
| Area | What to know |
|---|---|
| Datasets | How managed datasets can organize data for training and evaluation workflows. |
| Training | Difference between managed AutoML-style training and custom training jobs. |
| Workbench/notebooks | How notebooks fit experimentation, not as the only production workflow. |
| Experiments | Why parameter, metric, and artifact tracking matter. |
| Pipelines | How reusable components create reproducible end-to-end workflows. |
| Metadata | Why lineage supports debugging, governance, and reproducibility. |
| Model Registry | How versioned models support approval, deployment, rollback, and auditability. |
| Endpoints | How online prediction requires scaling, latency, traffic management, and monitoring decisions. |
| Batch prediction | When offline scoring is better than real-time serving. |
| Monitoring | How to detect drift, skew, data quality issues, prediction changes, and service health concerns. |
| Explainability | When feature attribution or explanation support is needed for trust, debugging, or review. |
Vertex AI scenario checks
| Scenario | Better reasoning |
|---|---|
| “The team wants minimal ML code and a managed workflow.” | Consider managed training options, but verify data type, customization needs, and deployment requirements. |
| “The model needs a custom TensorFlow/PyTorch/scikit-learn architecture.” | Consider custom training with a reproducible environment and tracked artifacts. |
| “Training succeeds locally but fails in the cloud.” | Check dependencies, container image, permissions, paths, data access, region/project configuration, and resource assumptions. |
| “A model version performs worse after deployment.” | Check data skew, traffic split, metric definitions, feature pipeline changes, and rollback options. |
| “A pipeline step is nondeterministic.” | Check random seeds, versioned inputs, container versions, external data dependencies, and metadata tracking. |
| “Business requires approval before production.” | Use model registry, evaluation gates, documentation, access controls, and deployment controls. |
MLOps and pipeline readiness
End-to-end workflow
flowchart LR
A[Define ML objective] --> B[Ingest and validate data]
B --> C[Engineer features]
C --> D[Train model]
D --> E[Evaluate and compare]
E --> F{Meets release criteria?}
F -- No --> C
F -- Yes --> G[Register model]
G --> H[Deploy or batch score]
H --> I[Monitor data, model, and service]
I --> J{Retrain or rollback?}
J -- Retrain --> B
J -- Rollback --> G
Pipeline readiness checklist
- I can break an ML workflow into reusable components.
- I can explain the difference between orchestration and model training.
- I can identify pipeline inputs, outputs, artifacts, and parameters.
- I can explain why each component should be idempotent where practical.
- I can describe how metadata helps reproduce a model.
- I can design evaluation gates before deployment.
- I can explain CI/CD for ML as more than code deployment: it includes data, model, and pipeline validation.
- I can compare manual, scheduled, event-driven, and performance-triggered retraining.
- I can identify rollback requirements for online and batch prediction.
- I can explain why production ML needs monitoring beyond infrastructure metrics.
Testing checklist for ML systems
| Test type | What to validate |
|---|---|
| Unit tests | Feature functions, parsing logic, custom transforms, metric calculations. |
| Data validation | Schema, ranges, nulls, categories, duplication, label quality, freshness. |
| Training tests | Job starts, reads data, writes artifacts, logs metrics, handles small sample runs. |
| Evaluation tests | Metrics are computed consistently and compared against baselines. |
| Pipeline tests | Components pass artifacts correctly and fail safely. |
| Serving tests | Prediction format, latency, error handling, model loading, response schema. |
| Security tests | Service account permissions, secret handling, data access boundaries. |
| Monitoring tests | Alerts fire for drift, skew, errors, latency, and failed jobs. |
Deployment and serving readiness
Serving pattern decision table
| Pattern | Use when… | Key tradeoffs |
|---|---|---|
| Online prediction endpoint | Applications need low-latency predictions. | Scaling, latency, request format, traffic split, monitoring. |
| Batch prediction | Predictions can be generated on a schedule or for large datasets. | Batch windows, storage outputs, downstream consumption. |
| In-database prediction | Analytics users score data already in BigQuery. | SQL workflow, model support, integration with reporting. |
| Streaming inference | Events need rapid scoring as they arrive. | Pipeline complexity, state, backpressure, error handling. |
| Custom serving container | Need custom preprocessing, model server, or runtime. | More responsibility for packaging, dependencies, health checks. |
| Application-embedded model | Small model close to application logic. | Versioning, rollout, monitoring, and update complexity. |
| Edge or client-side inference | Need offline, low-latency, or privacy-sensitive local inference. | Model size, update strategy, device variability. |
Deployment checklist
- I can choose online vs batch prediction based on latency and business process.
- I can explain why training-time preprocessing must match serving-time preprocessing.
- I can design model versioning and rollback.
- I can explain canary, blue/green, and traffic-splitting concepts.
- I can identify request/response schema risks.
- I can monitor prediction latency, error rate, throughput, and resource utilization.
- I can explain how autoscaling and cold starts may affect user experience.
- I can choose when custom containers are justified.
- I can identify how to secure prediction endpoints.
- I can describe how batch outputs should be validated before downstream use.
Monitoring, troubleshooting, and retraining
What to monitor
| Monitoring area | Examples of questions to answer |
|---|---|
| Data quality | Are incoming features missing, invalid, stale, or outside expected ranges? |
| Training-serving skew | Do production feature distributions differ from training distributions? |
| Data drift | Has input data changed meaningfully since training? |
| Concept drift | Has the relationship between features and labels changed? |
| Prediction drift | Are predictions shifting in unexpected ways? |
| Model quality | Are delayed labels showing degraded accuracy, precision, recall, or business KPIs? |
| Service health | Are latency, error rates, saturation, and availability acceptable? |
| Cost | Did training, inference, storage, or data processing costs change unexpectedly? |
| Fairness and safety | Are outcomes or generated outputs creating unacceptable risk for user groups or use cases? |
Troubleshooting scenarios
| Symptom | Likely checks |
|---|---|
| Training job cannot read data | IAM permissions, service account, project/dataset access, path, network restrictions. |
| Training job fails after dependency install | Container image, package versions, runtime mismatch, missing system libraries. |
| Model overfits | More validation, regularization, simpler model, more data, feature review, early stopping. |
| Offline metrics are good but production is poor | Leakage, skew, wrong preprocessing, changed data, label delay, wrong threshold, metric mismatch. |
| Prediction endpoint returns errors | Input schema, model signature, container health, permissions, resource limits, dependency loading. |
| Latency is too high | Model size, preprocessing cost, cold start, accelerator choice, batching, caching, endpoint scaling. |
| Batch prediction output is incomplete | Input format, failed records, permissions, output location, quota/resource constraints. |
| Monitoring alerts are noisy | Thresholds, seasonality, alert grouping, baseline choice, business relevance. |
| Retraining makes model worse | Bad new data, label delay, drift misdiagnosis, changed objective, missing evaluation gate. |
Can you do this?
- Identify whether a failure is data, model, pipeline, infrastructure, or permission related.
- Explain why delayed ground truth affects monitoring.
- Define a retraining trigger that is not just “run every day.”
- Choose rollback when retraining produces a worse model.
- Distinguish model drift from a temporary business event.
- Explain why monitoring generated AI outputs may require human or domain-specific evaluation.
Security, privacy, and governance checklist
IAM and access control
| Topic | Readiness expectation |
|---|---|
| Least privilege | Grant only the permissions needed for data access, training, deployment, and monitoring. |
| Service accounts | Use workload-specific service accounts instead of broad user credentials. |
| Project boundaries | Understand how projects separate environments, teams, billing, and access. |
| Dataset access | Control who can view raw data, transformed data, labels, predictions, and model outputs. |
| Model access | Treat models and embeddings as sensitive when they may reveal training data or business logic. |
| Secrets | Store secrets outside code and notebooks. |
| Audit logging | Know why audit trails matter for sensitive ML workflows. |
| Private connectivity | Recognize when private network paths reduce exposure. |
Data protection and responsible AI
- I can identify personally identifiable, confidential, regulated, or business-sensitive data in an ML pipeline.
- I can explain data minimization and purpose limitation in practical terms.
- I can describe encryption at rest and in transit conceptually.
- I can reason about customer-managed encryption keys when stronger key control is required.
- I can explain when de-identification, masking, tokenization, or aggregation may be needed.
- I can describe how lineage supports auditability.
- I can explain why fairness checks require both metric review and context.
- I can identify when explainability is required for trust, debugging, or approval.
- I can explain human review for high-impact predictions.
- I can recognize prompt injection, data exfiltration, and unsafe output risks in generative AI systems.
Generative AI and retrieval-augmented workflows
For PMLE preparation, be ready to reason about modern ML engineering patterns that include generative AI, embeddings, and retrieval workflows on Google Cloud.
Generative AI decision checks
| Requirement | Consider | Watch for |
|---|---|---|
| Summarize or draft text from trusted documents | Retrieval-augmented generation | Document freshness, access control, hallucination, citation quality. |
| Semantic search | Embeddings and vector search patterns | Embedding quality, indexing strategy, latency, relevance evaluation. |
| Domain-specific language | Prompt engineering, grounding, tuning, or custom model approach | Cost, data sensitivity, evaluation complexity. |
| Safer output | Safety filters, constraints, human review, evaluation sets | Overblocking, underblocking, ambiguous policy rules. |
| Reduce hallucination | Grounding, retrieval, constrained responses, verification | Source quality and prompt injection. |
| Improve consistency | Prompt templates, examples, evaluation harnesses | Overfitting to narrow examples. |
| Protect private data | Access controls, data minimization, logging review | Sensitive prompts, stored outputs, embeddings leakage. |
RAG readiness checklist
- I can describe the difference between model knowledge and retrieved context.
- I can outline document ingestion, chunking, embedding, indexing, retrieval, prompt assembly, generation, and evaluation.
- I can explain why chunk size and overlap affect retrieval quality.
- I can identify stale or unauthorized documents as RAG risks.
- I can define evaluation criteria: relevance, groundedness, factuality, completeness, safety, and latency.
- I can reason about access control at retrieval time, not just at ingestion time.
- I can identify prompt injection risks from retrieved documents.
- I can explain why human review may be needed for high-impact generated content.
Cost, reliability, and performance tradeoffs
| Design choice | Cost/performance questions |
|---|---|
| AutoML vs custom training | Is the productivity gain worth less control? Does the model need special architecture or preprocessing? |
| CPU vs accelerator | Does the workload benefit enough from acceleration to justify complexity and cost? |
| Large model vs smaller model | Is quality improvement worth latency, memory, and serving cost? |
| Online vs batch | Does the business truly need real-time predictions? |
| Frequent retraining | Is retraining driven by measurable drift or business need? |
| Feature store adoption | Does reuse, consistency, and online/offline parity justify added operational design? |
| Streaming pipeline | Is near-real-time value worth complexity versus scheduled batch? |
| Custom serving | Is flexibility worth added operational responsibility? |
| Data movement | Can training or inference happen where the data already resides? |
| Monitoring depth | Are alerts actionable and aligned to business risk? |
Can you do this?
- Choose a simpler architecture when complexity is not justified.
- Identify hidden costs from large-scale experimentation.
- Explain why reducing model size may improve reliability.
- Compare latency, throughput, and cost in serving decisions.
- Design batch workflows that meet business deadlines without online serving.
- Identify when caching, batching, or asynchronous processing is appropriate.
- Explain how failed pipelines can create downstream business risk.
Architecture scenario drills
Use these prompts for final review. For each one, state the ML approach, Google Cloud services, data path, evaluation metric, deployment pattern, monitoring plan, and security controls.
Scenario 1: Fraud detection
Checklist:
- Define prediction target and label delay.
- Address class imbalance.
- Choose precision/recall tradeoff based on investigation capacity.
- Prevent leakage from post-transaction fields.
- Use time-aware validation.
- Design online or near-real-time scoring if required.
- Monitor drift, false positives, false negatives, and alert fatigue.
- Secure sensitive transaction and user data.
Scenario 2: Demand forecasting
Checklist:
- Use historical time windows correctly.
- Avoid random split leakage.
- Add seasonality, calendar, price, promotion, and inventory features when available.
- Choose forecast horizon and granularity.
- Evaluate with backtesting.
- Decide batch prediction schedule.
- Monitor forecast error by segment.
- Handle new products or sparse history.
Scenario 3: Product recommendation
Checklist:
- Identify explicit and implicit feedback.
- Handle cold-start users and items.
- Separate candidate generation from ranking if needed.
- Include business constraints such as availability or diversity.
- Evaluate offline and online behavior carefully.
- Watch for feedback loops and popularity bias.
- Decide batch vs online updates.
- Monitor click-through, conversion, user satisfaction, and fairness concerns.
Scenario 4: Image classification
Checklist:
- Validate label quality and class balance.
- Decide managed AutoML vs custom model.
- Use train/validation/test split without duplicate leakage.
- Consider augmentation and transfer learning.
- Evaluate per-class metrics.
- Choose deployment pattern based on latency and device needs.
- Monitor image distribution changes.
- Secure stored images and model outputs.
Scenario 5: Generative AI support assistant
Checklist:
- Decide prompt-only, RAG, tuning, or custom model approach.
- Identify source documents and access controls.
- Design retrieval, grounding, and citation behavior.
- Evaluate factuality, relevance, safety, and refusal behavior.
- Protect sensitive user prompts and retrieved content.
- Monitor hallucinations, unsafe outputs, latency, and cost.
- Include human escalation for high-risk answers.
- Plan document refresh and evaluation updates.
Common weak areas and traps
| Trap | Why it hurts exam performance | Better habit |
|---|---|---|
| Choosing the most advanced model by default | PMLE scenarios often reward fit-for-purpose design. | Start with business goal, data, constraints, and baseline. |
| Ignoring leakage | Leakage creates unrealistic metrics and poor production performance. | Ask: “Would this feature exist at prediction time?” |
| Using random splits for time-series data | Future information can leak into training. | Use time-aware validation and backtesting. |
| Optimizing accuracy on imbalanced data | Accuracy can hide failure on rare but important cases. | Use precision, recall, PR-AUC, cost-based thresholds. |
| Treating notebooks as production pipelines | Notebooks alone often lack reproducibility and controls. | Move repeatable steps into versioned pipelines. |
| Forgetting training-serving skew | Different preprocessing breaks production behavior. | Reuse transformations and validate production inputs. |
| Monitoring only CPU and memory | ML systems fail through data and behavior changes too. | Monitor data, predictions, quality, and business metrics. |
| Retraining automatically without gates | Bad new data can make the model worse. | Use validation, approval, and rollback criteria. |
| Granting broad permissions | Overly broad IAM increases risk. | Use least privilege and workload-specific service accounts. |
| Moving large data unnecessarily | Data movement adds cost, latency, and complexity. | Train or score near the data when practical. |
| Ignoring delayed labels | Quality monitoring may lag behind production changes. | Use proxy metrics plus delayed ground-truth evaluation. |
| Assuming generative AI is always the answer | It may add cost, latency, and risk. | Use rules, search, retrieval, or smaller models when sufficient. |
| Skipping human review | Some outputs or predictions carry high impact. | Add review, escalation, and documentation where needed. |
“Can you do this?” final skill checklist
Design and architecture
- I can design an end-to-end ML solution on Google Cloud from raw data to monitored predictions.
- I can justify service choices without relying on memorized product names only.
- I can compare AutoML, custom training, and BigQuery ML for a scenario.
- I can choose batch, streaming, or online serving patterns.
- I can identify the simplest architecture that satisfies requirements.
- I can include security, governance, monitoring, and cost controls in the design.
Data and modeling
- I can detect leakage, skew, drift, imbalance, and bad labels.
- I can choose the right split strategy.
- I can select metrics aligned with business risk.
- I can explain overfitting, underfitting, bias, variance, and regularization.
- I can reason about feature engineering for tabular, text, image, time-series, and event data.
- I can explain when embeddings are useful.
MLOps
- I can describe pipeline components and artifacts.
- I can version data, code, parameters, models, and environments conceptually.
- I can design evaluation gates before deployment.
- I can plan rollback and retraining.
- I can monitor both infrastructure and model behavior.
- I can troubleshoot failures across data, code, IAM, serving, and monitoring.
Security and responsible AI
- I can apply least privilege to ML workflows.
- I can identify sensitive data in training and prediction flows.
- I can explain auditability, lineage, and model documentation.
- I can reason about explainability and fairness requirements.
- I can identify prompt injection, hallucination, unsafe output, and data leakage risks.
- I can add human review where automation alone is risky.
Final-week review checklist
Use this section to focus review time before the exam.
7 to 5 days out
- Review Google Cloud ML service selection: Vertex AI, BigQuery ML, Dataflow, Pub/Sub, Cloud Storage, BigQuery, Cloud Run, Google Kubernetes Engine, and supporting security services.
- Rebuild your mental map of the ML lifecycle: data, features, training, evaluation, registry, deployment, monitoring, retraining.
- Drill metric selection for classification, regression, forecasting, ranking, and generative AI.
- Practice spotting leakage and skew in short scenarios.
- Review IAM and service account patterns for ML workflows.
- Summarize when to use online prediction, batch prediction, and streaming inference.
4 to 2 days out
- Work through architecture scenarios without notes.
- For each missed question, classify the miss: service selection, metric choice, data issue, MLOps, security, or troubleshooting.
- Review common traps, especially overfitting, class imbalance, time leakage, and overbroad permissions.
- Practice explaining tradeoffs in one or two sentences.
- Review model monitoring and retraining decision points.
- Review generative AI/RAG risks: grounding, evaluation, safety, prompt injection, and access control.
Final 24 hours
- Review your weakest service-selection tables.
- Recheck metric formulas and threshold tradeoffs.
- Review the difference between data drift, concept drift, prediction drift, and training-serving skew.
- Review pipeline artifact flow and rollback logic.
- Review security defaults: least privilege, service accounts, sensitive data handling, auditability.
- Avoid cramming obscure limits or quotas unless your own study materials require them.
- Rest enough to read scenarios carefully and avoid rushing.
Practical next step
Pick one weak area from the readiness map and turn it into a short drill: write the problem type, data source, Google Cloud services, model approach, metric, deployment pattern, monitoring plan, and security controls. Then compare your answer against the checklist above and repeat with a different scenario until your choices feel deliberate rather than memorized.