Free Google Cloud PMLE Practice Exam: Google Cloud Professional Machine Learning Engineer

Try 50 free Google Cloud Professional Machine Learning Engineer (Google Cloud Professional ML Engineer) questions across the exam domains, with explanations, then continue with IT Mastery practice.

This free full-length Google Cloud Professional ML Engineer practice exam includes 50 original IT Mastery questions across the exam domains.

These are original IT Mastery practice questions. They are not official Google Cloud questions, copied live-exam content, or exam dumps. Use them for self-assessment, scope review, and deciding what to drill next.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Google Cloud Professional ML Engineer on Web

Exam snapshot

  • Exam route: Google Cloud Professional ML Engineer
  • Practice-set question count: 50
  • Time limit: 120 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Architecting Low-Code AI Solutions13%
Collaborating within and Across Teams to Manage Data and Models16%
Scaling Prototypes into ML Models21%
Serving and Scaling Models20%
Automating and Orchestrating ML Pipelines18%
Monitoring AI Solutions13%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and interactive practice.

Practice questions

Questions 1-25

Question 1

Topic: Scaling Prototypes into ML Models

A retail team receives weekly partner exports as CSV files. The files must be type-checked, deduplicated, split into training and evaluation sets, and then used by an Agent Platform custom training job. The team wants the workflow to be repeatable and to record lineage for each run. Which setup best meets these requirements?

Options:

  • A. Use Agent Platform Pipelines with CSVs in Cloud Storage and preprocessing components before training.

  • B. Use Model Monitoring to create the training and evaluation splits.

  • C. Add CSV parsing and deduplication only to the online inference container.

  • D. Register the CSV files directly as versions in Agent Platform Model Registry.

Best answer: A

Explanation: CSV exports used for model training should be ingested at the training pipeline layer, not at serving or registry time. A common pattern is to land the CSV files in Cloud Storage, then use Agent Platform Pipelines components to validate schema, clean records, transform fields, and create training/evaluation splits before launching custom training. The pipeline can pass prepared data artifacts to later steps and record metadata for reproducibility and lineage. Model Registry manages trained model versions, and Model Monitoring evaluates production behavior after deployment; neither prepares raw CSV training data.

  • Registry misuse fails because Model Registry stores model artifacts and versions, not raw CSV preparation workflows.
  • Serving-time preprocessing fails because training data cleaning and train/evaluation splitting must happen before model training.
  • Monitoring stage fails because Model Monitoring is for deployed model behavior, not creating offline training datasets.

Question 2

Topic: Automating and Orchestrating ML Pipelines

A payments team uses Agent Platform Pipelines to retrain a transaction-risk classifier from BigQuery when Model Monitoring reports drift. The draft pipeline validates only that the extract completed and that the retrained model’s overall AUC exceeds the prior model. The model serves online with shared preprocessing code, and compliance requires no material degradation for small-merchant segments. Which engineering decision is BEST before enabling automatic deployment?

Options:

  • A. Add GPU tuning and deploy the lowest-latency model

  • B. Add data, skew, and slice-based model validation gates

  • C. Deploy automatically when overall AUC improves

  • D. Replace drift-triggered retraining with monthly manual retraining

Best answer: B

Explanation: Validation coverage is not sufficient when it checks only pipeline completion and a single aggregate metric. Before automating retraining or deployment, the pipeline should fail closed unless the new run passes data validation, preprocessing consistency checks, and model validation against the previous production model. Because the stem includes an online serving path and a compliance requirement for small merchants, the gates must include training-serving skew checks and slice-based performance thresholds, not just overall AUC. Capturing the results as pipeline metadata also supports auditability and repeatable promotion decisions. The key takeaway is that continuous training needs automated quality gates that match production risks and business constraints.

  • Aggregate AUC only can hide small-merchant regressions and does not validate input data or serving consistency.
  • Manual monthly retraining reduces automation but does not create repeatable validation coverage for safe promotion.
  • GPU and latency tuning may improve runtime, but it does not address drift, skew, or segment compliance requirements.

Question 3

Topic: Serving and Scaling Models

A team deployed a fraud model on Agent Platform Inference for a partner checkout application. The partner runs outside your Google Cloud VPC and must call online predictions over HTTPS using authenticated requests; unauthenticated access is prohibited. The partner receives connection timeouts.

SettingValue
Endpoint visibilityPrivate endpoint only
Public endpointDisabled
Caller networkPartner cloud, no VPN/interconnect

Which next action best addresses the endpoint failure?

Options:

  • A. Add Cloud NAT to the model project

  • B. Open firewall ingress to the private endpoint IP

  • C. Expose a managed public endpoint with authentication

  • D. Switch the workload to batch prediction

Best answer: C

Explanation: A private endpoint is reachable only by clients with an appropriate private network path, such as VPC connectivity through the supported private access mechanism. The evidence points to a network reachability issue: the caller is in a partner cloud, there is no VPN or interconnect, and the public endpoint is disabled. For external applications that need managed online prediction access over HTTPS, use a public Agent Platform Inference endpoint and apply the required security controls, such as authenticated requests and appropriate identity-based access. The endpoint should not be made unauthenticated; the key is public reachability with controlled access.

  • Cloud NAT supports outbound internet access for private resources; it does not publish a private prediction endpoint for inbound partner calls.
  • Firewall ingress does not make a private endpoint routable from an external cloud with no private connectivity.
  • Batch prediction changes the serving pattern and does not meet an online checkout application’s request-time prediction requirement.

Question 4

Topic: Automating and Orchestrating ML Pipelines

A team uses Agent Platform Pipelines to retrain a churn model nightly. The pipeline reads raw events and labels from BigQuery, joins and transforms them, writes a training set to Cloud Storage, runs custom training on GPUs, and automatically registers and deploys successful runs. A recent incident occurred when late-arriving labels caused 35% null labels after the join; training still completed. Where should the team add the primary validation gate?

Options:

  • A. Only in Model Monitoring after deployment

  • B. After joining and transforming data, before custom training

  • C. After model registration, before endpoint deployment

  • D. Before reading the raw BigQuery tables

Best answer: B

Explanation: Validation should be placed at the earliest pipeline point where the failure can be detected with the right context. In this scenario, the null labels appear after the BigQuery join and transformation step, not necessarily in the raw source tables. A data validation component in Agent Platform Pipelines should check label completeness, schema, required features, and basic data-quality thresholds before launching GPU training. This fails fast, saves training cost, and prevents a bad model from being registered and deployed. Later model validation and production monitoring are still useful, but they are not the primary gate for this specific failure risk.

  • Raw-source validation may miss defects introduced by joins, late-arriving labels, or transformations.
  • Post-registration validation is too late to avoid wasted training cost and may allow bad artifacts into the registry.
  • Production monitoring detects live behavior after deployment, but the requirement is to block the bad training set before training.

Question 5

Topic: Scaling Prototypes into ML Models

An ML team is moving a churn prototype into an Agent Platform custom training pipeline. The source of truth is a Cloud SQL customer_profile table. Compliance requires training to use only the approved BigQuery authorized view ml_training.customer_profile_masked, and the training set must be less than 24 hours old.

The latest pipeline run fails input validation:

Input path: gs://ml-snapshots/customer_profile/2026-05-24/*.csv
Snapshot age: 72 hours
Observed columns: customer_id, email, age, churn_label
Expected columns: customer_id, email_hash, age, churn_label
Observed type: age=STRING
Expected type: age=INTEGER

What is the most likely cause?

Options:

  • A. A Cloud SQL read replica is lagging behind the primary database.

  • B. The model registry points to an older model version.

  • C. The BigQuery authorized view has an incorrect schema.

  • D. The pipeline reads unmanaged CSV snapshots instead of the approved BigQuery view.

Best answer: D

Explanation: The failure points to a training data ingestion source mismatch. The pipeline is consuming exported CSV files from Cloud Storage, not the approved BigQuery authorized view. That bypasses governance controls such as masking email to email_hash, and it can also introduce stale data and schema drift because CSV exports often rely on snapshot timing and inferred or serialized types. For database-backed training data, the pipeline should read from the governed BigQuery table or view, or from a controlled ingestion process that enforces schema, freshness checks, and access policies before training begins. The visible input path and raw column are the strongest evidence.

  • Authorized view schema is not the likely issue because the run is reading from a Cloud Storage CSV path, not the approved view.
  • Read replica lag could affect freshness, but it does not explain the raw email column, type mismatch, or CSV input path.
  • Model registry versioning affects model deployment lineage, not training input validation.

Question 6

Topic: Automating and Orchestrating ML Pipelines

An ML team uses Cloud Build to run tests, resolve the approved model in Agent Platform Model Registry, and deploy it to an Agent Platform Inference endpoint. A commit triggers the build, but deployment fails before any traffic is changed. What is the most likely cause?

Build log excerpt:

Step 1: run unit tests ... PASS
Step 2: resolve approved model ... churn_model version 28
Step 3: deploy model to endpoint churn-prod
ERROR: PERMISSION_DENIED
Principal: serviceAccount:cloud-build-deployer@example.iam.gserviceaccount.com
Missing permission: aiplatform.endpoints.deployModel

Options:

  • A. A unit test failed before the deployment step

  • B. The Cloud Build deployer lacks endpoint deployment permission

  • C. The approved model alias points to an older artifact version

  • D. The build config omitted the model ID substitution

Best answer: B

Explanation: The failure is an IAM permission issue in the automated deployment stage. The build completed tests and resolved a specific approved model version, so the artifact and test stages succeeded. The error occurs only when Cloud Build attempts to deploy the model to the endpoint, and it names the deployer service account plus the missing deployment permission. The next action is to grant the Cloud Build deployer an appropriate Agent Platform endpoint deployment role or equivalent custom permission scope, ideally limited to the required project or endpoint.

Artifact versioning and build substitutions are less likely because the model was resolved before deployment began.

  • Artifact versioning is not supported by the evidence because the build resolved churn_model version 28 successfully.
  • Test failure does not match the log because the unit test step passed before deployment.
  • Build substitution is unlikely because the deploy step already has a concrete endpoint and model version.

Question 7

Topic: Serving and Scaling Models

A retail company has a trained recommendation model registered in Agent Platform Model Registry. The mobile app must score one customer interaction at a time while the user is browsing, and the product team requires a response in under 200 ms for each request. Which serving setup best meets the requirement?

Options:

  • A. Score interactions with Dataflow micro-batches

  • B. Deploy to an Agent Platform Inference online endpoint

  • C. Run nightly batch predictions to BigQuery

  • D. Trigger retraining for each interaction

Best answer: B

Explanation: Online inference is the right serving pattern when an application needs an immediate prediction for one request at a time. In this scenario, the mobile app is waiting while the user is browsing, so the model should be deployed to an online endpoint that can serve synchronous prediction requests and scale for production traffic. Batch prediction is better for large offline scoring jobs where results can be written later to BigQuery or Cloud Storage. Streaming or micro-batch processing can reduce delay compared with nightly jobs, but it still adds queuing and is not the normal request-response serving layer for an interactive app.

  • Nightly batch scoring fails because it produces offline results too late for in-session user interactions.
  • Micro-batch scoring adds processing delay and is better for stream processing than synchronous app requests.
  • Per-request retraining confuses model training with model serving and would not provide low-latency predictions.

Question 8

Topic: Serving and Scaling Models

A team trained a fraud model using preprocessing that imputes missing values, applies training-fitted scalers, and one-hot encodes merchant categories. Online applications will call an Agent Platform Inference endpoint with raw transaction JSON. The team must minimize training-serving skew and avoid duplicating feature logic in each client. Which serving setup is best?

Options:

  • A. Apply schema alignment only in postprocessing

  • B. Package the saved preprocessing artifacts in the inference container

  • C. Recompute scalers from each online prediction batch

  • D. Have each client preprocess requests before endpoint calls

Best answer: B

Explanation: Inference preprocessing should use the same transformation logic and fitted artifacts that were used during training. For an online Agent Platform Inference deployment, a common pattern is to package the model with the preprocessing code and artifacts, such as imputers, vocabularies, and scalers, in a custom inference container or equivalent serving layer. The endpoint can then accept raw request fields and transform them consistently before invoking the model. This reduces training-serving skew and keeps feature logic centrally versioned with the deployed model. Client-side preprocessing is harder to govern, and recomputing statistics from prediction traffic changes the feature distribution the model expects.

  • Client preprocessing can create inconsistent implementations across applications and versions.
  • Online recomputation uses serving-time statistics instead of training-fitted artifacts, causing skew.
  • Postprocessing alignment happens after prediction, so it cannot fix inputs that were transformed incorrectly.

Question 9

Topic: Scaling Prototypes into ML Models

A team is moving a prototype that fine-tunes a large open-source foundation model from Model Garden to Agent Platform custom training. Full fine-tuning is required, and profiling shows the model weights and optimizer state do not fit in memory on one accelerator, even at the smallest acceptable per-device batch size. The dataset can keep multiple workers busy. Which distribution strategy is the best engineering decision?

Options:

  • A. Increase only the global batch size

  • B. Move preprocessing to Dataflow

  • C. Partition the model across accelerators

  • D. Replicate the model and split input batches

Best answer: C

Explanation: The deciding factor is what drives the distribution. Data parallelism replicates the full model on each worker and sends different data batches to each replica. That helps when the model fits on each accelerator and the bottleneck is training throughput over a large dataset. Here, the full model plus optimizer state cannot fit on one accelerator, so each worker cannot hold a complete replica. Model parallelism partitions model layers, tensors, or parameters across accelerators so the single training step can run with the model spread over multiple devices. The large dataset supports scaling, but it is not the primary constraint.

  • Batch splitting fails because each worker would still need a full model replica in memory.
  • Larger batches can improve throughput, but they do not solve model-state memory limits.
  • Dataflow preprocessing may help input pipelines, but the visible blocker is accelerator memory for the model.

Question 10

Topic: Monitoring AI Solutions

A subscription company serves an online churn-risk classifier on Agent Platform Inference for 30,000 predictions per hour. Labels arrive after 14 days, and the business wants to restore offer targeting without increasing online latency. Model Monitoring on Gemini Enterprise Agent Platform shows:

SignalResult
Serving latency/errorsWithin SLO
Input drift vs. trainingHigh on usage features
Training-serving skewNone detected
Delayed-label AUC0.91 to 0.74
Previous model shadow AUC0.73 on recent data

Which action is the best engineering decision?

Options:

  • A. Escalate for business review of the offer policy

  • B. Roll back to the previous model version immediately

  • C. Investigate the serving feature pipeline before changing the model

  • D. Retrain with recent labeled data and validate before rollout

Best answer: D

Explanation: The monitoring evidence points to a model that no longer represents current user behavior. Input drift is high, delayed labels show a large AUC drop, and the shadow evaluation shows the previous model performs similarly poorly on recent data. Serving latency and errors are healthy, and no training-serving skew is detected, so the immediate issue is not endpoint reliability or a broken feature pipeline. Retraining can happen offline through the training pipeline, then the candidate model should be validated and rolled out safely without adding online latency. A rollback would not restore quality because the older model is also degraded on current data.

  • Rollback trap fails because the previous version has similar recent AUC, so the deployed version is not the main cause.
  • Feature-pipeline trap fails because no training-serving skew or serving reliability problem is visible.
  • Business-review trap fails because labeled evaluation shows a measurable ML performance degradation first.

Question 11

Topic: Automating and Orchestrating ML Pipelines

An ecommerce team serves a fraud classification model on Agent Platform Inference. Its retraining policy triggers continuous training only when Model Monitoring shows drift or delayed-label quality decay. During a flash sale, the auto-approval KPI drops below target.

Monitoring excerpt:

SignalObservation
Request volume3x normal for 2 hours
p95 prediction latency1.8s vs 300 ms SLO
Endpoint errors/timeoutsElevated
Feature driftWithin baseline
Delayed-label AUC0.91, unchanged

What is the most likely cause of the degraded production metric?

Options:

  • A. The model has concept drift and needs retraining

  • B. Endpoint serving capacity is insufficient for traffic

  • C. The retraining pipeline skipped feature validation

  • D. Training data no longer represents production traffic

Best answer: B

Explanation: Production metric declines can come from model quality problems or serving reliability problems. Here, the decisive signals point to serving scale: request volume tripled, p95 latency violates the SLO, and endpoint errors/timeouts are elevated. At the same time, feature drift is within baseline and delayed-label AUC is unchanged, so there is no evidence that the model’s learned relationship has degraded. The next operational focus would be endpoint capacity, autoscaling, resource saturation, or fallback behavior, not continuous training. Retraining is appropriate when monitoring shows data drift, concept drift, training-serving skew, or label-based quality decay.

  • Training-data mismatch would usually be supported by feature drift, schema changes, or skew signals, which are not present here.
  • Concept drift would appear as declining label-based quality, but delayed-label AUC is unchanged.
  • Skipped validation is not supported because the stem gives no pipeline failure or bad model promotion evidence.

Question 12

Topic: Automating and Orchestrating ML Pipelines

A financial services team uses Agent Platform Pipelines to retrain a tabular fraud detection model weekly. The model can be promoted to online inference only if it meets predefined quality thresholds, passes a fairness check on protected groups, and stays within the endpoint latency budget on a sample batch. The team also needs an auditable record of each decision. Which engineering decision is best?

Options:

  • A. Deploy each trained model, then rely on Model Monitoring alerts

  • B. Validate only the input schema before training begins

  • C. Add a model validation gate that records metrics and conditionally promotes the model

  • D. Approve promotion manually from a shared notebook after each run

Best answer: C

Explanation: Model validation in an ML pipeline should run before promotion and should evaluate the candidate model against explicit release criteria. In this scenario, the gate needs to check predictive quality, responsible AI criteria, and serving readiness such as latency. Agent Platform Pipelines can include a validation component that writes evaluation artifacts and decisions to Agent Platform ML Metadata, then uses a conditional step to register or promote only models that pass. This makes the decision repeatable and auditable. Monitoring after deployment is still important, but it does not replace pre-promotion validation when governance requires evidence before serving traffic.

  • Monitoring after deploy fails because it allows an unvalidated model to reach online inference before governance checks pass.
  • Manual notebook approval is hard to reproduce and audit consistently across weekly retraining runs.
  • Schema-only validation catches bad inputs, but it does not assess model quality, fairness, or serving latency.

Question 13

Topic: Collaborating within and Across Teams to Manage Data and Models

An ML team is preparing to hand off a customer-support prototype to another internal team. The package includes an Agent Platform Workbench notebook, BigQuery training tables with customer identifiers and support notes, embeddings generated from those notes, prompt templates, and a model version. The receiving team only needs to reproduce evaluation results and run approved batch predictions. Which configuration is best before sharing?

Options:

  • A. Copy all artifacts to a shared Cloud Storage bucket with signed URLs

  • B. Deploy the model to a private endpoint and share endpoint invoker access

  • C. Use a curated handoff project with de-identified data, redacted prompts and embeddings, and least-privilege read access

  • D. Grant project Viewer and BigQuery Data Viewer on the source project

Best answer: C

Explanation: Cross-team ML handoffs should treat datasets, notebooks, prompts, embeddings, and model artifacts as separate assets that may expose sensitive information. The safest workflow is to create a controlled handoff area containing only approved, de-identified data and reviewed artifacts, then grant least-privilege access to the receiving group. Embeddings and prompts should not be assumed safe just because they are derived from source text; they can still reveal sensitive content or business logic. Model versions should be shared through controlled artifact management, such as Agent Platform Model Registry, with read-only or invocation-specific permissions as appropriate. Broad source-project access or raw artifact copies bypass the privacy review that the handoff requires.

  • Source-project access is too broad because it can expose raw identifiers, notes, and unrelated project resources.
  • Shared bucket copying loses fine-grained governance and may distribute prompts, embeddings, or artifacts before review.
  • Endpoint-only sharing supports inference but does not let the receiving team reproduce evaluation results or review approved handoff materials.

Question 14

Topic: Scaling Prototypes into ML Models

A team moves an image classification prototype from Agent Platform Workbench to an Agent Platform custom training job. The job fails on distributed workers with FileNotFoundError for paths such as /home/jupyter/data/images/cat001.jpg. The dataset will grow to millions of images plus exported label files. What should the ML engineer check or change first?

Options:

  • A. Upload the files to Cloud Storage and use gs:// URIs

  • B. Load the image bytes into BigQuery tables

  • C. Increase the Workbench notebook boot disk size

  • D. Register the images in Agent Platform Feature Store

Best answer: A

Explanation: The failure points to local file paths from the notebook environment being used in a remote custom training job. Distributed training workers do not share the Workbench VM filesystem. For file-based training data such as images, videos, text files, and exported datasets, Cloud Storage is the standard scalable object store on Google Cloud. The training code should read data from gs:// locations, with labels or manifests also stored in accessible objects when appropriate.

BigQuery is better for structured/tabular data and SQL-based workflows, not as the primary store for millions of raw image files. Feature Store is for managed feature values used in training or serving, not raw media object storage.

  • BigQuery for raw images adds unnecessary complexity and is not the usual storage layer for file-based media training data.
  • Feature Store misuse confuses reusable feature serving with storing raw training files.
  • Bigger notebook disk does not make local files available to distributed remote workers.

Question 15

Topic: Scaling Prototypes into ML Models

A fintech team is scaling a tabular binary classification prototype for credit-limit decisions on Gemini Enterprise Agent Platform. Both candidates meet latency and cost targets, but applicants and auditors require stable per-decision reason codes.

CandidateValidation AUCInterpretability
Logistic regression in BigQuery ML0.842Direct feature coefficients
Deep ensemble custom model0.848Post hoc explanations vary by run

Which engineering decision is best?

Options:

  • A. Average both models and report aggregate feature importance

  • B. Deploy the deep ensemble because it has the highest AUC

  • C. Remove sensitive fields and deploy the deep ensemble

  • D. Deploy the logistic regression model and document its explanations

Best answer: D

Explanation: Interpretability should take priority when the decision is high impact, regulated, or requires defensible explanations for individual predictions. In this scenario, the deep ensemble improves AUC by only 0.006, while the business requirement is stable per-decision reason codes for applicants and auditors. A simpler model with clear coefficients is easier to validate, explain, govern, and monitor, especially when latency and cost are already acceptable. Marginal predictive improvement is not enough to justify a less explainable architecture when explainability is a stated production requirement.

  • Highest AUC only fails because the performance improvement is marginal and does not meet the auditability constraint.
  • Model averaging may reduce interpretability further and aggregate importance does not provide stable individual reason codes.
  • Removing sensitive fields can help privacy or fairness reviews, but it does not make a deep ensemble inherently explainable.

Question 16

Topic: Serving and Scaling Models

A team packaged a PyTorch model in a custom container and deployed it to Agent Platform Inference. The image builds successfully, the container starts, and health checks pass. Online predictions return HTTP 500. Training used GPUs, but the endpoint was deployed on CPU-only serving nodes.

Exhibit: Serving log excerpt

INFO Starting prediction server on :8080
INFO Loading model from /models/model.pt
RuntimeError: Attempting to deserialize object on a CUDA device
but torch.cuda.is_available() is False.

What is the most likely cause?

Options:

  • A. Missing Python dependency in the serving container

  • B. Runtime mismatch between GPU-trained artifact and CPU serving

  • C. Missing model artifact in the expected path

  • D. Incorrect container entrypoint or startup command

Best answer: B

Explanation: This failure pattern points to a runtime mismatch. The container starts successfully, listens for requests, and begins loading /models/model.pt, so the entrypoint and artifact path are not the main issue. The decisive evidence is the PyTorch error saying the artifact is being deserialized for a CUDA device while torch.cuda.is_available() is false. That usually happens when a model saved with GPU tensors is loaded in a CPU-only serving environment, or when the container/runtime does not match the accelerator assumptions used during training. The fix is to save/load the model in a CPU-compatible way, such as using CPU map_location, or deploy to compatible GPU serving infrastructure if the model requires it.

  • Missing dependency would more likely show an import or module error, not a CUDA availability error.
  • Bad entrypoint would usually prevent the server from starting or passing health checks.
  • Missing artifact would produce a file or path error before PyTorch tried to deserialize CUDA tensors.

Question 17

Topic: Monitoring AI Solutions

A financial services team serves a tabular binary classifier on Agent Platform Inference to approve credit-line increases. Compliance reviewers and product managers need to understand which input features most influenced individual high-risk predictions and whether the model may be relying on proxy features for sensitive attributes. The team wants a managed approach that supports online serving reviews without rebuilding the model. What is the best engineering decision?

Options:

  • A. Train a separate interpretable surrogate model for reviews

  • B. Add only Model Monitoring data drift alerts on request features

  • C. Export raw prediction logs for manual stakeholder inspection

  • D. Enable Agent Platform model explainability and review feature attributions

Best answer: D

Explanation: Model explainability on Agent Platform is the right fit when stakeholders need understandable prediction factors for a deployed model. For a tabular classifier, feature attributions can show which inputs contributed most to an individual prediction and can be aggregated to support model reviews. This helps compliance and product teams investigate possible reliance on sensitive proxies while keeping the existing online serving architecture. Drift monitoring and raw logs are useful operational signals, but they do not directly explain why a prediction was made. A surrogate model may be easier to interpret, but it explains an approximation rather than the actual served model.

  • Drift-only monitoring shows distribution changes, not the feature contributions behind individual predictions.
  • Manual log review exposes inputs and outputs but does not provide attribution or stakeholder-ready explanations.
  • Surrogate modeling can be useful for exploration, but it adds approximation risk and does not explain the deployed model directly.

Question 18

Topic: Scaling Prototypes into ML Models

A retail company prototyped a Gemini-based assistant with retrieval over product manuals. The retrieved passages are usually correct, and prompt changes have improved formatting, but the model still gives the wrong task-specific troubleshooting sequence for a high-value device family. The team has 8,000 reviewed examples of correct responses and wants to avoid training a model from scratch. Which training setup best meets the requirement?

Options:

  • A. Fine-tune Gemini with the reviewed input-output examples

  • B. Train a custom model from scratch on GPUs

  • C. Expand the retrieval corpus with more product manuals

  • D. Increase the endpoint machine type for lower latency

Best answer: A

Explanation: Fine-tuning adapts a foundation model’s behavior using representative task examples. In this scenario, retrieval is already bringing in the right information, and prompt changes have not fixed the specialized troubleshooting sequence. The failure is not mainly missing context or serving capacity; it is the model’s learned response pattern for a specific task. Using the reviewed examples to fine-tune Gemini, such as through supported Model Garden or BigQuery-based fine-tuning workflows, is the right next step before production evaluation and deployment.

Adding more retrieval data helps when the model lacks facts. Here, the facts are available, but the response behavior is still wrong.

  • More retrieval data addresses missing or stale context, not incorrect behavior when relevant context is already retrieved.
  • Larger serving hardware can improve latency or throughput, but it does not teach the model a new troubleshooting sequence.
  • Training from scratch is unnecessarily complex and costly when a foundation model can be adapted with curated examples.

Question 19

Topic: Scaling Prototypes into ML Models

A customer support team has a Gemini-based assistant that drafts case summaries and suggested resolutions. Prompt revisions improved tone, but the model still misses company-specific resolution patterns. The team has 30,000 reviewed prompt-response examples in BigQuery, wants managed training and serving, and does not need to train a new model architecture. What is the best engineering decision?

Options:

  • A. Add only a retrieval layer over support documents

  • B. Fine-tune a Gemini model using the curated BigQuery examples

  • C. Replace the assistant with Agent Platform AutoML tabular classification

  • D. Train a custom transformer from scratch on GPU or TPU nodes

Best answer: B

Explanation: When prompt engineering is not enough but the task still fits a foundation model, supervised fine-tuning is the appropriate middle ground. The existing Gemini model already provides general language capability, while the 30,000 reviewed examples provide the task-specific behavior the team wants: company-specific summaries and resolutions. Using BigQuery as the example source keeps the workflow managed and avoids building a custom model architecture or distributed training stack. RAG can help when missing external facts are the main issue, but the stem points to response-pattern adaptation, not knowledge lookup.

  • Custom transformer training adds unnecessary architecture, infrastructure, cost, and training complexity for a task that can use an existing foundation model.
  • AutoML tabular classification does not match the generative summary and resolution-drafting task.
  • Retrieval only helps ground answers in documents, but it does not directly teach the model the desired output style and resolution patterns.

Question 20

Topic: Monitoring AI Solutions

A retail company runs a Gemini-based support assistant on Agent Platform Inference. A new prompt template and retrieval configuration were released as a 10% canary. Within an hour, Model Monitoring and LLM-as-a-judge checks report policy-violating responses from the canary, while the previous deployment remains stable. What is the best immediate operations response?

Options:

  • A. Increase canary replicas and GPU capacity.

  • B. Start custom training on recent chat logs.

  • C. Disable evaluation alerts until more samples arrive.

  • D. Route all traffic to the last approved deployment.

Best answer: D

Explanation: For a production AI incident with sudden unsafe behavior, first contain the blast radius and restore the last known safe state. Because the degradation is isolated to the canary and the prior deployment is stable, the safest immediate action is to use rollout controls to send traffic back to the approved version. After containment, the team can preserve logs, review monitoring and LLM-as-a-judge evidence, fix the prompt or retrieval configuration, and add stronger release gates before redeploying. Capacity changes or retraining do not address an active safety regression caused by a rollout.

  • Capacity scaling treats the issue as throughput or latency, but the symptom is unsafe content from a specific canary.
  • Custom training is a slower lifecycle action and does not immediately protect users from the bad deployment.
  • Disabling alerts removes visibility during an incident and allows unsafe responses to continue.

Question 21

Topic: Serving and Scaling Models

A team has a scikit-learn binary classifier saved as a supported model artifact. It uses only libraries included in the supported scikit-learn serving runtime and expects the default JSON prediction format. There is no custom preprocessing or postprocessing. The team wants the least operational work for online inference on Google Cloud. Which serving setup should they choose?

Options:

  • A. Wrap the model in a Cloud Run custom REST service.

  • B. Run predictions as an Agent Platform Pipelines workflow.

  • C. Deploy with the matching prebuilt serving container on Agent Platform Inference.

  • D. Build a custom prediction container for Agent Platform Inference.

Best answer: C

Explanation: Prebuilt serving containers are the preferred choice when the model artifact, framework version, dependencies, and request/response format match supported serving defaults. In this scenario, the scikit-learn model needs no custom inference code, dependency installation, preprocessing, or postprocessing. Deploying it to Agent Platform Inference with the matching prebuilt container gives managed online serving with minimal operational work. A custom container is useful when you need unsupported libraries, a different runtime, custom handlers, or nonstandard inference logic. Cloud Run can serve models, but it shifts more serving responsibility to the team. Pipelines are for orchestration, not low-latency online inference.

  • Custom container adds build and maintenance work that the stated runtime requirements do not require.
  • Cloud Run service can host custom APIs, but it is a lower-level serving pattern than managed model inference for this case.
  • Pipelines workflow fits batch orchestration or retraining, not direct online prediction serving.

Question 22

Topic: Collaborating within and Across Teams to Manage Data and Models

An ML team is exploring a foundation model from Model Garden for an internal policy assistant. Prototype outputs have acceptable tone, but several answers cite non-existent policy clauses and one response gives prohibited HR advice. Business stakeholders will not approve a pilot until answers are grounded in approved documents and pass safety review. Which workflow setup is the best next action?

Options:

  • A. Fine-tune immediately on historical chat transcripts to improve response style

  • B. Register the model and start a canary deployment to collect production feedback

  • C. Move the prototype to GPU-backed online serving to reduce latency

  • D. Run a cross-functional prototype review using captured prompts, outputs, grounding failures, and safety findings

Best answer: D

Explanation: When a foundation-model prototype exposes hallucination, unsafe responses, or business-fit gaps, the next step is a structured prototype review, not production rollout or serving optimization. The team should examine representative prompts and outputs with domain experts, responsible AI or safety reviewers, and business owners. The review should classify failures, define acceptance criteria, and decide whether the next iteration needs grounding with approved sources, safety controls such as filters or Model Armor, prompt changes, fine-tuning, or rejection of the approach. This keeps the lifecycle decision at the prototype review stage before committing to deployment.

  • Production canary fails because known safety and grounding defects should be resolved before exposing users to the model.
  • Immediate fine-tuning is premature because the team has not confirmed the failure categories or whether grounding and controls are sufficient.
  • Serving acceleration addresses latency, not hallucinated citations, prohibited advice, or stakeholder approval criteria.

Question 23

Topic: Scaling Prototypes into ML Models

A retail company has a tabular fraud-detection prototype trained from BigQuery data. The score is needed synchronously during checkout while the user waits. The production service must use private Google Cloud connectivity, meet a high-availability target, support controlled model version rollouts, and avoid operating Kubernetes. Which deployment strategy best meets these requirements?

Options:

  • A. Serve the model on a self-managed GKE cluster.

  • B. Run hourly BigQuery ML batch predictions into BigQuery.

  • C. Deploy to a private Agent Platform Inference online endpoint.

  • D. Use Agent Platform batch prediction after checkout completes.

Best answer: C

Explanation: The key decision is online managed inference versus batch scoring or self-managed serving. Checkout fraud detection is a synchronous request-response task, so the application needs a low-latency online endpoint rather than precomputed or delayed predictions. Agent Platform Inference provides managed model serving with autoscaling, health management, private endpoint patterns, and controlled model version rollout without requiring the ML team to operate Kubernetes. Batch prediction is better for offline scoring where results can be written to BigQuery or Cloud Storage for later use. Self-managed GKE can serve models, but it adds cluster and serving-stack operations that the requirements explicitly avoid.

  • Scheduled scoring fails because hourly predictions may be stale and do not satisfy a checkout-time request.
  • Post-checkout scoring misses the interaction pattern because the decision is needed before the transaction completes.
  • Self-managed GKE conflicts with the operational constraint to avoid Kubernetes ownership.

Question 24

Topic: Architecting Low-Code AI Solutions

A finance team is adding invoice intake to a Cloud Run application. The app must extract vendor, totals, due dates, and line items from standard PDF invoices with high field accuracy, return results within a few seconds for each uploaded invoice, and minimize custom ML code. Which low-code strategy best meets these requirements?

Options:

  • A. Train an Agent Platform AutoML tabular model

  • B. Use Document AI Invoice Parser with online processing

  • C. Fine-tune a Gemini model for invoice extraction

  • D. Use Vision API OCR and custom regular expressions

Best answer: B

Explanation: Document AI is the best fit when the task is structured extraction from business documents, especially common document types such as invoices. A prebuilt Invoice Parser reduces implementation effort because it already understands invoice fields and returns structured results with confidence information through an API. Online processing is appropriate when the application needs a response within a few seconds for each uploaded document. Vision API OCR can read text, but it leaves field mapping, line-item structure, and validation logic to the application. Custom training or fine-tuning can be useful when prebuilt processors do not meet requirements, but they add lifecycle, data, and evaluation work that the scenario is trying to avoid.

  • OCR plus regex is lower-level text extraction and would require brittle custom parsing for invoice structure.
  • AutoML tabular is not the right layer because the input is unstructured PDFs, not rows of prepared features.
  • Gemini fine-tuning adds model development effort when a purpose-built document API already matches the task.

Question 25

Topic: Monitoring AI Solutions

A payments team deployed a real-time binary fraud model on Gemini Enterprise Agent Platform. Fraud is rare, false positives block legitimate customers, false negatives create chargebacks, and confirmed fraud labels arrive 7-14 days after each transaction. The team needs a continuous evaluation setup that reflects model quality, business impact, and production behavior. Which setup is best?

Options:

  • A. Track overall accuracy on the latest training split

  • B. Track training loss and validation AUC only

  • C. Track precision, recall, expected cost, and drift by segment

  • D. Track endpoint latency and CPU utilization only

Best answer: C

Explanation: Continuous evaluation should combine outcome-based model metrics, business-weighted metrics, and production monitoring. For rare fraud, overall accuracy can look high even when the model misses most fraud, so precision, recall, or PR-AUC at the operating threshold are more useful. Because false positives and false negatives have different business costs, expected cost or savings should be included. Since labels arrive later, predictions should be logged and joined with delayed confirmed labels, then evaluated by important segments such as region, merchant type, or payment method. Model Monitoring can also track drift or skew in production inputs. Latency and resource metrics matter operationally, but they do not measure prediction quality by themselves.

  • Accuracy-only evaluation hides rare-class failures and does not reflect chargeback or customer-blocking costs.
  • Infrastructure-only monitoring checks serving health but cannot show whether fraud decisions are correct.
  • Training-only metrics do not capture delayed production labels, segment behavior, or input drift after deployment.

Questions 26-50

Question 26

Topic: Automating and Orchestrating ML Pipelines

A data science team is troubleshooting why a model retraining workflow fails a production readiness review. The workflow runs from a shared notebook on a VM, with cron calling several Python scripts. Review findings show inconsistent reruns, no central lineage for datasets and model artifacts, limited run monitoring, and difficult handoffs between teams. Which orchestration approach best addresses the root cause?

Options:

  • A. Use Cloud Build as the main retraining orchestrator

  • B. Use Agent Platform Pipelines with reusable components

  • C. Keep the notebook and add more Git branches

  • D. Run the same scripts in a larger VM cron job

Best answer: B

Explanation: The root issue is not only compute size or source control. The workflow lacks a managed, reproducible orchestration layer designed for ML lifecycle needs. Agent Platform Pipelines lets the team define versioned pipeline components, run them consistently, scale steps on managed infrastructure, and capture metadata such as inputs, outputs, artifacts, metrics, and lineage. This supports monitoring and collaboration because different teams can inspect runs and reuse components rather than depending on a shared notebook and VM-local logs.

A general CI tool or a larger cron host can trigger work, but they do not provide the same ML-native pipeline tracking and reproducibility controls.

  • More Git branches may improve code collaboration, but it does not create managed pipeline runs, lineage, or scalable step execution.
  • Larger cron host treats the symptom as capacity, while the evidence points to missing orchestration and metadata.
  • Cloud Build orchestration is useful for CI/CD triggers, but it is not the best primary orchestrator for ML pipeline runs and lineage.

Question 27

Topic: Collaborating within and Across Teams to Manage Data and Models

A data science team is preparing a churn prototype in Agent Platform Workbench using a BigQuery view that will feed Agent Platform Feature Store. A required privacy gate fails before training with: Sensitive data detected in experiment dataset. The project policy says experiments may use behavioral aggregates only; direct identifiers, PII in text, and protected attributes must be removed or masked unless explicitly approved.

Exhibit: Columns in the training view

ColumnNotes
customer_idstable account identifier
emailcopied from CRM
age_bandused only for segmentation dashboards
last_90d_purchasesbehavioral aggregate
support_notesfree text from agents

What is the most likely cause of the failed privacy gate?

Options:

  • A. The privacy gate should run after the train/test split.

  • B. The behavioral aggregate must be moved out of BigQuery first.

  • C. Unminimized sensitive fields are present without masking or approval.

  • D. The prototype must use custom training instead of Workbench.

Best answer: C

Explanation: Data minimization and sensitive information handling must happen before experimentation, feature registration, or model training. The visible policy allows behavioral aggregates only, but the training view still contains direct identifiers (customer_id, email), a potentially protected attribute (age_band), and free text (support_notes) that can contain PII. The appropriate diagnostic conclusion is that the experiment dataset was not reduced to approved fields or de-identified before use. A safer workflow would create a governed training view that excludes unneeded sensitive columns, masks or tokenizes fields only when there is an approved modeling need, and controls access to the resulting features. Splitting data or changing the training environment does not remove sensitive content.

  • Moving behavioral aggregates out of BigQuery does not address the raw sensitive columns shown in the view.
  • Switching from Workbench to custom training changes the execution method, not the privacy posture of the dataset.
  • Running the gate after the split would copy the same sensitive fields into training and evaluation datasets.

Question 28

Topic: Automating and Orchestrating ML Pipelines

A fraud detection team uses Agent Platform Pipelines for continuous training. A scheduled pipeline retrains from new BigQuery data, registers a candidate model, and deploys to an Agent Platform Inference endpoint. The team wants full automation but must prevent a retrained candidate with lower validation ROC AUC than the current production model from being promoted. Which pipeline configuration best meets this requirement?

Options:

  • A. Retrain only when data drift exceeds an alert threshold

  • B. Add a champion-challenger evaluation gate before deployment

  • C. Deploy the candidate first, then compare live prediction latency

  • D. Enable Model Monitoring after the endpoint receives traffic

Best answer: B

Explanation: Continuous training pipelines should include a model validation step before deployment. For this scenario, the pipeline needs to evaluate the newly trained candidate on an agreed validation dataset, compare its ROC AUC against the current production champion’s recorded metric, and conditionally continue only if the candidate meets the promotion rule. Agent Platform Pipelines can implement this as a gate before model registration as deployable, endpoint update, or rollout. This prevents automation from blindly replacing a stronger production model. Monitoring and drift triggers are useful lifecycle controls, but they do not by themselves prove that a newly trained model is better before promotion.

  • Monitoring after deployment detects production issues, but it allows the weaker model to reach traffic first.
  • Drift-triggered retraining decides when to train, not whether the trained candidate is safe to promote.
  • Latency comparison measures serving performance, not validation quality against the production champion.

Question 29

Topic: Architecting Low-Code AI Solutions

A team must deliver a tabular lead-scoring model in 10 days. The labeled training data is already in BigQuery, and there is no requirement for a custom architecture or custom training loop. Their Agent Platform custom training job keeps failing during container dependency setup, and the team has limited ML engineering support. What is the best next step?

Options:

  • A. Rewrite the custom training container from scratch

  • B. Fine-tune a Gemini model using BigQuery

  • C. Move the data to Cloud Storage and retry custom training

  • D. Use Agent Platform AutoML with the BigQuery table

Best answer: D

Explanation: Agent Platform AutoML is preferable when the problem is a standard supervised task, the data is already organized, and the main blocker is operational complexity rather than model capability. In this scenario, custom training is failing on container setup, not because the task needs custom code. AutoML provides managed training, evaluation, and deployment workflows with less ML engineering effort, which matches the 10-day delivery constraint.

Fine-tuning Gemini is aimed at generative AI behavior, not a typical tabular lead-scoring classifier. Retrying custom training keeps the team focused on infrastructure work that the requirement does not justify.

  • Custom container work adds engineering effort and does not address the short timeline.
  • Gemini fine-tuning targets generative model adaptation, not a standard tabular prediction workflow.
  • Cloud Storage migration changes the data location but does not remove the custom training burden.

Question 30

Topic: Scaling Prototypes into ML Models

A retail company has a scikit-learn prototype that predicts customer churn from a large tabular dataset in BigQuery. The team wants managed training, built-in evaluation and hyperparameter tuning, and some control over feature transformations, but it does not want to maintain custom training containers or distributed training code. What is the best engineering decision?

Options:

  • A. Train only a BigQuery ML logistic regression model

  • B. Run custom PyTorch training on GKE

  • C. Build a Ray training job with custom preprocessing code

  • D. Use Tabular Workflows on Gemini Enterprise Agent Platform

Best answer: D

Explanation: Tabular Workflows fit tabular ML problems that need managed training but also need more control than a fully automated path. In this scenario, the data is structured, the task is a standard predictive model, and the team wants built-in evaluation and tuning without owning containers or distributed training logic. Tabular Workflows let the team configure managed training pipeline behavior, including feature transformation choices and tuning, while avoiding the operational burden of custom training infrastructure. Fully custom GKE or Ray approaches add engineering complexity that the stem explicitly wants to avoid. A simple BigQuery ML model may be fast to build, but it does not best satisfy the need for a managed tabular training workflow with configurable pipeline steps.

  • Custom GKE training adds container, cluster, and training-code ownership that the team is trying to avoid.
  • Ray training is useful for custom distributed workloads, but it increases engineering effort for a standard tabular task.
  • BigQuery ML only is convenient for SQL-based models, but it is less aligned with configurable managed tabular training and tuning.

Question 31

Topic: Architecting Low-Code AI Solutions

A support organization is building a managed low-code model on Gemini Enterprise Agent Platform. The goal is to score new tickets for likelihood of missing a 4-hour SLA so they can be escalated. The BigQuery table has 3 million historical tickets with ticket text, queue, timestamps, resolution outcome, and assigned team. The current Agent Platform AutoML plan uses assigned team as the label because it is complete; the pilot has high team-prediction accuracy but poor recall on actual SLA misses. What is the BEST engineering decision?

Options:

  • A. Fine-tune Gemini to predict assigned team.

  • B. Tune thresholds on the team classifier.

  • C. Add more assigned-team training rows.

  • D. Retrain with an SLA-miss target label.

Best answer: D

Explanation: The issue is a mismatched training objective. The business needs a binary risk score for SLA misses, but the model is trained to predict assigned team. High accuracy on the wrong label does not imply useful escalation performance. Because the BigQuery table includes timestamps and resolution outcome, the team can derive a historical SLA-miss label and train an Agent Platform AutoML classifier against the actual decision target. After retraining, evaluation should focus on metrics that reflect escalation risk, such as recall for SLA misses and precision at the chosen operating point. Adding rows or tuning thresholds cannot fix a model trained on the wrong outcome.

  • Assigned-team labels are complete, but they represent routing history rather than SLA risk.
  • More same-label data may improve team prediction while leaving SLA-miss recall poor.
  • Threshold tuning only helps when the model output is already aligned to the decision being optimized.

Question 32

Topic: Serving and Scaling Models

A team trained a PyTorch ranking model with Agent Platform custom training. The training image includes notebooks, tuning libraries, data-access clients, and evaluation tools. The model must be deployed to Agent Platform Inference with p95 latency under 100 ms, frequent retraining through a pipeline, and a reduced vulnerability surface. Training and serving must use the same feature transformations. Which serving package design is the best engineering decision?

Options:

  • A. Use a separate minimal inference image with shared preprocessing.

  • B. Embed the retraining pipeline inside the inference container.

  • C. Reuse the training image for the inference endpoint.

  • D. Install training dependencies dynamically when the container starts.

Best answer: A

Explanation: Production inference containers should contain only what is needed to load the model, transform requests consistently, and serve predictions. In this scenario, the training image has many dependencies that increase image size, startup time, attack surface, and maintenance burden. A better design is to export the trained model artifact, build a separate serving image with the inference runtime and server, and include a small versioned preprocessing package shared with training. The retraining pipeline can publish new model versions without forcing training tools into the endpoint container. The key is separation of concerns: training code belongs in the training pipeline, while serving code belongs in the inference image.

  • Training image reuse carries notebooks, tuning tools, and data clients into production, increasing latency and vulnerability exposure.
  • Dynamic installs make startup slower and less reliable, and they do not remove training dependency coupling.
  • Embedded retraining mixes pipeline orchestration with online serving and complicates endpoint reliability.

Question 33

Topic: Collaborating within and Across Teams to Manage Data and Models

A team tracks experiments for a chargeback prediction model. The business can manually review only the top 2% highest-risk transactions and wants to catch as many true chargebacks as possible. The selected model had the best overall accuracy and ROC AUC offline, but the pilot caught fewer chargebacks than the previous model. What is the most appropriate next diagnostic step?

Options:

  • A. Compare recall and precision at the top 2% cutoff

  • B. Choose the run with the highest overall accuracy

  • C. Rank runs by final training loss

  • D. Compare RMSE of predicted risk scores

Best answer: A

Explanation: This is a rare-event binary classification problem with a constrained operating point. Overall accuracy can look strong when chargebacks are uncommon, and ROC AUC summarizes ranking quality across all thresholds, not necessarily the top 2% the business can review. The diagnostic should evaluate the model where it will actually be used: the top 2% risk cutoff. Useful tracked metrics include recall at the cutoff, precision at the cutoff, lift, and a confusion matrix for that threshold. The key takeaway is to align evaluation metrics with the business decision, not only general model-quality summaries.

  • Overall accuracy can be dominated by non-chargeback transactions and may hide missed positives.
  • RMSE treats the task like regression and does not directly evaluate classification decisions.
  • Training loss helps debug optimization, but it does not show business performance at the review threshold.

Question 34

Topic: Automating and Orchestrating ML Pipelines

A credit-risk model trained by Agent Platform Pipelines was redeployed after an automated retraining run. A Model Monitoring alert later shows worse recall for a protected customer segment. During the review, the team cannot identify the exact BigQuery training snapshot, preprocessing output, container image, hyperparameters, evaluation report, or approval decision that produced the deployed model version. What is the best next diagnostic step to make the next incident diagnosable and auditable?

Options:

  • A. Store only final model artifacts in versioned Cloud Storage paths.

  • B. Increase Cloud Logging retention for training jobs and endpoints.

  • C. Retrain immediately from the latest BigQuery table.

  • D. Capture full pipeline lineage in Agent Platform ML Metadata linked to Model Registry versions.

Best answer: D

Explanation: The gap is missing pipeline lineage, not just missing logs or another retraining run. For reproducible ML pipelines, each component should emit execution metadata and artifacts to Agent Platform ML Metadata, including the data snapshot, preprocessing artifacts, code or container version, hyperparameters, metrics, evaluation reports, and approval events. Linking this lineage to the registered model version supports auditability, debugging, model comparison, rollback decisions, and responsible AI reviews after production alerts. Operational logs and artifact storage are useful, but they do not preserve the full relationships needed to explain how a specific model version was produced.

  • Longer log retention helps operational troubleshooting, but it does not link datasets, transforms, parameters, metrics, and approvals to a model version.
  • Immediate retraining may create another model, but it does not explain or preserve the provenance of the problematic deployment.
  • Artifact-only versioning preserves files, but it misses component-level lineage, evaluation metadata, and approval history.

Question 35

Topic: Serving and Scaling Models

A retail company serves a custom image classification model on Agent Platform Inference for checkout images. The endpoint must keep p95 latency under 150 ms during peak traffic. Monitoring shows replicas are not saturated, cold starts are rare, and about 80% of request time is model execution. The product team will accept up to 1% accuracy loss. What is the best engineering decision?

Options:

  • A. Increase the online batch size

  • B. Add more minimum replicas

  • C. Deploy a smaller compressed model

  • D. Move scoring to batch prediction

Best answer: C

Explanation: When response time is the primary constraint and monitoring shows model execution dominates latency, the best tuning target is the model itself. A smaller compressed model, such as one produced through distillation, pruning, or quantization, can reduce compute per request and often improve p95 latency without changing the serving pattern. The stem also states that a small accuracy loss is acceptable, which makes compression a strong fit. Scaling replicas mainly helps queueing and availability, not single-request model execution time when replicas are already not saturated. Batch prediction and larger online batches optimize throughput, not interactive latency.

  • Online batching can improve throughput, but it often increases waiting time for individual requests.
  • Batch prediction is unsuitable because the workload needs low-latency online responses.
  • Minimum replicas reduce cold starts and queueing, but the stem says those are not the main latency drivers.

Question 36

Topic: Architecting Low-Code AI Solutions

A retail company is building a Gemini-based shopping assistant for live chat. Users abandon the session if they do not see a response quickly, and evaluation shows that a faster Gemini model is accurate enough for the recommendations. Which configuration should you recommend to optimize the interactive user experience?

Options:

  • A. Fine-tune a larger Gemini model using BigQuery

  • B. Generate recommendations with batch predictions

  • C. Run the chat workflow in Agent Platform Pipelines

  • D. Use Gemini Flash with response streaming

Best answer: D

Explanation: For a Gemini-based interactive application, latency is improved at the serving and interaction layer. A faster model variant such as Gemini Flash is designed for lower-latency, cost-efficient responses, and streaming lets the user see tokens as they are generated instead of waiting for the full answer. This fits a live chat workflow where perceived response time matters and quality is already acceptable with the faster model.

Fine-tuning, pipeline orchestration, and batch prediction can be useful in other lifecycle stages, but they do not directly optimize real-time chat responsiveness.

  • Larger fine-tuning can improve task adaptation, but it often increases complexity and does not directly reduce interactive response latency.
  • Pipeline orchestration is for repeatable ML workflows, not low-latency request handling for live chat.
  • Batch prediction is designed for offline throughput, not immediate user-facing responses.

Question 37

Topic: Monitoring AI Solutions

A lender refreshed a tabular credit-risk model deployed on Agent Platform Inference. After deployment, approval rates for first-time borrowers dropped, but validation AUC, serving latency, and error rates remain within targets. Compliance asks which input factors are driving the new denials and whether behavior changed for that cohort. What is the best next diagnostic step?

Options:

  • A. Increase endpoint machine size and replica count

  • B. Rerun hyperparameter tuning to improve AUC

  • C. Compare Agent Platform feature attributions for affected predictions

  • D. Inspect only serving logs for 5xx errors

Best answer: C

Explanation: Model explainability on Agent Platform is the right diagnostic tool when stakeholders need to understand prediction factors, not just whether the model is healthy. In this scenario, standard performance and serving signals are within targets, but the business outcome changed for a specific cohort. Generating feature attributions for denied first-time borrower predictions, and comparing them with the prior model or an unaffected cohort, can show which features are contributing most to the changed decisions. This supports compliance review, stakeholder communication, and bias investigation without assuming the model is broken.

  • Scaling the endpoint addresses capacity or latency problems, but latency and error rates are already within target.
  • Tuning for AUC misses the requirement because aggregate validation quality is not the reported issue.
  • Checking 5xx logs only can find serving failures, but it will not explain why specific predictions changed.

Question 38

Topic: Collaborating within and Across Teams to Manage Data and Models

A retail ML team is prototyping a next-purchase propensity model from tabular customer features stored in BigQuery. They need an interpretable baseline within a week, must keep PII inside Google Cloud, and want teammates to review notebooks and experiment results before deciding whether to scale training. What is the best engineering decision?

Options:

  • A. Use sklearn in Agent Platform Workbench with BigQuery-connected data access

  • B. Build a production Agent Platform Pipeline before notebook exploration

  • C. Use PyTorch locally after downloading the full BigQuery table

  • D. Use JAX on TPUs for a custom distributed training loop

Best answer: A

Explanation: For early tabular prototyping, the best framework choice is usually the simplest one that answers the modeling question safely. sklearn is well suited for quick baselines such as logistic regression, random forests, or gradient boosting on engineered tabular features. Using Agent Platform Workbench keeps work in a governed Google Cloud notebook environment, supports team review, and avoids exporting PII to unmanaged local machines. The team can use representative data access from BigQuery, compare candidate features, and record experiment outcomes before investing in scalable training or orchestration. JAX, PyTorch, and full pipeline automation may be appropriate later, but they add complexity before the prototype has proved value.

  • Distributed JAX adds accelerator and custom-loop complexity that is unnecessary for a one-week interpretable tabular baseline.
  • Local PyTorch violates the privacy constraint by downloading PII and does not match the tabular baseline need.
  • Production pipeline first skips collaborative notebook exploration and commits to orchestration before the model approach is validated.

Question 39

Topic: Collaborating within and Across Teams to Manage Data and Models

A churn model is trained from BigQuery features prepared by data engineers and deployed to an Agent Platform Inference endpoint. Offline AUC is 0.86, but production AUC drops and Model Monitoring flags training-serving skew. The application team says all prediction requests are schema-valid.

Exhibit: Feature handling

FeatureTraining pipelineServing payload
countrylowercaseduser-entered case
plan_typemissing becomes unknownomitted if missing
tenure_daysclipped, then log1praw integer

Which action is the best next diagnostic step?

Options:

  • A. Compare and version the shared preprocessing contract.

  • B. Increase Agent Platform Inference endpoint replicas.

  • C. Retrain immediately using production prediction logs.

  • D. Disable monitoring until more labels arrive.

Best answer: A

Explanation: The core issue is cross-team preprocessing alignment. The data engineering pipeline applies semantic transformations that the serving application does not reproduce, even though the JSON schema is valid. Schema validation confirms field names and types, but it does not prove that casing, missing-value handling, clipping, or log transforms match training. The next diagnostic step is to compare the training transformations with serving payload generation and formalize them in a shared, versioned contract or reusable preprocessing component with validation tests.

Fixing the handoff prevents retraining or serving changes from repeating the same skew.

  • Replica scaling addresses throughput or latency, not mismatched feature values.
  • Immediate retraining can preserve the same defect if serving transformations remain inconsistent.
  • Disabling monitoring hides the skew signal instead of reconciling team assumptions.

Question 40

Topic: Architecting Low-Code AI Solutions

A media team is building a low-code campaign tool. Users enter a product description and expect a downloadable 8-second promotional video. Production requires a managed Google Cloud model as a service, not custom training. The prototype uses a Gemini text model from Model Garden; acceptance tests return fluent shot lists and captions, but no video asset. What is the best next diagnostic recommendation?

Options:

  • A. Fine-tune Gemini using BigQuery prompts.

  • B. Select a Veo model from Model Garden.

  • C. Train Agent Platform AutoML on video labels.

  • D. Use Imagen for campaign asset generation.

Best answer: B

Explanation: The failure is a model-capability mismatch, not a prompt-quality or training issue. The prototype produces text artifacts because it uses a text-oriented Gemini model, while the application’s user-facing requirement is to generate an actual video file. For a low-code solution that must use a managed Google Cloud model as a service, the model-selection check should start with Model Garden capabilities and choose a video-generation foundation model, such as Veo. Fine-tuning or AutoML would add complexity without fixing the output modality mismatch.

  • Fine-tuning Gemini may improve text responses, but it does not make a text model produce video assets.
  • Imagen is suited to image generation, so it misses the explicit video-output requirement.
  • AutoML training is unnecessary for a managed foundation-model use case and would not be the first diagnostic fix.

Question 41

Topic: Collaborating within and Across Teams to Manage Data and Models

A team is troubleshooting a tabular classification model that predicts whether an account will upgrade in the next 30 days. Offline validation AUC is 0.94, but the first batch scoring run fails because the scoring table is missing a column the model expects.

Exhibit: Schemas

FieldTraining roleAvailable at scoring?
account_idKeyYes
plan_tierFeatureYes
logins_30dFeatureYes
support_cases_30dFeatureYes
upgrade_amount_next_30dFeatureNo
will_upgrade_next_30dLabelNo

What is the most likely cause?

Options:

  • A. The account identifier was excluded from model features.

  • B. Future outcome data was included as a training feature.

  • C. The validation split was too small for tabular data.

  • D. The serving job omitted the classification label.

Best answer: B

Explanation: For tabular ML, organize columns by business objective and prediction time: entity keys identify records, point-in-time features are model inputs, and labels or future outcomes are used only for training and evaluation. Here, the model predicts a future upgrade, but upgrade_amount_next_30d is marked as a feature even though it is not available at scoring time. That creates target leakage, which can inflate validation AUC, and it also causes the serving schema to fail because production records cannot supply that column. The feature schema should contain only values available when the prediction is made.

  • Sending labels fails because labels are outcomes used after the fact, not inputs for production scoring.
  • Using account IDs is not supported by the evidence; excluding an identifier would not explain a missing future-value column.
  • Changing validation size does not fix a feature schema that includes data unavailable at serving time.

Question 42

Topic: Architecting Low-Code AI Solutions

A business team built a low-code Gemini Enterprise Agent Platform RAG app to answer employee HR policy questions. The prototype works in demos, but production will serve 20,000 employees and can retrieve sensitive benefits, leave, and disciplinary policy documents. The team wants the fastest safe handoff to IT operations. Which governance check package is the best engineering decision before production use?

Options:

  • A. Run load testing only and keep prompt tuning with the business team.

  • B. Fine-tune Gemini on HR documents and chat logs before review.

  • C. Complete privacy, access, responsible AI, safety, signoff, and monitoring checks.

  • D. Approve launch based on managed Gemini safety and post-launch feedback.

Best answer: C

Explanation: Moving a low-code AI prototype to production still requires governance checks for the specific use case. For an HR RAG app, the handoff should verify approved data sources, least-privilege access, PII handling, representative evaluation for grounded and safe answers, responsible AI risks such as bias or harmful guidance, and production controls such as Model Armor or safety filters. It should also define accountable owners, approval status, monitoring signals, and an escalation path. Managed Gemini capabilities reduce implementation effort, but they do not automatically validate enterprise data use or production accountability.

  • Managed safety only fails because platform controls do not replace app-specific data, access, evaluation, and ownership review.
  • Load testing only checks reliability but misses privacy, responsible AI, safety, and production handoff requirements.
  • Fine-tuning first changes the solution and may expose sensitive HR data before governance approval.

Question 43

Topic: Scaling Prototypes into ML Models

A team trains a custom DNN classifier on Gemini Enterprise Agent Platform. The training job completes, data validation passes, and no new labeled data is available. Release requires validation AUC of at least 0.92, but the latest run has training AUC 0.98 and validation AUC 0.86. Validation loss starts increasing after epoch 4 while training loss keeps decreasing. What should the team do next?

Options:

  • A. Tune validation AUC over dropout, L2 strength, and early stopping

  • B. Deploy the model and diagnose endpoint latency

  • C. Optimize training AUC by increasing the number of epochs

  • D. Tune only larger batch sizes for faster training runs

Best answer: A

Explanation: The job is not failing mechanically; it is failing the validation-quality target. High training AUC with lower validation AUC and rising validation loss is a typical overfitting pattern. The next step is to run hyperparameter tuning with the release metric, validation AUC, as the objective and include parameters that affect generalization, such as dropout, L2 regularization strength, and early-stopping behavior. This uses the existing successful training code while systematically searching for a configuration that improves held-out performance. Optimizing training AUC or training longer would likely amplify the same overfitting pattern.

  • More epochs focuses on training performance and can worsen the increasing validation loss.
  • Batch-size-only tuning may affect runtime or convergence but does not directly address the overfitting evidence.
  • Endpoint latency diagnostics apply to serving performance, while the failure is validation quality before deployment.

Question 44

Topic: Collaborating within and Across Teams to Manage Data and Models

A product team wants to build a Gemini-based assistant for internal support tickets. Before committing to a full application, they must learn whether prompting alone is sufficient, whether ticket-history retrieval is needed, and whether responses meet latency, cost, and safety expectations. Which workflow should the ML engineer set up first?

Options:

  • A. Deploy a custom serving container on GKE for load testing

  • B. Start custom training on the full ticket history in Agent Platform

  • C. Build the complete production RAG pipeline before evaluation

  • D. Run a Model Garden prototype with prompt, retrieval, safety, latency, and cost checks

Best answer: D

Explanation: Foundation model exploration should start with a small, representative prototype before building production infrastructure or training workflows. For this scenario, the main uncertainties are prompt behavior, retrieval need, response safety, latency, and cost. Model Garden is the right place to explore Gemini and compare prompt patterns, sample retrieval grounding, safety behavior, and rough operational fit using representative support-ticket examples. Those findings help the team decide whether prompting is enough, RAG is required, or a different approach is needed. Full custom training, production pipeline construction, and serving infrastructure decisions should come after these prototype findings.

  • Custom training first skips the cheapest validation step and assumes model adaptation is needed before prompt and retrieval behavior are known.
  • Complete RAG first overbuilds the retrieval layer before proving that grounding is required or selecting the right prompt pattern.
  • Load testing only focuses on serving infrastructure and misses the model-quality, safety, retrieval, and cost questions in the stem.

Question 45

Topic: Serving and Scaling Models

A credit operations team has an XGBoost model in Agent Platform Model Registry. When served online, the model returns only a fraud probability. The case-management system requires every response to include a 0–100 risk score, a LOW/MEDIUM/HIGH label, and an APPROVE/REVIEW decision. Thresholds must be maintained centrally and applied consistently across all clients without retraining the model. Which serving setup best meets these requirements?

Options:

  • A. Let each client transform raw probabilities

  • B. Store threshold logic in Agent Platform Feature Store

  • C. Retrain the model to emit business decisions

  • D. Serve with an Agent Platform Inference custom postprocessing container

Best answer: D

Explanation: Postprocessing belongs in the serving path when raw model outputs need to become business-ready responses. Here the model already produces the needed probability, but callers need a standardized score, label, decision, and response shape. A custom container or equivalent custom prediction routine on Agent Platform Inference can wrap the registered model, apply centrally managed thresholds, and return the agreed response contract to every caller. This keeps the model artifact focused on prediction while keeping serving-time decision rules consistent and changeable without retraining.

  • Feature Store misuse fails because feature stores provide or share input features, not serving-time conversion of model outputs into decisions.
  • Retraining for thresholds is unnecessary because the probability output is already available and business thresholds need to change without retraining.
  • Client-side conversion creates duplicated logic and inconsistent decisions across applications, violating the central-management requirement.

Question 46

Topic: Automating and Orchestrating ML Pipelines

A retail team uses Agent Platform Pipelines to retrain a churn model weekly. They plan to enable automatic deployment to Agent Platform Inference after each retraining run. The current pipeline only checks that preprocessing and training components finish and that a model artifact is written. The team must prevent deployment when new data is malformed, preprocessing differs from serving, or the candidate model underperforms the production model.

Which pipeline design best meets these requirements?

Options:

  • A. Register every model artifact and promote the newest version

  • B. Deploy every successful run and rely on endpoint monitoring

  • C. Use larger training hardware and tune hyperparameters each run

  • D. Add validation gates for data, preprocessing consistency, and model quality

Best answer: D

Explanation: Before automating retraining or deployment, pipeline validation must cover the lifecycle points that can make automation unsafe. For this scenario, component success is not enough. The pipeline needs pre-training data checks for schema, missing values, and distribution issues; checks that preprocessing used in training is consistent with serving; and model validation that compares the candidate against production baselines and required slice metrics. Only candidates that pass these gates should be registered as deployable and promoted to Agent Platform Inference. Endpoint monitoring is still important, but it detects issues after deployment rather than preventing unsafe promotion.

  • Post-deployment monitoring is too late because the requirement is to block unsafe candidates before deployment.
  • Training hardware and tuning may improve performance, but they do not validate data quality or training-serving consistency.
  • Newest artifact promotion confuses artifact creation with release readiness and skips model-quality gates.

Question 47

Topic: Monitoring AI Solutions

A retail company uses a Gemini-based support agent on Gemini Enterprise Agent Platform. Employees ask valid order-status questions, but some prompts include customer PII, and red-team tests show prompt-injection attempts to reveal system instructions. The business wants to keep the agent available for normal support work. Which configuration best meets these requirements?

Options:

  • A. Encrypt model artifacts in the Model Registry only

  • B. Add Model Armor input and output checks with PII and prompt-injection policies

  • C. Disable the agent and route all support questions to manual review

  • D. Use Model Monitoring to retrain when drift is detected

Best answer: B

Explanation: Secure AI controls should be applied at the interaction boundary when the risk is prompt content, response leakage, or malicious prompting. Model Armor is designed to inspect prompts and responses for issues such as sensitive data exposure and unsafe or adversarial content. In this scenario, targeted input and output checks can redact or block risky exchanges while allowing ordinary order-status support requests. This protects customer data and model behavior without shutting down the workflow. Encryption and registry controls are still useful, but they protect stored artifacts rather than live prompt/response traffic. Drift monitoring supports production quality, not immediate prevention of data leakage or prompt injection.

  • Manual review only protects aggressively but unnecessarily blocks the valid support workflow.
  • Registry encryption protects stored model assets, not PII in live prompts or unsafe generated responses.
  • Drift monitoring detects model or data changes over time, but it does not screen prompt-injection attempts.

Question 48

Topic: Serving and Scaling Models

An ML team is canarying a new image model version from Agent Platform Model Registry to an Agent Platform Inference endpoint. The canary receives 20% of traffic and must keep p95 latency under 250 ms. Production clients have valid IAM, and the same request succeeds during low traffic. Which cause best explains the failure?

Exhibit: Canary observations

ObservationValue
Peak error503 RESOURCE_EXHAUSTED
Replica stateAt configured maximum
Accelerator utilization94% to 98%
Container health checksPassing
Feature lookup p9528 ms, no errors
Canary reduced to 5%Errors stop

Options:

  • A. Model Registry version not deployed

  • B. Insufficient serving capacity

  • C. Endpoint access misconfiguration

  • D. Container packaging defect

Best answer: B

Explanation: This is a resource scaling failure. The endpoint accepts requests, the model is deployed, health checks pass, and feature lookup latency is healthy. The decisive evidence is RESOURCE_EXHAUSTED, replicas already at the configured maximum, very high accelerator utilization, and errors disappearing when canary traffic is reduced. That pattern indicates the canary deployment does not have enough serving capacity for the assigned traffic share. A practical next step would be to adjust autoscaling limits, replica sizing, accelerator configuration, or canary percentage before increasing rollout traffic. Endpoint access, packaging, registry state, and feature serving problems would show different symptoms, such as authorization errors, container startup or handler failures, missing deployed model state, or feature lookup errors.

  • Access issue does not fit because clients have valid IAM and the same requests succeed during low traffic.
  • Packaging issue is unlikely because health checks pass and failures are load-dependent rather than constant.
  • Registry state issue is unlikely because the canary is already receiving traffic from the deployed model version.

Question 49

Topic: Architecting Low-Code AI Solutions

A support team is evaluating a Gemini-based assistant that reads return-request notes and outputs an internal disposition code plus a short explanation. Prompt examples and RAG over policy documents reduced hallucinations, but generative AI evaluation still shows inconsistent code selection for company-specific shorthand. The retrieved policy text is correct, and BigQuery contains 200,000 historical notes labeled with the final agent disposition. What should the team do next?

Options:

  • A. Move the labels into Agent Platform Feature Store.

  • B. Fine-tune Gemini using the labeled BigQuery examples.

  • C. Increase retrieval depth for the policy corpus.

  • D. Add stricter Model Armor safety filters.

Best answer: B

Explanation: Gemini fine-tuning with BigQuery is appropriate when the model needs to learn domain-specific behavior from enterprise examples, such as mapping internal shorthand to approved labels and producing a consistent output pattern. In this case, RAG is already retrieving the right policy text, so the remaining issue is not missing knowledge. The company also has a large labeled BigQuery dataset that matches the desired input-output behavior. That evidence points to supervised fine-tuning of Gemini rather than more retrieval or safety controls. Use RAG for grounding changing facts; use fine-tuning when examples should shape task behavior.

  • More retrieval does not address the issue because the retrieved policy text is already correct.
  • Safety filters target harmful or unsafe content, not inconsistent internal label selection.
  • Feature Store helps serve reusable ML features, but it does not train Gemini to follow domain-specific labeling behavior.

Question 50

Topic: Serving and Scaling Models

A team deployed a PyTorch image inspection model to Gemini Enterprise Agent Platform Inference. The endpoint must meet p95 latency under 150 ms at 200 RPS, but load tests fail.

SignalObservation
Current backendCPU-only online endpoint
Workload900 MB CNN; CUDA image available; non-XLA custom op
Test resultp95 310 ms at 1 RPS; 440 ms at 200 RPS
BottleneckCPU >90%; queue and network time <10 ms
EnvironmentCloud endpoint; no offline device requirement

Which hardware change best addresses the likely cause?

Options:

  • A. Move the model to TPU serving hardware.

  • B. Increase CPU replica limits only.

  • C. Add GPU accelerators to the online serving backend.

  • D. Deploy the model on edge devices.

Best answer: C

Explanation: The evidence points to a compute-acceleration bottleneck during online inference. The endpoint misses p95 latency even at 1 RPS, queue and network time are low, CPU utilization is high, and the model is a CNN with CUDA support. A GPU-backed serving backend is the most appropriate next remediation because it can reduce per-request tensor computation time and improve throughput for this cloud endpoint. TPU serving is less suitable here because the stem identifies a non-XLA custom op and a GPU-ready container.

  • TPU fit fails because the workload has a non-XLA custom op while GPU acceleration is already supported.
  • More CPU replicas may reduce queuing, but the 1 RPS latency shows the bottleneck is per-inference compute.
  • Edge deployment is aimed at local or offline serving, but the stem requires a cloud endpoint and network time is low.

Continue in the web app

Use IT Mastery for interactive Google Cloud Professional ML Engineer practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Google Cloud Professional ML Engineer on Web

Focused topic pages