Free Google Cloud PCA Practice Exam: Google Cloud Professional Cloud Architect
Try 50 free Google Cloud Professional Cloud Architect questions across the exam domains, with explanations, then continue with IT Mastery practice.
This free full-length Google Cloud Professional Cloud Architect practice exam includes 50 original IT Mastery questions across the exam domains.
These are original IT Mastery practice questions. They are not official Google Cloud questions, copied live-exam content, or exam dumps. Use them for self-assessment, scope review, and deciding what to drill next.
Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.
Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.
Exam snapshot
- Exam route: Google Cloud Professional Cloud Architect
- Practice-set question count: 50
- Time limit: 120 minutes
- Practice style: mixed-domain diagnostic run with answer explanations
Full-length exam mix
| Domain | Weight |
|---|---|
| Designing and Planning a Cloud Solution Architecture | 25% |
| Managing and Provisioning a Solution Infrastructure | 18% |
| Designing for Security and Compliance | 19% |
| Analyzing and Optimizing Technical and Business Processes | 15% |
| Managing Implementation | 11% |
| Ensuring Solution and Operations Excellence | 12% |
Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and interactive practice.
Practice questions
Questions 1-25
Question 1
Topic: Managing and Provisioning a Solution Infrastructure
A healthcare company wants to replace an internal help desk search portal with a conversational assistant. The architecture review produced this note:
| Area | Visible requirement |
|---|---|
| Sources | Policy docs and ticketing knowledge base |
| Access | Users see only content they are already allowed to access |
| Governance | Audit interactions, redact PHI/PII, and filter unsafe prompts |
| Delivery | Launch in 8 weeks with no ML engineering team |
Which next action best reduces implementation effort while preserving the governance and security requirements?
Options:
A. Build a custom RAG service on GKE with a self-managed vector database.
B. Fine-tune a Model Garden model on all source documents.
C. Call Gemini APIs through one shared service account over a merged document export.
D. Configure Agent Builder in Gemini Enterprise with governed connectors and AI security controls.
Best answer: D
Explanation: The requirements favor a prebuilt agent and enterprise search approach instead of a custom model workflow. Agent Builder in Gemini Enterprise is designed to assemble grounded agents over approved enterprise data sources with less custom infrastructure than building retrieval, orchestration, and chat logic from scratch. Governance can be preserved by using managed identity-aware access, audit logging, and safety layers such as Model Armor and Sensitive Data Protection for prompt/response filtering and PHI/PII handling. The key design implication is to configure a managed, governed agent first, then customize only where requirements are not met.
- Fine-tuning first adds ML lifecycle work and can weaken document-level access control if content is embedded into a model without careful design.
- Custom RAG on GKE may be valid for specialized needs, but it conflicts with the short timeline and lack of ML engineering staff.
- Shared service account access is risky because it can bypass per-user authorization and expose content from the merged export.
Question 2
Topic: Managing and Provisioning a Solution Infrastructure
A company wants to launch an internal HR assistant in six weeks. It must answer from existing policy documents in Google Workspace and Cloud Storage, respect each employee’s document permissions, log usage for audit, and minimize custom ML operations. Which recommendation best balances implementation effort with governance and security?
Options:
A. Build a custom RAG stack on GKE with a shared vector index
B. Use a prebuilt Natural Language API to classify each question
C. Fine-tune a Model Garden model on all HR documents
D. Use Gemini Enterprise or Agent Builder with governed connectors and grounding
Best answer: D
Explanation: For an internal assistant over enterprise content, the best fit is a prebuilt agent or enterprise search-style integration that can connect to existing sources, ground answers in authorized documents, and support governance controls. Gemini Enterprise or Agent Builder reduces implementation effort compared with building retrieval, orchestration, and model-serving components from scratch. It also better supports the security requirement because access-aware connectors and centralized logging can align responses with each user’s permissions. Fine-tuning or building a custom RAG platform may be useful in some cases, but they increase delivery and operations burden and require the team to recreate governance controls.
- Fine-tuning first optimizes model customization, but it adds ML lifecycle work and does not inherently enforce per-user document permissions.
- Custom RAG stack maximizes control, but it conflicts with the six-week timeline and low-operations requirement.
- Question classification may route requests, but it does not provide a governed assistant grounded in enterprise HR content.
Question 3
Topic: Ensuring Solution and Operations Excellence
A retail company will launch a customer-facing order API on Google Cloud in six weeks. Architecture goals include p95 latency under 300 ms, RTO of 15 minutes, no customer-visible testing during peak hours, and compliance evidence for resilience controls. The team has budget for a temporary production-like test environment, and the product owner can approve one low-traffic production window. Which validation plan best balances these constraints?
Options:
A. Use architect-led tabletop reviews only, and avoid live tests to reduce cost.
B. Defer resilience testing until after launch and rely on monitoring alerts.
C. Temporarily mirror production for prelaunch load and failover tests, then run an approved low-traffic game day with SRE-led response and evidence capture.
D. Run all failover and chaos tests in production during peak traffic to maximize realism.
Best answer: C
Explanation: A strong resilience validation plan ties tests to the stated architecture goals and runs them early enough to fix issues before launch. Here, temporary production-like testing validates latency and failover without paying for a permanent duplicate environment. A controlled production game day during an approved low-traffic window adds realism while protecting availability. SRE ownership ensures alerts, runbooks, escalation, and rollback behavior are exercised, while product and compliance stakeholders provide approval and evidence review. The key trade-off is not maximum realism at any cost; it is enough realism to validate recovery and operations without violating customer-impact and compliance constraints.
- Peak production testing maximizes realism but violates the stated restriction on customer-visible testing during peak hours.
- Tabletop-only validation reduces cost but does not prove latency, failover behavior, alerts, or operational response.
- Post-launch testing misses the required timing because problems would be discovered after customers are already using the API.
Question 4
Topic: Designing for Security and Compliance
A healthcare software company is launching a Gemini-powered support assistant on Cloud Run. Customers can enter free-form questions, and the app retrieves answers from an approved knowledge base. Requirements are to reduce prompt-injection and jailbreak risk, prevent sensitive data from being accepted or returned, keep the first release low-latency, and provide evidence of policy enforcement. Which architecture decision best meets these requirements?
Options:
A. Fine-tune the model with examples of prohibited responses
B. Add Model Armor checks on prompts and responses
C. Route every response to manual review before delivery
D. Put Cloud Armor in front of the Cloud Run service
Best answer: B
Explanation: Model Armor is the appropriate control when the risk is in generative AI inputs, outputs, or application behavior. In this scenario, the assistant accepts untrusted free-form prompts and returns generated responses, so the architecture needs guardrails in the request and response path. Model Armor can apply policies for risks such as prompt injection, jailbreak attempts, harmful content, and sensitive data handling, and its enforcement can support audit evidence. This fits the low-latency first release better than retraining or manual review.
Network and application perimeter controls are still useful, but they do not understand or evaluate model prompts and responses.
- Fine-tuning may improve model behavior, but it is not the right first-line control for inspecting every live prompt and response.
- Cloud Armor helps protect web endpoints from network and HTTP-layer attacks, not generative AI prompt or output risks.
- Manual review can reduce risk for selected workflows, but it violates the low-latency requirement for an online assistant.
Question 5
Topic: Managing and Provisioning a Solution Infrastructure
A retail company retrains a fraud detection model every week using data in BigQuery. Recent incidents involved a stale feature extract and a model version promoted before evaluation was completed.
Exhibit: Current AI workflow note
| Stage | Current handling |
|---|---|
| Feature extract | Analyst runs a saved query and exports a file |
| Training | ML engineer runs a notebook manually |
| Evaluation | Results copied to a spreadsheet |
| Promotion | Approved model selected in the console |
What is the best design implication from the exhibit?
Options:
A. Build a Vertex AI Pipeline with tracked artifacts and evaluation gates.
B. Keep the notebooks and add a required deployment checklist.
C. Move online prediction serving to Cloud Run.
D. Grant analysts broader IAM roles for workflow execution.
Best answer: A
Explanation: The core issue is manual risk across a multi-step ML lifecycle, not model serving or access friction. Vertex AI Pipelines is designed to orchestrate repeatable AI workflows such as data extraction, training, evaluation, registration, and deployment. Pipeline components can pass artifacts explicitly, capture metadata and lineage, parameterize runs, and add a quality gate so a model is not promoted unless evaluation completes successfully. This reduces the chance of using stale files, losing run parameters, or skipping approval steps. A checklist may help process discipline, but it does not provide the same reproducibility or automated control.
- Checklist-only control still depends on humans to follow each step and does not prevent stale artifacts or skipped evaluation.
- Serving migration addresses runtime hosting, not the upstream training and promotion workflow risks shown in the exhibit.
- Broader IAM may reduce access errors, but it increases privilege and does not orchestrate or validate the ML lifecycle.
Question 6
Topic: Designing and Planning a Cloud Solution Architecture
An enterprise runs analytics on Google Cloud and order processing on another cloud provider. The platforms must exchange several TB of data per day and support latency-sensitive API calls. Compliance requires private network transport, redundant paths, and centralized route control. The network team can manage BGP but wants to avoid building transit appliances. Which multicloud connectivity recommendation best balances the constraints?
Options:
A. Use redundant Cross-Cloud Interconnect with Cloud Router/BGP.
B. Expose public APIs with TLS and IP allowlists.
C. Route all traffic through the corporate data center.
D. Use HA VPN tunnels over the internet.
Best answer: A
Explanation: For sustained, latency-sensitive multicloud exchange with private transport and redundant paths, a dedicated multicloud interconnect pattern is the best fit. Cross-Cloud Interconnect connects Google Cloud to supported cloud providers through dedicated links, while Cloud Router uses BGP for dynamic route control and failover. Redundant links improve availability and avoid internet path variability. This costs more and requires planning, but the stem’s data volume, compliance, latency, and operational-control requirements justify that trade-off. HA VPN can be useful for smaller, less latency-sensitive, or interim connectivity.
- HA VPN is encrypted and quick to deploy, but internet-based paths do not meet the private transport and predictable high-throughput needs.
- Public APIs reduce network effort, but public exposure and IP allowlists fall short of private connectivity and route control.
- Data center hairpinning may reuse existing controls, but it adds latency, capacity bottlenecks, and operational dependence on an unnecessary transit point.
Question 7
Topic: Managing Implementation
A company manages a partner-facing order API with Apigee. A new backend keeps the same external contract but changes validation and latency characteristics. The business wants a fast release, no required partner code changes, early detection of consumer-specific errors, rollback within minutes, and minimal long-term operational overhead. Which rollout recommendation best balances these constraints?
Options:
A. Run both backends permanently and let partners choose either endpoint.
B. Cut over all traffic after backend unit tests to minimize delivery time.
C. Launch a new API version and require all partners to migrate before release.
D. Test contracts and load, then use Apigee canary traffic with per-consumer monitoring and rollback.
Best answer: D
Explanation: An API rollout plan should validate compatibility before production, control exposure during production release, and measure real consumer impact. For this scenario, contract and load tests reduce the risk of breaking the unchanged API contract or missing latency regressions. A canary or weighted traffic rollout in Apigee limits blast radius while still exercising real partner traffic. Monitoring by API product, developer app, status code, and latency helps detect consumer-specific failures early. A prepared rollback to the prior revision or backend path meets the minutes-level recovery requirement. This approach is faster and lighter than a long parallel platform, but safer than a full cutover.
- Speed-only cutover fails because unit tests do not prove partner contract compatibility, latency behavior, or safe rollback under real traffic.
- Forced version migration violates the no partner code-change constraint and increases consumer coordination risk.
- Permanent dual backends reduces cutover risk but creates unnecessary operational overhead and long-term complexity.
Question 8
Topic: Designing and Planning a Cloud Solution Architecture
A retail company is modernizing checkout services on Cloud Run and GKE. Most incidents are discovered from customer complaints, but the latency budget is tight, the SRE team is small, and compliance prohibits storing raw payment or customer data in logs. Leadership will fund moderate telemetry spend if it reduces production risk. Which recommendation best balances these constraints?
Options:
A. Increase autoscaling headroom before adding observability
B. Rely on staging tests and keep production telemetry minimal
C. Adopt SLO-based observability with sampled traces and redacted logs
D. Enable full debug logging and unsampled tracing for every request
Best answer: C
Explanation: The best design reduces operational risk by making production behavior measurable without creating new risk. SLO-based observability focuses telemetry on user-facing outcomes such as checkout latency, error rate, and availability. Using Google Cloud Observability with targeted metrics, alerts, sampled distributed traces, and redacted or excluded sensitive log fields gives teams enough signal to detect and diagnose incidents. Sampling and focused instrumentation help control latency and cost, while redaction supports compliance requirements. This is also realistic for a small SRE team because alerts can be tied to service-level objectives instead of every low-level metric.
- Full telemetry everywhere creates high cost and latency overhead and can violate the stated restriction on sensitive data in logs.
- Staging-only validation may reduce spend, but it does not reveal real production failures or customer-impacting behavior.
- More capacity first may mask some symptoms, but it does not improve diagnosis or risk visibility.
Question 9
Topic: Analyzing and Optimizing Technical and Business Processes
A retailer runs a stateless web checkout service on Compute Engine instances in a managed instance group behind an external Application Load Balancer. Instances average 15% CPU most of the week but hit 80-90% CPU during short marketing events. The business wants to reduce compute cost, preserve checkout latency during events, and avoid application code or packaging changes this quarter.
Which optimization decision best meets these requirements?
Options:
A. Migrate the service immediately to Cloud Run to use a fully managed platform.
B. Rightsize all instances to smaller machine types based on average CPU utilization.
C. Add Cloud Storage lifecycle rules for checkout logs and keep the compute fleet unchanged.
D. Configure managed instance group autoscaling with baseline minimum and event-capable maximum capacity.
Best answer: D
Explanation: Resource optimization should match the resource pattern and constraints. A stateless service with low average utilization but short, high peaks is a strong fit for autoscaling: keep enough baseline capacity for normal traffic and scale out during events to protect latency. Rightsizing is better when a workload is consistently overprovisioned, not when peak demand still needs capacity. Managed services or redesign can be valuable, but they are less suitable here because the business explicitly wants to avoid application or packaging changes this quarter.
The key takeaway is to optimize the bottleneck being paid for, without violating migration-impact and performance requirements.
- Average-based rightsizing fails because smaller fixed instances can reduce headroom needed during marketing-event peaks.
- Immediate Cloud Run migration may reduce operations effort, but it conflicts with the no code or packaging change constraint.
- Storage lifecycle rules can reduce log storage cost, but they do not address the overprovisioned compute fleet.
Question 10
Topic: Ensuring Solution and Operations Excellence
A company runs checkout services on Cloud Run and GKE. Operations teams currently receive pages for CPU spikes and pod restarts, but incident reviews show that many customer-impacting failures are found first through support tickets. Requirements are to alert only on user-visible impact, show which service version is consuming error budget, and preserve detailed logs for audit without increasing pager noise. Which observability improvement should the architect recommend?
Options:
A. Create SLO dashboards with burn-rate alerts using latency and error metrics labeled by version.
B. Export all logs to BigQuery and page on each error log.
C. Enable 100% tracing and alert on every slow span.
D. Page on CPU, memory, and restart alerts for every service.
Best answer: A
Explanation: Operations teams need signals that map to user experience, not only infrastructure symptoms. For services on Cloud Run and GKE, Cloud Monitoring dashboards and SLO-based burn-rate alerts built from request latency and error-rate metrics help identify when the error budget is being consumed. Adding service and version labels supports fast release-level triage. Detailed logs can still be retained for audit and investigation, but they should not be the primary paging signal unless converted into carefully scoped log-based metrics tied to user impact. The key is to reduce noise while making customer-impacting degradation visible quickly.
- Resource-only paging misses the stated user-impact requirement and commonly creates noise during harmless scaling or restarts.
- Error-log paging preserves evidence but turns routine application errors into pager noise without SLO context.
- Trace-everything alerting overbuilds the solution and pages on individual slow spans rather than service-level impact.
Question 11
Topic: Designing for Security and Compliance
A healthcare analytics team stores regulated datasets in Google Cloud. The architect reviews the security findings below and must choose the next control to reduce the highest-risk data movement path.
| Finding | Visible fact |
|---|---|
| Data services | BigQuery and Cloud Storage |
| Current access | Analysts have least-privilege IAM |
| Main concern | Valid user credentials could copy data to an unapproved Google Cloud project through Google APIs |
| Constraint | Approved batch jobs inside the analytics projects must continue using those APIs |
Which next action best addresses the concern?
Options:
A. Apply a hierarchical firewall policy that denies internet egress.
B. Create a VPC Service Controls perimeter with explicit ingress and egress rules.
C. Require context-aware access for analyst console sessions.
D. Enable additional Data Access audit logs and alerting.
Best answer: B
Explanation: The key risk is exfiltration through Google APIs from regulated projects to an unapproved Google Cloud project. IAM controls who can access data, but it does not by itself create a service boundary that prevents an authorized or compromised identity from moving data to another project. VPC Service Controls can create a perimeter around the analytics projects and use ingress and egress rules to allow approved internal jobs while restricting movement to untrusted projects or services.
Context-aware access can strengthen user access conditions, but it does not replace a data exfiltration boundary. Firewall rules mainly control network traffic, not authorized calls to managed Google APIs.
- Device-based access helps limit where users sign in from, but it does not directly block API-based copying to another project.
- Network egress denial misses the main path because managed Google API access is not fully governed by VPC firewall policy.
- Audit logging improves detection and evidence, but it is not the primary preventive control for the stated exfiltration path.
Question 12
Topic: Managing Implementation
A financial analytics company is standardizing new Google Cloud projects. Engineers currently create resources with the console and gcloud. Compliance requires peer review and an audit trail for network and IAM changes, but the platform team has only two engineers and limited Terraform experience. Which recommendation best balances controlled change, repeatability, and operational effort?
Options:
A. Use Git-based Terraform modules with CI plan review and approved applies
B. Run Terraform manually from one admin workstation with local state
C. Continue using the console and document each change in tickets
D. Build a custom provisioning portal before adopting Terraform
Best answer: A
Explanation: Infrastructure as code should make infrastructure changes reproducible, reviewable, and controlled without creating unnecessary operational burden. A practical Google Cloud approach is to keep Terraform configuration and reusable modules in source control, require pull request review, run terraform plan in CI, and apply changes only through an approved pipeline with remote state. This gives auditors a change history, reduces drift from manual changes, and lets a small team start with a manageable workflow. Reusable modules also help less experienced teams standardize project, network, IAM, and service configurations over time.
The key trade-off is not maximum automation on day one; it is introducing enough process to control risk while keeping adoption realistic.
- Ticket-only tracking records intent but does not make the infrastructure definition repeatable or prevent unreviewed console drift.
- Manual local Terraform uses IaC syntax but weakens collaboration, state safety, and auditable approval workflows.
- Custom portal first may improve self-service later, but it adds high build and maintenance effort before the team has a proven IaC baseline.
Question 13
Topic: Ensuring Solution and Operations Excellence
A retail company runs its checkout API on Cloud Run. The operations lead wants a support improvement that directly addresses recurring production issues and can show measurable improvement next quarter.
Exhibit: 30-day incident review
| Signal | Evidence |
|---|---|
| Recurring issue | 6 latency/error incidents |
| Pattern | 5 began within 10 minutes of a new revision |
| Current response | Manual log review, then developer escalation |
| Restore time | Average 42 minutes to rollback |
| Observability | Dashboards include revision labels and deploy annotations |
| Runbook gap | No rollback criteria or escalation path |
Which next action should the architect recommend?
Options:
A. Increase Cloud Run maximum instances after every deployment
B. Create a revision rollback runbook with dashboards, criteria, escalation, and MTTR tracking
C. Migrate the service to GKE for more deployment control
D. Require postmortems before production rollback decisions
Best answer: B
Explanation: Operational excellence improvements should be tied to visible incident evidence and measurable outcomes. The exhibit shows a recurring release-related pattern, existing observability by revision, slow manual diagnosis, and a missing rollback decision path. A runbook that links the alert to revision-filtered dashboards, defines rollback and escalation criteria, and tracks MTTR and repeat incidents directly addresses the support gap. It also gives leadership evidence that production support is improving over time.
Capacity tuning, postmortem process, or platform migration might help in other situations, but they do not most directly close the documented runbook gap.
- Capacity increase assumes scaling is the cause, but the visible pattern is tied to new revisions and rollback delays.
- Delayed rollback makes restoration slower because postmortems should not block urgent customer-impact mitigation.
- Platform migration is a large design change without evidence that Cloud Run is the source of the incidents.
Question 14
Topic: Ensuring Solution and Operations Excellence
A retailer runs a customer-facing order platform on GKE and Cloud SQL. Weekly releases often cause customer-visible incidents; alerts are noisy, rollback decisions are ad hoc, and post-incident fixes are not tracked. Leadership wants fewer incidents before the holiday season while preserving release speed and avoiding a major replatform. Which architecture decision best improves operational excellence?
Options:
A. Enable Gemini Cloud Assist recommendations and keep current release practices
B. Adopt SLOs, progressive delivery, observability standards, and post-incident ownership
C. Increase GKE node sizes, add read replicas, and create uptime checks
D. Migrate the platform to Cloud Run and AlloyDB before the holiday season
Best answer: B
Explanation: Operational excellence is a system of practices, not a single feature. In this scenario, the failures span detection, release safety, decision-making, and learning from incidents. A strong decision combines SLOs and error budgets, actionable Cloud Monitoring and Cloud Logging telemetry, automated canary or blue-green releases with rollback criteria, and clear on-call, runbook, and postmortem ownership. This preserves release speed while reducing customer impact and avoids the risk of a rushed replatform. A tool can assist, but it cannot replace agreed reliability targets, safe delivery patterns, and accountable team practices.
- Assistant-only approach may surface recommendations, but it leaves noisy alerts, ad hoc rollback, and weak incident follow-up unchanged.
- Rushed replatforming adds migration risk before the holiday season and does not directly fix release governance or incident learning.
- Capacity tuning may help performance, but larger nodes, replicas, and uptime checks do not solve unsafe releases or operational ownership.
Question 15
Topic: Ensuring Solution and Operations Excellence
A company is deciding whether to promote a new authenticated Cloud Run API to production. The API writes regulated records to BigQuery. Compliance needs evidence of who changed service/IAM configuration and who accessed the regulated dataset during the pilot. SREs also need enough evidence to investigate 5xx spikes. The team must control logging cost and avoid code changes before launch. What should you recommend?
Options:
A. Centralize filtered Cloud Audit Logs and Cloud Run request logs with retention controls
B. Enable VPC Flow Logs and packet capture for all production subnets
C. Use Cloud Monitoring metrics and uptime checks as the approval evidence
D. Export all project logs at DEBUG severity to BigQuery indefinitely
Best answer: A
Explanation: Cloud Audit Logs are the right evidence source for administrative changes and data-access activity, while Cloud Run request logs provide operational context for failed requests. A centralized log bucket or sink with scoped filters, appropriate retention, and exclusions for noisy nonessential logs balances compliance evidence, troubleshooting needs, cost control, and operational effort. This also avoids adding latency because log collection is not part of the application request path and avoids last-minute code changes.
- Export everything over-optimizes completeness and creates unnecessary cost and retention burden for evidence that can be scoped.
- Metrics only can show symptoms such as error rates, but they do not prove which principal changed configuration or accessed data.
- Network logs help with traffic analysis, but they do not provide BigQuery data-access audit evidence or Cloud Run request failure detail.
Question 16
Topic: Designing and Planning a Cloud Solution Architecture
A financial services company runs fraud analytics in Google Cloud and transaction settlement in another cloud provider. Review the requirements and choose the best connectivity design.
| Requirement | Detail |
|---|---|
| Data exchange | Hourly sensitive transaction summaries |
| Network path | Must not use public internet paths |
| Operations | Central team controls routing and monitors link health |
| Scale | Starts at 2 Gbps, expected to grow |
| Routing | Both cloud environments support BGP |
Which design best satisfies these requirements?
Options:
A. Expose external HTTPS endpoints and allowlist peer cloud IPs
B. Use HA VPN tunnels over the public internet
C. Create direct VPC peering between the two cloud providers
D. Use redundant Cross-Cloud Interconnect with Cloud Router BGP
Best answer: D
Explanation: Cross-Cloud Interconnect is the best fit when a Google Cloud environment must exchange data privately with another cloud provider at predictable, scalable bandwidth. Cloud Router uses BGP to exchange routes dynamically, which gives the central network team operational control over advertised prefixes and link state. Redundant connections and VLAN attachments should be used for availability, with firewall rules and route controls limiting which systems can exchange data. If payload encryption is required by policy, add encryption at the application layer or an approved tunnel, but the primary connectivity pattern remains a private multicloud interconnect rather than internet-facing access.
- Internet VPN encrypts traffic, but it still uses public internet paths and is less aligned with the private-path requirement.
- External endpoints increase exposure and rely on IP allowlisting instead of private routing control.
- Direct VPC peering is not a general cross-provider private connectivity mechanism for Google Cloud and another cloud provider.
Question 17
Topic: Designing for Security and Compliance
A financial services company is moving a payment fraud-scoring platform to Google Cloud. The service receives cardholder data from partner systems over private connectivity, scores transactions in near real time, stores results for 7 years, and feeds masked data to BigQuery. Requirements include customer control over key use, protection on the wire, reduced exposure while data is processed, no public endpoints, low latency, and limited operational overhead. Which recommendation best balances these requirements?
Options:
A. Use CMEK-backed storage, private TLS paths, and Confidential GKE nodes
B. Use a self-managed HSM service and isolated standard VMs
C. Use app-managed encryption keys and decrypt on standard GKE nodes
D. Use default encryption, public HTTPS APIs, and IAM-only controls
Best answer: A
Explanation: The requirement calls for layered data protection across at rest, in transit, and in-use states. CMEK-backed managed services give security teams control and auditability for regulated stored data without putting key material in application code. Private connectivity with enforced TLS protects partner and service-to-service traffic while avoiding public exposure and preserving low latency. Confidential GKE nodes reduce exposure during processing by using Confidential Computing for the scoring runtime. This balances compliance and security with managed-service availability, rotation support, and lower operational effort.
- Default controls only misses customer key-control needs, allows public endpoints, and does not address exposure during processing.
- App-managed keys increase operational risk and place decryption responsibility in the workload without confidential processing.
- Self-managed HSM optimizes control but adds availability and operations burden while standard VMs still lack in-use protection.
Question 18
Topic: Designing for Security and Compliance
A healthcare company is deploying an internal claims assistant using Gemini models through Vertex AI. The service retrieves claim documents from Cloud Storage and BigQuery. The architecture must minimize PHI exposure, reduce prompt-injection and unsafe-response risk, provide compliance evidence, and allow production changes only through an approved release process. Which architecture decision best addresses these risks?
Options:
A. Run the assistant on private GKE nodes without prompt or response inspection
B. Use Model Armor, Sensitive Data Protection redaction, audit logs, and gated deployments
C. Rely on model safety settings and keep the current manual deployment process
D. Store full prompts and responses, then redact them after inference
Best answer: B
Explanation: AI workflows add risks beyond ordinary application security because user prompts, retrieved context, model outputs, logs, and deployment pipelines can all expose sensitive data or create unsafe behavior. For this scenario, the strongest decision is a layered control pattern: use Sensitive Data Protection to discover and redact PHI before model interaction where possible, use Model Armor to help inspect prompts and responses for unsafe or policy-violating content, retain audit logs as compliance evidence, and require approved CI/CD gates before production release. These controls map directly to the stated risks without relying on a single model setting or network boundary.
- Model safety only misses deployment control and compliance evidence, and it does not adequately address sensitive-data handling.
- Redact after inference still sends PHI into the model interaction, which violates the exposure-minimization requirement.
- Private nodes only may reduce network exposure, but it does not address unsafe prompts, unsafe responses, or AI-specific governance.
Question 19
Topic: Managing Implementation
A company is migrating applications to Google Cloud in waves. The next wave includes a checkout service with a strict availability target and a 200 ms p95 latency requirement. Discovery shows the service makes synchronous calls to an on-premises licensing appliance that has not been tested over Cloud Interconnect, and the application team has not validated failure behavior in Google Cloud. The business wants to keep the migration schedule. What should the architect recommend?
Options:
A. Migrate as scheduled and monitor latency during the production cutover.
B. Pause this wave for a POC and dependency remediation, while continuing lower-risk waves.
C. Rewrite the checkout service before migrating any other workload.
D. Move the licensing appliance last to minimize short-term operational work.
Best answer: B
Explanation: A migration wave should pause when an unresolved critical dependency could violate availability, latency, or support requirements during cutover. Here, the checkout service depends on a synchronous on-premises appliance and has not been tested in the target environment. A focused proof of concept can validate connectivity, latency, failover behavior, and remediation options without stopping the whole migration program. Continuing lower-risk waves preserves progress while preventing a high-impact outage in a customer-facing service.
The key takeaway is to pause the specific risky implementation path, not the entire migration, when dependency evidence is missing for a critical workload.
- Cutover-first monitoring finds problems too late for a service with strict latency and availability requirements.
- Deferring the appliance ignores the dependency that must work during the checkout service migration.
- Full rewrite first over-optimizes for future architecture and creates unnecessary delay beyond the identified migration risk.
Question 20
Topic: Analyzing and Optimizing Technical and Business Processes
A team’s Cloud Run service has caused three production incidents in the last month after otherwise successful CI/CD deployments. Rollbacks restored service each time, but the same class of failure returned. The business requires weekly releases, SLO-based reliability reporting, and auditable remediation. What corrective process action should the cloud architect recommend first?
Options:
A. Freeze releases until the service is replatformed to GKE
B. Require executive approval for every deployment
C. Raise alert thresholds to reduce incident noise
D. Run blameless RCA reviews with owned remediation tracking
Best answer: D
Explanation: Recurring deployment failures are a process signal, not just a platform problem. The first corrective action should be a blameless root-cause analysis process that documents contributing factors, assigns remediation owners and due dates, and verifies completion through release validation or SLO evidence. This satisfies the need for auditable remediation while preserving the business requirement for weekly releases. It also helps distinguish whether the fix belongs in tests, deployment gates, monitoring, rollback criteria, or service design. A blanket freeze or replatforming effort may be considered later only if the analysis proves it is necessary.
- Replatforming first overbuilds the response and delays releases without proving Cloud Run is the cause.
- Alert tuning may reduce noise, but it does not prevent the recurring failure or create remediation evidence.
- Executive approvals add friction and accountability theater, but they do not identify or validate the technical root cause.
Question 21
Topic: Analyzing and Optimizing Technical and Business Processes
A software company is expanding from 2 to 14 Google Cloud application teams. The platform team must reduce environment lead time while preventing repeat production incidents.
Exhibit: Process assessment
| Area | Current finding |
|---|---|
| Provisioning | Manual tickets; 10-15 business days |
| Deployments | Some teams deploy from laptops |
| Controls | Required Shared VPC, audit logging, org policies |
| Release risk | Production changes need automated tests and approval |
| Team goal | Self-service dev/test with consistent baselines |
What should the architect recommend as the next action?
Options:
A. Keep central provisioning and add weekly release windows
B. Grant application teams Project Owner in dev/test folders
C. Create approved IaC modules with self-service gated pipelines
D. Allow laptop deployments and reconcile drift quarterly
Best answer: C
Explanation: The process needs controlled self-service, not unmanaged delegation or more manual gates. Approved Infrastructure as Code modules can encode required Shared VPC, audit logging, IAM, and organization policy baselines once, then let teams provision standard environments quickly. CI/CD pipelines add repeatable plans, automated validation, tests, production approval gates, and audit trails. This improves implementation speed by reducing ticket queues and improves operational safety by making every environment and release follow the same governed path. Fast but unmanaged access would increase drift and incident risk.
- Broad owner access is faster initially, but it weakens least privilege and makes required guardrails easier to bypass.
- More manual scheduling may add coordination, but it does not remove the provisioning bottleneck or standardize controls.
- Quarterly drift cleanup accepts unsafe changes for too long and does not provide reliable release auditability.
Question 22
Topic: Ensuring Solution and Operations Excellence
A retail company runs a checkout API on Cloud Run. Two planned marketing campaigns caused customer-visible latency incidents. The platform team wants to separate true operational excellence work from incident workarounds.
Exhibit: Incident review excerpt
| Finding | Detail |
|---|---|
| Immediate fix | Senior SRE manually raised scaling limits |
| Repeat signal | Same pattern occurred in two campaigns |
| Alerting | First signal came from support tickets |
| Readiness | No campaign load test or capacity check |
| Runbook | “Page senior SRE to tune limits” |
Which next action best represents an operational excellence improvement rather than another short-term workaround?
Options:
A. Pre-scale manually before each campaign.
B. Add SLO alerts, load tests, and scaling automation.
C. Keep maximum scaling limits permanently high.
D. Assign the senior SRE to all campaigns.
Best answer: B
Explanation: Operational excellence improvements reduce recurring operational risk by improving observability, readiness, automation, and learning from incidents. In the exhibit, the team is repeatedly using senior-engineer manual tuning after customers are already affected. Adding SLO-based alerts, campaign load testing, and automated scaling or capacity checks changes the operating model: the team can detect impact earlier, validate readiness before launch, and avoid relying on a person to repeat an emergency action. The key distinction is system improvement versus temporary recovery. A workaround may restore service for one incident, but it does not make future campaigns safer or more predictable.
- Manual pre-scaling still depends on a person remembering and correctly estimating demand.
- Permanent overprovisioning may hide symptoms but does not fix alerting, testing, or repeatability.
- Senior SRE assignment preserves a single-person dependency instead of improving the production support process.
Question 23
Topic: Designing and Planning a Cloud Solution Architecture
A streaming media company is deciding how to move its recommendation and playback platform to Google Cloud. The product team wants to increase paid conversions and reduce churn before a holiday launch. Marketing wants lower global latency for personalization, finance has capped first-year spend, legal requires EU customer data controls, and operations has limited Kubernetes expertise. Which recommendation should drive the architecture decision?
Options:
A. Standardize on a global GKE platform to maximize long-term engineering flexibility.
B. Delay migration until all workloads and data can move together.
C. Prioritize conversion and retention journeys; set p95 latency, churn, cost-per-user, and EU data-control targets before choosing managed, phased services.
D. Rehost the full platform in the lowest-cost single region first.
Best answer: C
Explanation: Architecture decisions should start with the business use cases and measurable outcomes that matter most, not with a preferred product or migration pattern. In this scenario, the decisive goals are increasing conversions, reducing churn, meeting the launch window, lowering personalization latency, controlling spend, protecting EU customer data, and fitting the team’s operating skills. Defining outcome measures such as p95 latency, churn reduction, conversion lift, cost per user, and data-control requirements gives the architect a basis for selecting Google Cloud services and sequencing the migration. The best recommendation balances product impact with constraints instead of optimizing only for platform flexibility, lowest infrastructure cost, or migration completeness.
- Platform-first thinking misses the team-readiness constraint and does not prove that Kubernetes is needed for the product outcomes.
- Lowest-cost rehosting may reduce short-term spend but can miss global latency, compliance, and churn-reduction goals.
- All-at-once migration reduces fragmentation but risks missing the holiday launch and delays measurable business value.
Question 24
Topic: Managing and Provisioning a Solution Infrastructure
A retailer stores 40 TB of daily clickstream files in a Cloud Storage bucket in us-central1. A nightly pipeline must parse, enrich, and aggregate the files into curated objects. The pipeline must finish within 4 hours when volume doubles during promotions, keep processing in the same region, and avoid copying the full dataset to VM disks before processing.
Which architecture should you recommend?
Options:
A. Copy the bucket nightly to Persistent Disk and process it on a fixed GKE node pool.
B. Use Storage Transfer Service to move files to a multiregion bucket before processing on spot VMs.
C. Run one high-memory Compute Engine VM that mounts the bucket with Cloud Storage FUSE.
D. Use Dataflow autoscaling workers in
us-central1to process Cloud Storage objects in parallel.
Best answer: D
Explanation: Storage-backed batch workloads bottleneck when compute is serialized, far from the data, or forced through a staging disk. Here, the workload is file-based ETL and aggregation over Cloud Storage with variable volume and a strict completion window. A Dataflow batch pipeline can read objects in parallel, autoscale workers in the same region as the bucket, and handle aggregation without requiring cluster management. You would still benchmark and set appropriate worker limits and machine types, but the core architecture avoids the single-node, disk-staging, and cross-region transfer bottlenecks.
- Single VM bottleneck fails because scaling up one VM does not provide enough parallelism and Cloud Storage FUSE is not a high-throughput ETL design.
- Disk staging fails because copying 40 TB to Persistent Disk adds a large transfer step and fixed cluster capacity.
- Cross-region transfer fails because it violates data locality and adds transfer time before processing starts.
Question 25
Topic: Designing and Planning a Cloud Solution Architecture
A manufacturing company wants to modernize its on-premises order-management platform on Google Cloud. Requirements: improve release speed for new customer features, keep SOX audit evidence and data-location controls, support a few years of hybrid connectivity for systems with licensing constraints, and avoid a big-bang migration. Which cloud-first design approach best meets these requirements?
Options:
A. Lift and shift everything into one project first
B. Create a governed landing zone and migrate in waves
C. Keep core systems on-premises and build only AI prototypes
D. Rewrite all workloads before any migration begins
Best answer: B
Explanation: A cloud-first approach does not mean moving everything immediately or abandoning governance. The best design starts with a Google Cloud landing zone: organization and folder structure, project standards, Shared VPC, IAM, organization policies such as location constraints, centralized logging, and IaC-based provisioning. Then workloads can move in waves based on risk, dependencies, and business value. New or refactored capabilities can use managed Google Cloud services while legacy systems continue through secure hybrid connectivity until they are ready to migrate or retire. This balances faster delivery with auditability, security, and migration realism.
- Big-bang rewrite increases delivery risk and delays business value before any migration benefit is realized.
- Single-project lift and shift may move workloads quickly, but it defers governance and creates audit, network, and access-management debt.
- AI-only prototyping does not modernize the core platform or establish the governed foundation needed for broader cloud adoption.
Questions 26-50
Question 26
Topic: Managing Implementation
An ecommerce company is migrating an order-processing service from on-premises VMs with PostgreSQL to Cloud Run and Cloud SQL for PostgreSQL. Cutover is planned in 6 weeks. The business allows no more than 15 minutes of write downtime, needs evidence that p95 latency will meet the SLO, and cannot fund a long-running duplicate full production environment. The on-premises database must remain the system of record until cutover. Which implementation step best reduces migration risk before cutover?
Options:
A. Lift and shift the VMs to Compute Engine first, then redesign after cutover.
B. Run two full production stacks for 6 weeks and split writes between them.
C. Perform a final full export/import during cutover and rely on post-cutover monitoring.
D. Use continuous replication, short-lived staging, data validation, load tests, and a cutover rollback rehearsal.
Best answer: D
Explanation: Before cutover, migration risk should be reduced through a production-like rehearsal that validates the target architecture and the cutover process. Database Migration Service continuous replication can keep Cloud SQL close to the on-premises PostgreSQL source, reducing final write downtime. A short-lived staging environment limits cost while still allowing smoke tests, data validation, load tests, and operational runbook practice for Cloud Run. Rehearsing rollback also improves team readiness. This approach balances availability, data movement, latency validation, cost, and operations effort instead of pushing discovery into the cutover window.
- Full export/import saves setup effort but leaves downtime, data validation, and latency risk unresolved until cutover.
- Lift and shift first avoids redesign risk temporarily but does not validate the intended Cloud Run and Cloud SQL target.
- Split production writes violates the system-of-record constraint and creates consistency and cost risk before cutover.
Question 27
Topic: Designing for Security and Compliance
A security team is reviewing network controls for a Google Cloud organization.
Exhibit: Architecture note
| Fact | Detail |
|---|---|
| Scope | Prod and Regulated folders only |
| Projects | Many app projects and Shared VPC host projects |
| Requirement | Deny internet ingress to TCP 22 and 3389 |
| Exception | Sandbox folder must not inherit this control |
| Operations | App teams manage application-specific firewall rules |
Which design best supports the requirement?
Options:
A. Attach folder-level hierarchical firewall policies.
B. Associate a global network firewall policy with each VPC.
C. Copy VPC firewall rules into each project.
D. Use an organization policy to restrict external IPs.
Best answer: A
Explanation: Hierarchical firewall policies are the right control when firewall rules must be inherited consistently across projects or folders. Attaching the deny rules to the Prod and Regulated folders applies them to descendant projects and VPC networks, including future projects in those folders, while leaving the Sandbox folder outside the scope. Because these policies are centrally managed and evaluated above local VPC firewall rules, project teams can still manage application-specific rules but cannot override the centralized deny for SSH and RDP from the internet. Network firewall policies and VPC firewall rules can be useful, but they do not provide the same resource-hierarchy inheritance for this folder-scoped requirement.
- Per-network policy can standardize VPC controls, but each VPC must be associated and new networks can be missed.
- Copied VPC rules create drift risk and do not provide central folder-level enforcement.
- External IP restriction changes resource configuration but does not express the required packet-level port deny.
Question 28
Topic: Managing Implementation
A regulated finance team must implement a new Java service that calls Google Cloud APIs. Security policy requires source code and test data to remain on managed developer laptops, and browser-based development workspaces are not allowed. Developers must run unit tests while offline by using local emulators before committing to CI/CD. Device management can install approved command-line tools and language packages, but new IDE extensions will not be approved before the deadline. Which tooling recommendation best balances these constraints?
Options:
A. Use Cloud Code in the developers’ IDEs
B. Use Cloud Shell Editor in the browser
C. Use Cloud Shell Terminal for implementation commands
D. Use local tooling with CLI, client libraries, and emulators
Best answer: D
Explanation: Local tooling is the best fit when compliance, offline testing, and custom developer environments are decisive. The team can install the Google Cloud CLI, language client libraries, and local emulators through managed-device controls, keeping source code and test data on approved laptops while supporting offline unit tests. This has more setup effort than Cloud Shell, but it satisfies the mandatory constraints. Cloud Code would be attractive for IDE-integrated development, but the stem says new IDE extensions are not approved before the deadline.
- Cloud Code optimizes IDE productivity, but it conflicts with the extension-approval constraint in the stem.
- Cloud Shell Editor reduces setup effort, but it is a browser-based workspace and does not support offline laptop-only development.
- Cloud Shell Terminal fits quick command execution, not iterative implementation with local emulators and managed source-code handling.
Question 29
Topic: Analyzing and Optimizing Technical and Business Processes
An online retailer is moving a customer order service from its data center to Google Cloud. The service can run as a container with minimal changes, but its database must remain on-premises for 9 months due to licensing. Stakeholders want a customer-facing release within 3 months, no major rewrite this fiscal year, private connectivity with centralized audit evidence, and a platform the small operations team can support. Which architecture recommendation best addresses these concerns?
Options:
A. Lift and shift to Compute Engine with public database access and manual deployments.
B. Build active-active multicloud Kubernetes with self-managed database replication.
C. Refactor into microservices on GKE Standard and migrate the database before launch.
D. Deploy on Cloud Run with private hybrid connectivity, centralized observability, and automated releases.
Best answer: D
Explanation: The core decision is stakeholder-aligned phased modernization. Because the application can run as a container, Cloud Run can provide a managed runtime without requiring the small operations team to operate Kubernetes clusters. Private hybrid connectivity keeps the required on-premises database reachable without exposing it publicly, while centralized logging, monitoring, and audit logs support security evidence. Automated build and deployment processes help the product team release faster within the 3-month target. This approach defers the database migration and deeper refactoring until licensing and schedule constraints allow it.
- Immediate refactor fails because it ignores the no-rewrite constraint, the database timing constraint, and the team’s operational capacity.
- Basic lift and shift may be fast, but public database access and manual deployments miss security and release-velocity requirements.
- Active-active multicloud overbuilds the solution, adds operational complexity, and does not align with the near-term business timeline.
Question 30
Topic: Designing and Planning a Cloud Solution Architecture
A manufacturing company is redesigning an order-processing system on Google Cloud. Traffic is highly seasonal, and the current VM estate sits idle most of the year. Requirements: tolerate a zone failure with RPO under 5 minutes and RTO under 1 hour; keep regulated customer data off public networks with audited least-privilege access; reduce operational toil, cost, and idle energy use. Which design best applies Google Cloud Well-Architected principles?
Options:
A. Deploy a single-zone GKE Standard cluster and database, then rely on backups for recovery.
B. Use Cloud Run, Pub/Sub, and a regional HA managed database with private connectivity, least-privilege IAM, SLO monitoring, and IaC.
C. Build active-active clusters in two regions with always-on peak capacity and self-managed databases.
D. Lift and shift to fixed-size Compute Engine instances across zones with public HTTPS endpoints and manual patching.
Best answer: B
Explanation: Well-Architected design balances pillars instead of optimizing only one. The best fit uses managed, autoscaling services to reduce toil, cost, and idle energy use; Pub/Sub to decouple bursts and improve reliability; a regional HA managed database to tolerate a zone failure; and private connectivity plus least-privilege IAM for regulated data. SLO monitoring and infrastructure as code support operational excellence by making reliability targets visible and deployments repeatable. The key trade-off is using enough resilience to meet the stated zone-failure requirement without building an unnecessarily complex, always-on multi-region platform.
- Lift-and-shift bias fails because fixed capacity and manual patching preserve toil and idle cost, and public exposure conflicts with private-data requirements.
- Single-zone design fails because backups alone do not meet the stated zone-failure RTO and RPO targets.
- Overbuilt resilience fails because always-on multi-region self-managed databases add cost, operational burden, and idle energy beyond the stated requirement.
Question 31
Topic: Analyzing and Optimizing Technical and Business Processes
A retailer is migrating its order-management service to Google Cloud using Cloud Run, Cloud SQL, Pub/Sub, and a Shared VPC. Requirements include 24/7 checkout support, least-privilege separation between platform and product teams, and independent product-team deployments after migration. The architecture review finds dashboards exist, but no team is named to own incident triage, schema-change approvals, or CI/CD pipeline failures after handoff. Which decision best addresses the readiness risk before go-live?
Options:
A. Move the workload to GKE so the platform team owns runtime operations.
B. Grant the product team project Owner until responsibilities stabilize.
C. Define an operating model with RACI, SLOs, runbooks, and escalation paths.
D. Proceed with cutover and review ownership after the first incident.
Best answer: C
Explanation: The main gap is not a missing Google Cloud service; it is an unclear support model. For a production migration with 24/7 support and separation of duties, the architecture decision should define who owns each operational responsibility before go-live. A RACI-style operating model, SLOs, alert routing, runbooks, escalation paths, and sign-off criteria make the handoff testable and auditable. This also preserves independent deployments because product teams can own application releases while platform teams own shared infrastructure boundaries. Changing compute platforms or broadening permissions does not solve unclear ownership and may increase risk.
- Platform takeover misses the requirement for product-team deployment independence and does not clarify database or pipeline ownership.
- Broad Owner access violates least privilege and masks process gaps with excessive permissions.
- Post-incident review accepts known readiness risk for a 24/7 checkout workload instead of gating production cutover.
Question 32
Topic: Designing and Planning a Cloud Solution Architecture
An online retailer runs a regional Cloud Run storefront with Cloud SQL. Engineering proposes moving the full platform to multi-region GKE with service mesh as next-quarter improvement. The CIO will fund only work tied to measurable benefit.
| Metric or constraint | Current finding |
|---|---|
| Availability | SLO met for 6 months |
| Latency | EU p95 is 780 ms; target is 400 ms |
| Incidents | Most caused by manual releases |
| Team | 3 SREs; limited GKE operations experience |
| Budget | 12% increase available |
Which recommendation best balances future evolution with measurable business and operational benefit?
Options:
A. Proceed with multi-region GKE and service mesh now.
B. Prioritize Cloud CDN for cacheable content and automated progressive delivery on Cloud Run.
C. Migrate Cloud SQL to Spanner first.
D. Pause improvements until the availability SLO is missed.
Best answer: B
Explanation: Future improvements should be justified by observable outcomes, not only by architectural ambition. In this scenario, availability is already meeting the SLO, so a broad multi-region GKE migration is not supported by the current evidence and would add operational risk for a small team with limited GKE experience. The measured gaps are EU latency and deployment-related incidents. Cloud CDN for cacheable content can reduce user-facing latency, while automated progressive delivery on Cloud Run can reduce failed releases and rollback toil without a platform rewrite. Success should be tracked with p95 latency, conversion impact, deployment failure rate, rollback time, and SRE effort. Revisit larger platform changes only if measured growth or requirements show the current design cannot meet future needs.
- Future flexibility alone does not justify a costly platform migration when availability is healthy and team readiness is limited.
- Availability headroom is not the measured pain point, so moving to Spanner first misses the latency and release-quality issues.
- Waiting for failure ignores measurable customer latency and operational defects that already have targeted, lower-risk remedies.
Question 33
Topic: Managing and Provisioning a Solution Infrastructure
An analytics team is provisioning a nightly ML workflow to train a forecasting model from curated BigQuery features. The platform team must decide whether the proposed provisioning meets the requirements.
Exhibit: Provisioning review
| Requirement | Proposed choice |
|---|---|
Read only analytics-prod.features.sales | Grant data-scientists BigQuery Data Viewer on analytics-prod |
| Avoid user credentials and long-lived keys | Run scheduled notebook scripts under each author’s identity |
| Repeatable runs with lineage and retries | Use cron jobs on Vertex AI Workbench VMs |
| Store training artifacts centrally | Write to gs://ml-artifacts-prod |
Which interpretation should the platform team make?
Options:
A. Provision Vertex AI Pipelines with a dedicated least-privilege service account
B. Approve the notebook-based workflow as-is
C. Add Storage Admin to the data-scientists group
D. Export BigQuery features to Cloud Storage before training
Best answer: A
Explanation: The proposed provisioning does not satisfy the combined data access, security, and operational requirements. Project-level BigQuery access for a user group is broader than the stated table-only need, and scheduled notebooks running under user identities create brittle ownership, audit, and credential-management issues. Cron on Workbench VMs also does not provide the managed lineage, retries, and repeatable pipeline execution expected for a production ML workflow. A better approach is to provision Vertex AI Pipelines using a dedicated pipeline service account with only the required BigQuery read and job permissions plus scoped write access to the artifact bucket. This keeps execution identity stable and operational history centralized.
- As-is approval ignores user-bound notebook execution and overly broad project-level access.
- More storage permissions broadens access and still leaves orchestration and credential issues unresolved.
- CSV exports add data movement and lineage burden without fixing the execution identity or orchestration design.
Question 34
Topic: Managing and Provisioning a Solution Infrastructure
An insurance company stores finalized claim documents in a Cloud Storage bucket. The architect is reviewing this snapshot before implementing cost controls.
| Fact | Detail |
|---|---|
| Compliance | Retain each finalized document for 7 years |
| Protection | Prevent deletion or replacement during retention |
| Deletion | Allowed after 7 years |
| Access | Frequent for 30 days, rare after 1 year |
| Current state | Standard storage, Object Versioning on, no lifecycle rules |
What should the architect do next?
Options:
A. Use IAM conditions and annual manual cleanup
B. Set locked retention and age-based lifecycle rules
C. Delete noncurrent versions older than 30 days
D. Change the bucket default storage class to Archive
Best answer: B
Explanation: Cloud Storage retention policies and Object Lifecycle Management address different parts of this requirement. A retention policy, locked after validation, prevents deletion or replacement of objects during the 7-year compliance period. Lifecycle rules can then reduce cost by moving older objects to colder storage classes as access drops and by deleting objects only after the retention period has expired. This preserves the compliance boundary while automating cost control. Changing only the default storage class does not protect existing records or automate deletion, and deletion rules must not remove protected versions before the required retention period ends.
- Archive by default affects new objects only and does not enforce immutable 7-year retention.
- Early version deletion could remove protected generations before the compliance period expires.
- Manual IAM cleanup is operationally weak and can be changed by privileged users, unlike locked retention plus lifecycle automation.
Question 35
Topic: Analyzing and Optimizing Technical and Business Processes
A retail company is standardizing Google Cloud infrastructure delivery. Product teams are blocked for several days while a central team manually provisions VPCs, Cloud Run services, and Cloud SQL instances. Leadership must enforce budget labels, network baselines, least privilege, separation of duties, and audit evidence. Delivery teams need repeatable self-service provisioning for nonproduction environments. Which process change should the architect recommend?
Options:
A. Let each team create its own modules and complete security review after deployment.
B. Offer approved infrastructure modules through an internal service catalog with CI/CD policy checks and auditable approvals.
C. Give product teams Project Owner access and review Cloud Audit Logs monthly.
D. Keep central manual provisioning and add a weekly change board for environment requests.
Best answer: B
Explanation: The best process change is to move from manual ticket fulfillment to governed self-service. Approved infrastructure-as-code modules exposed through an internal service catalog let delivery teams provision repeatable environments quickly. CI/CD policy checks can enforce labels, network baselines, IAM patterns, and required approvals before deployment, while pipeline logs and approvals provide audit evidence. This aligns business controls with delivery-team needs instead of choosing speed or governance alone.
The key is not only using Terraform or CI/CD; it is changing the provisioning process so controls are built into reusable paths that teams can consume safely.
- Broad project access improves speed but violates least privilege and relies on after-the-fact detection.
- More change boards preserves control but does not address the stated delivery delay or self-service requirement.
- Post-deployment review allows inconsistent infrastructure and finds policy violations too late.
Question 36
Topic: Designing for Security and Compliance
A healthcare SaaS company needs a third-party support team to SSH to private Compute Engine admin VMs during incidents. The security team proposes the access path below. What is the best interpretation?
| Requirement | Proposed access path |
|---|---|
| Use vendor corporate IdP; no shared users | HA VPN from vendor office; shared vendor-admin Linux account |
| No public IPs; access only named admin VMs | VMs have no public IPs; SSH allowed from vendor CIDR to all prod VMs |
| Per-user audit and fast revocation | Cloud VPN logs enabled; SSH key rotated monthly |
| Operational support via SSH | SSH over the VPN tunnel |
Options:
A. Reject it and use federated IAM with IAP TCP forwarding and OS Login.
B. Reject it because private VMs cannot be accessed by SSH.
C. Approve it because VPN encryption satisfies remote access compliance.
D. Approve it if the shared SSH key is rotated after incidents.
Best answer: A
Explanation: The proposed path meets only part of the network requirement: the VMs have no public IPs. It fails the identity and compliance requirements because a shared Linux account and shared SSH key do not provide per-user identity, fast individual revocation, or useful per-user audit trails. It also grants broader network reach than required by allowing SSH from the vendor CIDR to all production VMs. A better pattern is to federate the vendor IdP into Google Cloud IAM, authorize only the required users and VMs, use IAP TCP forwarding for SSH to private VMs, and use OS Login for user-level Linux access. VPN encryption alone is not enough when identity and auditability are explicit requirements.
- VPN-only thinking misses that encrypted connectivity does not prove who performed each SSH action.
- Key rotation reduces credential exposure but still leaves shared-account audit and revocation gaps.
- Private VM misconception fails because private Compute Engine VMs can be reached through controlled paths such as IAP TCP forwarding.
Question 37
Topic: Designing for Security and Compliance
A retailer is migrating its commerce platform to Google Cloud. It classifies data as Public, Internal, or Restricted. The architect must satisfy these requirements:
- Restricted data includes PII/payment data, must remain in
us-central1, use security-team-controlled keys, be retained 7 years, and have admin and data access audited. - Internal analytics must avoid Restricted fields and remain cost-efficient.
- Public catalog content may be globally cached.
Which architecture decision best aligns the platform with the classification model?
Options:
A. Segment by classification; keep Restricted data in
us-central1buckets/databases with CMEK, IAM, retention, and audit sinks; publish de-identified analytics and cache Public content.B. Treat all data as Restricted with CMEK, 7-year retention, and full audit logging; disable analyst access and public caching.
C. Store all classes together in multi-region buckets and databases; use labels for classification and rely on default encryption and logging.
D. Use one shared BigQuery multi-region dataset with row-level security; set dataset expiration to 7 years and use one CMEK.
Best answer: A
Explanation: Data classification should drive where data is stored, who can access it, how it is encrypted, how long it is retained, and what audit evidence is collected. Restricted PII/payment data needs stronger boundaries: regional storage and databases in us-central1, CMEK managed by the security team, least-privilege IAM, retention controls, and enabled audit logging with log sinks. Internal analytics should use de-identified or filtered datasets so analysts can work cost-effectively without accessing Restricted fields. Public catalog data can use caching because it has a lower sensitivity level. The key is to apply controls proportionally by classification rather than using labels alone or applying the most restrictive controls to everything.
- Labels only fail because labels do not enforce location, encryption-key ownership, access, retention, or Data Access audit logging.
- Restrict everything overbuilds the solution and misses the stated analytics and public-caching requirements.
- Shared dataset weakens isolation and the multi-region choice conflicts with the Restricted data residency requirement.
Question 38
Topic: Designing and Planning a Cloud Solution Architecture
A financial services company is migrating a customer portal from on-premises VMs to Google Cloud. The cutover window is limited to 15 minutes, checkout and reporting must keep their current SLAs, PCI-scoped controls need audit evidence, and the operations team must be ready for the first week. Budget allows one temporary validation environment, but not a months-long duplicate production stack. What is the best balanced migration validation approach before cutover?
Options:
A. Build a full duplicate production stack for several months and validate every historical workflow before migration.
B. Skip a separate validation environment and rely on rollback during the 15-minute production cutover.
C. Perform only load testing against the migrated application, then approve cutover if latency meets the SLA.
D. Run a production-like pilot with representative masked data, defined go/no-go criteria, performance, security, DR, and runbook tests.
Best answer: D
Explanation: Migration validation should prove that the target solution is ready from multiple angles, not just that servers start. A production-like pilot or dress rehearsal can use masked or representative data, a realistic traffic profile, security control checks, DR or rollback testing, monitoring, alerts, and operational runbooks. Clear go/no-go criteria tie the test results to business SLAs, compliance evidence, RTO/RPO expectations, and team readiness. This approach also respects the budget constraint by avoiding a long-lived full duplicate environment while still reducing cutover risk. The key takeaway is to validate readiness across business, technical, security, and operations domains before approving production migration.
- Latency-only testing misses audit evidence, operational readiness, and recovery validation required by the scenario.
- Rollback-only cutover optimizes cost but creates unacceptable availability and readiness risk in a short cutover window.
- Full duplicate stack may reduce uncertainty, but it violates the stated budget and timeline constraint.
Question 39
Topic: Managing and Provisioning a Solution Infrastructure
A retailer is moving a product-recommendation model from experimentation to a shared production workflow on Google Cloud.
Exhibit: AI workflow note
Current: data team runs notebooks against BigQuery.
Handoff: app team receives model files in Cloud Storage.
Issues: inconsistent feature prep, unclear model lineage, slow approvals.
Requirement: repeatable retraining, auditable artifacts, shared ownership.
Which next action best addresses the design implication?
Options:
A. Expose notebook outputs through a Cloud Run API.
B. Move model files to a versioned Cloud Storage bucket.
C. Implement Vertex AI Pipelines with versioned components and tracked artifacts.
D. Schedule the notebooks on a shared Compute Engine VM.
Best answer: C
Explanation: The exhibit points to an ML lifecycle problem, not just a storage or serving problem. Vertex AI Pipelines is designed to orchestrate repeatable ML workflows such as data extraction, preprocessing, training, evaluation, registration, and deployment. Pipeline components can be versioned in source control, runs can be parameterized, and artifacts and metadata can be tracked for lineage and auditability. This gives data teams a consistent way to build and validate models while giving application teams a clear, governed handoff for deployable artifacts. A versioned bucket or API may help one stage, but it does not solve workflow repeatability, traceability, and cross-team collaboration.
- Shared VM scheduling keeps the notebook-centric process and does not provide strong lineage or reusable pipeline components.
- Versioned storage helps retain model files but does not capture preprocessing, training, evaluation, and approval context.
- Cloud Run API addresses serving integration, not the upstream workflow and auditability gaps shown in the note.
Question 40
Topic: Designing and Planning a Cloud Solution Architecture
A media company runs customer-facing APIs on Cloud Run, event processing on Pub/Sub and Dataflow, and analytics on BigQuery. The business plan now adds international launches, AI-assisted content features, and more partner integrations. Leadership wants the platform to adopt useful Google Cloud capabilities over time without disrupting quarterly releases or losing cost visibility. Which architecture-review focus area should be prioritized next?
Options:
A. Freeze service choices until expansion is complete
B. Tune autoscaling settings for the existing services
C. Replace Cloud Run services with a standard GKE platform
D. Align architecture evolution with business and technology roadmaps
Best answer: D
Explanation: The key review focus is architecture evolution: checking whether current service boundaries, integration patterns, data governance, operational practices, and decision records can support the business roadmap while taking advantage of newer cloud capabilities incrementally. This is broader than optimizing one service. For this scenario, the architect should create or refresh an evolution roadmap that links business goals, likely technology changes, risks, dependencies, and migration increments. That keeps quarterly releases moving while preserving cost and operational visibility. A forced platform migration or a change freeze would either overcommit the team or block needed adaptation.
- Platform standardization overbuilds the response because the stem does not show a Cloud Run limitation that requires GKE.
- Autoscaling tuning may improve current operations but does not address international growth, AI features, or partner integration strategy.
- Change freeze conflicts with the requirement to adopt useful Google Cloud capabilities over time.
Question 41
Topic: Designing for Security and Compliance
A company hired a partner team for 9 months to operate a regulated Google Cloud project. The partner already uses its own OIDC identity provider and rotates staff frequently. Contractors need occasional privileged deployment access to Google Cloud APIs, but must not receive long-lived credentials or broad network access. Compliance requires individual attribution in audit logs and rapid revocation without creating and deleting many local user accounts. Which access-control correction is the best balanced recommendation?
Options:
A. Connect the partner network with Cloud VPN and restrict by IP range.
B. Use Workforce Identity Federation with least-privilege IAM and service account impersonation.
C. Issue a shared service account key and rotate it weekly.
D. Create Cloud Identity accounts and grant Project Editor during the contract.
Best answer: B
Explanation: Workforce Identity Federation is the appropriate pattern for external human users who already authenticate with a trusted external IdP. It avoids creating and deleting local Google identities, supports rapid revocation through the partner’s IdP, and eliminates long-lived service account keys. Mapping IdP attributes or groups to least-privilege IAM bindings keeps access scoped, while service account impersonation can provide controlled privileged access for deployment tasks with audit trails that preserve the external user context. This balances compliance, security, and operational effort better than local accounts, shared credentials, or network-based trust.
- Local accounts add lifecycle overhead, and Project Editor is broader than the privileged deployment access required.
- Shared keys optimize setup speed but violate the no-long-lived-credential and individual audit attribution requirements.
- VPN by IP addresses connectivity, not identity, and conflicts with the requirement to avoid broad network access.
Question 42
Topic: Designing and Planning a Cloud Solution Architecture
A retail company is planning a new checkout platform on Cloud Run with Pub/Sub for asynchronous fulfillment. The architecture review includes this note. What is the best design implication?
| Area | Visible requirement |
|---|---|
| User goal | 99.9% checkout success; p95 API latency < 500 ms |
| Diagnosis | Identify slow calls across services within minutes |
| Releases | Alert only on likely customer impact |
| Data | Do not log payment or PII payloads |
Options:
A. Rely on default Cloud Run request logs until traffic stabilizes.
B. Create dashboards only after the first production incident.
C. Log full request and response payloads for every checkout call.
D. Design correlated, redacted telemetry with SLO alerts and service dashboards.
Best answer: D
Explanation: Observability requirements should shape the architecture before implementation when the business has explicit SLOs, latency targets, release safety needs, diagnostic goals, and privacy limits. In this case, default platform logs are not enough. The design should include structured application logs with redaction, metrics aligned to checkout success and latency, distributed tracing across Cloud Run and Pub/Sub boundaries where applicable, profiling for performance hotspots, customer-impact alerting, and dashboards for service and user-journey health. The privacy requirement also means telemetry must avoid sensitive payloads while still preserving useful correlation IDs and operational context. The key takeaway is that observability is part of the solution design, not only an operations task after launch.
- Default logs only misses cross-service diagnosis, SLO alerting, and application-level success metrics required by the review note.
- Full payload logging conflicts with the stated payment and PII restriction and can increase cost and risk.
- Incident-driven dashboards delays a required design capability until after customer impact has already occurred.
Question 43
Topic: Designing and Planning a Cloud Solution Architecture
A retail company is reviewing a draft Google Cloud migration proposal for its order database. The proposal labels all of these as “requirements”:
- Use
Cloud SQL for PostgreSQL - p95 checkout read latency must be under 100 ms for EU customers
- RPO must be 5 minutes or less and the service must tolerate a zonal failure
- The first migration wave cannot require application-code changes
Which architecture decision should the cloud architect recommend?
Options:
A. Mandate Spanner because it provides global scale and strong consistency.
B. Treat latency and RPO as requirements, no-code migration as a constraint, and Cloud SQL as a design option to validate.
C. Treat all listed items as technical requirements and implement the draft.
D. Approve Cloud SQL because it is managed and reduces database operations.
Best answer: B
Explanation: Technical requirements describe what the solution must achieve, such as latency, availability, durability, RPO, or failure tolerance. Solution constraints limit how the team can meet those requirements, such as no application-code changes, license restrictions, fixed timelines, or mandated platforms. A named Google Cloud service in a proposal is usually an implementation choice unless the business explicitly makes it a constraint. In this case, latency and RPO are measurable technical requirements, while the no-code first wave is a migration constraint. Cloud SQL should be evaluated against those facts rather than accepted as a requirement by itself.
- Managed service shortcut misses the need to verify latency, RPO, and zonal-failure tolerance.
- Spanner mandate may overbuild the solution and conflicts with the no-code migration constraint.
- All technical requirements incorrectly mixes system outcomes, migration limits, and implementation choices.
Question 44
Topic: Managing and Provisioning a Solution Infrastructure
A company is migrating a Kubernetes-based fraud detection application to Google Cloud. The team wants to keep the current deployment model where possible.
Exhibit: Migration notes
| Requirement | Detail |
|---|---|
| Packaging | Helm charts and Kubernetes manifests |
| Portability | Keep Kubernetes API compatibility |
| Node control | Run an approved privileged DaemonSet using host namespaces |
| Operations | Avoid self-managing the Kubernetes control plane |
What is the best design implication?
Options:
A. Replatform the services to Cloud Run.
B. Use GKE Autopilot for all workloads.
C. Self-manage Kubernetes on Compute Engine.
D. Use GKE Standard with managed node pools.
Best answer: D
Explanation: The decisive requirement is node-level control. GKE Standard keeps the Kubernetes control plane managed by Google while allowing more control over node pools and workload patterns such as approved privileged DaemonSets that need host access. It also preserves the team’s Helm charts, manifests, and Kubernetes API portability. GKE Autopilot would reduce node operations more, but it is best suited when workloads do not need direct node management or privileged host-level behavior. Cloud Run lowers operations further but changes the orchestration model. Self-managed Kubernetes gives maximum control, but it adds control-plane operations the team explicitly wants to avoid.
- Autopilot fit is tempting because it reduces node work, but the visible node-level DaemonSet requirement points to Standard.
- Cloud Run fit misses the requirement to keep Kubernetes API compatibility and DaemonSet behavior.
- Self-managed Kubernetes provides control, but it violates the requirement to avoid managing the control plane.
Question 45
Topic: Designing and Planning a Cloud Solution Architecture
A retailer is moving a 2 TB PostgreSQL order database to Google Cloud. Traffic is predictable and fits on one primary instance. The team has strong PostgreSQL skills but only two DBAs. Requirements include low latency by keeping the app and database in one region, private connectivity, CMEK, automated backups, automated failover, and less OS/database patching work. Which recommendation best balances these constraints?
Options:
A. Rebuild the database on Spanner for global availability.
B. Run PostgreSQL on Compute Engine with a managed instance group.
C. Move order records to BigQuery and query them from the application.
D. Migrate to Cloud SQL for PostgreSQL with HA, private IP, backups, and CMEK.
Best answer: D
Explanation: The core decision is whether a managed relational service satisfies the workload while reducing operational burden. Cloud SQL for PostgreSQL fits a predictable regional PostgreSQL workload and removes much of the work for OS maintenance, backups, and HA failover configuration. It also supports private IP connectivity and CMEK, which align with the security requirements without forcing a major application or data model redesign. This is a better balance than self-managing PostgreSQL for control or adopting a more disruptive distributed database when the workload does not require it.
- Self-managed control fails because Compute Engine keeps PostgreSQL compatibility but leaves the team responsible for patching, backup design, and failover operations.
- Global availability focus fails because Spanner adds redesign effort and team readiness risk for a workload that only needs regional low latency.
- Analytics platform misuse fails because BigQuery is not the right serving database for a transactional order-processing application.
Question 46
Topic: Designing and Planning a Cloud Solution Architecture
A manufacturer is moving an SAP reporting environment to Google Cloud but will keep transactional systems in two company data centers for 18 months. Reporting jobs will transfer 12-15 Gbps of private traffic during business hours, and batch windows are sensitive to jitter. The design must tolerate a single connectivity failure and avoid the public internet for steady-state data movement. Each data center can order cross-connects in a colocation facility that supports Google Cloud Interconnect. Which connectivity pattern best meets these requirements?
Options:
A. VPC Network Peering between the VPC and data centers
B. A single Partner Interconnect connection per data center
C. Redundant Dedicated Interconnect connections with Cloud Router/BGP
D. HA VPN tunnels with dynamic routing over the internet
Best answer: C
Explanation: The key requirements are sustained multi-Gbps bandwidth, private connectivity, predictable latency/jitter, and resilience to a single connectivity failure. Dedicated Interconnect is the best fit when the organization can connect in a supported colocation facility and needs high-throughput private connectivity to Google Cloud. Using redundant physical connections and Cloud Router with BGP helps maintain routing if a link or edge path fails. HA VPN is often faster and lower cost to deploy, but it uses the public internet and is less suited to sustained 12-15 Gbps, jitter-sensitive traffic. A single interconnect path would leave a clear single point of failure.
- HA VPN trade-off fails because internet-based encrypted tunnels do not meet the private, predictable, sustained high-bandwidth requirement.
- VPC peering mismatch fails because VPC Network Peering connects VPC networks, not an on-premises data center network directly.
- Single interconnect path fails because one Partner Interconnect connection per data center does not satisfy the stated single-failure tolerance.
Question 47
Topic: Designing for Security and Compliance
A healthcare SaaS company is moving a customer-support assistant to production. The assistant uses Gemini models through Vertex AI, with Model Armor and Sensitive Data Protection controls for prompts and responses. Legal needs evidence that AI security controls are operating before launch. The platform team has two weeks, limited staff, and must not copy raw customer conversations outside Google Cloud. Which recommendation is the best balanced?
Options:
A. Store screenshots of the Model Armor configuration in the project plan.
B. Use latency and error-rate dashboards as the primary compliance evidence.
C. Export raw prompts and responses to the auditor for manual inspection.
D. Use Cloud Audit Logs, sanitized enforcement logs/metrics, and signed review evidence retained centrally.
Best answer: D
Explanation: AI compliance evidence should show both control design and operating effectiveness. Cloud Audit Logs can support who changed security-relevant configurations, sanitized application or Model Armor enforcement events can show allow/block policy decisions, and Cloud Monitoring metrics can demonstrate ongoing operation. A documented security or compliance review connects that evidence to the required controls. Keeping the evidence in Google Cloud with redaction or minimized content supports privacy requirements, reduces data movement, and is feasible for a small team on a short timeline. Raw transcript review may look thorough, but it creates unnecessary privacy and data-handling risk.
- Raw transcript export may provide detail, but it violates the constraint against copying customer conversations outside Google Cloud.
- Operational dashboards only optimize availability evidence, not AI security or compliance enforcement evidence.
- Configuration screenshots are easy to collect, but they are point-in-time artifacts and do not prove policies operated during use.
Question 48
Topic: Designing for Security and Compliance
A healthcare analytics company is piloting a Gemini-based assistant that summarizes patient support tickets for call-center agents. Tickets can contain PHI. Agents need responses in under 2 seconds, auditors require evidence of data handling and safety controls before expansion, and the ML team cannot operate a custom moderation stack. Which recommendation best balances these constraints?
Options:
A. Self-host a custom moderation service and open model on GPUs before the pilot.
B. Disable logging and safety filters to reduce PHI retention and latency.
C. Use managed Model Armor, Sensitive Data Protection redaction, least-privilege access, and audit logging to balance latency, compliance, and operations.
D. Give agents direct model access and rely on training to avoid submitting PHI.
Best answer: C
Explanation: AI workflows introduce security risks at prompt input, model output, deployment, and logging points. In this scenario, the best architecture uses managed controls close to the assistant: Sensitive Data Protection can help identify or redact PHI before prompts or logs are retained, Model Armor can help screen prompts and responses for unsafe interactions, least-privilege service access limits who and what can invoke the workflow, and audit logging provides compliance evidence. This layered approach fits the low-latency pilot and avoids asking a small team to build and operate custom moderation infrastructure. The key trade-off is using managed controls to reduce risk without blocking the business pilot or removing evidence auditors need.
- Direct model access shifts PHI prevention to users and bypasses enforceable prompt, response, and access controls.
- Self-hosting moderation conflicts with the stated team-readiness constraint and can add cost and latency before the pilot proves value.
- Disabling logs and filters may reduce overhead, but it removes safety controls and the evidence auditors require.
Question 49
Topic: Analyzing and Optimizing Technical and Business Processes
A company provisions Google Cloud resources by copying Terraform files from a shared drive and running terraform apply from an engineer’s laptop. Development, test, and production resources share one project. Auditors now require approval evidence and rollback readiness, but product teams cannot tolerate a long deployment freeze. What is the best balanced recommendation?
Options:
A. Keep the current workflow and add weekly audit screenshots.
B. Centralize all applies on one operations laptop with manual approval emails.
C. Move Terraform to Git with environment projects, PR approvals, automated plans, and release rollback.
D. Freeze all provisioning until a custom internal platform is built.
Best answer: C
Explanation: The key gap is governance in the provisioning workflow, not Terraform itself. A balanced solution keeps infrastructure as code, but makes it controlled and repeatable: store changes in Git, require peer or owner review for production, run automated plan checks, separate environments into distinct projects or folders, and use tagged releases or known-good commits for rollback. This improves compliance evidence and reduces blast radius without stopping delivery for months. Automation also reduces manual operational effort compared with a ticket-only or laptop-based process.
- Audit screenshots provide weak evidence and do not fix uncontrolled applies, shared environments, or rollback readiness.
- Long freeze over-optimizes control and delays product teams when a phased IaC pipeline can reduce risk faster.
- One operations laptop keeps manual bottlenecks and still lacks strong versioning, reproducibility, and environment isolation.
Question 50
Topic: Managing and Provisioning a Solution Infrastructure
A financial services company runs three independently owned Google Cloud VPCs in separate projects: payment processing, analytics, and shared services. It is adding an AWS VPC that hosts a fraud-scoring API. Requirements: use private IP connectivity, avoid full-mesh network builds as more VPCs are added, keep payment and analytics segmented except for approved paths, and let a central network team control routing and firewall policy. Which connectivity pattern should the architect recommend?
Options:
A. One Shared VPC for all workloads with subnet-level isolation
B. Network Connectivity Center with VPC spokes and an HA VPN spoke to AWS
C. Full-mesh VPC Network Peering with separate VPN tunnels to AWS
D. External HTTPS load balancers with Cloud Armor IP allowlists
Best answer: B
Explanation: Network Connectivity Center fits a multi-VPC and multicloud topology where connectivity needs to scale without building pairwise links between every network. The Google Cloud VPCs can be attached as VPC spokes, and AWS can connect through an HA VPN spoke. A central network team can manage the routing pattern, while hierarchical firewall policies, VPC firewall rules, and route design limit payment and analytics communication to approved paths. The key point is to separate the connectivity fabric from the segmentation controls: NCC provides scalable private reachability, and policy controls determine which flows are allowed.
- Full-mesh peering and separate VPNs scale poorly, and VPC Network Peering does not provide transitive routing through another VPC.
- A single Shared VPC ignores the requirement for independently owned VPCs and can increase blast-radius concerns.
- External load balancers and IP allowlists do not meet the private IP connectivity requirement for internal network communication.
Continue in the web app
Use IT Mastery for interactive Google Cloud Professional Cloud Architect practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.
Try Google Cloud Professional Cloud Architect on Web