Free CompTIA Cloud+ CV0-004 Full-Length Practice Exam: 90 Questions

May 1, 2026

Try 90 free CompTIA Cloud+ CV0-004 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length CompTIA Cloud+ CV0-004 practice exam includes 90 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the CompTIA Cloud+ CV0-004 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try CompTIA Cloud+ CV0-004 on Web View full CompTIA Cloud+ CV0-004 practice page

Exam snapshot

Exam route: CompTIA Cloud+ CV0-004
Practice-set question count: 90
Time limit: 90 minutes
Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

Domain	Weight
Cloud Architecture	23%
Deployment	19%
Operations	17%
Security	19%
Devops Fundamentals	10%
Troubleshooting	12%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Cloud Architecture

A cloud engineer is provisioning storage for a VM that hosts an order-processing database. The workload performs many small random reads and writes during business hours, and slow storage directly affects checkout latency. Management accepts a higher storage cost for this active database volume but wants to avoid unnecessary overprovisioning. Which disk choice best meets the requirement?

Options:

A. Object storage with archive tiering
B. HDD-backed archive storage
C. SSD-backed block storage
D. HDD-backed block storage

Best answer: C

Explanation: SSD is the best fit for latency-sensitive workloads with frequent small random reads and writes. HDD is typically less expensive per capacity unit, but it has higher latency and lower random IOPS, which conflicts with the checkout performance requirement.

The core storage tradeoff is performance versus cost. SSD-backed disks are preferred for active databases, boot volumes, and other workloads that need low latency and high random I/O performance. HDD-backed disks can be cost-effective for large, sequential, or less performance-sensitive data, but they are a poor fit when storage latency directly affects a production transaction path. Since management accepts higher cost for the active database volume, SSD-backed block storage meets the requirement without shifting the workload to a storage class intended for colder data.

Lower-cost HDD fails because the workload needs strong random I/O and low latency, not just cheap capacity.
Archive HDD fails because archive-oriented storage is intended for infrequently accessed data, not an active database.
Object archive tiering fails because the requirement is a VM database volume, not cold object retention.

Question 2

Topic: Cloud Architecture

A cloud team is selecting compute options for two application components:

Legacy reporting service: requires full OS administrative control, a custom host agent, and minimal application changes. Traffic is steady and predictable.
Upload processor: performs stateless image conversions when files arrive in object storage. It is idle for long periods but can receive large bursts, and the team wants to avoid server patching.

Which TWO compute choices best meet these requirements?

Options:

A. Run the reporting service in managed containers.
B. Run the upload processor on always-on large VMs.
C. Use fixed-replica stateful containers for uploads.
D. Trigger serverless functions for the upload processor.
E. Run the reporting service on virtual machines.
F. Move the reporting service to serverless functions.

Correct answers: D and E

Explanation: The legacy service needs VM-level control because it depends on OS administration and a custom host agent. The upload processor is a better fit for serverless because it is stateless, event-driven, bursty, and should not require server patching.

Compute selection depends on the workload’s control, scaling, and operational requirements. Virtual machines are appropriate when an application needs full OS access, custom host-level software, or minimal refactoring. Containers reduce packaging and deployment overhead, but they do not provide the same level of host OS control as VMs. Serverless functions are optimized for short-lived, stateless, event-driven work that scales automatically and minimizes infrastructure operations.

The key distinction is control versus operational abstraction: use VMs when the workload needs control of the operating environment, and use serverless when the workload is stateless, bursty, and event-triggered.

Managed containers do not satisfy the reporting service’s need for full OS-level control and custom host dependencies.
Always-on VMs can process uploads, but they miss the no-server-patching goal and are inefficient for long idle periods.
Serverless reporting conflicts with the legacy service’s OS control and minimal-change requirements.
Stateful fixed replicas add unnecessary persistence and fixed capacity for a stateless burst workload.

Question 3

Topic: Security

A company is standardizing access to its cloud management environment. Administrators must use the web portal and CLI, deployment automation must use APIs/SDKs, and the security team requires MFA, least privilege, and no long-lived shared credentials. Which implementation best meets these requirements?

Options:

A. Federated SSO with MFA, RBAC roles, and short-lived tokens
B. Local cloud users with permanent access keys for all admins
C. One shared administrator account stored in a password vault
D. Network allowlisting only for management portal access

Best answer: A

Explanation: The best implementation uses centralized identity federation, MFA, least-privilege RBAC, and temporary credentials. This supports interactive access through the portal and CLI while allowing automation to call APIs and SDKs securely without long-lived shared credentials.

Securing cloud management access requires controlling both human and automated access paths. Federated SSO ties administrator authentication to a central identity provider, MFA strengthens sign-in, and RBAC limits each identity to required actions. CLI, API, and SDK access should use short-lived tokens or role assumption rather than permanent access keys, especially for automation. This reduces credential exposure and improves accountability because each action can be traced to a user, role, or workload identity. A password vault can help store secrets, but it does not make shared administrator credentials least-privilege or accountable. The key takeaway is to secure management-plane access consistently across portal, CLI, API, and SDK workflows.

Shared admin account fails because it weakens accountability and usually grants excessive privilege.
Permanent access keys fail because long-lived credentials increase exposure risk and violate the stated constraint.
Network allowlisting only fails because it does not provide MFA, least privilege, or secure API/SDK identity control.

Question 4

Topic: Security

A cloud administrator is investigating an IaaS web server after object storage audit logs show the instance role was used to list buckets from an unfamiliar public IP. The application logs contain repeated HTTP requests with URLs such as http://169.254.169.254/latest/meta-data/iam/security-credentials/ submitted through a user-controlled URL parameter. CPU, memory, and network utilization are otherwise normal.

What is the MOST likely root cause?

Options:

A. Ransomware encryption activity
B. Distributed denial-of-service attack
C. Metadata service credential exposure
D. Cryptojacking malware

Best answer: C

Explanation: The symptoms point to a metadata attack against the instance metadata service. The attacker likely used a server-side request path to retrieve temporary IAM role credentials, then used those credentials from another IP to access object storage.

A metadata attack targets the instance metadata service, commonly reachable from the workload at a link-local address such as 169.254.169.254. If an application can be tricked into requesting attacker-supplied URLs, the attacker may retrieve temporary role credentials and use them outside the instance. The key indicators are metadata endpoint requests in application logs and subsequent cloud API activity using the instance role from an unfamiliar public IP. Normal CPU and network utilization make resource-exhaustion or mining explanations less likely.

The main takeaway is that unusual access to metadata URLs plus misuse of instance role credentials strongly indicates metadata credential exposure.

Cryptojacking usually shows sustained high CPU or GPU usage and unauthorized mining processes, which are not present here.
DDoS would typically cause traffic spikes, latency, dropped requests, or service exhaustion rather than role credential use.
Ransomware would show file encryption, ransom notes, or backup deletion behavior, not metadata endpoint requests.

Question 5

Topic: Deployment

A team is releasing a new version of a customer portal using a blue-green strategy with a 5-minute rollback requirement. After traffic is moved to the green environment, users receive intermittent HTTP 503 errors. The rollback attempt also fails because the load balancer reports zero healthy targets in the blue environment. Deployment notes show that blue was scaled to zero before the cutover to reduce cost. What is the best next fix?

Options:

A. Increase the DNS TTL before the next cutover
B. Convert the release to an in-place rolling update
C. Restore blue from the latest backup after cutover
D. Keep blue fully running and switch traffic back to it

Best answer: D

Explanation: Blue-green deployment depends on two production-ready environments. Scaling blue to zero removed the fast rollback target, so the load balancer had no healthy endpoints when the team tried to revert traffic.

In a blue-green deployment, the current production environment and the new release environment should both be deployable, reachable, and health-checked during the cutover window. The traffic switch is the release mechanism, and rollback is usually another traffic switch back to the previous environment. Because blue was scaled to zero, it was no longer production-ready, which directly explains the failed rollback and unhealthy target status. Cost optimization can be considered after the release is validated, but not before the rollback window has passed. The key takeaway is that blue-green trades temporary duplicate capacity for fast rollback.

Rolling update changes instances in place and does not preserve two complete production-ready environments for instant rollback.
DNS TTL change may affect propagation speed, but it does not fix the missing healthy blue targets.
Backup restore can recover an environment, but it is too slow for the stated 5-minute rollback requirement.

Question 6

Topic: Security

A healthcare company is provisioning storage for patient documents collected from customers in Germany and France. The compliance team states that primary data, replicas, and provider-managed backups must remain in EU regions for 10 years, and the company must control the encryption keys. Which implementation best satisfies these constraints?

Options:

A. Low-cost object storage in a non-EU region with a CDN edge cache in Europe
B. EU object storage with EU-only replication, retention lock, and customer-managed keys
C. Global SaaS document storage with provider-managed keys and automatic worldwide replication
D. EU primary storage with disaster recovery backups replicated to a non-EU region

Best answer: B

Explanation: Data residency and retention requirements apply to all durable copies, not just the primary storage location. The best implementation uses EU-only storage and replication, a retention lock for the 10-year requirement, and customer-managed keys for ownership and control.

The core concept is compliance-aware cloud service and region selection. When requirements specify locality, ownership, and retention, the design must ensure primary data, replicas, snapshots, and provider-managed backups stay within the approved geography. A retention lock or equivalent immutability control helps enforce the required 10-year retention period, while customer-managed keys support organizational control over encryption. CDN caching, global SaaS replication, or off-region disaster recovery may improve availability or performance, but they can violate data locality constraints if durable copies leave the allowed regions. Always validate where each service stores primary data, replicas, logs, and backups before provisioning.

Global replication fails because worldwide copies and provider-managed keys conflict with locality and key-control requirements.
CDN in Europe does not fix the problem when the source storage is in a non-EU region.
Off-region DR violates the requirement because backups and replicas must also remain in EU regions.

Question 7

Topic: Cloud Architecture

An e-commerce site serves product images from a single regional object storage endpoint through an application load balancer. Users in distant regions report slow image loads, but origin CPU, load balancer health checks, and firewall deny logs are normal. The images are public, cacheable, and change only during nightly catalog updates. Which next fix best addresses the root cause?

Options:

A. Add an application gateway for path-based routing
B. Replace the application load balancer with a network load balancer
C. Place a CDN in front of the image endpoint
D. Open additional inbound firewall ports

Best answer: C

Explanation: The symptom is geographic latency for static, cacheable content, not a failed origin or blocked connection. A CDN is designed to cache content closer to users and reduce round-trip time to a single regional origin.

A CDN’s primary role is edge delivery for static or cacheable content. In this scenario, the origin is healthy, firewall logs do not show blocking, and the content changes on a predictable schedule, making it a good CDN candidate. Application and network load balancers distribute traffic to backend targets, but they do not solve long-distance content delivery by themselves. Firewalls control allowed or denied traffic; they do not improve global download latency when no deny events are present.

The key takeaway is to match the symptom to the component’s role: use a CDN for cacheable content latency, not a load balancer or firewall change.

Layer 4 balancing does not cache content near users, so a network load balancer would not address distance-related image latency.
Path-based routing helps direct HTTP requests to backend pools, but the issue is global content delivery latency.
Firewall ports are not the issue because traffic is already allowed and deny logs are normal.

Question 8

Topic: Operations

A cloud operations team is reviewing a resource inventory for lifecycle management. The team must flag findings that create operational risk specifically because resources are unsupported, unpatched, orphaned, or unmanaged. Which findings should be prioritized? Select TWO.

Options:

A. An internet-facing load balancer with no owner tag or backend targets
B. A stopped VM with a current owner tag and patch policy
C. A container image rebuilt weekly from a supported base image
D. An encrypted snapshot retained by an approved backup policy
E. A VM running an end-of-support OS with failed patch jobs
F. A database instance monitored by alerts and maintenance windows

Correct answers: A and E

Explanation: Lifecycle management should identify resources that no longer have support, patch coverage, ownership, or valid workload purpose. An end-of-support VM with failed patches and an ownerless internet-facing load balancer are clear operational risks because they are not being properly maintained or governed.

Cloud resource lifecycle management reduces risk by keeping resources inventoried, owned, patched, supported, and decommissioned when no longer needed. Unsupported operating systems cannot reliably receive security fixes, and failed patch jobs indicate the resource is outside normal maintenance control. Similarly, an internet-facing load balancer with no owner and no backend targets suggests an orphaned resource that may remain exposed, misconfigured, or billed without accountability. The key signal is not simply that a resource exists, but that it lacks supportability, maintainability, ownership, or a valid operational purpose.

Managed but stopped is not automatically risky when ownership and patch governance still exist.
Approved retention is expected for encrypted snapshots governed by a backup policy.
Supported image workflow reduces lifecycle risk by rebuilding from a maintained base.
Monitored database has operational controls through alerts and maintenance windows.

Question 9

Topic: Operations

A cloud team runs a stateless worker service that processes jobs from a managed message queue. During marketing campaigns, queue depth rises quickly while CPU utilization on existing workers stays below 40% because workers spend time waiting on external APIs. The requirement is to keep job wait time under 2 minutes and avoid adding capacity when the queue is empty. Which scaling configuration should the administrator implement?

Options:

A. Vertically resize the worker instances before campaigns
B. Scale the web tier based on request count
C. Scale workers only when CPU exceeds 70%
D. Scale workers on queue age with scale-in cooldown

Best answer: D

Explanation: Triggered scaling should use the signal that best represents the workload constraint. Here, queue age or backlog is a better trigger than CPU because the workers are I/O-bound and the business requirement is based on job wait time.

Event-driven or metric-triggered scaling should be tied to the bottleneck or service-level target. In this scenario, the worker pool consumes jobs from a queue, and CPU remains low even when demand is high. Scaling on queue age, queue depth, or oldest message age causes additional workers to start when pending work is building and allows scale-in when the queue drains. A cooldown or stabilization period prevents removing workers too quickly during short dips. The key takeaway is to scale the consumer tier using queue-related metrics when queued work, not CPU saturation, drives the service requirement.

CPU threshold fails because low CPU is already observed during backlog growth, so it would not trigger in time.
Manual vertical resize adds capacity on a schedule but does not react to actual queue demand or scale down when idle.
Web tier scaling targets the wrong component because the bottleneck is asynchronous job processing, not incoming web requests.

Question 10

Topic: Deployment

A team is deploying a new version of a customer-facing web application hosted on managed PaaS instances behind a load balancer. The release must avoid downtime, allow a full production-like validation before users are moved, and support rollback in minutes if errors increase. The database changes are backward compatible, and the team can provision a second identical environment with IaC. Which deployment strategy is the BEST fit?

Options:

A. Deploy directly with a rolling update
B. Perform a blue-green deployment
C. Patch the existing production instances in place
D. Use a big-bang deployment during off-hours

Best answer: B

Explanation: Blue-green deployment best fits requirements for minimal production risk and rapid rollback. It avoids in-place changes by deploying to a separate environment, validating it, and then shifting traffic through the load balancer.

The core concept is choosing a deployment strategy that reduces blast radius and avoids changing the live production environment directly. In a blue-green deployment, the current production environment stays active while an identical new environment is built and tested. After validation, traffic is switched to the new environment. If metrics or errors worsen, rollback is usually a traffic-routing change back to the previous environment rather than an emergency rebuild or patch reversal. This works especially well when schema changes are backward compatible and IaC can provision a matching stack. Rolling and in-place methods can reduce downtime, but they still modify active production capacity during the release.

In-place patching increases production risk because the live instances are changed directly.
Big-bang deployment concentrates risk and typically makes rollback more disruptive.
Rolling update can reduce downtime, but it does not provide the same full parallel validation and instant environment-level rollback.

Question 11

Topic: Cloud Architecture

A company is migrating an on-premises order management database to a public cloud. The target will be provisioned with IaC during the migration. Requirements include preserving the existing tabular schema, enforcing ACID transactions for orders and payments, avoiding database OS and patch management, and handling seasonal read growth without a major application rewrite. Which target database option should be specified?

Options:

A. Managed key-value NoSQL database with eventual consistency
B. Self-managed relational database on cloud VMs
C. Managed relational database with read replicas
D. Object storage with a query engine

Best answer: C

Explanation: The requirements point to a managed relational database. It supports structured tables and ACID transactions, removes most database server administration, and can scale read traffic through read replicas without changing the application model significantly.

Database selection should match data structure, consistency, operational ownership, and scaling needs. An order management system with orders and payments usually requires relational tables and ACID transactions so related updates commit reliably together. A managed relational database service also reduces administrative work such as database host patching, backups, and platform maintenance. Read replicas are a common way to absorb seasonal read-heavy traffic while keeping the primary database responsible for writes.

The key takeaway is to avoid trading away consistency or manageability when the workload explicitly requires both.

Eventual consistency fails because order and payment processing requires strong transactional consistency, not only key-value access.
Self-managed VMs preserve relational behavior but fail the requirement to avoid database OS and patch management.
Object storage querying is useful for analytics-style access, but it is not a transactional database for operational order processing.

Question 12

Topic: Operations

A company is migrating a file-sharing VM from a private cloud to a public cloud. The migration runbook shows that the final backup job completed successfully with no errors. Before the team decommissions the source VM, which action best verifies that the workload can be recovered if the migrated VM fails?

Options:

A. Restore the backup to an isolated test VM
B. Check that the backup schedule is enabled
C. Increase the backup retention period
D. Confirm the backup job completion email

Best answer: A

Explanation: Backup success does not prove recoverability. The best validation is to restore the backup in a safe test location and confirm the recovered data and workload are usable before decommissioning the original VM.

The core concept is restore validation. A backup job can report success even if the data is incomplete, corrupted, encrypted with an unavailable key, or unusable for the required recovery process. In a migration, this risk is especially important because decommissioning the source VM removes the easiest fallback path. Restoring the backup to an isolated test VM verifies both backup integrity and the operational recovery steps without impacting production. The key takeaway is that recoverability requires evidence from integrity checks or restore testing, not just a completed backup status.

Schedule status only proves future jobs are configured, not that the current backup can be restored.
Longer retention may help with recovery points, but it does not validate backup integrity.
Completion email repeats the same assumption that job success equals recoverability.

Question 13

Topic: Devops Fundamentals

A cloud operations team uses infrastructure as code to create VPC-style networks, subnets, and VM instances for a hybrid application. After provisioning, the team needs a repeatable way to install required packages, apply approved configuration files, and correct configuration drift across the running servers without relying on manual commands. Which DevOps tool purpose best fits this need?

Options:

A. Container image registry
B. CI/CD orchestration
C. Source control management
D. Configuration management

Best answer: D

Explanation: The requirement is to manage system state after the infrastructure exists. Configuration management, also called configuration as code, is used to install packages, enforce files and settings, and reduce drift across servers.

DevOps tool roles are best matched by the phase and outcome they support. Infrastructure as code creates cloud resources, while configuration management maintains the desired state inside or on top of provisioned compute resources. In this scenario, the key facts are package installation, approved configuration files, and drift correction across running servers. Those are configuration management responsibilities, not syntax-specific tasks.

Source control stores and versions code or configuration. CI/CD orchestration automates build, test, and deployment workflows. A container image registry stores built container images. The takeaway is to identify the tool purpose from the operational need, not from a specific command or vendor syntax.

Source control is useful for versioning files, but it does not by itself apply packages or remediate drift on servers.
CI/CD orchestration can trigger workflows, but the stated need is ongoing server configuration enforcement.
Image registry stores container artifacts, but the workload described is configuration of running VM instances.

Question 14

Topic: Devops Fundamentals

A cloud team’s CI/CD pipeline builds a container image and passes unit tests, but the deployment stage is blocked by this policy error: No approved artifact with vulnerability-scan attestation found for this commit SHA. The team must keep repository controls and security gates in place. Which implementation should the engineer apply?

Options:

A. Deploy the source code directly from the Git repository
B. Rebuild the image during deployment from a developer workstation
C. Disable the deployment gate for commits with passing unit tests
D. Publish a versioned image and scan attestation to the artifact repository

Best answer: D

Explanation: The failure indicates that the pipeline is missing an approved deployable artifact and its security attestation. The correct fix is to add or repair the publish step so the deployment stage pulls a versioned artifact from the controlled repository with the required scan evidence.

In a CI/CD flow, build output should be promoted through controlled stages as an immutable artifact, not recreated or bypassed at deployment time. The policy error states that deployment requires both a matching artifact for the commit SHA and vulnerability-scan attestation. Publishing the versioned image and scan result to the artifact repository preserves traceability, supports repository controls, and gives the deployment stage something approved to consume.

A passing unit test is not a substitute for artifact storage or security evidence. The key takeaway is that deployment gates should be satisfied by producing the required artifact metadata, not by weakening the gate.

Source deployment skips the approved artifact repository and does not satisfy the required scan attestation.
Gate bypass weakens the required control instead of fixing the missing artifact evidence.
Workstation rebuild breaks reproducibility and traceability because the deployable artifact is not produced by the controlled pipeline.

Question 15

Topic: Deployment

A cloud team is preparing a release for a customer-facing API that includes a required security hardening change for compliance. The current plan deploys the new container image to all production instances at once. The risk register requires continuous availability and limits customer impact if the new authentication middleware rejects valid tokens. The compliance owner also wants production validation before full rollout. Which deployment strategy should replace the current plan?

Options:

A. Big-bang deployment
B. Recreate deployment
C. Full blue-green cutover
D. Canary deployment

Best answer: D

Explanation: The release plan is inconsistent with the stated risk requirement because it exposes all users to a security-related authentication change at once. A canary deployment limits blast radius while still allowing real production validation before a full release.

The core issue is deployment strategy selection based on risk and availability requirements. For a compliance-driven security hardening change that could reject valid tokens, the team needs continuous availability and controlled exposure. Canary deployment sends a small percentage of production traffic to the new version, monitors outcomes such as authentication failures, and then expands or rolls back based on results. This directly supports production validation without affecting all customers at once.

Blue-green can provide fast rollback, but a full cutover still shifts all users at one time unless canary-style traffic splitting is added.

Big-bang exposure fails because it releases the risky authentication change to all production users at once.
Recreate downtime fails because stopping and replacing the running version conflicts with continuous availability.
Blue-green cutover is safer than big-bang for rollback, but a full cutover does not meet the limited-exposure validation requirement.

Question 16

Topic: Troubleshooting

A cloud engineer is using IaC to migrate a web tier into a new public cloud region. The same template succeeded in the test region, but the production deployment fails before any instances are created.

Exhibit: Deployment output

Plan: create 12 standard compute instances
Requested size: 8 vCPU each
Region: region-b
Error: QuotaExceeded
Message: Regional standard compute vCPU quota is 64; requested total is 96

What is the most likely cause of the deployment failure?

Options:

A. Invalid IaC template syntax
B. Unsupported rolling deployment strategy
C. Configuration drift in the test region
D. Regional compute service quota exceeded

Best answer: D

Explanation: The failure is caused by a regional service quota, not by the template structure or deployment strategy. The IaC plan requests more standard compute vCPUs than the target region allows, so the provider rejects the deployment before creating instances.

Resource quotas and regional limits are common deployment failure points during migrations. A template can be valid and still fail if the target region has lower limits than another region or account. In this case, the plan requests 12 instances at 8 vCPUs each, which totals 96 vCPUs. The region allows only 64 standard compute vCPUs, so provisioning is blocked by the quota check before instance creation begins.

The practical fix would be to reduce the requested capacity, choose a different allowed size or region, or request a quota increase before retrying the deployment.

Template syntax is unlikely because the plan is generated and the error specifically names a quota limit.
Rolling strategy does not fit because no instances are created before the failure.
Configuration drift in another region does not explain the target region’s explicit vCPU quota error.

Question 17

Topic: Deployment

A company runs a regulated database and stateless batch-processing workers in a private cloud for maximum control. During a quarterly analytics run, the IaC deployment for additional workers fails with No available compute capacity and storage pool reservation exhausted. Demand spikes unpredictably, and purchasing new hosts takes 8 weeks. The database must remain private, but sanitized batch data can be processed off-site. Which next fix best addresses the root cause?

Options:

A. Move all workloads to a larger private cluster
B. Convert the database to a public SaaS platform
C. Increase the private cloud DHCP scope
D. Adopt a hybrid deployment for burst workers

Best answer: D

Explanation: The failure is caused by a private cloud capacity limit, not an addressing or application issue. A hybrid deployment model fits the stated tradeoff: retain private-cloud control for the regulated database while using public-cloud elasticity for burst batch workers.

Private cloud provides strong control and customization, but the organization must manage and fund the physical capacity. When workload demand spikes faster than hardware can be procured, scalability becomes the limiting tradeoff. Because the database must remain private but sanitized batch data can leave the environment, a hybrid model is the best deployment fit: keep sensitive systems in the private cloud and burst stateless processing into public cloud resources when needed. This reduces the need to overbuild private infrastructure for occasional peaks while preserving the required control boundary.

SaaS migration changes the service and control model more than required and does not follow the stated need to keep the database private.
DHCP expansion addresses IP assignment, not exhausted compute and storage reservations.
Larger private cluster can work eventually, but it keeps the same cost and procurement-delay problem for unpredictable spikes.

Question 18

Topic: Security

A company is moving a customer portal to a public cloud. A security review identifies two risks: session data could be intercepted between users and the cloud load balancer, and database backup snapshots could be exposed if underlying storage media or snapshot access is compromised. The application design cannot be changed. Which TWO controls best address these risks?

Options:

A. Deploy a WAF in front of the load balancer
B. Enforce TLS for all client-facing HTTPS connections
C. Enable encryption at rest for database backups and snapshots
D. Hash all database backups before storing them
E. Compress backup snapshots before archiving them
F. Require MFA for all cloud administrator accounts

Correct answers: B and C

Explanation: The two risks map directly to data in transit and data at rest. TLS mitigates interception of user sessions, while encryption at rest protects stored backups and snapshots if storage access is compromised.

Encryption should match where the data is exposed. For traffic moving between users and the cloud endpoint, TLS provides confidentiality and integrity for data in transit. For stored database backups and snapshots, encryption at rest protects the data if storage media, replicated copies, or snapshot access is compromised. These controls do not require redesigning the application and directly address the stated risks.

Controls like WAFs and MFA are valuable, but they address web attacks and administrative access rather than encrypting the specific data states described in the scenario.

Hashing backups fails because hashing is one-way and does not preserve recoverable backup data.
WAF protection helps filter malicious web requests but does not encrypt user sessions or stored snapshots.
MFA for admins improves account security but does not directly protect intercepted traffic or exposed storage data.
Compression reduces storage size but does not provide confidentiality.

Question 19

Topic: Operations

A payment application uses daily full backups copied to another region. During a regional outage test, the team restored the last backup into a recovery VPC. The restore took 3 hours, and 20 hours of transactions were missing. The business requirement is RTO <= 1 hour and RPO <= 15 minutes. Which next fix best addresses the failed recovery test?

Options:

A. Increase backup retention from 30 days to 1 year
B. Restore to a larger compute instance type
C. Use a warm site with continuous replication and PITR
D. Move the backup copies to an archive storage tier

Best answer: C

Explanation: The test missed both the RTO and RPO: recovery took too long, and too much data was lost. A warm site with continuous replication and point-in-time recovery is the best fit because it keeps recoverable data current and reduces startup time during failover.

RTO is the maximum acceptable time to restore service, while RPO is the maximum acceptable data loss. Daily full backups cannot meet a 15-minute RPO, and a 3-hour rebuild cannot meet a 1-hour RTO. A warm recovery site keeps core resources partially running or ready, and continuous replication with point-in-time recovery provides recent recovery points. This approach directly addresses the observed symptoms: long restore time and missing transactions. Retention, archive tiering, or larger compute alone do not solve both SLA failures.

Archive storage usually lowers storage cost but can increase retrieval time, which works against the 1-hour RTO.
Longer retention preserves older restore points but does not reduce the 20-hour data loss from daily backups.
Larger compute may speed some processing, but it does not create 15-minute recovery points or a ready recovery environment.

Question 20

Topic: Troubleshooting

A company uses a site-to-site IPsec VPN to connect an on-premises data center to a private IaaS VPC for a migrated order application. After replacing the on-premises edge firewall, users cannot reach the application. The cloud route table still points the on-premises CIDR to the VPN gateway, and security groups allow the required ports. The VPN logs show NO_PROPOSAL_CHOSEN during tunnel negotiation. Which root cause BEST fits these symptoms?

Options:

A. Private subnet DNS resolution failure
B. Missing route to the on-premises CIDR
C. IPsec proposal mismatch on the new firewall
D. Application load balancer health-check failure

Best answer: C

Explanation: The strongest signal is the NO_PROPOSAL_CHOSEN VPN negotiation error. Because routing and security rules are already stated as correct, the best root cause is a protocol or cryptographic proposal incompatibility introduced by the replacement firewall.

In IPsec VPN troubleshooting, negotiation errors often identify the failing layer before traffic routing is even relevant. NO_PROPOSAL_CHOSEN means the peers could not agree on acceptable VPN parameters, such as IKE version, encryption algorithm, integrity algorithm, Diffie-Hellman group, or PFS settings. A new edge firewall may default to deprecated or incompatible settings, such as older IKE or weak cipher suites, while the cloud VPN gateway requires a stronger supported proposal. Since the route table and security groups are already valid, the issue is not basic packet forwarding or access control. The key takeaway is to match the on-premises VPN device proposal to the cloud gateway’s supported IPsec parameters.

Routing issue is unlikely because the stem states the route to the on-premises CIDR still targets the VPN gateway.
Load balancer issue would affect application health after network connectivity exists, not VPN tunnel negotiation.
DNS issue would affect name resolution, but the log points to IPsec negotiation failure before application access.

Question 21

Topic: Cloud Architecture

A company is migrating a legacy license-management application to the cloud. The application requires a vendor-supported Linux distribution, a custom kernel module, fixed private IP addressing, host-based firewall rules, and control over the VM patch schedule. The company wants the provider to manage the physical data center and hypervisor. Which cloud service model best fits these requirements?

Options:

A. SaaS
B. PaaS
C. FaaS
D. IaaS

Best answer: D

Explanation: IaaS is the best fit when the customer must manage the guest operating system, virtual machine settings, and network configuration. The stem requires OS-level changes and fixed private addressing, while still offloading physical infrastructure and hypervisor management to the provider.

The core concept is the cloud shared responsibility split by service model. In IaaS, the provider manages the facilities, hardware, storage platform, and virtualization layer, while the customer manages the guest OS, patches, installed agents or modules, VM sizing, and much of the virtual network configuration. Those responsibilities match the legacy application’s need for a specific Linux distribution, a custom kernel module, fixed private IPs, and host firewall rules. Higher-level models abstract away too much control for this workload.

The key takeaway is that needing OS, VM, or detailed virtual networking control is a strong signal for IaaS.

PaaS abstraction fails because the platform typically controls the runtime and OS details, limiting kernel-module and patch-schedule control.
SaaS consumption fails because the customer uses a finished application rather than managing VM-level configuration.
FaaS event runtime fails because functions do not provide persistent VM, guest OS, or fixed host-network control.

Question 22

Topic: Security

A team is deploying a containerized batch processor to a cloud container platform. The container must read job files from a mounted volume and write results to a separate mounted path. Security requirements state least privilege, no host-level administrative control, and protection against modification of input files if the container is compromised. Which configuration BEST satisfies these requirements?

Options:

A. Run privileged as root and use application logic to prevent input file changes.
B. Run unprivileged as a non-root UID, mount input read-only, and restrict writes to the results path.
C. Run unprivileged but mount both volumes read-write with 777 permissions.
D. Run as root without privileged mode and rely on image scanning before deployment.

Best answer: B

Explanation: Container least privilege means the workload should run without privileged mode and without unnecessary root access. File mounts should grant only the access required, so read-only input and restricted write access to the results path best match the scenario.

Container security uses multiple layers: runtime privilege, user identity, and filesystem permissions. Privileged containers can access host-level capabilities that are unnecessary for a batch processor and increase blast radius. Running as a non-root UID reduces what a compromised process can do inside the container. Mounting job input as read-only protects source data from modification, while allowing writes only to the results path supports the application requirement without granting broad filesystem access. The key takeaway is to combine unprivileged execution with path-specific file permissions.

Privileged root fails because application logic does not replace runtime isolation or host capability restrictions.
Broad permissions fail because 777 read-write mounts allow unnecessary modification of both volumes.
Root-only hardening fails because image scanning does not enforce runtime user or file access controls.

Question 23

Topic: Cloud Architecture

A company is migrating an order-processing application to a public cloud. The application must keep using a standard relational database, and the cloud team wants to reduce responsibility for database engine patching, routine backups, and failover operations while still controlling schemas and queries. Which implementation best meets these requirements?

Options:

A. Run the database in self-managed containers
B. Use a managed relational database service
C. Install the database on IaaS virtual machines
D. Replace the application with a SaaS order platform

Best answer: B

Explanation: A managed relational database service is the best fit because it supports the existing relational workload while reducing operational responsibility. The provider handles many platform tasks, such as database patching, backup automation, and high-availability failover, while the customer still manages data, schema, access, and application queries.

Managed cloud services reduce customer operational responsibility by moving supported platform operations to the provider under the shared responsibility model. For a relational database workload, a managed database service commonly handles the underlying infrastructure, database engine maintenance, automated backups, replication options, and failover mechanisms. The customer still owns application design, data governance, IAM configuration, schema changes, query performance decisions, and business continuity requirements. This is different from running the same database on virtual machines or self-managed containers, where the customer must administer the OS, database software, backup jobs, and HA tooling. The key takeaway is to choose the managed service that matches the workload type without giving up required application control.

IaaS database hosting keeps the most administrative control but leaves OS, database patching, backups, and failover largely customer-managed.
Self-managed containers improve packaging and portability but do not remove database administration responsibility.
SaaS replacement may reduce operations further, but it does not preserve the requirement to keep the existing application’s schema and queries.

Question 24

Topic: Deployment

A company is preparing a lift-and-shift migration of an order-processing VM to public cloud IaaS. The application cannot be modified before cutover, must keep using an on-premises license server through a site-to-site VPN, and has a four-hour outage window. During discovery, which finding is the BEST indicator of a migration risk that must be remediated before cutover?

Options:

A. The database backup runs nightly after midnight.
B. The license server subnet is missing from VPN routes.
C. The source VM averages 40% CPU utilization.
D. Application logs are collected by an agent.

Best answer: B

Explanation: The key migration risk is the missing network dependency. Because the application must continue using an on-premises license server through the VPN, the cloud VM needs a valid network path to that subnet before cutover.

Cloud migration discovery should identify dependencies that could break after workload placement changes. In this scenario, the application cannot be modified and depends on an on-premises license server, so VPN routing to that subnet is a hard requirement. If the subnet is not in the route tables or allowed by firewall/security rules, the migrated VM may boot successfully but fail application licensing or startup checks. This is a deployment migration risk, not just an operations preference. CPU averages, logging agents, and backup timing may be reviewed, but they do not directly show a required dependency will be unreachable during cutover.

CPU average is not enough by itself to show a resource allocation error or migration blocker.
Logging agent use is common and does not indicate incompatibility unless the target platform cannot support it.
Backup timing may affect scheduling, but the stated nightly run does not conflict with the four-hour cutover window.

Question 25

Topic: Devops Fundamentals

A cloud engineering team must create repeatable dev, test, and production environments for a new IaaS workload. The environments include VPC/VNet-style networks, subnets, route tables, security groups, compute instances, and object storage. The team wants declarative, version-controlled definitions and the ability to preview infrastructure changes before provisioning. Which tool is the BEST fit?

Options:

A. Kubernetes
B. Jenkins
C. Ansible
D. Terraform

Best answer: D

Explanation: Terraform best matches the scenario because the primary need is infrastructure as code provisioning. It uses declarative configuration to define cloud resources and supports repeatable deployments with change previews before applying infrastructure updates.

The core concept is choosing a DevOps tool by purpose. Terraform is primarily used for infrastructure as code, especially provisioning cloud resources such as networks, subnets, route tables, security groups, compute, and storage. Its declarative model lets teams store desired infrastructure state in source control and review planned changes before applying them. That fits the requirement to build consistent dev, test, and production environments across cloud infrastructure.

Jenkins can run pipelines that call Terraform, and Ansible can configure systems after they exist, but neither is the best fit for declarative cloud infrastructure provisioning in this scenario.

Pipeline orchestration is useful for automation, but Jenkins is not the primary tool for defining cloud infrastructure resources.
Container orchestration applies to running containerized workloads, not provisioning VPCs, subnets, and IaaS resources.
Configuration management can configure servers, but Ansible is less directly aligned than Terraform for declarative infrastructure provisioning with change plans.

Questions 26-50

Question 26

Topic: Devops Fundamentals

A cloud engineering team supports several containerized microservices across two regions. The team already has basic CPU, memory, and latency dashboards, but troubleshooting incidents is slow because application logs are stored separately on each node. The team needs centralized log ingestion, indexing, full-text search, and correlation across services. Which toolset best fits this need?

Options:

A. Jenkins
B. Grafana
C. Terraform
D. ELK stack

Best answer: D

Explanation: The requirement is primarily centralized logging and searchable log analysis, not just dashboard visualization or automation. The ELK stack is commonly used to ingest, index, search, and analyze logs from distributed cloud workloads.

ELK fits the scenario because the decisive need is log observability: collecting logs from many services, indexing them, and searching them during incidents. Grafana is often used for dashboards and visualization, especially for metrics from time-series data sources, but it is not the best match when the core requirement is full-text log ingestion and indexing. Terraform and Jenkins serve different DevOps purposes: infrastructure provisioning and CI/CD automation.

The key distinction is whether the team needs log search and analysis or metrics visualization. Here, the missing capability is centralized, searchable logs.

Metrics dashboarding is tempting because Grafana is strong for visualization, but the stem emphasizes log ingestion and full-text search.
Infrastructure provisioning does not fit because Terraform manages infrastructure as code rather than observability data.
Pipeline automation does not fit because Jenkins focuses on build and deployment workflows, not log indexing.

Question 27

Topic: Cloud Architecture

A company is moving an internal billing application from its data center to a cloud VPC. The application must continue using private IP addresses, and traffic between the data center and cloud must use a controlled, dedicated path that does not traverse the public internet. Which implementation best meets these requirements?

Options:

A. Assign public IPs and restrict access with security groups
B. Deploy a NAT gateway for outbound application traffic
C. Create a site-to-site VPN over the internet
D. Provision a dedicated private connection to the VPC

Best answer: D

Explanation: The requirement is private, controlled, dedicated cloud network access. A dedicated private connection supports private IP routing between the data center and VPC without sending traffic across the public internet.

Cloud network designs should match the required connectivity model. When a workload requires private IP communication and a dedicated path that avoids the public internet, the appropriate architecture is a private dedicated connection into the cloud network, typically attached to a private gateway or transit routing construct. This supports predictable routing and keeps application traffic off public internet paths.

Encrypted VPNs can provide private tunnels, but many site-to-site VPNs still run over public internet connectivity. Public IPs, internet gateways, and NAT gateways are internet-facing patterns and do not satisfy a requirement for dedicated private access.

Public IP restriction still exposes the design to public connectivity, even if security groups limit sources.
NAT gateway is mainly for outbound internet access from private subnets, not dedicated private data center connectivity.
Internet VPN encrypts traffic but still uses the public internet, violating the stated path constraint.

Question 28

Topic: Cloud Architecture

A cloud engineer is deploying a containerized image-processing worker. Each container downloads source files, creates temporary thumbnails during processing, uploads the final output to object storage, and can be safely restarted at any time. The temporary thumbnail files must be removed when the container is replaced. Which storage configuration should the engineer use for the thumbnail directory?

Options:

A. Replicated block volume
B. Shared file storage mount
C. Persistent volume claim
D. Ephemeral container-local storage

Best answer: D

Explanation: Ephemeral storage is appropriate for scratch data that does not need to survive container restarts or replacement. In this scenario, the final output is saved elsewhere, and the thumbnail files should be removed with the container lifecycle.

The core concept is matching container storage to data persistence requirements. Container-local ephemeral storage is designed for temporary working files, caches, and scratch space that can be regenerated or discarded. Because the worker uploads the durable result to object storage and the thumbnail directory should be cleared when the container is replaced, no persistent volume is needed. Persistent or shared storage would add unnecessary durability and could retain files that the requirement says should be removed.

Persistent volume is for data that must survive container recreation, which conflicts with removing thumbnails on replacement.
Shared file storage is useful for multi-container access but unnecessarily preserves temporary scratch files.
Replicated block storage improves durability for stateful workloads, not disposable container work directories.

Question 29

Topic: Security

A cloud operations team manages a fleet of IaaS Linux instances that must follow a published CIS benchmark. After routine OS updates, several instances have service and SSH settings that no longer match the approved baseline. Which action best manages this operation while using the benchmark to guide secure configuration?

Options:

A. Scale the instances vertically after each patch window
B. Increase log retention for all system and application logs
C. Continuously check benchmark compliance and alert on configuration drift
D. Replicate the instances to another availability zone

Best answer: C

Explanation: The key issue is configuration drift from a required secure baseline. Using a CIS benchmark operationally means measuring actual settings against the benchmark and alerting when systems deviate so they can be remediated.

CIS and vendor-specific benchmarks provide hardened configuration guidance for operating systems, services, containers, and cloud resources. In this scenario, patching changed settings that were expected to remain aligned with the approved baseline. The best operational control is continuous compliance or configuration monitoring that compares current state to the benchmark and alerts on drift. This supports secure configuration management without relying on manual checks after every update.

The key takeaway is that benchmarks are most useful when they are tied to repeatable monitoring and remediation, not treated as one-time hardening documents.

Longer retention helps investigations but does not verify whether settings still match the benchmark.
Vertical scaling changes capacity, not security configuration alignment.
Replication improves availability but can copy the same misconfiguration to another location.

Question 30

Topic: Operations

A cloud operations team labeled an application platform update as a minor maintenance patch and deployed it directly to production during a short window. The release notes stated that the update removes a legacy authentication header and changes the API response schema. After deployment, dependent services show 401 Unauthorized errors and JSON parsing failures. What is the best next fix?

Options:

A. Increase the API rate limit for the dependent services
B. Continue the rollout as a minor update using smaller batches
C. Flush DNS caches and restart the application load balancer
D. Reclassify it as a major update and roll back pending staging tests

Best answer: D

Explanation: The symptoms match an update that changed service contracts, not a simple resource or network issue. Removing an authentication header and changing a response schema are high-risk changes, so the update should be handled as major and validated in staging before production rollout.

Major updates are changes with higher compatibility, security, or availability risk and usually require broader testing, rollback planning, and stakeholder coordination. In this case, the release notes identify breaking changes: an authentication behavior changed and the API response format changed. The resulting 401 Unauthorized and JSON parsing errors are consistent with dependent services no longer matching the updated interface. A minor update would usually involve low-risk fixes that do not change external behavior or integrations. The safest operational response is to roll back or halt the production rollout, reclassify the change as major, and run integration/regression testing in a staging environment.

Rate limit increase does not address authentication failures or response schema incompatibility.
DNS or load balancer restart targets name resolution or traffic handling, not changed API behavior.
Smaller rollout batches reduce blast radius but still treats a breaking change as low risk.

Question 31

Topic: Cloud Architecture

A cloud engineer is reviewing an incident for a customer portal. The relational database tier runs on IaaS VMs, and a minor engine patch caused one replica to fail to rejoin the cluster. Backups also required manual verification after the maintenance window. The next release must keep relational database features while reducing OS patching, backup scheduling, and HA maintenance. What is the best next fix?

Options:

A. Move the database into containers on the same VMs
B. Migrate to a provider-managed relational database service
C. Add more self-managed read replicas
D. Increase the VM size for each database node

Best answer: B

Explanation: The recurring problem is operational overhead from running a relational database on IaaS VMs. A provider-managed relational database service is the best fit because the provider handles much of the patching, backup automation, and availability management.

Provider-managed database deployments are appropriate when requirements emphasize reducing administrative effort while keeping database functionality. In this scenario, the team is spending time on VM-level and database-cluster operations: patch coordination, replica recovery, backup scheduling, and HA maintenance. A managed relational database service shifts much of that undifferentiated operational work to the cloud provider while still supporting relational data models and SQL-style application needs.

Scaling the VMs or adding replicas may improve capacity or read performance, but it does not remove the team’s responsibility for operating the database platform. Containerizing the database can add portability, but it usually increases operational complexity unless paired with a managed database platform.

Bigger VMs may help performance, but they do not reduce patching, backup, or HA administration.
Database containers do not automatically solve operational ownership and can complicate persistent data management.
More replicas can improve availability or read scaling, but they still require self-managed cluster operations.

Question 32

Topic: Security

A company runs a containerized customer portal behind a load balancer. The latest authenticated vulnerability scan reports a critical CVE in the container base image currently deployed in production, and the asset owner has approved a change window. Which next action is remediation rather than additional vulnerability assessment?

Options:

A. Run another scan to confirm the CVE appears consistently
B. Expand scanning to all nonproduction container images
C. Rebuild the image with a patched base image and redeploy it
D. Calculate the CVSS score and document business impact

Best answer: C

Explanation: Remediation means taking action to correct or reduce a confirmed vulnerability. In this case, the vulnerability assessment has already identified a critical CVE in the deployed container base image, so the next remediation step is to update the image and redeploy it during the approved window.

Vulnerability assessment identifies, validates, prioritizes, and reports weaknesses. Remediation changes the environment to fix or reduce the risk, such as patching software, changing configuration, replacing a vulnerable image, or applying an approved compensating control. Because the scan is already authenticated, the affected production image is known, and a change window is approved, rebuilding from a patched base image and redeploying directly addresses the vulnerable component. Additional scanning, scoring, or inventory expansion may support vulnerability management, but those activities do not fix the current production exposure.

Repeat scanning may help validate findings, but it does not correct the vulnerable production image.
Risk scoring supports prioritization and reporting, but it is still assessment activity.
Broader inventory may find similar issues elsewhere, but it does not remediate the confirmed production CVE.

Question 33

Topic: Deployment

A cloud team is reviewing a release plan for a revenue-critical web application running behind a load balancer in two availability zones. The business requires no user-visible downtime and the ability to roll back within minutes if error rates increase. The current plan replaces all running instances in place during a single maintenance window. Which change best aligns the release strategy with the stated requirement?

Options:

A. Use a cold standby environment for rollback
B. Use blue-green deployment with a traffic switch
C. Use an in-place rolling deployment without parallel capacity
D. Use a big-bang deployment after business hours

Best answer: B

Explanation: The current in-place replacement strategy conflicts with the requirement for no visible downtime and rollback within minutes. A blue-green deployment provides a parallel production-ready environment and allows traffic to move back quickly if the new release fails health checks or raises error rates.

Blue-green deployment is the best fit when a release must minimize downtime and support rapid rollback. The team deploys the new version into a separate but equivalent environment, validates it, and then shifts load balancer traffic from the current environment to the new one. If metrics degrade, traffic can be switched back to the known-good environment quickly. This directly addresses the risk in the release plan: replacing all instances in place creates a larger outage and rollback risk because the old version is not kept running as a ready target. Rolling or big-bang approaches can work for some workloads, but they do not match this strict availability and rollback requirement as well as blue-green.

Big-bang release still concentrates risk and may cause downtime even if scheduled after hours.
Rolling in place can reduce disruption, but without parallel capacity it does not provide the same immediate rollback path.
Cold standby is not suitable for rollback within minutes because it is not already serving-ready.

Question 34

Topic: Deployment

A group of regional hospitals wants to deploy a shared claims-processing platform. The platform must be accessible only to the participating hospitals and approved auditors, use a common compliance baseline for healthcare data, and be governed and funded jointly by the member organizations. The hospitals do not need to integrate existing on-premises workloads during the first phase. Which cloud deployment model is the BEST fit?

Options:

A. Hybrid cloud
B. Public cloud
C. Community cloud
D. Private cloud

Best answer: C

Explanation: A community cloud best fits organizations that share similar compliance, governance, and access requirements. In this scenario, multiple hospitals need a jointly governed environment restricted to participating members and auditors, which matches the community cloud model.

A community cloud is designed for a defined group of organizations with shared concerns, such as regulatory requirements, security controls, or industry governance. It can be owned, managed, or hosted by one or more member organizations or a third party, but access is limited to the community. The key signals are shared funding, joint governance, restricted membership, and a common healthcare compliance baseline. A private cloud would usually serve one organization, while a public cloud is broadly available to many unrelated tenants. Hybrid cloud requires integration between different deployment models, such as private and public environments, which the first phase does not require.

Public access model does not fit because the platform is limited to participating hospitals and approved auditors.
Single-organization control is too narrow because governance and funding are shared across multiple hospitals.
Hybrid integration is not required because the scenario states no on-premises workload integration is needed in the first phase.

Question 35

Topic: Security

A cloud team receives a secret-scanning alert that an access key for a CI service account was committed to a public repository. Within 10 minutes, audit logs show successful API calls using that key from an unfamiliar external IP to list object storage and create snapshots. Which response action should be taken FIRST?

Options:

A. Invalidate and rotate the exposed access key
B. Delete the snapshots created during the alert window
C. Block the source IP address in the WAF
D. Reset the engineer’s SSO password

Best answer: A

Explanation: The alert and audit logs show that a leaked service account key is being actively used. The first containment step is to invalidate the exposed credential and rotate it so the attacker can no longer authenticate with that secret.

Suspicious activity response should prioritize containment based on the evidence. Here, the compromised asset is not a user browser session or web request path; it is an API access key tied to a CI service account. Because successful API calls are already occurring, revoking or disabling the exposed key and issuing a replacement through the approved secrets process stops additional authenticated activity. After containment, the team can review audit logs, identify affected resources, preserve evidence, and clean up unauthorized changes. The key takeaway is to respond to the proven attack path first: compromised credentials require credential invalidation, not only network blocking or account password resets.

SSO reset misses the evidence because the activity used a service account access key, not an interactive user login.
WAF blocking is insufficient because API credential use may not traverse the protected web application path and can resume from another IP.
Snapshot deletion may remove evidence and should wait until after containment and investigation confirm the impact.

Question 36

Topic: Security

A team runs a containerized web application on IaaS VMs. After a new feature release, the WAF logs show user-supplied URLs causing the application to request the cloud instance metadata endpoint on a link-local address. Minutes later, temporary instance-role credentials are used from an unfamiliar external IP to list object storage buckets. CPU, disk, and inbound traffic remain near baseline. Which attack type BEST matches these indicators?

Options:

A. Metadata attack
B. Cryptojacking
C. Zombie instance
D. DDoS attack

Best answer: A

Explanation: The indicators point to a metadata attack. The key evidence is application-driven access to the instance metadata service and later use of temporary instance-role credentials from an unfamiliar location.

A metadata attack targets the cloud instance metadata service to steal temporary credentials or configuration details, often through SSRF or unsafe URL-fetching features. In this scenario, the web application is tricked into calling the link-local metadata endpoint, and the stolen role credentials are then used externally to enumerate object storage. Normal CPU, disk, and inbound traffic make resource-abuse or volumetric attack explanations less likely.

The key takeaway is to correlate metadata endpoint access with unexpected credential use, especially after changes that process user-supplied URLs.

Cryptojacking usually shows abnormal CPU or GPU usage from unauthorized mining, which is not present here.
DDoS attack would typically involve traffic spikes, saturation, or availability impact, not credential use after metadata access.
Zombie instance implies a compromised instance used for botnet activity or outbound attacks, but the decisive evidence is metadata credential theft.

Question 37

Topic: Troubleshooting

A CI/CD pipeline deploys a containerized application using a dedicated deployment service account. The build succeeds, but the release fails during startup with this log:

ERROR: 403 Forbidden
Action: read secret
Resource: /prod/payments/db-password
Identity: svc-deploy-payments

Which action should the cloud engineer take to resolve the failure while preserving least privilege?

Options:

A. Store the database password in the pipeline variables
B. Disable secret scanning for the release stage
C. Assign the service account full administrator rights
D. Grant secret read access to the deployment service account

Best answer: D

Explanation: The deployment is failing because the runtime identity receives a 403 Forbidden when reading a required secret. The best fix is to grant only the needed secret read permission to the deployment service account.

This is an IAM-related deployment failure. A 403 Forbidden response means the request reached the secret service, but the authenticated identity is not authorized for the requested action. Since the log names the identity, action, and resource, the targeted remediation is to update the access policy or role assignment so svc-deploy-payments can read /prod/payments/db-password. This preserves least privilege because it grants only the specific permission required for the deployment to start successfully.

Broad administrator access would likely resolve the symptom but creates excessive risk and violates least privilege. Moving secrets into pipeline variables or disabling scanning does not fix the missing authorization and can weaken secrets management.

Overprivileged role fails because full administrator rights exceed the specific secret-read permission needed.
Pipeline variable storage fails because it bypasses centralized secrets management and does not correct the IAM denial.
Secret scanning change fails because scanning is not blocking access; the log shows an authorization failure.

Question 38

Topic: Operations

A cloud operations team supports a checkout application built from several microservices behind an API gateway. Users report intermittent 6-second checkout delays, but CPU, memory, and network metrics are normal for all services. Centralized logs show successful requests, but the team cannot determine which downstream service adds the delay for each request. What is the best next troubleshooting action?

Options:

A. Enable distributed tracing with correlation IDs
B. Rotate the API gateway TLS certificate
C. Increase CPU on all service instances
D. Create a higher-severity uptime alert

Best answer: A

Explanation: This scenario requires visibility into request flow across distributed services. Distributed tracing links spans for a single transaction across the API gateway and microservices, making it possible to locate the slow dependency or service hop.

Distributed tracing is the observability tool used when a request crosses multiple services and the team needs to see the end-to-end path, timing, and dependency calls. Metrics can show whether resources are healthy, and logs can show events, but neither necessarily reconstructs one user request across many services. Traces use correlation IDs and spans to show where time is spent, such as the API gateway, payment service, inventory service, or database call.

The key takeaway is to use tracing when the troubleshooting question is about following a transaction through distributed components, especially when basic metrics and logs do not isolate the delay.

Resource scaling is not supported because the stem states CPU, memory, and network metrics are normal.
More alerting may notify the team faster, but it does not reveal which service hop is slow.
Certificate rotation addresses trust or expiration issues, not intermittent application latency with successful requests.

Question 39

Topic: Security

A company is deploying a PaaS-based customer portal that must let an external billing application read invoice status through an API. The billing application must not store user passwords, access must be limited to invoice-read actions, and users must be able to revoke the integration without changing their credentials. Which authorization model BEST meets these requirements?

Options:

A. OAuth 2.0 with scoped access tokens
B. Role-based access control for portal administrators
C. Group-based access control for billing users
D. Discretionary access control by invoice owners

Best answer: A

Explanation: OAuth 2.0 is the best fit for delegated API access between applications. It allows the external billing application to receive limited, revocable authorization for invoice-read actions without handling user credentials.

OAuth 2.0 is designed for delegated authorization, especially when one application needs limited access to resources exposed by another application or API. In this scenario, the key requirements are avoiding password sharing, limiting access to a specific action, and allowing revocation of the integration. OAuth 2.0 meets those needs through access tokens and scopes, such as an invoice-read scope. RBAC and group-based access are useful for assigning permissions to users or workforce identities inside an organization, but they do not directly solve delegated third-party API access. Discretionary access control lets resource owners grant permissions, but it is not the standard cloud API model for scoped, token-based delegation.

RBAC mismatch fails because administrator roles do not provide delegated third-party API authorization.
Group membership can simplify internal user access but does not prevent credential sharing for an external integration.
Discretionary access depends on owners granting permissions and lacks OAuth 2.0’s scoped token delegation model.

Question 40

Topic: Deployment

A cloud engineer is provisioning resources for a new three-tier web application using IaC. Requirements state that web traffic must be public, application servers must not have public IP addresses, database access must be limited to the application tier, and administrative access must use a controlled entry point. Which provisioning plan BEST satisfies these requirements?

Options:

A. Public load balancer, private app subnet, private database subnet, bastion host, least-privilege security groups
B. Private load balancer, private app subnet, public database endpoint, VPN optional
C. Public load balancer, public app subnet, private database subnet, open database security group
D. Public app subnet, private database subnet, direct SSH from administrator IPs

Best answer: A

Explanation: The best fit is to provision a public load balancer as the internet-facing tier and keep application and database resources in private subnets. Least-privilege security groups and a bastion host support the stated segmentation and controlled administration requirements.

Provisioning from security requirements means translating access rules into network placement and resource controls. A public load balancer can receive internet traffic without giving public IP addresses to application servers. The application tier should be in private subnets, and the database tier should also be private with inbound access allowed only from the application tier’s security group. A bastion host or similar controlled entry point centralizes administrative access instead of exposing each server directly.

The key takeaway is to expose only the required service endpoint and use private placement plus scoped security rules for everything else.

Direct SSH exposure does not meet the controlled entry point requirement because administrators would connect directly to application instances.
Public app servers violate the requirement that application servers must not have public IP addresses.
Open database access fails the requirement to limit database access to the application tier.
Public database endpoint exposes the wrong tier and does not satisfy the application-controlled database access requirement.

Question 41

Topic: Security

A company runs several IaaS workloads in private subnets. A new compliance requirement says the security team must detect port scans, command-and-control callbacks, and attempted lateral movement between workloads, with the option to block known malicious traffic patterns automatically. Which cloud security control best meets this requirement?

Options:

A. DLP
B. Security groups
C. IDS/IPS
D. WAF

Best answer: C

Explanation: IDS/IPS is the best fit when the requirement is to detect or prevent suspicious network activity. It can identify patterns such as scans, callbacks, and lateral movement, and an IPS can actively block matching malicious traffic.

The core concept is intrusion detection and prevention. An IDS monitors traffic or workload activity and alerts on suspicious patterns, while an IPS is placed inline or otherwise configured to prevent matching activity. The stem specifically calls out port scans, command-and-control callbacks, and lateral movement between workloads, which are classic IDS/IPS use cases. This is different from access control, data loss prevention, or web request filtering because the requirement is behavioral detection and possible prevention of malicious network patterns.

The key takeaway is to choose IDS/IPS when the scenario requires suspicious activity monitoring and automated blocking based on intrusion signatures or behavior.

DLP scope focuses on preventing sensitive data exposure, not detecting lateral movement or command-and-control patterns.
WAF scope protects web applications from HTTP/HTTPS attacks, but the stem includes broader workload-to-workload network activity.
Security groups allow or deny traffic by rule, but they do not inspect traffic for malicious behavior or intrusion patterns.

Question 42

Topic: Troubleshooting

A cloud administrator is investigating failed and suspicious API authentication events for a deployment pipeline. The audit log shows successful API calls from an unfamiliar IP using the pipeline service account, followed by denied calls for resources outside the normal deployment scope. A developer confirms the pipeline token was accidentally committed to a public repository. What should the administrator do FIRST?

Options:

A. Increase the service account password complexity
B. Revoke and rotate the pipeline token
C. Patch the application deployment images
D. Add the unfamiliar IP to a network deny list

Best answer: B

Explanation: This is a leaked credential incident involving a service account token. The first priority is to revoke the exposed token and issue a new secret through a controlled process so the attacker cannot continue authenticating.

When a cloud API token or secret is exposed, any authentication using that credential may be valid from the provider’s perspective, even if the user is unauthorized. The immediate containment step is to revoke or disable the exposed credential and rotate it. After containment, the team should update the pipeline with the new secret, review audit logs, reduce permissions if needed, and remove the secret from the repository history where possible. Blocking one IP is incomplete because the credential can be used from another location. Password policy changes do not protect a leaked token, and patching images does not address credential misuse.

Password complexity does not help if the compromised authenticator is an API token rather than an interactive password.
IP blocking may reduce one source of activity but leaves the leaked credential usable elsewhere.
Image patching is unrelated to the authentication evidence and does not stop valid token-based access.

Question 43

Topic: Cloud Architecture

A company is moving a latency-sensitive payment application to a public cloud VPC while keeping its database in a private data center. The link must provide predictable latency, high availability, and private connectivity for continuous transaction traffic. Management accepts higher recurring costs if they reduce internet-path variability. Which connectivity design best meets these requirements?

Options:

A. Use redundant dedicated cloud connections
B. Use a single site-to-site IPsec VPN
C. Expose the database through public HTTPS endpoints
D. Use client VPN access for the application servers

Best answer: A

Explanation: A dedicated cloud connection is the best fit when predictable latency and reliability are more important than minimizing cost. Redundant circuits also reduce the risk of a single link failure for continuous hybrid application traffic.

The core concept is selecting hybrid cloud connectivity based on tradeoffs. Site-to-site VPNs are encrypted and relatively inexpensive, but they typically traverse the public internet, so latency and path availability can vary. A dedicated cloud connection uses a private provider or carrier path into the cloud, which better supports consistent latency, higher throughput, and stronger availability when deployed redundantly. If encryption is required by policy, it can be layered onto the dedicated connection, but the architectural choice is still driven by the need for predictable performance and reliability.

The key takeaway is that VPN is usually the cost-effective secure option, while dedicated connectivity is preferred for critical, high-volume, latency-sensitive hybrid workloads.

Single VPN is secure and low cost, but it depends on internet path quality and lacks circuit redundancy.
Client VPN is intended for remote user access, not stable site-to-cloud application connectivity.
Public HTTPS exposure can encrypt traffic but increases exposure and does not provide private, predictable hybrid connectivity.

Question 44

Topic: Operations

A cloud engineer is investigating intermittent 503 responses and slow checkout requests in a three-tier application. Monitoring for the last 15 minutes shows:

Web CPU: 32% average, memory: 58% average
Load balancer targets: healthy
App request queue: rising from 20 to 1,100
Database active connections: pinned at the configured maximum
Database connection timeout errors: increasing

Which condition is the MOST likely root cause?

Options:

A. Load balancer health checks are failing
B. Database connection capacity is exhausted
C. Network throughput is saturated
D. Web tier compute capacity is exhausted

Best answer: B

Explanation: The monitoring metrics point to a capacity bottleneck at the database connection layer. Web CPU and memory are not constrained, and the load balancer reports healthy targets, but database connections are at their maximum while timeout errors increase.

Observability metrics should be correlated across tiers to identify where requests are slowing or failing. In this case, the app request queue is growing while database active connections remain pinned at the configured maximum and connection timeout errors increase. That combination indicates the application is waiting for database connections, causing checkout requests to queue and eventually return 503 errors. The key takeaway is to identify the saturated dependency, not just the tier where users see the symptom.

Web scaling trap fails because CPU and memory on the web tier are well below saturation.
Health check trap fails because the load balancer targets are reported as healthy.
Network saturation trap is unsupported because no throughput, packet loss, or latency metric indicates a network bottleneck.

Question 45

Topic: Security

A company hosts a public customer portal on IaaS instances behind an internet-facing load balancer. Threat intelligence indicates an upcoming campaign may generate large TCP/UDP floods intended to exhaust bandwidth and make the portal unavailable. The company must keep the service reachable, filter attack traffic before it reaches the VPC, and avoid application code changes. Which cloud security control is the BEST fit?

Options:

A. Deploy host-based endpoint protection agents
B. Add managed WAF rules to the portal
C. Enable cloud DDoS protection with traffic scrubbing
D. Tighten instance security group ingress rules

Best answer: C

Explanation: The scenario describes a volumetric denial-of-service risk that threatens availability by exhausting network capacity. A cloud DDoS protection service with edge mitigation and traffic scrubbing best matches the requirement to keep the public portal reachable before attack traffic reaches the VPC.

DDoS protection is the appropriate cloud security control when the primary risk is large-scale traffic intended to disrupt availability. Cloud DDoS services typically use distributed edge capacity, anomaly detection, rate controls, and scrubbing centers to absorb or filter malicious traffic upstream of the workload. That directly fits a public load-balanced IaaS application facing TCP/UDP floods and avoids requiring application changes. A WAF can help with application-layer HTTP attacks, but it is not the best control for bandwidth-exhaustion or transport-layer flood risk. The key takeaway is to match volumetric availability attacks with DDoS mitigation, not only host or application controls.

WAF-only control may block malicious HTTP requests, but it does not best address large TCP/UDP bandwidth-exhaustion floods.
Endpoint agents protect hosts from malware or suspicious local activity, not upstream volumetric traffic saturation.
Security group hardening restricts allowed ports and sources, but it cannot absorb or scrub large-scale internet flood traffic.

Question 46

Topic: Operations

A cloud engineer is decommissioning an old application stack after a successful cutover to a new environment. The IaC plan will destroy the old compute instances, a block storage volume used by the database, and an object bucket containing uploaded customer documents. The application owner has not confirmed legal retention requirements, and the last restore test for the data is undocumented. What should the engineer do next?

Options:

A. Replace the old volume with an empty volume for rollback
B. Delete the block volume but keep the object bucket
C. Destroy the stack because the application cutover was successful
D. Place the data resources on hold and verify retention and recovery requirements

Best answer: D

Explanation: Persistent resources require extra lifecycle controls because they may contain regulated, business-critical, or recoverable data. A successful application cutover does not prove that retention requirements are satisfied or that data can be restored.

Cloud resource lifecycle management should separate disposable infrastructure from persistent data. Compute instances can often be recreated from IaC, but database volumes and document buckets may be subject to retention policies, backup requirements, legal holds, or rollback needs. Before destroying or replacing those resources, the engineer should pause deletion, confirm ownership and retention requirements, and verify that backups or replicas meet the required recovery point and recovery time objectives. The key distinction is that infrastructure cleanup is not the same as data disposition. Persistent storage needs documented approval and recovery validation before destructive changes.

Cutover success only proves the new environment is serving traffic; it does not prove old data can be deleted.
Partial deletion still risks losing database records before retention and restore requirements are known.
Empty rollback storage removes the existing persistent data and weakens recovery instead of preserving it.

Question 47

Topic: Devops Fundamentals

A company is modernizing an order-processing platform that uses several cloud-hosted microservices. New orders must trigger inventory updates, fraud checks, shipment requests, and customer notifications. Each downstream service should scale independently, temporary outages should not block order intake, and new services should be added later with minimal changes to the order service. Which integration approach is the BEST fit?

Options:

A. Publish order events to a message broker or event bus
B. Run a nightly batch job to export new orders
C. Call each downstream service synchronously using REST APIs
D. Share one relational database schema across all services

Best answer: A

Explanation: Event-driven architecture is appropriate when cloud services need asynchronous, loosely coupled integration. In this scenario, order intake should continue even if downstream services are slow or temporarily unavailable, and new consumers should be easy to add without changing the producer.

Event-driven integration uses events, queues, topics, or an event bus so a producer can publish a business fact, such as “order created,” without directly controlling every consumer. This fits cloud microservices that have variable demand, need independent scaling, and should tolerate temporary downstream failures through buffering and retry behavior. It also supports future extensibility because additional services can subscribe to the same event stream without requiring the order service to add more point-to-point logic.

Synchronous REST is useful for immediate request/response interactions, but it tightly couples order intake to each downstream service’s availability and latency. The key takeaway is that event-driven architecture is best when integration should be asynchronous, decoupled, scalable, and extensible.

Synchronous REST fails because each downstream dependency can delay or break order intake.
Nightly batch fails because it delays processing and does not support near-real-time reactions to new orders.
Shared database fails because it tightly couples service data models and makes independent changes harder.

Question 48

Topic: Cloud Architecture

A company migrated an application tier to private subnets in a cloud VPC. The application must connect to an on-premises database at 10.20.0.0/16 using the existing site-to-site VPN. The VPN tunnel is up, on-premises routes include the VPC CIDR, and security rules allow the database port. The private subnet route table contains only local and 0.0.0.0/0 to a NAT gateway. Which action BEST resolves the connectivity issue?

Options:

A. Replace the VPN with a dedicated private connection
B. Move the application instances to a public subnet
C. Add 10.20.0.0/16 to the private route table via the VPN gateway
D. Add a public IP address to the on-premises database

Best answer: C

Explanation: The VPN is already established, and security rules are not blocking the database port. The missing element is routing from the private cloud subnets to the on-premises CIDR through the VPN gateway.

Cloud subnet route tables determine where traffic is forwarded. In this scenario, the private subnets have a local route and a default route to a NAT gateway, but no route for the on-premises database network. NAT gateways are for outbound internet access, not private VPN routing. Because the VPN is up and the on-premises side already knows the VPC CIDR, adding a route for 10.20.0.0/16 to the VPN gateway completes the returnable private path. The key takeaway is to verify both sides of the route path before changing the connection type or subnet placement.

Public subnet move weakens the private design and does not address the missing route to the on-premises CIDR.
Dedicated connection is unnecessary because the existing VPN is already up and the issue is routing, not link type.
Public database IP bypasses the private VPN requirement and increases exposure instead of fixing cloud subnet routing.

Question 49

Topic: Cloud Architecture

A company is redesigning connectivity for six cloud VPCs in the same region. Each VPC has non-overlapping CIDR blocks and private application subnets. The design must allow private routing between all VPCs, support future VPC additions with minimal route changes, and avoid exposing workloads to the internet. Which design BEST meets these requirements?

Options:

A. Attach all VPCs to a transit gateway and update subnet route tables with static routes to it.
B. Place all workloads in public subnets and restrict access with security groups.
C. Create VPC peering between one shared-services VPC and each application VPC.
D. Create a full mesh of VPC peering connections between all VPCs.

Best answer: A

Explanation: A transit gateway is the best fit for scalable VPC-to-VPC connectivity when many networks need private routing. It centralizes attachments and lets subnet route tables point traffic for remote CIDRs toward the gateway instead of requiring many individual peering routes.

The core concept is choosing the right VPC connectivity pattern. VPC peering is useful for simple point-to-point connectivity, but it does not scale well for many VPCs and does not provide transitive routing through a hub VPC. A transit gateway acts as a regional routing hub: each VPC attaches to it, and private subnet route tables use static routes for the other VPC CIDR blocks through the gateway. This supports private connectivity, future expansion, and simpler route management. Internet exposure is not required because traffic stays on private cloud network paths. The key takeaway is to use peering for small direct connections and a transit gateway for scalable hub-and-spoke VPC routing.

Shared-services peering fails because VPC peering is not transitive, so application VPCs would not automatically route through the shared VPC.
Public subnets violate the requirement to avoid internet exposure and do not solve private VPC-to-VPC routing.
Full-mesh peering can work technically, but it creates many connections and route updates as VPCs are added.

Question 50

Topic: Operations

A cloud operations team is onboarding a three-tier application that uses virtual machines, a managed database, and a load balancer. The team must troubleshoot incidents from one place, correlate events by resource and environment, and retain logs for 90 days. Which configuration best provides the required operational visibility?

Options:

A. Enable CPU and memory metrics dashboards only
B. Collect distributed traces without platform logs
C. Store logs locally on each virtual machine
D. Send resource logs to a centralized logging workspace

Best answer: D

Explanation: Centralized logging is the best fit because the team needs searchable, correlated operational events across multiple cloud resources. Sending platform and resource logs to a shared workspace with retention supports troubleshooting from one place.

The core concept is configuring log collection for observability. For a multi-resource cloud application, operational logs should be collected from each relevant component, enriched with identifiers such as resource name, environment, and timestamp, and sent to a centralized logging workspace or log analytics platform. This allows operators to search across the load balancer, virtual machines, and managed database during an incident and meet the 90-day retention requirement. Metrics and traces are useful observability signals, but they do not replace log collection when the requirement is event-level troubleshooting and retention. Local-only logs also create gaps if instances fail or scale in.

Metrics only shows resource trends but does not capture the event details needed for incident investigation.
Local VM logs fail to cover managed services and can be lost when instances are replaced.
Traces only helps follow requests but does not provide platform and resource event logs across the stack.

Questions 51-75

Question 51

Topic: Deployment

A cloud engineer is provisioning resources for a customer portal. If requirements conflict, they must be prioritized in this order:

Survive the loss of one availability zone.
Support 8,000 sustained database write IOPS.
Serve public static images with low latency.
Minimize monthly cost.

Which provisioning plan best satisfies the requirements?

Options:

A. Single-zone web tier, autoscaling group, general-purpose database storage
B. Single large VM, local database disks, CDN for all content
C. Two-zone web tier, provisioned-IOPS database, object storage with CDN
D. Two-zone web tier, archive storage for images, general-purpose database storage

Best answer: C

Explanation: The best plan honors the stated priority order instead of choosing the lowest-cost design first. Multi-zone placement addresses availability, provisioned-IOPS storage addresses the database requirement, and object storage with a CDN fits static image delivery.

Provisioning from requirements means mandatory and higher-priority constraints must be met before optimization goals such as cost reduction. In this scenario, surviving an availability zone failure requires resources across at least two zones, and the database storage must be provisioned to sustain the required write IOPS. Static public images are a strong fit for object storage fronted by a CDN because that combination is scalable, low-latency, and typically more cost-efficient than serving images from VM disks or database storage.

Cost still matters, but only after the availability, performance, and delivery requirements are satisfied.

Single-zone designs fail because they cannot survive the loss of one availability zone.
General-purpose database storage is risky because the stem requires a specific sustained write IOPS target.
Archive storage for images is a poor fit for low-latency public static content.
Local database disks reduce resilience and do not align with the prioritized availability requirement.

Question 52

Topic: Cloud Architecture

A company is migrating an order management application from a private data center to a public cloud. The application stores customers, orders, invoices, and payments in related entities. The migration requirements include enforcing relationships between records, supporting ACID transactions, and running ad hoc SQL queries with joins for finance reports. Which database target should the cloud engineer provision?

Options:

A. Document database
B. Object storage with query features
C. Key-value database
D. Managed relational database

Best answer: D

Explanation: The workload has structured, related entities and requires SQL joins and transactional integrity. Those requirements align best with a relational database rather than a non-relational store optimized for flexible schemas or simple lookups.

Relational databases organize data into tables with defined schemas and relationships, making them a strong fit for orders, invoices, payments, and customers that must stay consistent. ACID transactions help ensure multi-step business operations complete reliably, and SQL supports joins and ad hoc reporting across related tables. Non-relational databases can be excellent for flexible documents, high-scale key lookups, or semi-structured data, but they are not the best default choice when relationship enforcement and SQL join-heavy reporting are core requirements.

The key takeaway is to match the database model to the data structure and query pattern.

Document store mismatch fails because flexible JSON-like records do not best satisfy enforced relationships and join-heavy SQL reporting.
Key-value mismatch fails because simple key lookups are not designed for complex relational queries.
Object storage mismatch fails because object storage is not a transactional database for enforcing record relationships.

Question 53

Topic: Troubleshooting

A cloud team hardened an external API gateway to meet a new compliance benchmark. Immediately afterward, several legacy workloads can no longer call the API, while current web clients still work. The gateway logs show:

TLS handshake failed
Reason: no shared cipher
ClientHello: TLS 1.0, RSA_WITH_3DES_EDE_CBC_SHA
Gateway policy: TLS 1.2+, AEAD cipher suites only

Which action should the administrator take to restore access while maintaining compliance?

Options:

A. Allow the clients through the network ACL
B. Replace the gateway certificate with a longer key
C. Upgrade the legacy clients’ TLS libraries and cipher support
D. Re-enable TLS 1.0 on the API gateway

Best answer: C

Explanation: The failure is a TLS negotiation problem caused by deprecated protocol and cipher support. The logs show the clients offer TLS 1.0 with 3DES, but the gateway now requires TLS 1.2 or later with AEAD cipher suites. Updating the clients maintains the benchmark instead of weakening the gateway.

Cipher suite deprecation failures often appear as TLS handshake errors such as “no shared cipher,” even when routing, DNS, IAM, and certificates are otherwise correct. In this case, the gateway policy explicitly requires TLS 1.2+ and AEAD cipher suites, while the failing clients only present TLS 1.0 and 3DES. Restoring access requires modernizing the clients or their runtime TLS libraries so they can negotiate an approved protocol and cipher suite. Rolling back the gateway policy would restore connectivity but violate the stated compliance requirement.

The key troubleshooting signal is the mismatch between ClientHello capabilities and the gateway’s allowed TLS policy.

Weakening TLS would restore legacy connectivity but would violate the compliance benchmark.
Replacing the certificate does not fix a protocol and cipher mismatch shown by the handshake log.
Changing the ACL addresses network reachability, not a completed connection attempt failing during TLS negotiation.

Question 54

Topic: Troubleshooting

A cloud engineer is troubleshooting two issues after moving an internal web app to a cloud VPC: on-premises admins cannot reach the app over the VPN, and the app cannot call an external API.

Troubleshooting notes:

VPN tunnel: up
Route tables: on-prem 10.40.0.0/16 <-> cloud 172.20.0.0/16 present
Cloud load balancer: listener TCP/443, targets healthy
Cloud security group: allows TCP/443 from on-prem CIDR
On-prem firewall rule: allow TCP/80 to 172.20.10.0/24; deny TCP/443 to 172.20.10.0/24
App log: TLS handshake failed; peer requires TLS 1.2+; client offered TLS 1.0

Which TWO findings identify root causes of the connectivity failures?

Options:

A. The load balancer targets are unhealthy
B. The VPN route tables are missing the cloud CIDR
C. On-prem firewall blocks TCP/443 to the app subnet
D. The app uses deprecated TLS 1.0 for the API call
E. The cloud security group blocks on-premises access
F. The app requires UDP instead of TCP for HTTPS

Correct answers: C and D

Explanation: The evidence points to two separate root causes: a network device rule blocking the correct service port and a deprecated protocol version. The VPN and routing are up, but the on-prem firewall denies TCP/443, and the API handshake fails because the client offers TLS 1.0.

Network issue troubleshooting often requires separating path problems from protocol negotiation problems. Here, the VPN and routes are present, and the cloud-side listener, targets, and security group are configured for TCP/443. The explicit deny on the on-prem firewall blocks HTTPS access to the app subnet, so that is a network device misconfiguration. Separately, the API failure is not a routing issue; the log shows a TLS version mismatch where the peer requires TLS 1.2 or later but the client offers TLS 1.0. That indicates protocol deprecation or incompatibility. The key is to use the provided evidence rather than assume every connectivity issue is caused by routing.

Routing issue is not supported because the route tables already include the on-premises and cloud CIDRs.
Target health is not the cause because the load balancer reports healthy targets.
Cloud security group is not blocking the flow because it explicitly allows TCP/443 from the on-premises CIDR.
UDP for HTTPS is a mismatch because standard HTTPS to the load balancer uses TCP/443 in the notes.

Question 55

Topic: Troubleshooting

A support engineer assigned to a read-only operations group was able to change network ACLs in a cloud account. Audit logs show the engineer first assumed the BreakGlassAdmin role. The role’s trust policy allows principals from the entire account to assume it, while incident response staff must continue to use the role during outages. Which implementation best stops this privilege escalation while preserving the required access?

Options:

A. Add a deny rule for network ACL changes to the read-only group
B. Restrict the role trust policy to the incident response role
C. Require MFA for the support engineer’s read-only group
D. Rotate the support engineer’s password and access keys

Best answer: B

Explanation: The issue is unauthorized role assumption, not a direct permission in the read-only group. Tightening the BreakGlassAdmin trust policy to only the incident response role removes the escalation path while keeping break-glass access available to the intended team.

Privilege escalation through role assumption is controlled by both the caller’s permission to assume a role and the target role’s trust policy. In this case, the dangerous condition is that the admin role trusts the entire account, so any principal that gains or already has assume-role permission can use it. Restricting the trust relationship to the incident response role directly enforces who can assume BreakGlassAdmin and preserves the outage requirement. A support-group deny on network ACL changes does not reliably help after the user assumes an admin role, because the effective permissions come from the assumed role session. The key takeaway is to fix the trust boundary at the privileged role, not only the symptom action.

MFA only improves authentication assurance but does not remove the support path into the privileged role.
Action deny targets network ACL changes, but the escalation source is the assumed admin role.
Credential rotation helps if credentials are compromised, but the logs show authorized credentials abusing an overly broad trust policy.

Question 56

Topic: Troubleshooting

A cloud team discovers an unauthorized cryptocurrency miner running on several instances in a production auto scaling group. Newly launched instances also contain the miner before any startup script runs, and the application must remain available during remediation. Which implementation best removes the source of the unauthorized software while preserving availability?

Options:

A. Increase logging on the affected instances
B. Add a startup script to uninstall the miner
C. Update to a clean approved image and roll instances
D. Block the mining pool with an egress firewall rule

Best answer: C

Explanation: The evidence points to the machine image or launch source because the software exists before startup scripts run. Updating the auto scaling configuration to a known-good approved image and rolling the replacement removes the source while keeping capacity online.

Unauthorized software that appears on every newly launched instance before initialization usually indicates a compromised or unapproved base image, template, or boot volume. The remediation should remove the tainted launch source, not just clean individual running instances. In an auto scaling group, a rolling instance refresh from a clean approved image preserves availability by replacing instances gradually while maintaining desired capacity. Blocking network access or uninstalling the binary can reduce immediate impact, but those actions do not prevent the same unauthorized software from reappearing on future launches. The key troubleshooting signal is when the software appears in the lifecycle before startup automation executes.

Uninstall script fails because the unauthorized software is already baked into the launch source and may return on each new instance.
Egress blocking may reduce miner communication but leaves the unauthorized software installed and recurring.
More logging improves visibility but does not remediate the compromised image or stop new affected instances.

Question 57

Topic: Cloud Architecture

A company is moving its order-processing system to a public cloud but will keep the inventory database in its data center. The application requires private hybrid connectivity with predictable low latency during business hours, automated failover, and no dependency on best-effort Internet paths. The network budget was approved for higher recurring circuit costs. Which implementation best meets these requirements?

Options:

A. Use client VPN access for the application servers
B. Deploy redundant dedicated cloud connections with dynamic routing
C. Configure VPC peering between the cloud network and data center
D. Deploy a site-to-site VPN over the public Internet

Best answer: B

Explanation: A dedicated cloud connection is the best fit when predictable latency, reliability, and private hybrid connectivity are primary requirements and the budget allows higher recurring costs. Redundancy and dynamic routing support failover without relying on best-effort Internet transport.

Dedicated cloud connectivity uses private circuits from the data center or colocation facility into the cloud provider network. Compared with a standard site-to-site VPN over the public Internet, it typically offers more predictable latency, better throughput consistency, and stronger availability options when deployed redundantly. VPNs are often less expensive and encrypted, but they still depend on Internet path quality unless paired with a dedicated transport. For this scenario, the approved higher budget and strict latency/reliability requirements make redundant dedicated connectivity the best architectural choice.

Internet VPN is cost-effective and encrypted, but it depends on variable public Internet paths.
Client VPN is intended for user or admin remote access, not application-to-database hybrid connectivity.
VPC peering connects cloud networks to each other; it does not create a data center connection by itself.

Question 58

Topic: Operations

A cloud operations team backs up a production database every night. The backup dashboard shows successful jobs for the last 30 days, but auditors want evidence that the data can actually be recovered after a failure. Which TWO actions best validate recoverability?

Options:

A. Replicate backups to another region
B. Validate restored data with integrity checks
C. Enable encryption for all backup sets
D. Perform periodic restores in an isolated environment
E. Extend backup retention from 30 to 90 days
F. Review backup job success status daily

Correct answers: B and D

Explanation: Successful backup jobs only prove that a backup process completed; they do not prove the data is usable. Recoverability requires restore testing and integrity validation so the team can show that backup data can be restored and trusted after recovery.

The core concept is backup recoverability testing. A backup marked successful may still be incomplete, corrupted, incompatible with the restore process, or missing application-level consistency. Periodically restoring backups into an isolated environment proves the restore workflow works without risking production. Integrity checks, such as database consistency checks or checksum validation, prove the restored data is usable and trustworthy.

Retention, encryption, replication, and dashboard reviews are important backup controls, but they do not independently confirm that a restore will succeed or that recovered data is valid.

Longer retention improves recovery point availability but does not prove any backup can be restored.
Backup encryption protects confidentiality but does not validate restore success or data consistency.
Job status review confirms process completion only, not actual recoverability.
Regional replication improves resilience but can replicate unusable or corrupted backups if restores are never tested.

Question 59

Topic: Deployment

A regulated company must deploy a patch for a critical vulnerability in its customer authentication service before an audit deadline. The service cannot have a planned outage, the team must be able to route users back quickly if login errors increase, and the patch should be validated with a small amount of real production traffic first. The budget allows only modest temporary extra capacity, not a full duplicate environment. Which deployment strategy best meets these requirements?

Options:

A. Canary deployment
B. In-place deployment
C. Big-bang deployment
D. Blue-green deployment

Best answer: A

Explanation: Canary deployment best matches the downtime, rollback, testing-risk, and resource constraints. It sends a small percentage of production traffic to the patched version first, then expands only if health and security signals remain acceptable.

A canary deployment is designed for controlled production validation with limited risk. In this scenario, the team needs no planned outage, fast rollback, and exposure to a small amount of real traffic before full rollout. Traffic routing can be shifted away from the canary if authentication errors or security telemetry worsens. Unlike blue-green, canary does not require a full parallel copy of the production environment, which fits the modest temporary capacity limit. The key tradeoff is that canary provides gradual confidence and rollback control without the resource cost of a complete duplicate stack.

Blue-green capacity fails because it usually requires a full duplicate production environment, which the budget does not allow.
In-place update fails because it increases outage and rollback risk for a critical authentication service.
Big-bang release fails because it exposes all users at once and does not validate the patch with a small production subset first.

Question 60

Topic: Troubleshooting

A company moved an application tier to a new private subnet. Since cutover, the application cannot connect to a managed relational database; connection attempts time out. Monitoring shows normal application CPU and memory, normal database storage and I/O, and the application IAM role can retrieve the database credentials. DNS resolves the database private endpoint correctly. The database security group allows TCP 5432 only from the old application subnet. What should the cloud administrator do first?

Options:

A. Grant the application role database administrator permissions
B. Change the database volume to higher IOPS storage
C. Increase the application instance size
D. Allow the new subnet on the database security group

Best answer: D

Explanation: The evidence points to a network filtering issue, not a compute, storage, or IAM problem. The database security group still trusts only the old subnet, so the new application subnet must be allowed on the required database port.

Network troubleshooting should follow the symptoms and available evidence. A connection timeout with successful DNS resolution usually means packets are not reaching the service or return traffic is blocked. The stem also rules out common non-network causes: CPU and memory are normal, storage and I/O are normal, and the IAM role can retrieve credentials. Because the database security group still allows TCP 5432 only from the old subnet, the most direct fix is to update the network access rule for the new application subnet.

Changing compute, storage, or IAM settings would not address a source-subnet restriction in the database security group.

Compute sizing does not address a timeout caused by blocked database traffic.
Storage performance is not implicated because database storage and I/O are normal.
IAM permissions are not the issue because the role already retrieves credentials successfully.

Question 61

Topic: Cloud Architecture

A company runs a customer portal on virtual machines with a self-managed relational database cluster. The cloud team spends significant time applying database patches, managing failover, and maintaining backup jobs. The application must keep SQL compatibility, but the team wants the provider to handle as much database operations work as possible. Which architecture best meets these requirements?

Options:

A. Move the database to a managed relational database service
B. Rebuild the database on larger IaaS virtual machines
C. Run the database in containers on managed Kubernetes
D. Export relational data to object storage only

Best answer: A

Explanation: A managed relational database service reduces customer operational responsibility for a supported SQL workload. The provider typically manages much of the platform operation, such as database engine patching, backup features, and failover mechanisms, while the customer manages schema, access, and application use.

The core concept is using cloud-provided managed services to offload undifferentiated operational tasks. In this scenario, the workload still needs SQL compatibility, so a managed relational database service is the best fit. It preserves the relational database model while reducing responsibility for common operations such as database software maintenance, integrated backups, replication or failover configuration, and underlying infrastructure management. The customer still has responsibilities, including data governance, IAM, network access controls, schema design, and application configuration. Moving the same database to larger VMs or containers does not remove much database administration burden because the team still owns the database runtime and operational lifecycle. The key takeaway is that managed services reduce, but do not eliminate, customer responsibility.

Larger VMs may improve capacity, but the customer still manages database patching, backups, and failover.
Managed Kubernetes reduces some cluster operations, but a containerized database still requires significant database administration.
Object storage only is not a direct replacement for an active SQL relational database used by an application.

Question 62

Topic: Security

A cloud operations team receives alerts that several web VMs are hitting 95% CPU, causing autoscaling to add instances and increasing latency. Application request volume is normal, but endpoint monitoring shows an unauthorized process launched from /tmp and repeated outbound connections to known cryptocurrency mining pool domains. Which root cause best matches this evidence?

Options:

A. Cryptojacking on the web VMs
B. Distributed denial-of-service attack
C. Zombie instances in a botnet
D. Metadata service abuse

Best answer: A

Explanation: The evidence points to cryptojacking because the workload is consuming excessive CPU while communicating with cryptocurrency mining pools. Normal application request volume makes a traffic-based attack less likely, and the unauthorized process explains the resource spike.

Cryptojacking is unauthorized use of compute resources to mine cryptocurrency. Monitoring commonly shows sustained high CPU or GPU usage, unexpected processes, and outbound connections to mining pools. In this case, autoscaling and latency are symptoms caused by stolen compute capacity, not by legitimate workload growth. A practical next fix would be to isolate affected VMs, remove the process, rotate any exposed credentials, patch the initial access path, and add egress controls or detection for mining pool traffic.

The key takeaway is to match resource symptoms with network and process evidence, not just the performance alarm.

Botnet confusion fails because zombie instances are typically identified by command-and-control or attack traffic, not mining pool connections.
Metadata abuse fails because there is no evidence of calls to the instance metadata service or stolen temporary credentials.
DDoS confusion fails because application request volume is normal rather than showing a large inbound traffic spike.

Question 63

Topic: Deployment

A company is provisioning a customer portal for regulated financial data. Requirements: 99.9% availability, private database access, encryption at rest and in transit, audit logging, predictable moderate traffic with short monthly spikes, document retention for 7 years, and controlled network access from the corporate data center. The team must avoid unnecessary always-on capacity. Which provisioning design best balances these requirements?

Options:

A. Always-on active-active deployment in two regions with duplicated databases, premium block storage for all documents, and dedicated circuits to both regions
B. FaaS-only backend with public object storage, no persistent database, CDN-only access control, and archive tier for all documents
C. Autoscaled stateless compute across zones, managed relational database, private subnets, encrypted object storage with lifecycle policies, VPN, WAF, and least-privilege IAM
D. Single large virtual machine with local storage, public database endpoint, manual backups, and IP allow-listing from the office

Best answer: C

Explanation: A balanced provisioning design uses elastic capacity for variable demand while meeting security, availability, compliance, and retention needs. The best fit combines multi-zone stateless compute, managed database services, private networking, encryption, audit-friendly IAM, and storage lifecycle management.

The core concept is provisioning from competing requirements rather than maximizing one attribute. Multi-zone stateless compute with autoscaling supports the 99.9% availability target and monthly spikes without paying for excessive idle capacity. A managed relational database in private subnets fits regulated transactional data and reduces operational overhead. Encrypted object storage with lifecycle policies supports long document retention at lower cost than keeping all files on high-performance block storage. VPN connectivity, WAF protection, least-privilege IAM, and logging address controlled access and compliance evidence.

The key takeaway is to map each requirement to the least complex service pattern that satisfies it, not to choose the largest or cheapest resources by default.

Single VM design creates availability and durability risks and exposes the database path too broadly for regulated data.
Active-active overbuild may improve resilience, but it adds unnecessary always-on cost beyond the stated 99.9% requirement.
FaaS-only design does not satisfy the persistent relational data need and weakens storage and access-control requirements.

Question 64

Topic: Troubleshooting

A cloud engineer runs the same IaC deployment for a three-node application tier in two regions. Region East completes successfully. Region West fails during VM creation with this message:

Compute API status: Healthy
Requested size: memory-optimized M-series
Regional quota used: 18 of 40 vCPUs
Error: The requested VM size is not offered in the selected region or zone.

What is the most likely root cause?

Options:

A. Quota exhaustion for regional vCPUs
B. A NAT route failure during provisioning
C. Regional unavailability of the selected VM size
D. A compute service outage in Region West

Best answer: C

Explanation: The failure points to regional or zonal availability of a specific resource type, not a broad service outage. The compute API is healthy and the vCPU quota has available capacity, so the best fix would be to choose a supported size or deploy to a region or zone where it is available.

Deployment failures can come from different layers: the cloud service itself, account limits, quota exhaustion, or resource availability in a specific location. In this case, the compute service reports healthy, so an outage is not indicated. The quota line shows 18 of 40 vCPUs used, so the deployment is not blocked by vCPU quota. The decisive evidence is the provider message stating that the requested VM size is not offered in the selected region or zone. That means the deployment template is valid in general, but the chosen regional target cannot provide that instance family. The appropriate next action is to select an equivalent supported VM size or change the region or zone.

Service outage is unlikely because the compute API status is explicitly healthy.
Quota exhaustion does not fit because the regional vCPU quota still has unused capacity.
NAT routing is unrelated to the VM size availability error returned by the compute control plane.

Question 65

Topic: Security

A company runs an internet-facing application on IaaS virtual machines in an auto-scaling group. A vulnerability scan reports a critical OS package CVE on all instances. The workload must remain available, and the team uses a hardened base image for deployments. Which action best reduces exposure while preserving operational consistency?

Options:

A. Disable external access until the next quarterly maintenance window
B. Patch and test the base image, then perform a rolling replacement
C. Manually patch only the instances currently receiving traffic
D. Add more instances to reduce load on vulnerable servers

Best answer: B

Explanation: The best practice is to remediate the vulnerable package in the hardened base image, validate it, and redeploy instances through a controlled rolling update. This reduces workload exposure without creating configuration drift or taking the service offline.

For cloud workloads built from a standard hardened image, patching should be repeatable and fleet-wide. Updating the base image, testing it, and replacing instances in batches keeps the environment consistent and avoids ad hoc changes on running systems. A rolling replacement also supports availability because only part of the fleet is updated at a time while healthy instances continue serving traffic. This approach combines hardening, patch management, and operational change control.

Manual patching can leave drift between instances and may not fix future scale-out instances if the image remains vulnerable. Scaling out does not remediate the CVE, and waiting for a long maintenance window leaves an internet-facing workload exposed.

Manual live patching can create drift and does not update the source image used for future instances.
Delayed maintenance leaves a critical internet-facing vulnerability exposed longer than necessary.
Scaling out improves capacity, not security posture, and adds more vulnerable instances.

Question 66

Topic: Operations

A cloud administrator is decommissioning an application stack after a migration. The change plan will delete the VMs, attached block volumes, and an object storage bucket that may contain customer export files. The retirement ticket confirms the application is no longer used but does not state any retention or recovery requirements. Which action best manages this operation?

Options:

A. Disable monitoring and wait for backup expiration
B. Apply the destroy plan because the application is retired
C. Pause deletion until retention and recovery requirements are confirmed
D. Delete the block volumes but keep the object bucket

Best answer: C

Explanation: Persistent resources require lifecycle controls before deletion. Even if an application is retired, attached volumes and object storage may contain regulated, contractual, or business-critical data that must be retained or recoverable.

Cloud resource lifecycle management includes safe decommissioning of persistent resources. Before deleting or replacing storage, the administrator should confirm data ownership, retention periods, backup status, restore requirements, and any legal or compliance holds. The stem explicitly says the ticket lacks retention and recovery details, so proceeding would risk permanent data loss or policy violation. The safest operational action is to pause the destructive change until those requirements are confirmed and any required backups or archives are validated.

The key takeaway is that retirement of compute does not automatically authorize deletion of persistent data.

Application retired is insufficient because unused compute does not prove stored customer exports can be deleted.
Partial deletion still risks losing persistent data on the block volumes without confirmed requirements.
Waiting for backup expiration weakens recoverability and does not establish whether the data must be retained.

Question 67

Topic: Deployment

A healthcare organization is deploying a new claims-processing platform that must keep all regulated data on infrastructure dedicated to the organization. Auditors require direct control over hypervisor configuration, network segmentation, and physical access procedures. The platform will still expose APIs to approved partner systems over private connectivity.

Which deployment approach best implements these requirements?

Options:

A. Deploy to a public cloud serverless architecture
B. Deploy to a SaaS claims-processing platform
C. Deploy to a public cloud shared VPC
D. Deploy to a private cloud in the organization’s data center

Best answer: D

Explanation: A private cloud or on-premises deployment is best when the scenario requires dedicated control, strong isolation, and regulatory oversight of infrastructure. The auditors specifically require control over hypervisors, network segmentation, and physical access, which points away from shared public cloud or SaaS models.

Cloud deployment model selection should match control and compliance requirements. A private cloud hosted in the organization’s data center provides cloud-like provisioning while keeping infrastructure dedicated to one organization. This supports direct governance over virtualization hosts, segmentation controls, access procedures, and regulated data location. Public cloud can support many compliance needs, but the provider controls much of the physical and hypervisor layer under the shared responsibility model. SaaS and serverless further reduce customer control over the underlying platform.

The key takeaway is that dedicated control and isolation requirements are strong indicators for private cloud or on-premises deployment.

Shared public cloud may provide logical isolation, but it does not give direct control over physical access or hypervisor configuration.
SaaS model reduces operational burden, but it does not preserve the required infrastructure-level control.
Serverless model is highly managed and scalable, but it abstracts away the exact layers auditors require the organization to control.

Question 68

Topic: Deployment

A company must move a legacy three-tier application from an on-premises virtualization cluster to a public cloud before its data center lease expires. The application uses fixed OS versions, local configuration files, and block storage. The business requires the shortest migration timeline and will postpone code changes until after the move. Which migration approach is the BEST fit?

Options:

A. Refactor the application into cloud-native microservices
B. Rebuild the application on a PaaS runtime
C. Rehost the application on IaaS virtual machines
D. Replace the application with a SaaS product

Best answer: C

Explanation: Rehosting, often called lift-and-shift, is the best fit when the priority is moving quickly with little or no application modification. The fixed OS versions, local configuration files, and block storage dependencies point to IaaS virtual machines rather than a redesign.

The core concept is selecting the migration strategy that matches the required amount of change. Rehost keeps the application architecture mostly intact while moving the existing server workload onto cloud IaaS resources. This fits a lease-driven deadline and a requirement to delay code changes. It also preserves control over OS versions and storage attachment patterns that may not fit PaaS or SaaS models.

Refactoring, rebuilding, or replacing can improve long-term cloud alignment, but they require more design, testing, and application change. For this scenario, the key takeaway is that minimal change plus a short timeline strongly indicates rehost.

Refactor to microservices fails because it requires application redesign and code changes before migration.
Replace with SaaS fails because it changes the application solution rather than moving the existing workload.
Rebuild on PaaS fails because PaaS would likely require runtime, configuration, and deployment changes.

Question 69

Topic: Deployment

A company must move a legacy intranet application before a data center lease expires. The migration requirement is to use minimal application changes and preserve the current OS/runtime. During a pilot, deployment to a managed application platform fails with: Unsupported runtime and OS-level library installation is not permitted. Which next fix best addresses the migration goal?

Options:

A. Refactor the application for a managed PaaS runtime
B. Rearchitect the application into microservices
C. Replace the application with a SaaS product
D. Rehost the application on IaaS virtual machines

Best answer: D

Explanation: The key requirement is minimal application change while preserving the current OS/runtime. A managed application platform blocks required OS-level dependencies, so moving the existing workload to IaaS virtual machines is the best next fix.

Rehosting, also called lift and shift, is the migration approach used when a workload should move to the cloud with little or no application modification. In this scenario, the application has runtime and OS-level dependencies that the managed platform does not allow. IaaS virtual machines provide control over the guest OS, installed libraries, and runtime configuration, allowing the migration to meet the deadline without redesigning the application. This does not optimize the application for cloud-native services, but it directly satisfies the stated migration constraint.

PaaS refactor fails because it requires code or dependency changes to fit the managed runtime.
Microservices redesign is too large a change for a minimal-change migration.
SaaS replacement changes the application model and is not a direct migration of the existing workload.

Question 70

Topic: Cloud Architecture

A company is redesigning an image-processing workload. After a user uploads an image, the system must run malware scanning, resize the image, extract metadata, update a database, and send a notification. Some steps can run in parallel, failed steps must be retried, and the overall status must be visible to support staff.

Which TWO design choices best apply orchestration or workflow concepts for this workload?

Options:

A. Add retry and timeout policies to each workflow task.
B. Use a state-machine workflow to coordinate task order and branching.
C. Scale the upload API vertically with a larger instance type.
D. Call each service synchronously from the web front end.
E. Use a CDN to cache processed images near users.
F. Place all processing logic in a single VM startup script.

Correct answers: A and B

Explanation: This workload needs a workflow/orchestration pattern because multiple components must execute in a controlled sequence with parallelism, retries, and visible status. A state-machine workflow plus task-level retry and timeout handling directly addresses coordinated execution across services.

Workflow orchestration coordinates multiple components as part of one business process. In this scenario, the system needs to manage dependencies, parallel steps, failures, and status tracking after an upload. A state-machine or workflow engine can represent each step, decide when tasks run, branch on outcomes, and record progress. Adding retry and timeout policies prevents one transient component failure from breaking the entire process without visibility.

Caching, scaling, or running a script may help other performance or deployment concerns, but they do not provide durable coordination across several independent processing steps.

Single startup script creates a brittle implementation and does not provide durable workflow state or per-step visibility.
CDN caching improves content delivery after processing, but it does not coordinate the processing tasks.
Vertical scaling may add compute capacity, but it does not manage dependencies, branching, or retries.
Synchronous front-end calls tightly couple the user request to back-end processing and reduce resilience for long-running workflows.

Question 71

Topic: Operations

A cloud operations team is standardizing logging for VMs, containers, and FaaS workloads. Security requires authentication and firewall logs to be searchable for 90 days and retained for 1 year. Application teams require application logs for 14 days. The solution must minimize storage cost while preserving investigation evidence. Which implementation best meets these requirements?

Options:

A. Centralize all logs and apply source-based retention and archive policies
B. Keep logs on each workload and rotate local files after 14 days
C. Send only security logs to storage and disable application log collection
D. Store all logs in searchable hot storage for 1 year

Best answer: A

Explanation: The requirement calls for log aggregation plus different retention periods by log type. Centralizing logs from all workload types supports investigation across systems, while source-based retention and archive policies meet compliance and reduce cost.

Log aggregation collects records from distributed cloud resources into a centralized platform so investigators can search and correlate events across VMs, containers, and FaaS workloads. Retention should match the requirement for each log category: security logs need 90 days of searchable access and 1 year of retained evidence, while application logs only need 14 days. Moving older security logs to an archive tier preserves compliance evidence without paying for long-term hot search storage. The key takeaway is to separate collection from retention policy: collect broadly, then retain each log type according to its purpose.

Local rotation fails because workload-local logs are harder to correlate and would delete security evidence too soon.
Security-only collection fails because application teams explicitly require application logs for debugging.
All hot storage meets retention but ignores the stated cost-minimization constraint.

Question 72

Topic: Security

A company runs IaaS VMs in a public cloud and on-premises servers connected by VPN. A new security policy requires host-level protection on each cloud-connected system to detect or block malicious processes and suspicious endpoint behavior. Which TWO controls best meet this requirement?

Options:

A. Deploy an EDR agent to the servers
B. Enable network ACLs on the cloud subnets
C. Use a CDN with DDoS protection
D. Configure object storage encryption at rest
E. Place a WAF in front of the application
F. Install centrally managed anti-malware software

Correct answers: A and F

Explanation: Endpoint protection is applied directly to cloud-connected systems, such as VMs, servers, or workstations. EDR and centrally managed anti-malware provide host-level visibility and enforcement against malicious processes and suspicious endpoint activity.

The core concept is endpoint protection: security controls installed or managed at the host level. For IaaS VMs and VPN-connected servers, the cloud provider typically secures the underlying infrastructure, but the customer remains responsible for protecting the guest OS, applications, and endpoint behavior. EDR helps detect suspicious process activity, persistence, lateral movement, and other host indicators. Anti-malware helps prevent and remediate known malicious files and processes. Network and application controls are useful, but they do not replace controls running on the protected systems themselves.

The key takeaway is to match the control location to the requirement: host-level risk requires host-level endpoint controls.

Application edge control fails because a WAF protects web application traffic, not processes running on each host.
Subnet filtering fails because network ACLs control traffic at the network boundary, not endpoint behavior.
Storage encryption fails because encryption at rest protects stored data, not host processes.
Traffic absorption fails because CDN-based DDoS protection addresses availability attacks, not endpoint malware or suspicious host activity.

Question 73

Topic: Deployment

A team deploys a stateless web service on six IaaS instances behind a load balancer. During a supposed rolling update, users received HTTP 503 errors for four minutes.

Deployment log excerpt:

Batch size: 6
Action: Deregister old instances from load balancer
Action: Start six new instances
Health checks: passing after 90 seconds

The requirement is to keep the service available while application instances are updated gradually. What is the best next fix?

Options:

A. Update one small batch and wait for health checks
B. Scale each instance vertically before deployment
C. Lower the DNS TTL before the deployment
D. Disable load balancer health checks during updates

Best answer: A

Explanation: The outage happened because all six serving instances were removed from the load balancer at once. A rolling deployment maintains availability by updating a small subset of instances and waiting for each batch to pass health checks before continuing.

Rolling deployment updates application instances gradually while keeping enough healthy capacity online to serve traffic. In this scenario, the configured batch size matched the entire fleet, so the process behaved like an all-at-once replacement and temporarily left the load balancer without healthy backends. The fix is to reduce the batch size or set an appropriate maximum unavailable value, then gate each batch on successful health checks before draining the next set of old instances.

The key takeaway is that “rolling” requires controlled batch size plus health validation, not just automated replacement.

DNS TTL does not address the immediate cause because traffic is already going through the load balancer.
Disabling health checks can send users to unhealthy instances and makes the rollout less safe.
Vertical scaling may add capacity per instance, but it does not prevent all backends from being removed at once.

Question 74

Topic: Security

A company is moving an internal order-processing application to a public cloud VPC. Administrators and developers will access management endpoints from corporate laptops and remote networks. The security team requires no implicit trust based on network location, continuous verification of users and devices, and limited east-west traffic between application tiers. Which design best applies Zero Trust principles?

Options:

A. Place management endpoints behind a public load balancer with TLS
B. Use identity-aware access with MFA, device posture checks, and microsegmentation
C. Allow VPN access to all private subnets from corporate IP ranges
D. Use a shared bastion account and restrict login to office networks

Best answer: B

Explanation: Zero Trust assumes no user, device, or network path is trusted by default. The best design combines strong identity verification, device posture validation, and least-privilege segmentation between workloads.

Zero Trust for cloud access and workload protection focuses on “never trust, always verify.” In this scenario, administrators and developers connect from varied locations, so access should be based on identity, MFA, device health, authorization, and context rather than a trusted corporate network. For the application tiers, microsegmentation using security groups, network ACLs, or similar controls reduces lateral movement by allowing only required east-west flows. The key takeaway is to authenticate and authorize every request while minimizing implicit trust between users, devices, networks, and workloads.

Trusted VPN range fails because private network access alone still grants broad implicit trust based on source location.
Public TLS endpoint protects transport but does not provide continuous identity, device, or least-privilege workload controls.
Shared bastion access weakens accountability and still depends heavily on network location instead of per-user verification.

Question 75

Topic: Cloud Architecture

A cloud team builds container images in a CI job and deploys the same service to multiple clusters. Recent rollouts used different image versions because some deployments pulled locally exported image files while others pulled older cached images. Which operational change BEST supports consistent container build, distribution, and deployment?

Options:

A. Publish versioned images to a central image registry
B. Add more replicas behind the load balancer
C. Move container data to persistent volumes
D. Increase application log retention for all clusters

Best answer: A

Explanation: An image registry is the control point for storing and distributing container images after they are built. Publishing versioned images to a registry lets clusters pull the same artifact during deployment instead of relying on local exports or caches.

The core purpose of an image registry is to hold container images as deployable artifacts for build, distribution, and deployment workflows. A CI job builds the image, pushes it to the registry with a tag or immutable digest, and the deployment configuration references that stored image. This creates a consistent source of truth across clusters and regions and supports repeatable rollouts and rollbacks. Logs, scaling, and persistent storage are important operational concerns, but they do not solve inconsistent image sourcing.

Log retention helps with investigation but does not control which image version is deployed.
Persistent volumes protect application data but do not distribute container images.
More replicas improves capacity or availability but can multiply an inconsistent deployment.

Questions 76-90

Question 76

Topic: Cloud Architecture

A company currently runs a payment API as a single stand-alone container on one cloud VM. A new compliance review requires documented patch rollouts, centralized secrets handling, least-privilege service-to-service access, automatic restart on failure, and scaling from 2 to 20 instances during peak periods. Which approach best meets these requirements?

Options:

A. Run multiple manual containers with shell scripts
B. Move the API to a container orchestration platform
C. Keep one container and increase the VM size
D. Store secrets in the container image

Best answer: B

Explanation: The requirements exceed what a stand-alone container is designed to manage. Container orchestration is the best fit when a workload needs scale-out, resilience, controlled updates, secrets handling, and consistent policy enforcement.

Stand-alone containers are useful for simple workloads, testing, or single-instance deployments, but they do not natively provide coordinated scheduling, health-based replacement, rolling updates, or centralized policy management. A container orchestration platform can run multiple replicas, restart failed containers, apply network policies, integrate with secrets management, and support repeatable patch or image rollouts. In this scenario, the compliance and resilience requirements are tied to operating the workload at scale, so orchestration is the appropriate architecture choice. Simply making the VM larger or scripting manual container starts does not provide the same managed control plane or audit-friendly consistency.

Bigger VM may add capacity, but it does not provide self-healing, replica management, or controlled rollouts.
Manual scripts can start containers, but they are weak for consistent policy, failure recovery, and auditability.
Secrets in images violates secure secrets management and makes rotation difficult.

Question 77

Topic: Deployment

A company deploys a standardized application stack using IaC and CaC templates. Operators outside the deployment team handle after-hours scaling, validation, and rollbacks. The team wants the automated deployments to remain maintainable and understandable as templates change. Which TWO documentation practices should the team implement?

Options:

A. Store deployment notes in each engineer’s private workspace.
B. Add comments or descriptions for non-obvious variables, modules, and configuration choices.
C. Include administrator credentials in examples for faster troubleshooting.
D. Replace written documentation with console screenshots.
E. Keep a versioned README/runbook with inputs, dependencies, validation, and rollback steps.
F. Use commit messages as the only deployment documentation.

Correct answers: B and E

Explanation: Maintainable automated deployments need documentation that is accessible, versioned, and close to the code. A runbook explains how operators use and recover the deployment, while inline comments clarify why important IaC or CaC choices exist.

For code-based cloud deployment, documentation should reduce operational ambiguity without becoming stale. Keeping a README or runbook in the same repository as the IaC/CaC templates helps operators find current prerequisites, variables, dependencies, validation checks, and rollback steps. Adding comments or descriptions near non-obvious modules, variables, and configuration choices preserves design intent where future maintainers are most likely to look. The key is to document both procedure and intent in a controlled, reviewable location. Documentation that is private, screenshot-only, or limited to commit messages is harder to maintain and less useful during operations.

Private notes fail because operators need shared, accessible documentation that survives staffing changes.
Console screenshots become stale quickly and do not explain automated template logic.
Commit-only documentation lacks procedural detail and is difficult to use during an incident.
Credential examples weaken security and should not be used as deployment documentation.

Question 78

Topic: Operations

A cloud operations team must improve incident response for intermittent 5xx errors in a checkout API after deployments. Responders need to correlate failing requests with latency, route, status code, and application version. Company policy prohibits collecting request bodies, query strings, or full debug logs. Data should be retained for 14 days. Which observability configuration best meets these requirements?

Options:

A. Full packet capture for all checkout traffic with 30-day retention
B. Verbose debug logging for all services including payloads
C. CPU, memory, and uptime metrics only
D. Structured error logs and sampled traces with 14-day retention

Best answer: D

Explanation: The best configuration collects only the telemetry needed to investigate the 5xx incidents. Structured logs and sampled traces can include route, status, latency, correlation ID, and version without storing bodies, query strings, or excessive debug data.

Observability for incident response should be purposeful and scoped. For this checkout API, responders need request-level context to connect errors to latency, routes, status codes, and application versions. Structured error logs and sampled traces provide that correlation while avoiding prohibited data such as request bodies and query strings. A 14-day retention setting also matches the stated data minimization requirement.

The key principle is to collect actionable telemetry with controlled fields, sampling, and retention rather than capturing everything available.

Packet capture is excessive, may expose sensitive traffic details, and exceeds the required retention period.
Verbose debug logs violate the policy by collecting payloads and broad service data.
Metrics only are useful for alerts but lack request-level context for incident investigation.

Question 79

Topic: Security

A company is moving administrative access for several cloud services to a centralized IAM design. Requirements are to verify each administrator with the corporate identity provider and MFA, allow only job-appropriate cloud actions, and retain a searchable record of API and console activity for audits. Which design BEST maps authentication, authorization, and accounting to these requirements?

Options:

A. Federated SSO with MFA, RBAC policies, and centralized audit logging
B. RBAC policies, password complexity rules, and encrypted object storage logs
C. Centralized audit logging, SSO session timeouts, and network security groups
D. MFA prompts, immutable backups, and private subnet placement

Best answer: A

Explanation: The scenario asks for the AAA functions in IAM. Authentication verifies who the administrator is, authorization determines what the administrator can do, and accounting records what actions occurred for auditability.

In a cloud IAM scenario, authentication, authorization, and accounting are separate but related controls. Federated SSO with MFA satisfies authentication because it proves the administrator’s identity through the corporate identity provider and an additional factor. RBAC or least-privilege IAM policies satisfy authorization because they limit actions based on job role. Centralized audit logging satisfies accounting because it records console and API activity for review, investigations, and compliance evidence. The key is matching each requirement to the IAM function it actually performs, not treating all security controls as interchangeable.

RBAC first is incomplete because RBAC is authorization, while password rules do not provide the requested federated identity flow.
Audit logging first misplaces accounting as authentication and uses network security groups, which are not IAM authorization.
Backups and subnets are useful operational and network controls, but they do not map to IAM authorization or accounting.

Question 80

Topic: Cloud Architecture

A team runs a private IaaS cloud hosting several legacy application VMs. New test environments must be provisioned repeatedly with the same OS patches, middleware, and monitoring agent. If an application VM fails, the replacement must come online quickly without rebuilding the OS; application data is restored from separate block-volume snapshots. Which approach BEST meets these requirements?

Options:

A. Install each VM manually from the base OS ISO
B. Create clones from a hardened source VM and attach restored data volumes
C. Use object storage replication to recreate the VM filesystem
D. Live migrate the failed VM to another host before restoring data

Best answer: B

Explanation: VM cloning supports repeatable deployment by copying a known-good VM state into new instances. In this scenario, cloning provides the patched OS, middleware, and agent configuration quickly, while persistent application data is handled separately through block-volume snapshots.

Cloning is a virtualization mechanism for creating a new VM from an existing VM, template, or snapshot-based source. It is useful when many VMs need the same baseline configuration or when recovery requires quickly replacing a failed VM without rebuilding the operating system and software stack. Keeping application data on separate block volumes is a strong design choice because the clone can provide the compute and OS layer, while the data volume snapshot restores the persistent state. This keeps the golden source clean and makes deployments consistent.

Live migration helps move a running VM between hosts, but it does not create a clean replacement if the VM itself has failed.

Live migration is mainly for host maintenance or balancing and assumes the VM can still run.
Manual installation is slower and increases configuration drift across repeated deployments.
Object storage replication protects objects, not a bootable VM configuration or attached block-volume state.

Question 81

Topic: Operations

A company runs a production database on IaaS block storage. The recovery plan requires an RPO of 15 minutes, recovery in another region if the primary region fails, and encryption for backup data both in transit and at rest. Which approach BEST meets these requirements?

Options:

A. Enable encrypted cross-region replication with replicated key access
B. Replicate unencrypted volumes across regions every 15 minutes
C. Use synchronous replication between zones in the primary region
D. Take encrypted daily snapshots in the same availability zone

Best answer: A

Explanation: The best fit is encrypted cross-region replication with key access available in the recovery region. This aligns the 15-minute RPO with frequent replication, protects data during transfer and storage, and supports recovery from a regional failure.

For backup and recovery operations, replication addresses resilience and encryption addresses data protection. Because the stem requires recovery in another region, same-zone or same-region methods are not enough. Because the RPO is 15 minutes, replication must occur frequently enough to keep recovery data within that loss window. Encryption must cover data in transit and at rest, and the recovery region must be able to decrypt the replicated copies using authorized key access. The key takeaway is that resilience and protection requirements must be satisfied together, not as separate afterthoughts.

Same-zone snapshots protect against some data loss but do not meet the regional recovery requirement or the 15-minute RPO.
Unencrypted replication may meet resilience goals but fails the stated encryption requirement.
Primary-region replication can improve availability across zones but does not support recovery from a full regional outage.

Question 82

Topic: Cloud Architecture

A media company is moving an existing stateless containerized service to the cloud. The service creates thumbnails when video files are uploaded, takes 2–5 minutes per file, and does not analyze content or learn from historical data. The company wants elastic processing and durable file storage with minimal changes to the application. Which implementation best meets the requirement?

Options:

A. An IoT message broker with device shadows for each video
B. A managed machine learning training pipeline for uploaded videos
C. Object storage events, a queue, and autoscaled container workers
D. A GPU cluster for continuous model inference on uploads

Best answer: C

Explanation: The task needs ordinary cloud storage and elastic compute, not an evolving technology such as AI, machine learning, or IoT. Object storage, a queue, and autoscaled container workers match the upload-driven workflow while preserving the existing stateless container design.

Evolving technologies are valuable when the workload actually requires their capabilities. In this scenario, the service performs deterministic thumbnail creation, already runs in a container, and only needs durable storage plus elastic processing. Object storage can hold uploaded and generated files, storage events can enqueue work, and autoscaled container workers can process jobs as volume changes. This design is cloud-native without introducing unnecessary AI/ML or IoT components.

AI or machine learning services would fit content classification, prediction, or model training requirements. IoT services would fit device telemetry and command/control use cases. The key takeaway is to choose the simplest cloud capability that satisfies the requirement rather than forcing an emerging technology into a standard compute-and-storage task.

ML pipeline mismatch fails because the task does not require training, prediction, or content understanding.
IoT broker mismatch fails because uploaded media files are not device telemetry or managed IoT endpoints.
GPU inference overbuild fails because thumbnail generation does not require model inference or specialized AI acceleration.

Question 83

Topic: Devops Fundamentals

A cloud team manages production network changes through an IaC repository. The main branch is protected because production changes require peer review and passing CI tests. An engineer tries to apply an urgent route-table fix by pushing directly to main, but the deployment does not start and the Git service returns: protected branch update rejected; pull request review required. What is the best next fix?

Options:

A. Rebase the local branch and retry the direct push.
B. Temporarily disable branch protection and push the route-table fix.
C. Force-push the commit to main with an administrator token.
D. Open a hotfix pull request and complete the required approvals/checks.

Best answer: D

Explanation: The symptom points to a source control policy failure, not an IaC syntax or cloud networking issue. Because production changes require review and CI validation, the engineer should use the protected-branch workflow rather than bypassing it.

Protected branches enforce controlled cloud changes by requiring actions such as pull requests, approvals, and successful status checks before code can affect production. In this scenario, the Git service explicitly rejects a direct push because a pull request review is required. The appropriate fix is to create a hotfix branch, open a pull request, allow the required CI tests to run, obtain the required review, and merge according to policy. This preserves auditability and reduces the risk of unreviewed infrastructure changes reaching production.

Bypassing the control might appear faster, but it violates the stated governance requirement and can create untracked or unsafe production drift.

Disable controls is inappropriate because the scenario explicitly requires review before production changes.
Force push bypasses the protected workflow and weakens auditability for a production IaC change.
Rebase and retry does not address the branch protection rule that blocks direct pushes.

Question 84

Topic: Devops Fundamentals

A cloud team manages Terraform modules, Kubernetes manifests, and deployment scripts for a hybrid application. Multiple engineers must work on changes at the same time, every production change must be reviewed, and the team must be able to quickly roll back to the last known-good deployment asset version. Which source control workflow BEST meets these requirements?

Options:

A. Commit all changes directly to main and rely on pipeline logs
B. Store deployment files in a shared folder with dated subfolders
C. Let each engineer maintain a local repository copy for deployments
D. Use feature branches, pull requests, protected main, and release tags

Best answer: D

Explanation: A branch-based Git workflow with pull requests and protected main supports collaboration and change control for deployment assets. Release tags or versioned releases provide stable rollback points when a deployment asset version must be restored quickly.

Source control for cloud deployment assets should provide version history, concurrent work isolation, review gates, and a reliable way to identify approved release states. Feature branches let engineers make changes without disrupting the production-ready branch. Pull requests support peer review and auditability before merge. Protected main prevents unreviewed or force-pushed changes. Release tags mark known-good versions of IaC, manifests, and scripts so the team can redeploy or revert to a previous asset set when needed.

Pipeline logs can show what happened, but they are not a complete source control rollback strategy. Shared folders and local-only copies lack strong collaboration controls, merge history, and reliable versioned rollback points.

Direct main commits weaken review controls and make concurrent work riskier.
Shared folders can preserve copies, but they lack proper branching, merge tracking, and audit history.
Local-only repositories create inconsistent deployment sources and do not support team-wide rollback reliably.

Question 85

Topic: Devops Fundamentals

A payments team is integrating order, inventory, and notification microservices for a regulated application. Compliance requires least-privilege data access and auditable transaction events. The architecture requirement is loose coupling, near-real-time communication, and no direct reads or writes to another service’s database. Which integration pattern best meets these requirements?

Options:

A. Use synchronous point-to-point REST calls for every update
B. Publish events to a broker with service-owned datastores
C. Share one relational database schema across all services
D. Grant each service read access to peer databases

Best answer: B

Explanation: An event-driven broker pattern best matches loose coupling, near-real-time communication, and controlled data ownership. Each service can publish or subscribe to events without directly accessing another service’s database, supporting least privilege and auditability.

The core concept is choosing an integration pattern that matches coupling, communication, and data access requirements. A publish/subscribe or event-broker pattern lets services communicate asynchronously or near real time without tight point-to-point dependencies. Keeping each service’s datastore private also supports least privilege because other services consume published events instead of querying peer databases. The event stream can also provide an auditable record of business activity when retained and protected appropriately.

The key takeaway is that integration convenience should not override stated boundaries for coupling or data ownership.

Shared schema creates tight coupling and violates the requirement that services not directly access a common data layer.
Peer database reads break service-owned data boundaries and expand privileges beyond least-privilege access.
Point-to-point REST can be valid for some integrations, but using it for every update increases coupling and does not fit the asynchronous event requirement as well.

Question 86

Topic: Cloud Architecture

A company is building a small private cloud for several critical VMs. The requirement is to manage two hypervisor hosts as one resource pool and have VMs automatically restart on the surviving host if one host fails. Which implementation best meets these requirements?

Options:

A. Configure a hypervisor cluster with shared storage and HA enabled.
B. Create VM clones on a second host and start them manually.
C. Deploy each VM on a stand-alone hypervisor with local storage.
D. Pin each VM to a specific host using host affinity rules.

Best answer: A

Explanation: The requirement combines centralized management with automatic recovery after a host failure. That points to clustering, not stand-alone virtualization. A hypervisor cluster with shared storage and high availability lets surviving hosts restart affected VMs.

Stand-alone virtualization runs VMs on an individual hypervisor host, but each host is managed independently and does not inherently provide automated recovery if the host fails. Clustering combines multiple hypervisor hosts into a managed pool and can provide high availability features, such as restarting VMs on another node after a failure. Shared storage, or another supported shared-disk design, ensures the surviving host can access the VM disks needed to restart workloads. The key distinction is that virtualization provides VM abstraction, while clustering adds coordinated management and availability across hosts.

Stand-alone hosts miss the requirement for pooled management and automatic restart after host failure.
Manual clones may improve recoverability, but they do not provide automated high availability.
Host affinity controls VM placement and can reduce mobility, which conflicts with failover flexibility.

Question 87

Topic: Deployment

During a pilot lift-and-shift migration, a batch-processing VM boots in the cloud but never passes the application health check. Security groups allow outbound TCP 443, and DNS resolves other internal names correctly.

Log excerpt:

ERROR license checkout failed
Target: 10.20.5.15:443
Reason: connection timed out
Route table: local VPC CIDR, 0.0.0.0/0 -> NAT gateway
VPN routes: none for 10.20.0.0/16

What is the most likely migration risk causing the failure?

Options:

A. Unsupported guest OS on the cloud hypervisor
B. Missing network route to an on-premises dependency
C. Insufficient vCPU quota for the migrated VM
D. Expired DNS record for the license service

Best answer: B

Explanation: The VM is running, but the application cannot reach a required on-premises license service. The route table lacks a path to the 10.20.0.0/16 network, so this is a missing network dependency rather than a compute or platform issue.

A common migration risk is overlooking network dependencies that existed implicitly on premises. In this case, the application targets 10.20.5.15 over TCP 443, but the cloud route table only has local VPC routing and a default route to a NAT gateway. NAT provides outbound internet access; it does not create private connectivity to an on-premises RFC 1918 network. The migration plan should include private connectivity, route propagation or static routes, and any required firewall rules for the on-premises dependency.

The key takeaway is to validate application dependency maps, not just whether the VM boots successfully.

Platform incompatibility is unlikely because the VM boots and reaches the application startup phase.
Resource quota would usually prevent provisioning or cause capacity errors, not a timeout to a specific private IP.
DNS expiration does not fit because the failing target is shown as an IP address, not an unresolved hostname.

Question 88

Topic: Operations

A company is migrating several IaaS instances, a managed database, and object storage to a cloud environment. Compliance requires centralized retention of administrative activity and resource logs for 1 year, and the operations team needs to correlate events across resources during incidents. Some instances are replaced during autoscaling events.

Which configuration best supports the required operational visibility?

Options:

A. Store instance logs on each VM’s local disk
B. Export logs weekly to an administrator workstation
C. Collect only CPU and memory metrics in monitoring
D. Enable resource audit logs and agents to a central logging service

Best answer: D

Explanation: The best approach is centralized log collection from both cloud resource audit sources and workload agents. This supports compliance retention, incident correlation, and visibility even when autoscaled instances are replaced.

Operational visibility depends on collecting the right logs from the right sources into a centralized logging platform. For this scenario, provider or resource audit logs capture administrative activity for services such as databases and object storage, while agents or native log forwarding capture operating system and application logs from IaaS instances. Central retention policies help meet the 1-year compliance requirement, and centralized indexing makes correlation possible during investigations. Local-only collection is risky for autoscaled or replaced instances because the evidence can disappear with the resource.

The key takeaway is to centralize log ingestion and retention instead of relying on individual resource storage or metrics alone.

Local VM logs fail because autoscaled instances can be terminated, deleting or isolating operational evidence.
Metrics only fails because CPU and memory metrics do not provide administrative activity or detailed event records.
Workstation exports fail because weekly manual collection is not timely, centralized, or reliable for incident correlation.

Question 89

Topic: Troubleshooting

A private IaaS cloud uses VLAN-backed tenant networks between hypervisor hosts and a top-of-rack switch. After a VM is migrated to a new host, it cannot reach its default gateway on VLAN 120. The VM port group tags frames with VLAN 120, and the host must carry several tenant VLANs on the same physical uplink. The switch port connected to the new host is configured as an access port in VLAN 120. Which change BEST resolves the issue?

Options:

A. Configure the switch port as a trunk allowing VLAN 120.
B. Change the VM port group to send untagged frames.
C. Remove VLAN 120 from the trunk allowed list.
D. Configure the hypervisor uplink as access VLAN 120.

Best answer: A

Explanation: The issue is a VLAN tagging mismatch between the hypervisor and the physical switch. Because the hypervisor is tagging VLAN 120 and the same uplink must carry multiple tenant VLANs, the switch-facing port should be a trunk that allows VLAN 120.

In VLAN-backed cloud networking, a hypervisor uplink that carries multiple tenant networks normally connects to a trunk port. The trunk preserves VLAN tags so the physical switch and virtual switch can separate tenant traffic. An access port is intended for a single untagged VLAN; it does not match a hypervisor uplink that sends tagged frames for several VLANs. Configuring the switch port as a trunk and allowing VLAN 120 restores the expected path for the migrated VM while still supporting other tenant VLANs on the same uplink. The key signal is that the VM port group already tags VLAN 120, so the physical port must accept tagged VLAN 120 traffic.

Untagged VM traffic would only fit an access-port design and would not satisfy the need to carry several tenant VLANs on the uplink.
Access hypervisor uplink would collapse the connection to one untagged VLAN and break the multitenant trunk requirement.
Removing VLAN 120 would explicitly block the tenant VLAN needed by the migrated VM.

Question 90

Topic: Devops Fundamentals

A cloud engineering team is standardizing DevOps tooling. The requirement is to automatically run build, test, and deployment jobs when application code changes are pushed to a repository. Which TWO tools are designed for this purpose? (Select TWO.)

Options:

A. Docker
B. GitHub Actions
C. Jenkins
D. Terraform
E. Ansible
F. Git

Correct answers: B and C

Explanation: GitHub Actions and Jenkins are DevOps automation tools used to run CI/CD workflows. Both can execute build, test, and deployment jobs in response to source code changes, matching the stated requirement.

The core concept is tool purpose within a DevOps workflow. CI/CD tools orchestrate repeatable pipeline jobs, such as compiling code, running tests, creating artifacts, and deploying to cloud environments after a repository event. GitHub Actions provides workflow automation tied closely to repository events, while Jenkins provides a flexible automation server for pipeline execution. Other tools in the list may participate in a pipeline, but they do not primarily provide the CI/CD orchestration function requested in the scenario.

The key distinction is between running the pipeline and supporting tasks inside or around the pipeline.

Source control only fails because Git tracks and versions code but does not primarily orchestrate CI/CD jobs.
Container packaging fails because Docker builds and runs containers rather than managing pipeline workflow execution.
Infrastructure provisioning fails because Terraform provisions infrastructure as code, not application build and test automation.
Configuration automation fails because Ansible manages configuration and task automation, but it is not primarily a CI/CD pipeline orchestrator.

Continue with full practice

Use the CompTIA Cloud+ CV0-004 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try CompTIA Cloud+ CV0-004 on Web View CompTIA Cloud+ CV0-004 Practice Test

Focused topic pages

Free review resource

Read the CompTIA Cloud+ CV0-004 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.

Revised on Thursday, May 14, 2026

Troubleshooting

Browse Certification Practice Tests by Exam Family

Free CompTIA Cloud+ CV0-004 Full-Length Practice Exam: 90 Questions

Exam snapshot

Full-length exam mix

Practice questions

Questions 1-25

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Questions 26-50

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Question 48

Question 49

Question 50

Questions 51-75

Question 51

Question 52

Question 53

Question 54

Question 55

Question 56

Question 57

Question 58

Question 59

Question 60

Question 61

Question 62

Question 63

Question 64

Question 65

Question 66

Question 67

Question 68

Question 69

Question 70

Question 71

Question 72