Browse Certification Exam Prep, Practice Tests & Simulators

AWS SOA-C03: 24 Sample Questions & Simulator

SOA-C03 sample questions, mock-exam practice, and simulator access with detailed explanations in the IT Mastery on web, iOS, and Android.

SOA-C03 is AWS’s CloudOps Engineer Associate certification for candidates who need strong operational judgment across monitoring, remediation, reliability, automation, security, and networking. If you are searching for SOA-C03 sample questions, a practice test, mock exam, or exam simulator, this is the main IT Mastery page to start on web and continue on iOS or Android with the same account.

Interactive Practice Center

Start a practice session for AWS Certified CloudOps Engineer - Associate (SOA-C03) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.

Open Full App in a New Tab

A small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same account they use on web and mobile.

Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same account across web and mobile.

What this SOA-C03 practice page gives you

  • a direct route into the IT Mastery simulator for SOA-C03
  • topic drills and mixed sets across monitoring, reliability, automation, security, and networking
  • detailed explanations that show why the best CloudOps answer is correct
  • a clear free-preview path before you subscribe
  • the same account across web and mobile

SOA-C03 exam snapshot

  • Vendor: AWS
  • Official exam name: AWS Certified CloudOps Engineer - Associate (SOA-C03)
  • Exam code: SOA-C03
  • Items: 65 total, including 50 scored and 15 unscored
  • Exam time: 130 minutes
  • Question types: multiple-choice and multiple-response
  • Passing score: 720 scaled

SOA-C03 questions usually reward the option that improves observability, reduces operational risk, and applies the lowest-risk remediation path rather than a broad redesign.

Topic coverage for SOA-C03 practice

DomainWeight
Monitoring, Logging, Analysis, Remediation, and Performance Optimization22%
Reliability and Business Continuity22%
Deployment, Provisioning, and Automation22%
Security and Compliance16%
Networking and Content Delivery18%

How to use the SOA-C03 simulator efficiently

  1. Start with domain drills so you can separate monitoring/remediation gaps from reliability, automation, security, or networking gaps.
  2. Review every miss until you can explain the telemetry, automation, failover, security-control, or networking decision behind the best answer.
  3. Move into mixed sets once you can shift between incident response, DR, Systems Manager, CloudFormation, CloudWatch, and network-troubleshooting scenarios without hesitation.
  4. Finish with timed runs so the 130-minute pace feels routine before exam day.

Free preview vs premium

  • Free preview: a smaller web set so you can validate the question style and explanation depth.
  • Premium: the full SOA-C03 bank with 3,592 questions, focused drills, mixed sets, detailed explanations, and progress tracking across web and mobile.

24 SOA-C03 sample questions with detailed explanations

These sample questions include the same mix of single-answer and multiple-response items you should practice for SOA-C03. Use them to check your readiness here, then move into the full IT Mastery question bank for broader timed coverage.

Question 1

Topic: Content Domain 5: Networking and Content Delivery

An AWS Site-to-Site VPN shows both tunnels as UP. Instances in a private subnet still cannot reach an on-premises network (10.20.0.0/16). Security groups and network ACLs allow the traffic.

Which VPC configuration is required for the subnet’s traffic to be sent through the VPN?

  • A. Add inbound rule to security group for on-prem CIDR
  • B. Turn on VPC Flow Logs for the subnets
  • C. Enable VPC DNS hostnames and DNS resolution
  • D. Add route to on-prem CIDR via virtual private gateway

Best answer: D

Explanation: A VPN tunnel being UP only indicates the tunnel’s control plane is established. The VPC still needs an explicit route (static route or propagated route) that points the on-premises CIDR to the virtual private gateway so the data plane traffic is forwarded over the VPN.

For Site-to-Site VPN connectivity, VPC routing determines whether traffic ever enters the VPN. Even with both VPN tunnels UP, instances will not reach on-premises networks unless the subnet-associated route table has a matching route for the on-prem CIDR that targets the VPN attachment (for example, a virtual private gateway), either as a static route or via route propagation (BGP/propagation).

If that route is missing, traffic follows the remaining best match (often the NAT gateway or no route), so connectivity fails despite a healthy tunnel state. DNS settings and logging features can help with name resolution troubleshooting and visibility, but they do not control VPC packet forwarding to the VPN.


Question 2

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

An operations engineer needs to improve an Amazon EC2 instance’s EBS storage performance with minimal downtime. Which statement is INCORRECT about safely modifying EBS volume size, IOPS, and throughput?

  • A. You can decrease an EBS volume size with ModifyVolume.
  • B. Elastic Volumes can modify in-use volumes without detaching.
  • C. After resizing, expand the partition/filesystem to use new space.
  • D. gp3 lets you change IOPS/throughput independently of size.

Best answer: A

Explanation: Amazon EBS Elastic Volumes supports online increases to volume size and performance settings, and gp3 decouples IOPS/throughput from size for tuning with minimal disruption. However, EBS does not support reducing a volume’s size in place. To get a smaller volume, you must migrate data to a newly created smaller volume (for example, via snapshot/restore).

The core concept is using EBS Elastic Volumes (ModifyVolume) to tune storage performance safely while the volume remains attached and available. You can typically increase volume size and adjust performance characteristics (IOPS and, for supported types, throughput) with little or no downtime, then complete any required OS-level steps to realize the change.

Key operational steps are:

  • Modify the volume (size/type/IOPS/throughput) and monitor the volume modification state until it is optimizing/completed.
  • If you increased size, extend the partition (if used) and grow the filesystem so the instance can use the new capacity.
  • Use gp3 when appropriate to set IOPS and throughput independently of size.

A crucial limitation is that EBS volumes cannot be shrunk in place; downsizing requires creating a new smaller volume and migrating data to it.


Question 3

Topic: Content Domain 5: Networking and Content Delivery

An operations team uses Amazon CloudWatch Internet Monitor to detect internet connectivity degradation to an ALB endpoint. The monitor publishes two CloudWatch metrics for a 5-minute period: PacketsSent (count) and PacketsLost (count). For the most recent 5-minute datapoint, PacketsSent is 12,000 and PacketsLost is 420.

The team wants a CloudWatch alarm based on a metric math expression that outputs packet loss as a percentage (0-100%). Which metric math expression correctly produces the packet loss percentage for this datapoint (round to 1 decimal place)?

  • A. 100 * (PacketsSent / PacketsLost)
  • B. 100 * (PacketsLost / PacketsSent)
  • C. 100 - (100 * (PacketsLost / PacketsSent))
  • D. PacketsLost / PacketsSent

Best answer: B

Explanation: To alarm on connectivity degradation, you often need packet loss in percent rather than raw counts. For this datapoint, packet loss percentage is computed as lost packets divided by sent packets, multiplied by 100. Using that formula yields a 3.5% loss rate for 420 lost out of 12,000 sent.

CloudWatch network monitoring often provides raw counters (for example, packets sent and packets lost) that you convert into a rate to detect degradation consistently. A metric math expression can transform the two count metrics into a percentage that you can alarm on.

Compute packet loss percent for the datapoint:

  • Divide lost by sent: 420 / 12,000 = 0.035
  • Convert to percent: 0.035 (\times) 100 = 3.5%

An expression of the form 100 * (lost / sent) produces a 0-100 value that directly represents packet loss percentage; subtracting from 100 instead produces success percentage.


Question 4

Topic: Content Domain 5: Networking and Content Delivery

An operations team uses AWS WAF on an Application Load Balancer (ALB) to protect a public web application. After enabling a new AWS Managed Rules rule group, users report intermittent 403 responses. The team must validate whether the rule group is effective and support an investigation by identifying exactly which WAF rule matched and what action WAF took for each request.

Which solution best meets this requirement?

  • A. Use AWS CloudTrail event history to find WAF UpdateWebACL events during the incident
  • B. Enable ALB access logging to Amazon S3 and review requests with 403 status codes
  • C. Enable VPC Flow Logs for the ALB subnets and review REJECT records
  • D. Enable AWS WAF full logging and deliver logs with Kinesis Data Firehose to Amazon S3

Best answer: D

Explanation: The deciding factor is needing per-request visibility into which AWS WAF rule matched and the resulting WAF action. AWS WAF full logs provide rule identifiers and request context for each evaluated request, which supports both effectiveness validation and incident investigation.

To validate and investigate a network protection control like AWS WAF, you often need request-by-request evidence of enforcement (which rule matched and what action was applied). CloudWatch metrics can show counts (allowed/blocked), but they do not provide the detailed request record needed to attribute a specific 403 to a specific WAF rule.

Enable AWS WAF logging and send the logs to a destination such as Amazon S3 via Kinesis Data Firehose. These logs contain fields that identify the terminating/matched rule and the action taken, allowing you to confirm whether the managed rule group is blocking the intended traffic and to troubleshoot false positives. The closest alternative, ALB access logs, shows the 403 at the load balancer but not the WAF rule decision.


Question 5

Topic: Content Domain 3: Deployment, Provisioning, and Automation

A company maintains a “golden” Amazon Linux AMI in a tooling account in us-east-1. Operations must distribute each approved AMI release to two production accounts and to two Regions (us-east-1 and us-west-2).

Requirements:

  • All AMIs must remain encrypted.
  • AMIs must never be publicly accessible.
  • Rollback must be possible by reverting Auto Scaling groups to the prior AMI version, and the last two AMI versions must be retained for 30 days.

Which TWO actions should the engineer AVOID?

  • A. Temporarily make the AMI public so other accounts/Regions can copy it, then set it back to private
  • B. Tag each AMI with an immutable version label and keep launch template versions so Auto Scaling can be reverted to the earlier AMI
  • C. Deregister the prior AMI and delete its snapshots immediately after promotion
  • D. Use EC2 Image Builder distribution settings to distribute the AMI to the required accounts and Regions with encryption enabled
  • E. Share the AMI to the target accounts, grant required KMS permissions, and have each target account copy the AMI into its own account/Region
  • F. Keep current and previous AMI IDs in SSM Parameter Store and update Auto Scaling launch templates to roll back by repointing to previous

Best answers: A, C

Explanation: To distribute AMIs safely across accounts and Regions, the process must preserve encryption and control access through explicit sharing and KMS permissions. Rollback depends on keeping prior AMI versions available for a defined retention window. Any action that exposes the AMI publicly or removes the previous image artifacts breaks these stated requirements.

For cross-account and cross-Region AMI distribution, the safe operational pattern is to keep images private, share them only with required accounts, and ensure any encrypted AMI can be used/copied by granting the destination accounts the necessary KMS permissions (or by distributing encrypted copies with appropriate keys). Rollback requires that the prior AMI still exists (and its snapshots, if EBS-backed) so Auto Scaling can be reverted to a known-good image.

Actions to avoid are those that directly violate the explicit constraints:

  • Making an AMI public, even briefly, breaches the “never publicly accessible” requirement.
  • Deregistering/deleting the previous version immediately prevents meeting the 30-day retention and breaks rollback capability.

Everything else is compatible with controlled distribution and operational rollback by keeping versioned AMIs and switching launch template/parameter references as needed.


Question 6

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

An application runs on an Auto Scaling group behind an ALB. Users report intermittent 503 errors and slow responses. You must restore performance quickly, and Security requires keeping CloudWatch Logs for at least 90 days for investigations.

Exhibit: CloudWatch (last 15 minutes)

1ALB TargetResponseTime (p95): 0.4s -> 6.8s
2ALB HTTPCode_Target_5XX_Count: 0 -> 1,200
3ALB HealthyHostCount: 8 -> 3
4EC2 CPUUtilization (avg): 20% to 30%
5CWAgent mem_used_percent: 94% to 97%
6App logs: "java.lang.OutOfMemoryError" then "Health check failed"

Which TWO actions should you AVOID during remediation? (Select TWO.)

  • A. Use CloudWatch Logs Insights to find when OOM errors began and which instances are affected
  • B. Add an alarm on HealthyHostCount and notify an SNS topic when it drops
  • C. Delete and recreate the CloudWatch Logs log group to remove noisy errors
  • D. Increase Auto Scaling group desired capacity and scale on the memory metric
  • E. Use AWS Systems Manager Run Command to restart the application process on affected instances
  • F. Disable the ALB health check to keep all instances InService

Best answers: C, F

Explanation: The metrics show a drop in healthy targets and a spike in target 5XXs, while logs indicate OutOfMemoryError events. Remediation should preserve health-based routing and retain logs for investigation. Actions that keep unhealthy instances serving traffic or delete log evidence directly violate the stated availability and retention requirements.

CloudWatch metrics and logs together indicate the ALB is sending traffic to fewer healthy targets (HealthyHostCount drops) while target 5XXs and p95 latency spike, and the instances are memory-exhausted (mem_used_percent ~97% with OutOfMemoryError). Safe remediation focuses on identifying impacted instances and restoring healthy capacity while preserving observability and required evidence.

  • Use logs/Logs Insights to correlate OOM events with instance IDs and timing.
  • Restore capacity (scale out/up) and/or restart the failed process on impacted instances.
  • Strengthen detection with alarms on availability indicators like HealthyHostCount.

Any step that bypasses health checks or removes CloudWatch Logs undermines availability or violates the explicit log-retention requirement.


Question 7

Topic: Content Domain 5: Networking and Content Delivery

A company runs the same public HTTPS API in two AWS Regions (us-east-1 and eu-west-1) behind Application Load Balancers (ALBs). Users are primarily in North America and Europe.

Requirements:

  • Route users to the Region with the best performance.
  • If a Region becomes unhealthy, shift traffic automatically to the other Region.
  • Security/compliance: do not add a new unauthenticated public health-check endpoint; use existing CloudWatch alarms for alerting.
  • Ops/cost: use a managed DNS solution with minimal ongoing maintenance.

Which Route 53 configuration best meets these requirements?

  • A. Latency-based alias records and CloudWatch-alarm Route 53 health checks
  • B. Weighted routing 50/50 across both ALBs with Route 53 health checks
  • C. Failover routing with us-east-1 primary and eu-west-1 secondary
  • D. Simple routing with two alias records pointing to both ALBs

Best answer: A

Explanation: Latency-based routing is the Route 53 policy that targets the lowest-latency endpoint for users in different geographies. Pairing it with Route 53 health checks driven by CloudWatch alarms removes unhealthy Regions from DNS responses to achieve automatic failover. Using CloudWatch-alarm health checks also avoids creating a new unauthenticated internet-facing health-check path.

Use Route 53 latency-based routing when the primary goal is best client performance across Regions. Create two latency records (alias to each ALB) for the same name and associate each record with a Route 53 health check that is based on an existing CloudWatch alarm (for example, ALB HTTPCode_ELB_5XX or UnHealthyHostCount). When an alarm breaches, Route 53 treats that endpoint as unhealthy and stops returning it, effectively shifting traffic to the remaining healthy Region while still preferring the lowest-latency Region during normal operation. This approach is fully managed, integrates with existing monitoring, and avoids exposing a new public health-check endpoint just for DNS probing.


Question 8

Topic: Content Domain 5: Networking and Content Delivery

You are configuring Route 53 private hosted zones (PHZs) to implement split-horizon DNS for example.internal. Select THREE statements that are true.

  • A. A private hosted zone can be associated only with VPCs in the same AWS Region as the hosted zone.
  • B. If VPC-A is associated with a private hosted zone, any VPC that is peered with VPC-A can automatically resolve that private hosted zone without being explicitly associated.
  • C. A private hosted zone’s records are automatically resolvable from every VPC in the same AWS account and Region.
  • D. To associate a private hosted zone with a VPC in a different AWS account, the hosted zone owner must create a VPC association authorization and the VPC owner then completes the association.
  • E. Split-horizon DNS can be implemented by creating a public hosted zone and a private hosted zone with the same domain name; queries from associated VPCs resolve to the private zone.
  • F. A single VPC cannot be associated with more than one private hosted zone that has the same domain name.

Best answers: D, E, F

Explanation: Split-horizon DNS with Route 53 relies on how Route 53 selects between public and private hosted zones of the same name based on the source of the DNS query. Private hosted zone visibility is controlled strictly by explicit VPC associations (including cross-account associations via authorization), and Route 53 enforces uniqueness to avoid conflicting private zones per VPC.

The core mechanism is that Route 53 answers from a private hosted zone only when the DNS query originates from within a VPC that is explicitly associated with that private hosted zone. This enables split-horizon DNS by using the same domain name in both a public hosted zone and a private hosted zone, with Route 53 returning different answers depending on whether the resolver is inside an associated VPC.

Operationally, keep these association rules in mind:

  • A VPC must be explicitly associated to use a private hosted zone.
  • Cross-account association requires an authorization created by the hosted-zone owner, then association by the VPC owner.
  • To avoid ambiguity, a VPC can be associated with only one private hosted zone for a given domain name.

The key takeaway is that DNS visibility for private hosted zones does not “inherit” through networking constructs like VPC peering; it is controlled by Route 53 associations.


Question 9

Topic: Content Domain 4: Security and Compliance

An operations engineer reviews AWS Trusted Advisor security checks for an AWS account and must choose the first remediation to reduce the highest account-wide risk.

Trusted Advisor: Security

  • MFA on Root Account: Red (not enabled)
  • Security Groups - Unrestricted access: Red (some rules allow 0.0.0.0/0)
  • IAM Access Key Rotation: Yellow (some keys are old)

Which remediation should be prioritized first?

  • A. Enable MFA on the AWS account root user
  • B. Rotate all IAM access keys that are older than policy
  • C. Enable AWS CloudTrail in all AWS Regions
  • D. Remove 0.0.0.0/0 inbound rules from security groups

Best answer: A

Explanation: Trusted Advisor’s “MFA on Root Account” check is meant to prevent full account takeover because the root user cannot be constrained by IAM policies and has unrestricted access. Enabling MFA on the root user is the fastest way to reduce the highest blast-radius risk before addressing resource-level exposure and hygiene items.

Trusted Advisor security checks are best prioritized by blast radius and privilege level. The AWS account root user is a highly privileged identity that can perform any action in the account, and its credentials are not governed by IAM permission boundaries or SCPs. If root MFA is not enabled, a compromised password can lead to immediate, complete account compromise.

After root MFA is enabled, prioritize other findings based on exposure (for example, which ports are open to the internet and whether they are required) and identity hygiene (such as access key rotation). The key fact is that root MFA directly protects the highest-privilege principal in the account.


Question 10

Topic: Content Domain 4: Security and Compliance

A company stores customer documents in three Amazon S3 buckets. The operations team currently runs a nightly workflow that copies all new objects to an EC2 instance and uses custom scripts to search for PII (for example, passport numbers). This workflow is slow, requires frequent script updates, and adds compute and data transfer cost.

The security team wants an AWS-native way to continuously discover and classify sensitive data in these S3 buckets and produce auditable findings, with minimal ongoing operational effort and no application changes.

Which change is the best optimization?

  • A. Use AWS Config rules to evaluate S3 bucket policies and tag objects that contain PII
  • B. Enable Amazon GuardDuty S3 Protection to detect sensitive data in the S3 buckets
  • C. Generate daily S3 Inventory reports and query them with Amazon Athena to locate PII
  • D. Enable Amazon Macie and create a scheduled sensitive data discovery job for the S3 buckets

Best answer: D

Explanation: Amazon Macie is purpose-built to discover and classify sensitive data in Amazon S3 using managed and custom data identifiers and to produce findings for audit and response workflows. Enabling Macie and running scheduled discovery jobs removes the need for a custom EC2-based scanning pipeline. The tradeoff is ongoing Macie scanning cost, but operational effort and fragility are significantly reduced.

The core need is automated sensitive data discovery and classification for S3 objects with auditable findings and low operational overhead. Amazon Macie natively scans S3 buckets (without copying data into custom scanners), uses managed data identifiers for common PII types, and generates findings that can be routed to downstream systems (for example, Security Hub, EventBridge, and SNS) for triage and remediation.

A practical operational approach is:

  • Enable Macie in the account (or as the delegated administrator for an organization).
  • Create a scheduled discovery job targeting the specific buckets/prefixes in scope.
  • Review and route findings to the security team’s existing alerting/ticketing path.

This replaces ongoing maintenance of custom scripts and scanning infrastructure; the main tradeoff is paying for Macie’s analysis based on what you choose to scan and how often.


Question 11

Topic: Content Domain 2: Reliability and Business Continuity

An application’s EC2 Auto Scaling group (ASG) is “thrashing” (frequent scale out and scale in) after load spikes, creating operational noise. The ASG uses scaling policies.

Which TWO statements are UNSAFE guidance for tuning cooldowns and stabilization windows to reduce thrashing? (Select TWO.)

  • A. For step scaling, set a cooldown long enough for the metric to reflect the last scaling action before allowing another scaling action.
  • B. Increase the target tracking scale-in stabilization window to ignore brief metric drops after a spike.
  • C. Use different cooldown durations for scale-out and scale-in when scale-out effects take longer to show in metrics than scale-in effects.
  • D. Reduce the target tracking scale-in stabilization window to the minimum value so scale-in happens quickly after spikes.
  • E. Set the ASG default cooldown to 0 seconds so scaling reacts as fast as possible.
  • F. Configure instance warm-up (or estimated warm-up) to match instance boot time so the ASG waits before evaluating further scale-out actions.

Best answers: D, E

Explanation: Cooldowns and stabilization windows are meant to give metrics time to reflect the last scaling action and to ignore short-lived metric fluctuations. Guidance that removes that “settling time” (for example, zero cooldown or overly aggressive scale-in stabilization) commonly causes oscillation, repeated scaling events, and alert noise. The safer guidance adds controlled delay aligned to instance warm-up and metric behavior.

To prevent scaling thrash, you want scaling decisions to be based on stable signals, not on metrics that haven’t yet “caught up” to the last scaling action.

Cooldowns (commonly for step/simple scaling) and instance warm-up help by delaying additional scaling until new instances are actually contributing to the metric. Target tracking uses a scale-in stabilization window to avoid scaling in due to temporary metric dips right after a spike.

Practical tuning approach:

  • Set warm-up/cooldown roughly to the time it takes new capacity to become effective.
  • Increase the scale-in stabilization window when you see rapid in/out oscillation.
  • Consider asymmetric timing (often longer on scale-out) if scale-out impact is delayed in metrics.

The unsafe guidance is the advice that shortens or removes these delays, which makes scaling react to noise instead of steady demand.


Question 12

Topic: Content Domain 4: Security and Compliance

Select THREE statements that are true about AWS resource-based policies (for example, Amazon S3 bucket policies and AWS KMS key policies) and high-level policy evaluation.

  • A. For cross-account access to an S3 bucket, the caller’s IAM policy and the bucket policy must both allow the requested action.
  • B. An explicit Deny in any applicable policy overrides any Allow.
  • C. KMS key policies cannot specify principals; only IAM identity policies can grant access to a KMS key.
  • D. To allow a principal in another AWS account to use a KMS key, the key policy must allow that principal (or account), and the principal must also have IAM permissions in its own account.
  • E. If an S3 bucket policy allows an action, IAM identity policies are ignored for that request.
  • F. Resource-based policies cannot use conditions and must allow or deny unconditionally.

Best answers: A, B, D

Explanation: Resource-based policies (like S3 bucket policies and KMS key policies) are evaluated together with identity-based policies, and explicit denies always win. Cross-account access commonly requires permission both on the calling principal (IAM) and on the target resource (bucket/key). KMS additionally relies on the key policy as the resource control point for the key.

Resource-based policies attach to a specific resource (for example, an S3 bucket or a KMS key) and can grant permissions directly to principals, including principals from other AWS accounts. For most requests, AWS evaluates all applicable policy types together (identity-based policies, resource-based policies, session policies, permission boundaries, SCPs), and an explicit Deny anywhere overrides any Allows.

For cross-account access, you typically need:

  • Permission on the caller identity (an IAM policy in the caller’s account).
  • Permission on the target resource (for example, an S3 bucket policy or a KMS key policy) that trusts/allows that principal.

Key takeaway: resource-based policies don’t replace IAM evaluation; they participate in it, and conditions are supported to scope access safely.


Question 13

Topic: Content Domain 2: Reliability and Business Continuity

A DynamoDB table named Orders has point-in-time recovery (PITR) enabled. During quarterly DR tests, operators restore the table to a timestamp from earlier that day and must validate the restore within 15 minutes while keeping read costs low. The current validation step runs a full table Scan on the restored table, which is slow and expensive.

Which change is the best optimization to the restore-and-validate procedure without breaking the constraints?

  • A. Keep the full Scan, but use strongly consistent reads
  • B. Enable a DynamoDB global table and validate by failing over Regions
  • C. Restore to a new table, then BatchGetItem a canary key set
  • D. Replace PITR tests with restoring an on-demand backup each quarter

Best answer: C

Explanation: PITR restores a DynamoDB table to a specific timestamp by creating a new table, so validation should focus on quickly proving data and access paths work. Using a small, predefined canary dataset and targeted reads (for example, BatchGetItem) validates the restore outcome within the time window and with minimal read cost. The tradeoff is that it’s sampling-based rather than a full-data verification.

The core PITR workflow is to restore to a new table at a chosen timestamp, wait until the restored table is ACTIVE, and then validate that the expected data is present and readable. For a large table, a full Scan is a poor fit for time- and cost-constrained DR tests because it consumes many RCUs and can exceed the 15-minute objective.

A practical validation approach is:

  • Run RestoreTableToPointInTime to a new table name.
  • Wait for DescribeTable to show ACTIVE.
  • Validate with targeted reads of a preselected canary set (for example, BatchGetItem for known order IDs) plus a basic application smoke test.

This reduces operational effort and cost while still providing strong evidence the PITR restore succeeded; full-table verification is the closest alternative but is not operationally efficient here.


Question 14

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

You are configuring Amazon EventBridge rules to route Amazon EC2 instance state-change events to operational targets.

Select THREE statements that are FALSE.

  • A. Use source: aws.ec2 and an EC2 state-change detail-type to match events.
  • B. A rule in one Region receives EC2 events from all Regions.
  • C. An EventBridge rule can directly stop/start an EC2 instance as a target.
  • D. To send events cross-account, target the other account’s event bus and allow events:PutEvents.
  • E. For a Lambda target, add a policy that allows events.amazonaws.com to invoke it.
  • F. EC2 state-change notifications require CloudTrail management events enabled.

Best answers: B, C, F

Explanation: EventBridge service events are scoped to the Region where they occur, and EC2 state-change notifications are published directly to EventBridge without requiring CloudTrail. Also, EventBridge rules don’t “take EC2 actions” by themselves; they route events to targets that perform actions, such as Lambda or SSM Automation, with the required permissions in place.

The core concept is how EventBridge service events are matched and routed. EC2 instance state changes are published as native EventBridge events in the same Region as the instance, and you match them with an event pattern (commonly source: aws.ec2 plus the EC2 state-change detail-type, and optionally detail.state).

The false statements are incorrect because:

  • EventBridge does not aggregate AWS service events across Regions automatically; rules evaluate events only in their Region.
  • CloudTrail is not required for EC2 state-change notifications; CloudTrail is used when you want to react to API calls (via CloudTrail events).
  • EventBridge routes events to targets; it does not directly stop/start instances without an action-performing target (for example, Lambda or SSM Automation) and appropriate IAM permissions.

Key takeaway: configure the right event pattern and choose an operational target that can perform the remediation action.


Question 15

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

Which statement best defines the high-level way the Amazon CloudWatch agent is configured for ECS/EKS to collect container logs and metrics?

  • A. It is deployed as a container on each worker node (for example, an EKS DaemonSet or an ECS daemon) with a CloudWatch agent configuration and IAM permissions to publish container metrics and ship logs to CloudWatch.
  • B. It is installed only on the EKS control plane and uses CloudTrail events to reconstruct application container logs and CPU/memory metrics.
  • C. It is configured by creating a CloudWatch Logs subscription filter that converts log events into container-level CPU and memory metrics.
  • D. It is enabled by turning on CloudWatch in the cluster settings, and ECS/EKS automatically collects all container logs and metrics without deploying anything.

Best answer: A

Explanation: In ECS and EKS, the CloudWatch agent is commonly run as a container on each node so it can read host/container telemetry locally and publish it to CloudWatch. You provide an agent configuration (for example, for Container Insights) and the required IAM permissions so the agent can write metrics and logs to CloudWatch.

The CloudWatch agent is software that you deploy and configure to collect metrics and logs from the compute environment and publish them to CloudWatch. In container orchestrators, the typical pattern is to run the agent on every worker node (for example, as a DaemonSet in EKS or as a per-instance daemon/agent task in ECS on EC2) so it can collect node and container telemetry locally. You supply the CloudWatch agent configuration (what metrics/log files to collect, and where to send them) and ensure the agent has IAM permissions to call CloudWatch and CloudWatch Logs APIs. This is different from CloudTrail (API auditing) and from CloudWatch Logs subscription filters (log streaming/processing), which do not replace the need to deploy the agent for metrics collection.


Question 16

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

You are tuning Amazon CloudWatch alarm notifications for a service to reduce alert fatigue while still reflecting dependent conditions across multiple metrics.

Select TWO true statements about CloudWatch composite alarms.

  • A. A composite alarm can reference alarms from multiple AWS Regions in a single rule.
  • B. A composite alarm uses an AlarmRule with AND/OR over other alarm states.
  • C. A composite alarm evaluates the underlying metrics directly and does not require other alarms.
  • D. Missing-data behavior is configured on the composite alarm with treatMissingData.
  • E. Disable actions on underlying metric alarms and attach notifications only to the composite alarm.
  • F. A composite alarm always ignores INSUFFICIENT_DATA from the alarms it references.

Best answers: B, E

Explanation: CloudWatch composite alarms don’t watch metrics directly; they combine the states of other alarms by using an AlarmRule expression. This lets you represent dependent conditions (for example, multiple alarms must be ALARM) and reduce alert fatigue by notifying only when the composite’s rule evaluates to ALARM.

The core concept is that a composite alarm is a “state aggregator”: it changes state based on an AlarmRule that references other CloudWatch alarms (typically metric alarms) and uses Boolean logic such as AND, OR, and NOT. This is useful operationally when you want one actionable page that represents a dependent condition across multiple metrics, instead of multiple redundant notifications.

A common pattern is:

  • Create individual metric alarms for each signal you care about.
  • Disable actions on those individual alarms.
  • Attach notification/remediation actions to the composite alarm so only the combined condition triggers.

Key takeaway: composite alarms evaluate other alarms’ states; metric evaluation and missing-data treatment are handled by the underlying metric alarms.


Question 17

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

When creating Amazon CloudWatch dashboards to summarize key service metrics and alarm states for an application fleet, which TWO statements are INCORRECT?

  • A. Use an Alarm widget to display the current state of key CloudWatch alarms.
  • B. A single dashboard can include widgets that show metrics from multiple AWS Regions.
  • C. A dashboard automatically creates a CloudWatch alarm for every metric that is added to the dashboard.
  • D. Use metric math or a SEARCH expression in a widget to aggregate fleet-level KPIs.
  • E. Dashboards can show alarm states without creating alarms by defining thresholds only in the widget.
  • F. Text widgets can be used to include operator notes and runbook pointers on the dashboard.

Best answers: C, E

Explanation: CloudWatch dashboards visualize data but do not generate alarm evaluations on their own. To summarize alarm states, you must create CloudWatch alarms and then add an Alarm widget to display their current state alongside key metrics (including aggregated fleet KPIs).

The core concept is that CloudWatch dashboards are for visualization and operational summarization, while CloudWatch alarms are separate resources that evaluate metrics and produce an alarm state.

In practice, fleet dashboards commonly:

  • Use metric widgets (including metric math and SEARCH) to roll up KPIs across many resources.
  • Use Alarm widgets to list the current state of existing alarms (OK/ALARM/INSUFFICIENT_DATA).
  • Combine widgets from multiple Regions on one dashboard, and add text widgets for operational context.

A dashboard widget cannot replace an alarm by “defining a threshold” in the dashboard, and adding a metric to a dashboard does not create an alarm for it.


Question 18

Topic: Content Domain 3: Deployment, Provisioning, and Automation

A company manages 300 Amazon EC2 Linux instances across multiple Auto Scaling groups. The instances are already managed by AWS Systems Manager (SSM) and are tagged with Baseline=Prod.

Operations must ensure a configuration baseline: the Amazon CloudWatch agent package must be installed, a standard agent config file must be present, and the agent service must be running. Today, the team uses an ad hoc SSM Run Command “once after launch” plus occasional manual re-runs, and drift keeps reappearing after instance replacements and human changes.

Which change is the best way to reduce operational effort and continuously remediate configuration drift without rebuilding AMIs?

  • A. Bake the CloudWatch agent and configuration into a new golden AMI and update all Auto Scaling groups to use it
  • B. Create an SSM State Manager association that targets Baseline=Prod and periodically enforces CloudWatch agent installation, config, and service state
  • C. Add a cron job on each instance to reinstall and restart the CloudWatch agent every 30 minutes
  • D. Add an Auto Scaling lifecycle hook that runs an SSM Run Command to install and start the CloudWatch agent at instance launch

Best answer: B

Explanation: SSM State Manager is purpose-built to define and enforce configuration baselines over time. By using an association that targets instances by tag and runs on a schedule, it automatically detects and remediates drift and applies the baseline to newly launched instances. The tradeoff is additional scheduled association executions (minor overhead) in exchange for consistent, auditable enforcement.

The core need is continuous baseline enforcement, not a one-time setup step. SSM State Manager uses associations to apply a desired configuration (packages, files/scripts, and service state) to a dynamic set of managed instances (for example, by tag) on a schedule, and it provides execution history and compliance visibility.

A practical approach is:

  • Target Baseline=Prod instances with an association
  • Use a managed document (for example, package installation) plus a document/script to apply the agent config and ensure the service is running
  • Run on a periodic schedule so drift is corrected automatically and new instances converge without manual intervention

Launch-time hooks, AMI baking, and cron can help initially, but they do not provide the same ongoing drift remediation and centralized auditing that State Manager provides.


Question 19

Topic: Content Domain 4: Security and Compliance

A security team wants to ensure that no Amazon EC2 security group allows inbound SSH (port 22) from 0.0.0.0/0. The team uses AWS Config to detect noncompliant security groups and automatically trigger remediation workflows with an audit trail.

Which action should the team NOT take?

  • A. Use EventBridge to trigger an Automation runbook on noncompliance
  • B. Auto-remediate by adding 0.0.0.0/0 ingress on port 22
  • C. Deploy the restricted-ssh AWS Config managed rule
  • D. Use a Config remediation action to run SSM Automation

Best answer: B

Explanation: AWS Config rules should detect noncompliant configurations and trigger remediation that reduces risk. A remediation workflow for unrestricted SSH must remove or restrict the offending rule, using an auditable mechanism such as Systems Manager Automation. Adding an internet-open SSH rule is the opposite of remediation and increases exposure.

The core pattern is: AWS Config evaluates resources against a rule, and a remediation workflow responds to noncompliance to bring the resource back into the desired (secure) state. For security groups, compliant remediation typically revokes overly permissive ingress or restricts it to approved CIDRs. AWS Config can trigger remediation directly (remediation actions) or indirectly by emitting compliance change events that start an SSM Automation runbook, both of which are auditable through AWS API activity logs.

A workflow that intentionally opens SSH to 0.0.0.0/0 is an operations anti-pattern because it expands the attack surface and undermines the control Config is meant to enforce.


Question 20

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

A production web service runs on an EC2 Auto Scaling group (6 instances) behind an ALB. The service must remain available during changes and must keep at least 90% of desired capacity in service. The team has a 1-hour weekly change window.

AWS Compute Optimizer shows this for the Auto Scaling group:

1Current: m5.xlarge
2Avg CPU: 12% Avg mem: 28%
3Recommendation: m6i.large
4Estimated monthly savings: 38%
5Performance risk: Very low

Which change best reduces cost while managing operational risk and the change-window constraint?

  • A. Stop all instances, change the instance type to m6i.large, and restart them during the change window
  • B. Update the launch template to m6i.large and run an Auto Scaling instance refresh with a minimum healthy percentage of 90% during the change window
  • C. Change the Auto Scaling group to t3.large because it is cheaper than the recommendation
  • D. Purchase a 1-year Reserved Instance for m5.xlarge to reduce cost without changing the fleet

Best answer: B

Explanation: Compute Optimizer indicates the fleet is overprovisioned and that moving to m6i.large has very low performance risk with significant savings. Using an Auto Scaling instance refresh applies the recommendation gradually while enforcing the 90% healthy-capacity constraint. This approach fits the 1-hour change window and limits blast radius.

The core concept is to implement Compute Optimizer right-sizing recommendations in a way that preserves availability. For an Auto Scaling group, the safest operational pattern is to update the launch template (or launch configuration) to the recommended instance type and then use an instance refresh (or controlled rolling update) with a minimum healthy percentage that matches the business constraint.

A practical sequence is:

  • Update the launch template version to m6i.large.
  • Start an instance refresh with minimum healthy set to 90%.
  • Monitor ALB target health and key CloudWatch alarms; pause/roll back if errors increase.

This achieves the savings projected by Compute Optimizer while minimizing downtime risk compared to a stop/start approach or deviating to an unvalidated instance family.


Question 21

Topic: Content Domain 5: Networking and Content Delivery

A company uses an Amazon Route 53 private hosted zone api.corp.local that points to EC2 instances by private IP in a VPC. There is a warm-standby set of instances in another AWS Region. The instances are not behind a load balancer and are not reachable from the internet.

The operations team must implement DNS failover so that Route 53 returns the standby IPs only when the primary API is unhealthy.

Which solution meets this requirement?

  • A. Create an HTTPS Route 53 health check to the private IPs
  • B. Use geolocation routing and enable Evaluate Target Health
  • C. Use a CloudWatch-alarm Route 53 health check with failover records
  • D. Use latency-based routing between the two sets of IPs

Best answer: C

Explanation: Route 53 failover routing returns the primary record only while its associated health check is healthy; otherwise it answers with the secondary record. Because the API endpoints are private IPs, Route 53’s external health checkers cannot probe them directly. A health check that references a CloudWatch alarm lets an internal probe publish health and still control Route 53 failover.

The deciding factor is that the endpoints are reachable only inside the VPC. Standard Route 53 HTTP/HTTPS/TCP health checks are performed by AWS health checkers on the public internet, so they cannot directly check private IPs in a private hosted zone.

To make DNS failover work, you can:

  • Run an internal probe (for example, a Lambda function in the VPC or an EC2 script) that checks the API and publishes a custom CloudWatch metric.
  • Create a CloudWatch alarm on that metric.
  • Create a Route 53 health check that uses the CloudWatch alarm.
  • Configure Route 53 failover records (primary and secondary) for api.corp.local, associating the health check with the primary record.

This way, the health check state controls whether Route 53 answers with the primary or secondary IPs, without requiring public reachability.


Question 22

Topic: Content Domain 5: Networking and Content Delivery

Select THREE statements that are INCORRECT about Amazon Route 53 routing policies.

  • A. Simple routing returns all record values in the DNS response.
  • B. Latency routing answers with the lowest-latency Region for the resolver.
  • C. Failover routing always requires a Route 53 health check.
  • D. Weighted routing with health checks can do active-active failover.
  • E. Latency routing cannot be used with alias records.
  • F. Weighted routing shifts traffic by configuring record weights.

Best answers: A, C, E

Explanation: Simple routing does not return all values in one response, even if you configure multiple values. Failover routing does not universally require a separate Route 53 health check because alias records can use Evaluate Target Health. Latency routing is compatible with alias records for supported AWS endpoints.

Route 53 routing policies control which record value is returned in response to a DNS query.

Simple routing can be configured with multiple values, but Route 53 generally returns a single value per query (often treated like basic round robin), not every value at once. Failover routing uses primary/secondary records and can determine health either from a Route 53 health check (commonly for non-alias targets) or from the AWS target’s health when using an alias with Evaluate Target Health. Latency routing is available for both alias and non-alias records (for supported alias targets) and returns the record associated with the lowest-latency AWS Region for the DNS resolver.

The key operational takeaway is to choose the routing policy based on the control you need (distribution, lowest latency, or automated failover) and how health is evaluated for the target type.


Question 23

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

In Amazon CloudWatch metric alarms, what does the Evaluation periods setting control?

  • A. The percentage of samples that must breach the threshold within a single period for the alarm to trigger
  • B. How long CloudWatch waits after an alarm action before it can run the action again
  • C. How many consecutive metric periods are evaluated against the threshold before the alarm changes state
  • D. The duration of each metric data aggregation period that the alarm evaluates

Best answer: C

Explanation: Evaluation periods defines the number of back-to-back periods CloudWatch considers when comparing a metric to an alarm threshold. Increasing it generally reduces alert noise by requiring the unhealthy condition to persist longer before actions (such as an SNS notification or Auto Scaling action) occur.

A CloudWatch metric alarm evaluates a metric as a series of fixed-duration “periods” (for example, 1 minute each) and compares each period’s statistic (Average, Sum, etc.) to the threshold. Evaluation periods is the number of those consecutive periods CloudWatch looks at when deciding whether to change the alarm state (OK, ALARM, INSUFFICIENT_DATA).

For example, with a 1-minute period and 5 evaluation periods, CloudWatch evaluates the last 5 one-minute datapoints against the threshold. Separately, Datapoints to alarm controls how many of those evaluation periods must be breaching to trigger ALARM (useful for allowing occasional spikes).


Question 24

Topic: Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

An AWS Systems Manager Automation execution is failing intermittently with errors such as AccessDenied during role assumption and step failures when targeting EC2 instances.

When troubleshooting Automation execution failures, which TWO statements are false or unsafe? (Select TWO.)

  • A. Add missing tags, then rerun the same execution to re-evaluate steps.
  • B. Check CloudTrail for PassRole/execution-denied events.
  • C. Grant the operator sts:AssumeRole to fix AssumeRole errors.
  • D. Validate document parameters match required names, types, and values.
  • E. Confirm target EC2 instances are SSM managed and online.
  • F. Ensure the Automation assumeRole trust allows ssm.amazonaws.com.

Best answers: A, C

Explanation: Systems Manager Automation failures commonly come from role assumption problems, bad/invalid parameters, or unmet resource preconditions (for example, instances not being managed by SSM). The two unsafe statements misidentify who must be allowed to assume the role and incorrectly imply an existing execution can be re-run to pick up environment changes.

To troubleshoot Automation execution failures, separate permission failures from parameter and resource readiness issues. With Automation, the service assumes the specified assumeRole, so the role’s trust policy must allow ssm.amazonaws.com, and the user starting the execution commonly needs iam:PassRole for that role. Next, validate that all document parameters match the document schema and reference real resources in the correct Region/account. Finally, for steps that use Run Command, the target instances must be managed instances (SSM Agent running, instance profile permissions, and network reachability to SSM endpoints).

  • Inspect the Automation execution step that failed and its error message
  • Verify assumeRole trust and iam:PassRole/CloudTrail denials
  • Validate parameters and existence/state of referenced resources
  • Confirm instance managed-state and connectivity for instance-targeted steps

The key is to fix the role trust/PassRole chain and the resource preconditions rather than changing unrelated operator permissions or expecting an old execution to re-run.

Revised on Tuesday, April 14, 2026