Free AWS SAA-C03 Full-Length Practice Exam: 65 Questions

Try 65 free AWS SAA-C03 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length AWS SAA-C03 practice exam includes 65 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the AWS SAA-C03 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AWS SAA-C03 on Web View full AWS SAA-C03 practice page

Exam snapshot

  • Exam route: AWS SAA-C03
  • Practice-set question count: 65
  • Time limit: 130 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Design Secure Architectures30%
Design Resilient Architectures26%
Design High-Performing Architectures24%
Design Cost-Optimized Architectures20%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Design Cost-Optimized Architectures

A financial application stores transaction records in Amazon RDS and archives them to Amazon S3. Regulations require each record be kept for 7 years: 2 years online, 5 years archived, then deleted. The company wants to minimize storage cost. Which THREE approaches should the solutions architect AVOID? (Select THREE.)

Options:

  • A. Export records from RDS to S3 Standard but do not configure any S3 lifecycle policies, keeping all exported data in S3 Standard indefinitely.

  • B. Configure S3 lifecycle rules so that exported records transition to S3 Glacier Deep Archive after 30 days and are permanently deleted after 3 years, while deleting the corresponding RDS rows after 2 years.

  • C. Use AWS Backup with short RDS snapshot retention (for example, 30 days) for operational recovery, and rely on S3 exports with lifecycle policies that retain records for 7 years in progressively colder tiers before deletion.

  • D. Move all transaction data, including records less than 2 years old, from RDS directly into S3 Glacier Deep Archive to minimize storage costs, and query it from there when needed.

  • E. Use scheduled jobs to export RDS records older than 2 years to an S3 bucket with lifecycle policies that transition objects to a colder tier over time and delete them when they reach 7 years of age, while purging the corresponding rows from RDS.

  • F. Export records when they reach 2 years of age from RDS to S3 Standard-IA, then use an S3 lifecycle policy to transition them to S3 Glacier Deep Archive at 3 years of age and permanently delete them at 7 years of age.

Correct answers: A, B and D

Explanation: The scenario requires a clear, cost-optimized retention strategy: keep records online in Amazon RDS for 2 years, then store them in cheaper archival storage for another 5 years, and finally delete them at 7 years. The organization also wants to avoid holding historical data longer than necessary, which would increase storage costs and complicate compliance.

Correct designs therefore (1) move data off RDS once it is no longer required for online access, (2) place it into appropriate, lower-cost S3 storage classes, (3) use S3 lifecycle policies to transition it to even colder archival tiers over time, and (4) expire the data exactly when the 7-year requirement is met.

The approaches to avoid either keep data in expensive storage indefinitely, delete data too early (breaking the 7-year requirement), or put active data into archival tiers where it cannot be retrieved quickly enough to satisfy the need for 2 years of online access.


Question 2

Topic: Design Resilient Architectures

Which THREE statements about architecting event-driven solutions on AWS using SNS, SQS, EventBridge, and Kinesis are true?

(Select THREE.)

Options:

  • A. Amazon EventBridge supports content-based filtering on events, so rules can deliver only events that match specific patterns to a given target.

  • B. Publishing events to an Amazon SNS topic that fans out to multiple Amazon SQS queues lets each consumer process messages at its own rate and fail independently, improving loose coupling.

  • C. With Amazon SQS, producers must know the number of consumer applications so they can configure the queue with the correct number of subscribers before sending messages.

  • D. Amazon Kinesis Data Streams immediately deletes a record after the first successful read by any consumer, which prevents the use of the same event by multiple independent consumer applications.

  • E. Synchronous REST calls between microservices through an Application Load Balancer are the preferred pattern for loose coupling in event-driven architectures because failures are surfaced immediately to callers.

  • F. Amazon Kinesis Data Streams is well suited for event streams that may need to be replayed, because consumers can read from a chosen sequence number or timestamp within the retention window.

Correct answers: A, B and F

Explanation: Event-driven architectures on AWS aim to decouple producers and consumers using messaging and streaming services. Amazon SNS and SQS are commonly used together for fanout and buffering. Amazon EventBridge provides event buses with content-based routing and integration with many AWS services and SaaS providers. Amazon Kinesis Data Streams is designed for ordered, replayable event streams with multiple consumers.

The true statements highlight how these services support loose coupling, independent scaling, and replay. The false statements describe synchronous dependencies or incorrect behaviors that would reduce resilience or flexibility in an event-driven design.


Question 3

Topic: Design Cost-Optimized Architectures

A web application currently transfers 4TB of optional product video preview data to users each month. You will apply throttling so this non-critical traffic is reduced to 25% of its current volume. What will be the new monthly data transfer for these previews, in TB? (Round to the nearest 0.1TB, if needed.)

Options:

  • A. 3.0TB per month

  • B. 1.0TB per month

  • C. 0.3TB per month

  • D. 2.0TB per month

Best answer: B

Explanation: The scenario describes non-critical video preview traffic that currently transfers 4TB per month. To lower data transfer costs, you plan to throttle this traffic so that it operates at 25% of its current volume. In other words, only one-quarter of the existing traffic will remain.

To find the new transfer volume, multiply the original 4TB by the remaining fraction (25% = 0.25). This gives 1TB per month. Throttling non-critical traffic like optional previews is a cost-optimization technique that reduces data transfer charges without affecting core application functionality.

By reducing only non-essential traffic, you preserve user experience for critical features while cutting recurring network egress costs, aligning with Well-Architected cost optimization best practices.


Question 4

Topic: Design Resilient Architectures

Which TWO of the following statements about using Application Load Balancers (ALBs) and Network Load Balancers (NLBs) for high availability are INCORRECT? (Select TWO.)

Options:

  • A. Health checks for an Application Load Balancer are configured on the load balancer listener; all target groups behind that listener must share the same health check path and port.

  • B. For high availability, an Application Load Balancer must be associated with subnets in at least two Availability Zones; targets in those AZs receive traffic only if they pass health checks.

  • C. If all targets in one Availability Zone fail health checks but healthy targets exist in other enabled AZs, the ALB will automatically stop sending requests to the unhealthy AZ.

  • D. To achieve multi-AZ resiliency with an Application Load Balancer, it is sufficient to launch instances in multiple Availability Zones; the load balancer does not need subnets in every AZ where targets run.

  • E. A Network Load Balancer operates at Layer 4 and is commonly used to load balance TCP or UDP traffic while preserving the client source IP address.

Correct answers: A and D

Explanation: This question tests understanding of how Application Load Balancers (ALBs) and Network Load Balancers (NLBs) achieve high availability, especially with respect to Availability Zones (AZs), health checks, and where key settings are configured.

ALBs distribute incoming traffic across multiple targets (such as EC2 instances or ECS tasks) in one or more AZs. For true high availability, the ALB must be deployed (that is, associated with subnets) in at least two AZs. Within each enabled AZ, it only routes requests to targets that pass health checks configured at the target group level.

NLBs operate at Layer 4 (TCP/UDP) and are commonly used when you need ultra-low latency, very high throughput, or to preserve the client source IP. They also support multi-AZ configurations and health checks on targets.

The incorrect statements in this question either misplace the health check configuration (listener vs. target group) or incorrectly imply that the ALB does not need subnets in each AZ where you want to load balance traffic. Both misunderstandings can lead to designs that are not truly highly available or are harder to operate correctly.


Question 5

Topic: Design High-Performing Architectures

A company ingests time-series sensor metrics from thousands of IoT devices into an Amazon DynamoDB table for low-latency key-value lookups. At peak, devices send 900 metrics per second. Each metric is stored as a 1 KB item. Each write capacity unit (WCU) supports 1 write/second for items up to 1 KB. What is the minimum number of WCUs the company should provision to sustain peak throughput?

Options:

  • A. 90 WCUs

  • B. 9,000 WCUs

  • C. 900 WCUs

  • D. 450 WCUs

Best answer: C

Explanation: This workload is a time-series, key-value access pattern that is well suited to Amazon DynamoDB as a managed NoSQL database. The question focuses on sizing write throughput using provisioned write capacity units (WCUs).

Each WCU in DynamoDB supports 1 write/second for items up to 1 KB in size. The workload requires 900 writes/second of 1 KB items.

Variables used:

  • \(R = 900\) writes/second (peak write rate)
  • \(C_{\text{per}} = 1\) write/second supported per WCU for 1 KB items

Calculation (one step):

\[ C_{\text{required}} = \frac{R}{C_{\text{per}}} = \frac{900}{1} = 900\,\text{WCUs} \]

Therefore, provisioning 900 WCUs is the minimum configuration that can sustain 900 writes/second of 1 KB items without throttling, while aligning with a cost-conscious, high-performing NoSQL design.


Question 6

Topic: Design High-Performing Architectures

Which TWO of the following statements about using Amazon CloudWatch metrics and business KPIs as triggers for Auto Scaling are INCORRECT and should be avoided? (Select TWO.)

Options:

  • A. For an SQS-based worker fleet, using ApproximateNumberOfMessagesVisible per instance as a scaling metric can help keep queue backlog and processing latency under control.

  • B. Scaling an Auto Scaling group directly on the number of HTTP 4XX client errors from an Application Load Balancer is a recommended primary method to handle sudden traffic spikes.

  • C. Using an Application Load Balancer TargetResponseTime metric to scale an Auto Scaling group can help keep end-user latency near a desired target.

  • D. When scaling based on CPUUtilization, a target tracking policy that maintains average CPU around 50–60% typically provides better resource utilization than attempting to keep CPU close to 0%.

  • E. An Auto Scaling group can use any internal business KPI stored only in your application database as a scaling metric without publishing it to CloudWatch.

Correct answers: B and E

Explanation: Auto Scaling policies should be driven by metrics that correlate well with capacity needs, such as CPU utilization, request latency, or queue backlog, and those metrics must be available in Amazon CloudWatch. Business KPIs can be used, but only after they are exposed as CloudWatch metrics. Using metrics that do not reflect server load (such as client-side error counts) leads to unstable or ineffective scaling behavior.

In this question, the incorrect statements either propose a poor metric for capacity (client 4XX errors) or incorrectly claim that Auto Scaling can use metrics that are not in CloudWatch. The correct statements reflect common and recommended patterns: scaling on latency from an ALB, scaling on SQS queue depth per worker, and targeting a moderate CPU utilization range for efficiency and responsiveness.


Question 7

Topic: Design High-Performing Architectures

A company uses Amazon QuickSight SPICE to visualize application error rates from Amazon CloudWatch and business KPIs from Amazon Redshift in a single dashboard. During incidents, operators report that the error-rate visuals never show data from the last 2 hours, although the CloudWatch console does. The SPICE dataset for CloudWatch metrics refreshes once per day at midnight. What should a solutions architect do to fix this?

Options:

  • A. Enable Amazon CloudFront caching on the QuickSight dashboard URL to reduce page load times for operators during incidents.

  • B. Replace the QuickSight error-rate visuals with a native Amazon CloudWatch dashboard and have operators use two separate dashboards.

  • C. Reconfigure the QuickSight SPICE dataset for CloudWatch metrics to refresh multiple times per hour instead of once per day.

  • D. Increase the Amazon CloudWatch metric retention period from 15 days to 30 days so that more historical data is available to QuickSight.

Best answer: C

Explanation: Symptom: Operators see up-to-date error metrics in the Amazon CloudWatch console, but the Amazon QuickSight dashboard never shows data from the last 2 hours. This indicates that CloudWatch is receiving current metrics, but QuickSight is visualizing an outdated snapshot.

Root cause: The QuickSight dataset that imports CloudWatch metrics is stored in SPICE and configured to refresh only once per day at midnight. SPICE serves cached data between refreshes, so any metrics generated after the last refresh are invisible on the dashboard until the next daily import.

Fix: The solutions architect should increase the SPICE dataset refresh frequency for the CloudWatch metrics (for example, multiple times per hour). More frequent scheduled refreshes keep the SPICE cache close to real time, allowing operators to see near-current error rates during incidents while maintaining the single QuickSight dashboard that combines operational and business KPIs.


Question 8

Topic: Design Cost-Optimized Architectures

A company is redesigning its storage on AWS to cut costs while maintaining required performance for EC2, EFS, and S3 workloads. Operators must access the latest 30 days of logs within minutes. Which approaches should it AVOID because they over-provision performance or break access requirements? (Select THREE.)

Options:

  • A. Use gp3 EBS volumes for production EC2 web servers, initially provisioning 3,000 IOPS and adjusting later based on CloudWatch metrics.

  • B. Transition all log objects to S3 Glacier Deep Archive after 1 day to minimize storage cost, even though operations staff often need to read the latest 30 days of logs within minutes.

  • C. Store application logs and user reports with unpredictable access patterns in S3 Intelligent-Tiering to automatically move objects between frequent and infrequent access tiers.

  • D. For shared application data with occasional throughput spikes, use Amazon EFS General Purpose performance mode with bursting throughput mode.

  • E. For a lightly used development stack, use io2 EBS volumes provisioned at 40,000 IOPS to prevent any potential future bottlenecks.

  • F. Configure the same shared EFS file system with Provisioned Throughput of 1 GiB/s, even though average usage is around 5 MiB/s.

Correct answers: B, E and F

Explanation: This scenario focuses on right-sizing storage performance and throughput for EBS, EFS, and S3 to avoid unnecessary cost while still meeting access and performance requirements.

For EBS, gp3 volumes are generally the cost-optimized default. They provide a baseline of 3,000 IOPS and 125 MiB/s, and you can increase provisioned IOPS or throughput independently if metrics show a need. In contrast, using high-end io2 volumes with tens of thousands of IOPS for a lightly used environment is an obvious case of over-provisioning and overspending.

For EFS, the key trade-off is between bursting throughput mode (where throughput scales with stored data and bursts as needed) and Provisioned Throughput mode (where you pay for a fixed throughput rate regardless of data size). Provisioned Throughput is justified only when you need consistently high throughput that your storage size cannot otherwise provide. If average use is low, configuring an extremely high provisioned throughput wastes money.

For S3, cost-optimized designs must be consistent with required access times. S3 Intelligent-Tiering is ideal for data with unpredictable access patterns and still offers millisecond access in its frequent and infrequent tiers. Glacier Deep Archive is the lowest-cost archival tier but has long retrieval times, which conflicts with the need to access the latest 30 days of logs quickly.

The approaches to avoid are therefore the ones that (1) heavily over-provision performance for low-need workloads or (2) choose an archival storage tier that cannot meet the stated access latency requirements.


Question 9

Topic: Design Resilient Architectures

A company runs a web application in a single AWS Region using an Application Load Balancer (ALB) with Amazon EC2 instances and an Amazon S3 bucket for static assets. Users are globally distributed and report high latency, especially for dynamic API calls. The company wants to reduce latency for both static and dynamic content, offload as much traffic as possible from the ALB and EC2 instances, and protect the application from common web exploits. Personalized, authenticated responses must never be cached and shared across users, and the solution should minimize operational overhead by using managed services. Which architecture meets these requirements BEST?

Options:

  • A. Use AWS Global Accelerator in front of the ALB for all traffic, continue serving static content directly from the S3 bucket, and attach AWS WAF to the ALB.

  • B. Place an Amazon CloudFront distribution in front of the ALB, cache all responses from the ALB for several minutes without forwarding authentication headers or cookies, and attach AWS WAF to the CloudFront distribution.

  • C. Place an Amazon CloudFront distribution in front of both the S3 bucket and the ALB, cache only static paths, configure CloudFront to forward authentication headers and cookies and disable caching for dynamic API paths, and attach AWS WAF to the CloudFront distribution.

  • D. Place an Amazon CloudFront distribution only in front of the S3 bucket for static assets, leave users to access the ALB directly for APIs, and attach AWS WAF to the ALB.

Best answer: C

Explanation: Using CloudFront in front of both S3 and the ALB, caching only static content while forwarding authentication data and disabling caching for dynamic paths, and integrating AWS WAF at the edge satisfies the latency, origin offload, security, and operational requirements together.


Question 10

Topic: Design Secure Architectures

Which of the following statements correctly describe how to use AWS security services for specific security requirements? (Select THREE.)

Options:

  • A. Use Amazon GuardDuty to continuously analyze AWS CloudTrail, VPC flow logs, and DNS logs for suspicious or unexpected activity in your AWS accounts.

  • B. Use Amazon Macie to automatically discover and classify sensitive data, such as PII, stored in Amazon S3 buckets.

  • C. Use AWS Shield to block SQL injection and cross-site scripting (XSS) attacks on HTTP requests sent to an Application Load Balancer.

  • D. Use AWS Secrets Manager to securely store and automatically rotate database credentials used by an AWS Lambda function.

  • E. Use Amazon Cognito to continuously scan Amazon S3 buckets for personally identifiable information (PII) to support data privacy regulations.

  • F. Use AWS WAF to centrally store and automatically rotate API keys used by multiple microservices running on Amazon ECS.

Correct answers: A, B and D

Explanation: This question checks understanding of when to use core AWS security services for common requirements: threat detection, secret management, and data classification.

Amazon GuardDuty is used for intelligent threat detection and continuously analyzes several AWS data sources for suspicious behavior.

AWS Secrets Manager is designed for securely storing secrets and automating their rotation, removing the need to hard-code secrets in application code.

Amazon Macie focuses on data privacy and protection by discovering and classifying sensitive data in Amazon S3, such as PII.

Other services listed (AWS Shield, Amazon Cognito, AWS WAF) play important but different roles: DDoS protection, identity and access for end users, and application-layer firewalling respectively. They do not perform the functions described in the incorrect statements.


Question 11

Topic: Design Resilient Architectures

A company must replicate 2TB of daily backup data from its primary AWS Region to a disaster recovery Region within 24 hours over a dedicated link. What is the minimum sustained network throughput required for this replication?

Use 1TB = 1,000GB, 1GB = 10^9 bytes, 1 byte = 8 bits, 1 day = 86,400 seconds. Round your answer up to the nearest Mbps.

Options:

  • A. 186Mbps

  • B. 100Mbps

  • C. 1,000Mbps

  • D. 50Mbps

Best answer: A

Explanation: To ensure that daily backup data is fully replicated to the disaster recovery Region within 24 hours, you must calculate the minimum sustained throughput that can transfer the entire 2TB in one day.

First, convert 2TB per day into bits per second.

  • 2TB per day = 2,000GB per day (using 1TB = 1,000GB)
  • 1GB = 10^9 bytes
  • 1 byte = 8 bits
  • 1 day = 86,400 seconds

So the number of bits per day is:

\[ \text{bits per day} = 2{,}000 \times 10^{9} \, \text{bytes} \times 8 \, \frac{\text{bits}}{\text{byte}} = 16 \times 10^{12} \, \text{bits} \]

Convert bits per day to bits per second by dividing by the number of seconds in a day:

\[ \text{bits per second} = \frac{16 \times 10^{12}}{86{,}400} \approx 1.8519 \times 10^{8} \, \text{bits/s} \approx 185.19\,\text{Mbps} \]

Rounding up to the nearest Mbps, the minimum sustained throughput required is 186Mbps.

This ensures the replication completes within 24 hours, meeting the durability and availability objective for daily off-site backups without significantly overprovisioning bandwidth.


Question 12

Topic: Design Secure Architectures

A company uses multiple AWS accounts and services such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. Security engineers want a single place to centrally view and prioritize security findings across all accounts. Which approach is NOT appropriate to meet this requirement?

Options:

  • A. Create a custom AWS Lambda function that polls each security service’s API in every account and writes raw JSON events to a central Amazon S3 bucket for analysts to manually review.

  • B. Configure Amazon Inspector to automatically send its vulnerability findings to AWS Security Hub, where they are viewed and prioritized along with findings from other AWS security services.

  • C. Use Amazon Detective together with AWS Security Hub so that Security Hub surfaces GuardDuty and Inspector findings centrally, while Detective is used for deeper investigation of high-priority issues.

  • D. Enable AWS Security Hub in a delegated administrator account, integrate GuardDuty, Inspector, and Macie, and aggregate findings from all member accounts using AWS Organizations.

Best answer: A

Explanation: The requirement is for a single place to centrally view and prioritize security findings from services like GuardDuty, Inspector, and Macie across multiple accounts. At the SAA-C03 level, the key design choice is to use AWS’s managed security services that already aggregate and normalize findings, instead of building ad-hoc collection mechanisms.

AWS Security Hub is purpose-built to aggregate, normalize, and prioritize findings from multiple AWS security services and many third-party tools. It supports multi-account setups via AWS Organizations and provides dashboards, security standards, and automated insights. Amazon Inspector and Amazon GuardDuty act as findings sources feeding into Security Hub, while Amazon Detective helps investigate issues using correlated log data.

Manually polling APIs and dumping raw events into S3 does not provide central prioritization or an operationally efficient view, and it recreates undifferentiated tooling that Security Hub already provides. That approach is therefore the anti-pattern in this scenario.


Question 13

Topic: Design Secure Architectures

Which THREE of the following statements describe recommended patterns when designing flexible IAM authorization models using users, groups, roles, and policies? (Select THREE.)

Options:

  • A. Application code running on Amazon EC2 instances should typically use an IAM user with long-term access keys stored in configuration files, to avoid the overhead of using an IAM role.

  • B. Attaching permissions to IAM groups instead of directly to individual IAM users makes it easier to adjust access as people join, leave, or change job roles.

  • C. To minimize policy sprawl, it is best practice to create a single large inline policy attached directly to each IAM user, instead of using reusable managed policies.

  • D. Because IAM roles cannot be used with federated identities, external partners must always be created as IAM users in each AWS account they need to access.

  • E. Using cross-account IAM roles from a central security account allows security engineers to access workloads in other accounts without creating separate IAM users in each account.

  • F. Defining a highly privileged break-glass administrator role that requires MFA and has no long-term access keys is a recommended pattern for emergency access.

Correct answers: B, E and F

Explanation: Flexible IAM authorization models focus on separating identity from permissions, using groups and roles for manageability, and minimizing long-term credentials. IAM users generally represent individual people or machine identities only when necessary; permissions are attached to groups and roles using reusable managed policies. Cross-account roles and break-glass roles are key patterns for centralized security and emergency access.

The correct statements highlight group-based permission management, cross-account roles from a central security or governance account, and a controlled, MFA-protected break-glass administrator role without long-term access keys. The incorrect statements promote unsafe patterns such as long-term access keys for applications, user-specific inline policies, and the misconception that roles cannot be used with federation.


Question 14

Topic: Design Secure Architectures

An AWS organization has 1 management and 40 workload accounts. A central security team currently owns all IAM permissions in workload accounts, so developers must file tickets to create roles. The team wants to reduce operational load while enforcing least privilege and organization-wide guardrails. Which change is BEST?

Options:

  • A. Replace the existing SCPs with IAM permission boundaries and give workload-account admins full IAM administrative permissions in their accounts.

  • B. Keep the current SCPs and security-owned IAM permissions model, but add an internal ticket automation portal that files IAM change requests on behalf of developers.

  • C. Delegate IAM role and policy creation to workload-account IAM admins, but require all new roles to use a security-team–managed IAM permission boundary, keeping existing SCP guardrails unchanged.

  • D. Attach an SCP to the workload OU that allows all actions on all resources, and rely on IAM Access Analyzer findings for oversight by the security team.

Best answer: C

Explanation: The scenario describes a multi-account AWS Organizations setup where a central security team currently controls all IAM changes in each workload account. Developers must file tickets for any IAM role or policy changes, creating an operational bottleneck. At the same time, the organization requires strong, centralized guardrails and least privilege.

A well-architected pattern for delegated administration combines:

  • Service control policies (SCPs) at the organization or OU level to enforce global guardrails (for example, prohibiting disabling CloudTrail or leaving the organization).
  • IAM permission boundaries within each account to define a maximum permissions envelope for roles created by delegated admins.

In this pattern, the security team manages the SCPs and the permission boundary policy document. Workload-account IAM admins are granted permissions such as iam:CreateRole, iam:PutRolePolicy, etc., but only when they attach the approved permission boundary. This allows application teams to create and adjust IAM roles quickly while ensuring they cannot exceed the limits defined by security. Operational load on the central team drops, but security posture is preserved or improved.

The other choices either remove or weaken centralized guardrails, violating least privilege and separation of duties, or they do not materially reduce the operational burden on the security team and thus are not a true optimization of the baseline design.


Question 15

Topic: Design Secure Architectures

A company is designing a secure access pattern for an internet-facing web application. Users access the app through Amazon CloudFront, which forwards to an Application Load Balancer (ALB) in front of application servers that connect to a relational database. The application also calls internal microservices running on AWS.

Which of the following statements describe appropriate security responsibilities at each layer in this architecture? (Select THREE.)

Options:

  • A. Database access from the application layer should be restricted with security groups and database users/roles so that only the application servers, using least-privilege credentials, can connect to the database endpoint.

  • B. The application should always perform server-side input validation and output encoding, even when AWS WAF and client-side validation are configured.

  • C. Deploying AWS WAF on the CloudFront distribution helps block common web exploits (such as SQL injection) close to the edge, reducing malicious traffic that reaches the ALB and application servers.

  • D. Service-to-service calls between AWS microservices should rely primarily on IP allow lists in security groups instead of using IAM roles or signed tokens for authentication and authorization.

  • E. If the ALB terminates HTTPS and performs user authentication with OIDC, the application no longer needs to perform fine-grained authorization checks on user actions.

  • F. Storing database credentials in application environment variables without any encryption is sufficient as long as the application subnets are private and not directly internet-routable.

Correct answers: A, B and C

Explanation: In a CloudFront → ALB → application → database architecture with internal microservices, security should be applied in layers (defense in depth). Each layer has distinct responsibilities: edge protection, transport security, authentication, authorization, input validation, and data-layer controls. Relying on a single layer (for example, WAF only or network controls only) leaves gaps and violates Well-Architected security best practices.

Edge services such as CloudFront with AWS WAF can block many common web exploits and reduce load on downstream resources. However, they cannot replace proper application-layer controls like input validation and authorization logic. Similarly, network-level controls like security groups and private subnets limit reachability, but they do not provide identity or fine-grained permissioning; service-to-service calls should use IAM or token-based mechanisms. At the data layer, least-privilege access with both network and database-level restrictions is critical to protect stored data.

The correct statements together describe a layered security model: WAF at the edge, robust validation and authorization in the application, strong identity-based controls for microservices, and least-privilege access to the database. The incorrect statements either over-trust one layer (such as ALB authentication or private subnets) or recommend weak secret management practices.


Question 16

Topic: Design High-Performing Architectures

In an Amazon S3 data lake, a company uses AWS Lake Formation to define table- and column-level permissions in the AWS Glue Data Catalog and enforces least-privilege access for analytics users across accounts. Which AWS Well-Architected pillar does this action primarily support?

Options:

  • A. Reliability

  • B. Security

  • C. Performance Efficiency

  • D. Cost Optimization

Best answer: B

Explanation: The described action uses AWS Lake Formation to set table- and column-level permissions in the AWS Glue Data Catalog and applies least-privilege access for analytics users. This is fundamentally about who can see which data and under what conditions.

Those choices implement data governance, least-privilege access, and centralized authorization for an S3-based data lake. These are classic responsibilities of the Security pillar of the AWS Well-Architected Framework. Security in data lakes is largely about controlling access paths (for example, Athena, EMR, Redshift Spectrum, Glue ETL) using fine-grained permissions and consistent policies, which Lake Formation is designed to provide.

Other pillars such as Performance Efficiency, Reliability, and Cost Optimization can be influenced by how a data lake is designed, but the specific behavior of defining and enforcing column- and table-level permissions is clearly aligned to protecting data and managing access risk, not primarily to performance, resilience, or cost outcomes.


Question 17

Topic: Design Resilient Architectures

Partners upload hourly CSV files via AWS Transfer Family (SFTP) into S3. A Lambda function parses each file and writes all rows into Amazon RDS using credentials from AWS Secrets Manager. During uploads, Lambda often times out and RDS CPU reaches 100%. The business accepts a few minutes of delay. Which change will MOST effectively resolve the issue?

Options:

  • A. Modify the AWS Transfer Family server to limit concurrent SFTP uploads so that fewer Lambda functions run at the same time.

  • B. Have the Lambda function publish each parsed record to an Amazon SQS standard queue, and add consumers that read from the queue and batch writes to RDS.

  • C. Cache the database credentials in Lambda environment variables instead of calling AWS Secrets Manager for each file.

  • D. Increase the RDS instance to a larger size to handle the higher write throughput during file uploads.

Best answer: B

Explanation: Symptom: During hourly CSV uploads, many Lambda functions parse files and write directly into Amazon RDS. CloudWatch shows Lambda timeouts and RDS CPU reaching 100% during these spikes.

Root cause: The architecture couples file ingestion tightly to database writes. Each CSV can contain many rows, and multiple Lambdas may run concurrently, all issuing synchronous insert statements directly to RDS. This sudden, bursty write load overwhelms the database, causing high CPU utilization, longer query times, and eventually Lambda timeouts. AWS Transfer Family and AWS Secrets Manager are functioning correctly; the problem is the lack of an asynchronous buffer between ingestion and the database.

Fix: Introduce Amazon SQS as a decoupling layer. The Lambda function should parse each row and publish it (or small batches of rows) as messages to an SQS standard queue. A separate set of consumers (for example, another Lambda function or an ECS service with Auto Scaling) reads from the queue and performs batched writes to RDS at a controlled rate. This design smooths traffic spikes, reduces concurrent DB connections and CPU pressure, and still meets the requirement that processing can be delayed by a few minutes. It preserves both the existing SFTP integration through AWS Transfer Family and secure credential management via AWS Secrets Manager while improving resilience and scalability.


Question 18

Topic: Design Cost-Optimized Architectures

Which of the following statements about using NAT instances versus NAT gateways for outbound internet access from private subnets are INCORRECT from a cost-optimization and operations perspective? (Select THREE.)

Options:

  • A. NAT gateways are always the lowest-cost option at any traffic volume, because there is no hourly charge or data processing fee.

  • B. For very low, intermittent traffic, a small NAT instance can be cheaper than a NAT gateway, because you pay only EC2 instance pricing and standard data transfer charges.

  • C. Because NAT gateways run on dedicated EC2 instances that you manage, you can stop them when not in use to avoid hourly charges, whereas NAT instances continue to incur costs 24/7.

  • D. To achieve high availability with NAT instances, you typically need multiple instances in different Availability Zones and failover automation, which increases both cost and operational complexity.

  • E. NAT instances are a fully managed, highly available service across multiple Availability Zones by default, so they usually have lower operational cost than NAT gateways.

  • F. NAT gateways automatically scale up to handle bursts of traffic without you managing instance size or Auto Scaling groups, reducing operational overhead compared to NAT instances.

Correct answers: A, C and E

Explanation: NAT instances and NAT gateways both allow resources in private subnets to initiate outbound internet connections, but they differ significantly in cost model and operational burden.

A NAT instance is just an EC2 instance configured for NAT. You pay normal EC2 instance pricing plus any standard data transfer charges, with no additional per-GB NAT-specific fee. This can be cheaper at very low traffic levels, but you must handle sizing, patching, security hardening, monitoring, and high availability. To build HA, you typically deploy multiple instances in different Availability Zones and add failover logic, which raises both cost and complexity.

A NAT gateway is a managed service. It charges an hourly fee plus a per-GB data processing fee. In exchange, it provides built-in scaling within an AZ and significantly lower operational overhead. For many production workloads, especially with steady or moderate-to-high traffic and strict availability requirements, the managed, scalable nature of NAT gateways justifies the cost.

The incorrect statements either misrepresent which option is managed/HA, misstate the billing model, or confuse which resource you can start/stop to control cost.


Question 19

Topic: Design Cost-Optimized Architectures

A company runs large, long-lived, in-memory databases on Amazon EC2. The workload requires very high memory capacity per vCPU while keeping costs down by not overprovisioning compute. Which Amazon EC2 instance family is the most appropriate choice for this workload?

Options:

  • A. M7g (general purpose)

  • B. R6i (memory optimized)

  • C. C6i (compute optimized)

  • D. I4i (storage optimized)

Best answer: B

Explanation: Memory-optimized instance families, such as R6i, are specifically built for memory-bound workloads that need large memory footprints relative to CPU, like in-memory databases (for example, Redis or Memcached), in-memory analytics engines, and real-time big data processing. By choosing memory-optimized instances, you can provision high memory capacity without unnecessarily paying for extra vCPUs you do not need, improving cost efficiency compared to over-sized general purpose or compute-optimized instances. In contrast, compute-optimized, general purpose, and storage-optimized families each target different primary bottlenecks (CPU, balanced resources, or storage I/O), so they are less suitable when memory is clearly the dominant requirement.


Question 20

Topic: Design Resilient Architectures

Which THREE statements about AWS disaster recovery (DR) strategies and RPO/RTO trade-offs are correct? (Select THREE.)

Options:

  • A. Backup and restore is typically the lowest-cost DR strategy but also has the highest RPO and RTO compared to other common AWS DR patterns.

  • B. In a pilot light strategy, a minimal version of the critical application components runs continuously in the DR Region and is scaled up to full capacity during a disaster.

  • C. Recovery Point Objective (RPO) defines the maximum acceptable application downtime, while Recovery Time Objective (RTO) defines how much data loss in time is acceptable.

  • D. A warm standby strategy usually has a worse RTO than backup and restore because most components in the DR Region are completely turned off until a disaster occurs.

  • E. In a pilot light strategy, the DR Region must run a full-capacity duplicate of the production environment at all times to ensure instant failover.

  • F. A multi-site active-active strategy can achieve near-zero RPO and very low RTO, but it is usually the most complex and expensive DR option.

Correct answers: A, B and F

Explanation: AWS disaster recovery strategies trade off cost against recovery speed and amount of acceptable data loss. Backup and restore is simple and inexpensive but recovers slowly with more data loss. Pilot light and warm standby improve RPO and RTO by keeping more of the environment running in the DR Region. Multi-site active-active offers the fastest recovery and smallest data loss, at the highest cost and complexity. RPO and RTO are core metrics: RPO is about data loss in time, and RTO is about downtime duration.


Question 21

Topic: Design Secure Architectures

A company runs a public web application behind an Application Load Balancer using HTTPS. Compliance requires annual TLS certificate renewal and annual rotation of a customer managed KMS key, without downtime. Which approach should the solutions architect AVOID?

Options:

  • A. About 30 days before expiration, request a new ACM certificate, update the ALB listener to use the new certificate during a low-traffic period, and enable automatic annual rotation on the KMS key while clients always reference the key alias.

  • B. Use an external public CA integrated with AWS Certificate Manager to import a certificate, configure an automated process to renew and re-import the certificate before expiration, attach it to the ALB, and enable automatic annual rotation on the customer managed KMS key.

  • C. Install the TLS certificate directly on each EC2 instance and manually replace it once per year during a maintenance window, temporarily disabling HTTPS, and turn off automatic rotation on the KMS key, planning to rotate the key only after a security incident.

  • D. Attach an ACM-issued public certificate to the ALB using DNS validation so ACM automatically renews and deploys the certificate, and enable automatic annual rotation on the customer managed KMS key while the application uses the KMS key alias.

Best answer: C

Explanation: For compliant and highly available architectures, key and certificate rotation should be regular, automated, and non-disruptive.

For TLS, using AWS Certificate Manager (ACM) with an Application Load Balancer allows certificates to be renewed and deployed automatically without touching individual instances or interrupting traffic. For KMS, enabling automatic rotation on customer managed KMS keys and using key aliases in application code allows the key to rotate on a yearly schedule without changes or downtime in the application.

Any strategy that relies on manual, infrequent rotation and intentionally accepts downtime or non-HTTPS access contradicts both security and availability requirements and should be avoided.


Question 22

Topic: Design Secure Architectures

An application runs on Amazon EC2 instances in private subnets within a VPC. It reads data from Amazon DynamoDB. Security recently removed the route to the NAT gateway and blocked all outbound internet access. The application now times out when calling DynamoDB. What should a solutions architect do to restore access while still preventing internet egress?

Options:

  • A. Create a gateway VPC endpoint for DynamoDB and update the private subnet route tables to send DynamoDB traffic to the endpoint.

  • B. Recreate the NAT gateway and restrict outbound access to only DynamoDB public IP address ranges in security groups and network ACLs.

  • C. Create an interface VPC endpoint powered by AWS PrivateLink for DynamoDB and associate it with the private subnets.

  • D. Move the EC2 instances into public subnets with public IP addresses and restrict outbound access to DynamoDB using security groups.

Best answer: A

Explanation: Symptom: The application previously reached DynamoDB over the internet via a NAT gateway. After security removed the NAT route and blocked outbound internet access, calls to DynamoDB now time out from private subnets.

Root cause: The EC2 instances are attempting to reach the public DynamoDB endpoint, which requires internet egress. With the NAT gateway removed and no outbound route, there is no path from the private subnets to DynamoDB.

Fix: For services like DynamoDB and Amazon S3, AWS provides gateway VPC endpoints. A gateway endpoint is a route-target in your VPC route tables that directs traffic to the AWS service over the AWS network without traversing the public internet. By creating a DynamoDB gateway VPC endpoint and updating the route tables for the private subnets to point DynamoDB traffic at this endpoint, the instances regain access to DynamoDB while still having no general internet egress. This both restores application functionality and satisfies the compliance requirement to prevent outbound internet access.

This design contrasts with public endpoints (which require internet egress, typically via a NAT gateway) and interface endpoints powered by AWS PrivateLink (used for many other AWS services and custom NLB-backed services). For DynamoDB specifically, the correct choice is a gateway VPC endpoint.


Question 23

Topic: Design Resilient Architectures

A company runs a fleet of Linux web servers across multiple Availability Zones behind an Application Load Balancer. The application stores user-uploaded images using standard POSIX file APIs and requires shared directory access from all servers, with automatic storage growth to very large scale. Which storage option is most appropriate?

Options:

  • A. Attach a single large Amazon EBS volume to one EC2 instance and export it via NFS to the other servers.

  • B. Use an Amazon EFS file system mounted on all web servers.

  • C. Store the images as BLOBs in a Multi-AZ Amazon RDS for MySQL database.

  • D. Store the images in an Amazon S3 bucket and access them directly via the application SDK.

Best answer: B

Explanation: The key discriminating factor in this scenario is the need for shared POSIX-compliant file system access across many Linux EC2 instances in multiple Availability Zones, with automatic storage scaling to very large sizes.

Amazon EFS is a fully managed, elastic, NFS-based file system that provides standard file system semantics (directories, permissions, POSIX APIs). It can be mounted concurrently by hundreds or thousands of Linux EC2 instances across multiple AZs in a region. EFS automatically grows and shrinks as files are added or removed, so there is no need to pre-provision capacity.

In contrast, object storage such as Amazon S3 and block storage such as Amazon EBS do not natively provide shared POSIX file system semantics across instances. While they are excellent for other patterns, they fail the specific requirement here. Building your own NFS server on top of EBS is possible but introduces operational overhead, fixed capacity, and single-instance failure risk, making it less resilient and less aligned with AWS managed-service best practices.

Therefore, using an Amazon EFS file system mounted on all web servers best matches the stated workload characteristics and resilience goals.


Question 24

Topic: Design Secure Architectures

Which of the following statements about securing the AWS account root user and IAM users are INCORRECT or unsafe? (Select TWO.)

Options:

  • A. IAM users who require only programmatic access should be created without a console password to avoid unnecessary exposure of the AWS Management Console.

  • B. Keeping one active access key for the root user for emergency use is acceptable if the key is stored securely offline.

  • C. The root user should be protected with MFA and used only for tasks that explicitly require root-level permissions.

  • D. For applications running on Amazon EC2, a secure practice is to create IAM users with long-term access keys and share those keys across all instances in the Auto Scaling group.

  • E. Access keys for IAM users should be rotated regularly and deactivated or deleted when no longer needed.

Correct answers: B and D

Explanation: The question focuses on best practices for securing the AWS account root user and IAM users, especially around MFA and credential hygiene.

AWS strongly recommends minimizing use of the root user, enabling MFA on it, and deleting any root access keys. For IAM users, good credential hygiene includes using the least privilege principle, rotating access keys, and not providing unnecessary console access. For workloads running on AWS services such as EC2, roles with temporary security credentials are preferred over long-term access keys.

The unsafe statements are the ones that keep a root access key “for emergencies” and that suggest sharing long-term IAM user access keys across EC2 instances. Both patterns increase risk and violate modern AWS security best practices.


Question 25

Topic: Design Resilient Architectures

Which TWO of the following statements about AWS messaging and publish/subscribe patterns are INCORRECT? (Select TWO.)

Options:

  • A. Using an Amazon SNS topic with multiple Amazon SQS subscriptions implements a fanout pattern that decouples a single producer from multiple independent consumers.

  • B. Amazon SQS FIFO queues are designed to guarantee that messages are processed in the order they are sent (within a message group) and support exactly-once processing semantics.

  • C. Consumers of an Amazon SNS topic normally poll the SNS topic for messages, which allows each consumer to control its own polling rate and back-pressure.

  • D. An Amazon SQS standard queue provides at-least-once delivery and best-effort ordering, so consumers should be designed to be idempotent.

  • E. You can configure an Amazon SQS dead-letter queue (DLQ) so that messages that exceed a configured receive count are moved to the DLQ for offline inspection or reprocessing.

  • F. Using an Amazon SQS queue directly between a producer and a consumer tightly couples them, because both must be online at the same time for messages to flow.

Correct answers: C and F

Explanation: AWS messaging services such as Amazon SNS and Amazon SQS are commonly combined to build scalable, loosely coupled architectures. SNS offers a publish/subscribe model with fanout to multiple subscribers. SQS offers durable, decoupled queues that allow producers and consumers to operate independently and at different rates.

Standard queues provide at-least-once delivery and best-effort ordering, so applications must be resilient to duplicate and out-of-order messages. FIFO queues add ordering guarantees and exactly-once processing semantics for workloads that require strict sequence handling. Dead-letter queues are a key error-handling pattern to prevent problematic messages from blocking the main flow.

The incorrect statements in this question confuse push vs pull semantics for SNS and misunderstand how queues affect coupling between components. SNS does not rely on consumers polling SNS, and SQS queues are used precisely to remove the requirement that producers and consumers be online at the same time.


Questions 26-50

Question 26

Topic: Design Secure Architectures

An organization centralizes shared VPC subnets using AWS Resource Access Manager and restricts which AWS accounts can modify networking resources by applying tightly scoped service control policies in AWS Organizations. Which AWS Well-Architected pillar is most directly addressed by this design choice?

Options:

  • A. Security

  • B. Reliability

  • C. Cost Optimization

  • D. Performance Efficiency

Best answer: A

Explanation: The scenario describes two key design choices: using AWS Resource Access Manager (AWS RAM) to share network resources like VPC subnets across accounts, and applying service control policies (SCPs) in AWS Organizations to tightly control which accounts can modify those resources.

Both actions are fundamentally about governing access. AWS RAM allows secure, controlled sharing of resources without needing to duplicate them, reducing the attack surface and improving visibility. SCPs provide central, organization-wide guardrails that enforce least privilege by defining the maximum available permissions for accounts.

These behaviors align directly with the Security pillar of the AWS Well-Architected Framework, which emphasizes strong identity and access management, protecting resources at multiple layers, and using mechanisms like policies to enforce security boundaries across accounts.


Question 27

Topic: Design High-Performing Architectures

An online retail company runs its application in a single AWS Region. It needs automatic database failover if one Availability Zone fails while keeping write latency low and avoiding any cross-Region replication. Which database deployment option is MOST appropriate?

Options:

  • A. Use Amazon RDS for MySQL with a cross-Region read replica in a second Region.

  • B. Use a single-AZ Amazon RDS for MySQL instance with automated backups enabled.

  • C. Use Amazon RDS for MySQL in a Multi-AZ deployment within the same Region.

  • D. Use an Amazon Aurora MySQL Global Database with a primary cluster in one Region and a secondary cluster in another Region.

Best answer: C

Explanation: The key discriminating factor in this scenario is the need for automatic failover across Availability Zones within a single Region while explicitly avoiding any cross-Region replication. The workload is regional, and the business wants high availability for AZ failures without the added latency, complexity, and cost of multi-Region designs.

Amazon RDS Multi-AZ deployments are specifically designed for this situation. They synchronously replicate data to a standby instance in a different AZ within the same Region and provide automatic failover using a single database endpoint. Because replication stays inside the Region and between AZs, additional write latency is small compared to cross-Region replication, and there is no inter-Region network traffic.

Other options either lack automatic AZ-level failover or introduce cross-Region replication, failing the clearly stated requirement to avoid multi-Region designs while still achieving automatic failover for AZ outages.


Question 28

Topic: Design Cost-Optimized Architectures

Which of the following statements about designing cost-optimized data retention and archival policies on AWS are true? (Select TWO.)

Options:

  • A. Using Amazon S3 Lifecycle policies to transition older, rarely accessed objects to archival classes such as S3 Glacier Deep Archive and then permanently delete them when no longer required helps minimize long-term storage cost.

  • B. Enabling Time to Live (TTL) on DynamoDB tables that store expiring data (such as session or event data) automatically removes old items and prevents unbounded storage growth.

  • C. To control costs, it is best practice to retain Amazon RDS automated backups indefinitely so that any past database state can be restored on demand.

  • D. Because S3 Glacier retrievals incur additional charges, it is more cost-effective to keep all historical log files in S3 Standard for at least 10 years instead of expiring or archiving them.

  • E. S3 Intelligent-Tiering automatically moves any S3 Standard bucket to the lowest-cost Glacier storage class after a year, so separate lifecycle policies and deletion schedules are usually unnecessary.

Correct answers: A and B

Explanation: Cost-optimized data retention on AWS means keeping data only as long as it has business, legal, or compliance value, and using the lowest-cost storage tier that still meets access and durability requirements. AWS provides mechanisms like S3 Lifecycle policies and DynamoDB TTL to automate data movement and deletion according to retention rules.

For object storage, S3 Lifecycle policies can transition data between storage classes (for example, from S3 Standard to S3 Glacier Deep Archive) as it ages, and can eventually expire objects entirely. This prevents large, rarely used datasets from remaining in high-cost storage indefinitely.

For NoSQL workloads, DynamoDB TTL allows you to mark items with an expiration time. DynamoDB then automatically deletes expired items in the background, which is ideal for data such as sessions, events, and temporary records that lose value after a specific time. This avoids unbounded table growth and unnecessary ongoing storage charges.

By contrast, retaining everything forever “just in case”—such as keeping RDS backups indefinitely or never expiring S3 data—tends to increase cost without clear benefit and can even increase operational risk and complexity. Similarly, misunderstanding how features like S3 Intelligent-Tiering work can lead to incorrect assumptions that retention and archival happen automatically without explicit configuration.


Question 29

Topic: Design High-Performing Architectures

A company is designing a new three-tier web application in a single AWS Region that will be accessed by internet users. The on-premises network uses 10.0.0.0/16, and the company plans to connect it to AWS later using a VPN. Requirements are: VPC CIDR ranges must not overlap with on-premises and must support thousands of EC2 instances with room for future subnet tiers, the web tier must be internet-accessible while the application and database tiers are private and reachable only from within the VPC, and the design must support horizontally scaling the web and application tiers across at least two Availability Zones. Which VPC design meets these requirements?

Options:

  • A. Create a VPC with CIDR 10.1.0.0/16; in two AZs, create separate public web, private application, and private database subnets using /20 blocks per tier per AZ; attach an internet gateway with an Application Load Balancer in the public subnets; route public subnets to the internet gateway, route application subnets through a NAT gateway in each AZ, keep database subnets without internet routes, and place EC2 web/application instances and RDS in their respective subnets.

  • B. Create a VPC with CIDR 10.1.0.0/24; create one public subnet for the web tier and one private subnet for both application and database tiers in a single AZ; attach an internet gateway and a NAT gateway in that AZ, and use Auto Scaling only within that AZ for all EC2 instances.

  • C. Create a VPC with CIDR 10.0.0.0/16; in two AZs, create public subnets for both web and application tiers and a single private subnet for the database; attach an internet gateway, place all EC2 web and application instances in the public subnets with security groups restricting access, and route the private subnet through a NAT gateway.

  • D. Create a VPC with CIDR 10.1.0.0/16; in two AZs, create one large public and one large private subnet per AZ; place the Application Load Balancer, web, and application instances in the private subnets and expose them to users via a VPN connection, place the database in the same private subnets, and use a single NAT gateway in one AZ for internet access.

Best answer: A

Explanation: The best design for a scalable, multi-AZ, three-tier application that must later integrate with an on-premises 10.0.0.0/16 network is to choose a non-overlapping, sufficiently large VPC CIDR (such as 10.1.0.0/16) and implement distinct public, private application, and private database subnet tiers in at least two Availability Zones. Public subnets host the internet-facing Application Load Balancer and web tier with a route to an internet gateway, while private application and database subnets have no direct inbound internet access and instead use NAT gateways (for application outbound traffic) and local routing between tiers. Using /20-sized subnets per tier per AZ leaves address space for thousands of instances and future subnet tiers, while distributing each tier across two AZs supports horizontal scaling and improved availability without compromising security boundaries.


Question 30

Topic: Design Cost-Optimized Architectures

A company needs to migrate several petabytes of sensor data from a factory that has intermittent network connectivity to AWS. Instead of provisioning a long-term high-bandwidth network link, the company chooses to use AWS Snowball Edge devices for this one-time transfer to minimize ongoing network charges and deployment effort. According to the AWS Well-Architected Framework, which pillar does this decision BEST exemplify?

Options:

  • A. Performance Efficiency

  • B. Security

  • C. Reliability

  • D. Cost Optimization

Best answer: D

Explanation: The scenario describes a company that needs a one-time migration of several petabytes of data from a location with intermittent connectivity. Rather than investing in a long-term high-bandwidth network link, the company chooses AWS Snowball Edge devices to physically ship data to AWS.

This design choice matches the Cost Optimization pillar of the AWS Well-Architected Framework. Cost Optimization emphasizes avoiding unnecessary cost, matching capacity to actual need, and selecting the most economical option that still meets requirements. For a one-time, large data migration from a site with poor connectivity, a dedicated high-speed network link would likely be underutilized and expensive. Snowball Edge offers a lower total cost for that specific usage pattern, while also fitting the connectivity constraint.

Other pillars like Reliability, Performance Efficiency, and Security are important but are not the main driver in the decision as described. The key point is choosing a hybrid data transfer approach (Snowball Edge) that avoids ongoing network charges and overprovisioned infrastructure, directly illustrating cost-aware compute and data-movement design.


Question 31

Topic: Design High-Performing Architectures

An online retailer is building an S3 data lake. Analysts need serverless, ad hoc SQL queries on the data. Executives need interactive dashboards accessible in a browser without managing servers. Which AWS services should the company use? (Select TWO.)

Options:

  • A. Amazon Athena

  • B. Amazon Redshift

  • C. Amazon QuickSight

  • D. Amazon EMR

  • E. AWS Lake Formation

Correct answers: A and C

Explanation: The scenario describes two distinct requirements on an Amazon S3 data lake:

  • Analysts need serverless, ad hoc SQL directly on S3 data.
  • Executives need interactive dashboards in a browser without managing servers.

Amazon Athena is purpose-built for serverless, interactive SQL queries on S3 data, and Amazon QuickSight is designed for fully managed BI dashboards. Together they meet both requirements while minimizing infrastructure management.

Per option:

  • Amazon Athena — Provides serverless, pay-per-query SQL over data in S3, so analysts can run ad hoc queries without provisioning or managing any servers.
  • AWS Lake Formation — Focuses on building, securing, and governing a data lake. It manages metadata and permissions but does not itself provide SQL querying or dashboards.
  • Amazon QuickSight — A managed BI service for creating and sharing interactive dashboards via a web browser, with no servers for the customer to manage.
  • Amazon EMR — Suitable for big data processing (Spark, Hadoop, etc.) but requires cluster management, which violates the requirement for serverless analytics.
  • Amazon Redshift — A managed data warehouse that typically requires provisioning and tuning clusters; it is more heavyweight than necessary for simple S3-based, serverless analytics and visualization.

Question 32

Topic: Design Secure Architectures

Which statement correctly describes how AWS Backup centralizes backup and retention configuration for services such as Amazon EC2, Amazon EBS, Amazon RDS, and Amazon DynamoDB?

Options:

  • A. You must configure backup schedules separately in each AWS service; AWS Backup only aggregates backup status and reports into a single dashboard.

  • B. AWS Backup automatically backs up every supported resource in a Region using a single fixed daily schedule and 35-day retention that cannot be changed.

  • C. You create backup plans with scheduled rules and lifecycle policies, then assign supported resources or tags so the same backup and retention policy is applied across multiple services.

  • D. AWS Backup can schedule backups for multiple services, but retention must still be configured separately in each individual service console.

Best answer: C

Explanation: AWS Backup is a centralized, policy-based service for managing backups across many AWS services, including Amazon EC2, Amazon EBS, Amazon RDS, and Amazon DynamoDB. Instead of setting up independent backup schedules and retention rules in each service, you use AWS Backup to create backup plans that define when backups run and how long they are retained.

A backup plan contains one or more backup rules. Each rule specifies a schedule (for example, daily at a certain time), the backup vault to store recovery points, and lifecycle settings such as when to transition to cold storage and when to expire backups. You then assign resources (by resource ID or by tags) from supported services to the backup plan. AWS Backup automatically applies the same schedule and lifecycle policy to all those resources, giving you consistent, centralized control.

This policy-based approach is a key data protection control: it reduces configuration drift, simplifies compliance with organizational backup standards, and makes it easier to audit and manage backups across multiple services and accounts.


Question 33

Topic: Design Resilient Architectures

Which of the following statements about implementing immutable infrastructure on AWS are INCORRECT? (Select THREE.)

Options:

  • A. Blue/green and canary deployments are incompatible with immutable infrastructure because they require modifying the same servers repeatedly.

  • B. Rollback in an immutable pattern typically involves redeploying a previous known-good image or version, rather than reconfiguring existing instances in place.

  • C. In immutable architectures, each EC2 instance should be treated as a unique pet and manually repaired when it fails to preserve its configuration state.

  • D. Immutable infrastructure requires administrators to SSH into servers and patch them in place to apply security updates quickly.

  • E. Using versioned Amazon Machine Images (AMIs) or container images for each release is a common way to implement immutable infrastructure.

Correct answers: A, C and D

Explanation: Immutable infrastructure means you never change servers in place; instead, you build new ones from a known, versioned image and replace the old ones. This approach reduces configuration drift, speeds recovery, and makes rollbacks predictable.

On AWS, this often involves baking Amazon Machine Images (AMIs) or building container images for each application version. Deployments launch new instances or tasks from these images and then cut over traffic using load balancers or DNS. If something goes wrong, you roll back by redeploying a previous image instead of trying to repair or reconfigure existing resources.

Patterns such as blue/green and canary deployments fit naturally with immutability, because they route traffic between separate, freshly built environments rather than modifying the same set of servers repeatedly. Likewise, instances are treated as disposable: if one is unhealthy, you terminate it and let Auto Scaling or your orchestrator replace it from the latest image, instead of logging in to fix it manually.


Question 34

Topic: Design High-Performing Architectures

Which THREE statements about how AWS resource placement affects latency and data transfer costs are correct? (Select THREE.)

Options:

  • A. Within a Region, using public IP addresses between EC2 instances guarantees lower latency than private IP addresses because traffic stays on AWS’s global backbone.

  • B. Placing tightly coupled resources in different Regions generally increases latency and incurs inter-Region data transfer charges, even when using private connectivity options such as VPC peering.

  • C. Placing an EC2 instance and its Amazon RDS database in the same Availability Zone and using private IP addresses minimizes latency and avoids cross-AZ data transfer charges between them.

  • D. Distributing application servers across multiple Availability Zones in a Region improves availability but can introduce cross-AZ data transfer charges when traffic flows between AZs, such as through a load balancer or synchronous replication.

  • E. Placing tightly coupled compute and database tiers in different Regions is a common pattern to reduce latency for global users while keeping synchronization simple and low-cost.

  • F. Cross-AZ data transfer charges apply only when data leaves AWS through an internet gateway; traffic between AZs in the same Region is always free.

Correct answers: B, C and D

Explanation: Resource placement in AWS has a direct impact on both network latency and data transfer charges. Keeping tightly coupled components, such as application servers and databases, close together—often in the same Availability Zone and communicating over private IPs—minimizes network hops and avoids cross-AZ data transfer costs.

When you spread resources across multiple AZs in a Region, you improve availability and fault tolerance, but any traffic that crosses AZ boundaries can incur cross-AZ data transfer charges and may have slightly higher latency than same-AZ traffic. This is a trade-off between reliability and cost/performance.

Placing components in different Regions magnifies these effects: Regions are physically distant, so latency is significantly higher, and inter-Region data transfer is always chargeable, regardless of whether you use public endpoints or private options like VPC peering. For tightly coupled workloads that require low latency and frequent communication, single-Region (and often same-AZ) placement is usually best.

For high-performing architectures, it is crucial to understand these patterns so you can intentionally choose when to keep resources close together and when to distribute them for resilience or geographic reach, while also controlling data transfer spend.


Question 35

Topic: Design Resilient Architectures

A company runs a web application in two AWS Regions using Route 53 failover routing. The primary Region has an Application Load Balancer. Route 53 uses a health check directly against the ALB. The health check is configured with a 10-second interval and a failure threshold of 3. The DNS record has a TTL of 60 seconds. After the ALB in the primary Region becomes unhealthy, Route 53 should fail over traffic automatically to the secondary Region.

Assuming the worst case and ignoring any DNS resolver caching beyond the TTL, what is the maximum expected failover time in seconds? (Round to the nearest whole second.)

Options:

  • A. 180 seconds

  • B. 90 seconds

  • C. 60 seconds

  • D. 30 seconds

Best answer: B

Explanation: This scenario tests how Route 53 failover timing works when using health checks and DNS TTLs.

Route 53 failover routing depends on two main time components in the worst case:

  • Failure detection time: How long it takes the health check to declare the primary endpoint unhealthy.
  • DNS propagation time: How long existing DNS responses remain cached before clients start using the updated response pointing to the secondary Region.

Given:

  • Health check interval \(I = 10\) seconds
  • Failure threshold \(N = 3\) consecutive failed checks
  • Record TTL \(T = 60\) seconds

Worst-case failure detection occurs when the endpoint fails just after a successful health check. Route 53 then needs \(N\) failed checks spaced \(I\) seconds apart before marking it unhealthy.

The failure detection time is:

\[ \text{Detection time} = I \times N = 10 \times 3 = 30\ \text{seconds} \]

Once Route 53 updates the status and starts responding with the secondary endpoint, clients that previously resolved the DNS name can still use the old cached answer for up to the TTL. In the worst case, they just cached the old answer, so they will wait the full TTL.

The DNS propagation time is \(T = 60\) seconds.

Therefore, the maximum expected failover time is:

\[ \text{Total failover time} = 30 + 60 = 90\ \text{seconds} \]

This aligns with automated failover design: understanding how health check configuration and DNS TTL affect RTO when using Route 53 failover routing between Regions.


Question 36

Topic: Design Secure Architectures

A company has 25 AWS accounts in AWS Organizations. The security team must enforce centrally managed, deep packet inspection for all internet egress and VPC-to-VPC traffic, with organization-wide firewall rules, auditability, and minimal per-account administration. Which approaches SHOULD BE AVOIDED? (Select THREE.)

Options:

  • A. Create an AWS Firewall Manager policy that automatically deploys AWS Network Firewall endpoints and a shared rule group into member VPCs in the organization.

  • B. Configure security groups and network ACLs separately in each account, and use AWS Config to check for non-compliant rules.

  • C. In each account, deploy a third-party firewall appliance in a dedicated VPC and route local traffic through it, managing rules by signing in to each appliance separately.

  • D. Use AWS Network Firewall in each application VPC without a shared transit layer, and let each application team manage its own firewall rule groups.

  • E. Deploy a centralized inspection VPC with AWS Network Firewall, connect all VPCs via AWS Transit Gateway, and manage firewall policies using AWS Firewall Manager.

Correct answers: B, C and D

Explanation: The scenario explicitly calls for centrally managed, deep packet inspection across all accounts and all internet and VPC-to-VPC traffic, with organization-wide rules, auditability, and minimal per-account administration. At the SAA-C03 level, this strongly indicates using AWS Network Firewall for stateful, deep packet inspection and AWS Firewall Manager in combination with AWS Organizations for central policy deployment and governance.

Designs that rely only on security groups and NACLs cannot perform deep packet inspection; they only filter at the network and transport layers on basic attributes such as protocol and port. Designs that manage firewalls independently in each account or VPC violate the requirement for centralized, organization-wide policy control and create operational overhead. The best-fit architectures either centralize traffic through a shared inspection VPC with AWS Network Firewall and Transit Gateway, or use AWS Firewall Manager to automatically deploy and manage Network Firewall resources and policies across accounts.


Question 37

Topic: Design Cost-Optimized Architectures

A company runs EC2, Fargate, and Lambda workloads across several member accounts in AWS Organizations. The finance team needs to:

  • Visualize historical compute spend by service and by member account.
  • Enforce monthly cost limits per project using the Project tag, with email alerts at 80% and 100% of budget.
  • Perform ad hoc queries on hourly, resource-level cost and usage data by tag.

The solution must rely on native AWS cost management tools and require minimal custom code. Which combinations of actions meet these requirements? (Select THREE.)

Options:

  • A. Enable AWS Cost Explorer and the Cost and Usage Report in the management account. Download CUR files to a local workstation and process them with custom scripts to analyze hourly resource-level costs. Use a single CloudWatch billing alarm on the payer account for total monthly spend instead of configuring AWS Budgets with tag filters.

  • B. Create separate AWS Budgets per account with cost filters for EC2, Fargate, and Lambda and 80%/100% alerts. Use Cost Explorer in each member account to view historical compute spend. Use Amazon CloudWatch metrics to perform detailed hourly analysis of resource usage instead of enabling the Cost and Usage Report.

  • C. From the management account, enable AWS Cost Explorer and activate the Project cost allocation tag. Create Cost Categories to group member accounts, and use Cost Explorer to view compute spend by service and Cost Category. Configure tag-filtered AWS Budgets per project with 80%/100% email alerts. Turn on a Cost and Usage Report with hourly resource-level data to Amazon S3 and query it using Athena for detailed, tag-based cost analysis.

  • D. From the management account, enable AWS Cost Explorer and activate the Project cost allocation tag. Use Cost Explorer to filter compute costs by service and linked account. Configure AWS Budgets with a Project tag filter and 80%/100% alerts. Enable the Cost and Usage Report with hourly granularity and resource IDs to Amazon S3, then query it with Amazon Athena for ad hoc analysis.

  • E. Enable the Cost and Usage Report with hourly granularity and load it into an Amazon Redshift cluster using custom ETL scripts. Build dashboards that show compute costs by service and account. Create monthly AWS Budgets per account without using tags, and rely on those budgets to control project-level costs.

  • F. Enable AWS Cost Explorer in the payer account and filter on EC2, Fargate, and Lambda services and on linked accounts to visualize compute spend. Activate the Project cost allocation tag and create monthly AWS Budgets that filter on this tag with 80%/100% email alerts. Configure a Cost and Usage Report with hourly, resource-level details delivered to S3 and use the AWS Glue Data Catalog plus Athena to run ad hoc queries by tag, service, and account.

Correct answers: C, D and F

Explanation: To track and control compute spend across multiple AWS accounts, AWS provides three primary cost management tools:

  • AWS Cost Explorer for visualizing historical and forecasted costs and usage, with filters such as service, linked account, and cost allocation tags.
  • AWS Budgets for setting cost or usage thresholds (including per-tag budgets) and sending alerts when spending approaches or exceeds those thresholds.
  • The AWS Cost and Usage Report (CUR) for the most detailed cost and usage data, including hourly and resource-level granularity with tags. CUR is typically delivered to Amazon S3 and queried with Athena (often via the AWS Glue Data Catalog) for ad hoc analysis.

In this scenario, the finance team needs cross-account visibility, project-level budget enforcement using the Project tag, and detailed, hourly, tag-based querying, all with minimal custom code. The best solutions therefore must:

  • Be configured from the payer/management account so they see all member accounts.
  • Activate the Project cost allocation tag so costs can be grouped and filtered by that tag.
  • Use AWS Budgets with a filter on the Project tag and thresholds at 80% and 100% of each monthly budget.
  • Enable CUR at hourly, resource-level granularity, deliver it to S3, and use Athena (with or without Glue Data Catalog and Cost Categories) for ad hoc queries.

The correct options are the ones that explicitly combine these three services and behaviors; the distractors either omit tag-based budgets, rely on usage-only tools like CloudWatch metrics, or require substantial custom ETL and scripting, which violates the minimal-code requirement.


Question 38

Topic: Design Resilient Architectures

A retail company has an order microservice that must publish events for several independent subscriber services (email notifications, analytics, inventory). Processing must be asynchronous and must not delay order placement. The solution should minimize coupling and provide automatic retries for failed deliveries. Which solutions meet these requirements? (Select TWO.)

Options:

  • A. Publish order events to an Amazon SNS topic and subscribe a separate Amazon SQS standard queue for each subscriber service.

  • B. Write order events into an Amazon Kinesis Data Streams stream and have all subscriber services share one consumer application name to read from the stream.

  • C. Send order events to an Amazon EventBridge event bus and configure rules to route events to each target service (for example, AWS Lambda or Step Functions) with DLQs configured where needed.

  • D. Have the order microservice synchronously invoke each subscriber’s REST API through an Application Load Balancer and wait for responses.

  • E. Write order events to a single Amazon SQS FIFO queue that all subscriber services read from in parallel.

Correct answers: A and C

Explanation: The company needs an event-driven, asynchronous pattern where the order microservice publishes events and multiple independent subscribers consume them without impacting order placement. Pub/sub messaging or event buses are ideal because they decouple producers and consumers, allow independent scaling, and support retry behavior.

Using an Amazon SNS topic with SQS queues provides classic fanout: the order service publishes a single event, SNS delivers it to all subscribed SQS queues, and each subscriber processes messages from its own queue at its own pace. This is a well-known AWS pattern for loosely coupled, resilient microservices.

Amazon EventBridge also supports event-driven architectures. The order service sends events to an event bus, and rules route those events to various targets (such as Lambda functions, Step Functions, or other buses). EventBridge handles retries and optionally uses dead-letter queues, while decoupling event producers from consumers via schemas and event patterns.

In contrast, synchronous REST invocations create direct, blocking dependencies between services, a single SQS queue shared by all subscribers fails the fanout requirement, and the proposed Kinesis consumer configuration causes competing rather than broadcast consumption and adds unnecessary complexity.

Summary of options:

  • ✔ Publish to an SNS topic with an SQS queue per subscriber.
  • ✔ Send events to an EventBridge event bus with rules and appropriate targets/DLQs.
  • ✖ Synchronously call each subscriber via REST over ALB.
  • ✖ Use a single shared SQS FIFO queue for all subscribers.
  • ✖ Use Kinesis with a shared consumer app name for all subscribers.

Question 39

Topic: Design Cost-Optimized Architectures

An organization is designing hybrid connectivity between its on-premises data center and an AWS VPC. A network assessment shows average traffic of 300 Mbps with peaks around 700 Mbps a few times per day.

Requirements:

  • The primary connectivity must comfortably handle the 700 Mbps peak without saturating.
  • If the primary path fails, at least 200 Mbps must remain available for degraded operation.
  • Security requires that no unencrypted production traffic traverses the public internet. Private AWS Direct Connect links and IPsec Site-to-Site VPNs both satisfy this requirement.
  • The network team wants to minimize monthly connectivity costs and avoid paying for more than 1 Gbps of total Direct Connect capacity.

Which of the following connectivity designs should the solutions architect AVOID? (Select TWO.)

Options:

  • A. Use two 500 Mbps Site-to-Site VPN connections over different ISPs, configured with equal-cost multi-path (ECMP) routing to share traffic across both tunnels.

  • B. Provision a single 1 Gbps AWS Direct Connect connection to the VPC and a Site-to-Site VPN over a different ISP as backup, preferring Direct Connect and failing over to the VPN during outages.

  • C. Use two 500 Mbps AWS Direct Connect connections in different locations, with one active and the other configured as a standby that takes over if the primary link fails.

  • D. Provision a single 500 Mbps AWS Direct Connect connection as primary connectivity and route any additional traffic above 500 Mbps directly over the public internet without a VPN to avoid extra tunnel management.

  • E. Provision a single 10 Gbps AWS Direct Connect connection as the only link between the data center and AWS, because it comfortably exceeds the 700 Mbps peak requirement.

Correct answers: D and E

Explanation: The scenario is about selecting a cost-optimized hybrid network design that still meets throughput, redundancy, and security requirements.

Peak traffic is 700 Mbps. The primary path therefore needs to support at least this rate without saturating. In a failure scenario, at least 200 Mbps must remain, so some form of backup path is required. Security prohibits unencrypted use of the public internet, but both private AWS Direct Connect links and IPsec Site-to-Site VPNs are acceptable. Finally, the network team wants to avoid paying for more than 1 Gbps of Direct Connect capacity.

The designs that should be avoided are those that clearly violate one or more explicit requirements: either by drastically overbuying Direct Connect capacity relative to the 1 Gbps limit or by sending production traffic unencrypted over the public internet and/or failing to provide any backup connectivity.


Question 40

Topic: Design High-Performing Architectures

Which TWO of the following statements about Amazon Athena, AWS Lake Formation, and Amazon QuickSight are INCORRECT? (Select TWO.)

Options:

  • A. AWS Lake Formation can apply fine-grained, centralized permissions on data catalog resources so that multiple analytics services, such as Athena, respect the same table and column-level access controls.

  • B. Amazon QuickSight can only visualize data that is stored in Amazon S3 and queried through Amazon Athena; it cannot connect directly to relational databases or SaaS applications.

  • C. AWS Lake Formation is primarily an ETL engine that must first copy data out of Amazon S3 into a proprietary data warehouse before services like Amazon Athena can query it.

  • D. Amazon Athena is a serverless, interactive query service that lets you query structured data in Amazon S3 using standard SQL without provisioning infrastructure.

  • E. Amazon QuickSight is a fully managed business intelligence service that enables users to create dashboards and reports without provisioning or managing servers.

Correct answers: B and C

Explanation: This question checks understanding of the distinct roles of Amazon Athena, AWS Lake Formation, and Amazon QuickSight in a modern analytics stack.

Athena is a serverless, interactive query service that lets you use SQL to analyze data directly in Amazon S3. You do not provision or manage any infrastructure; you pay per query based on the amount of data scanned.

Lake Formation builds on the AWS Glue Data Catalog to simplify creation, security, and governance of data lakes on S3. A key value is centralized, fine-grained access control (down to table and column level) that can be enforced consistently across multiple analytics services such as Athena.

QuickSight is a fully managed business intelligence (BI) and visualization service. It connects to many data sources (including but not limited to Athena) and allows users to build dashboards and interactive reports without managing servers.

The incorrect statements either mischaracterize Lake Formation as a data warehouse/ETL engine or incorrectly limit QuickSight to only S3–Athena data, which does not reflect how these services are intended to be used together in a data lake architecture.


Question 41

Topic: Design High-Performing Architectures

Which TWO statements about integrating caching layers with database-backed applications to meet latency objectives are true? (Select TWO.)

Options:

  • A. Because most managed databases already implement internal buffer caches, adding an application-layer cache never provides meaningful additional latency benefits.

  • B. Routing write operations to the cache first and asynchronously propagating changes to the database is the best way to guarantee strong consistency across cache and database.

  • C. For workloads that require strict read-after-write consistency for all clients, you should favor long cache TTLs so that data is reused as much as possible and consistency issues are minimized.

  • D. Using short time-to-live (TTL) values and explicit cache invalidation on writes allows caching even for frequently updated data while limiting how stale cached values can become.

  • E. Placing a distributed, in-memory cache such as Amazon ElastiCache in front of a relational database can significantly reduce read latency and offload frequently accessed, rarely changing data from the database.

Correct answers: D and E

Explanation: Caching layers are commonly added to database-backed applications to reduce read latency and protect the database from heavy load. An in-memory cache can return data faster than a disk-backed database, especially for hot keys or precomputed results. However, caching introduces the possibility of stale data, so consistency requirements must be considered when choosing patterns, TTLs, and invalidation strategies.

A typical AWS pattern is to place Amazon ElastiCache (Redis or Memcached) in front of Amazon RDS or Aurora for read-heavy workloads, or to use DynamoDB Accelerator (DAX) in front of DynamoDB. These services store frequently accessed items in memory, reducing round trips and query processing on the underlying database. For data that changes, short TTLs and explicit invalidation on updates are used to limit staleness while still gaining latency benefits.

Patterns that rely on asynchronous writes from cache to database do not guarantee strong consistency; instead, they can increase the chance of divergence. Similarly, internal database caches do not remove the need for a separate caching layer in low-latency applications, and long TTLs work against strict read-after-write consistency guarantees.


Question 42

Topic: Design Secure Architectures

Which of the following statements about using IAM Identity Center and SAML federation with external identity providers to access multiple AWS accounts are INCORRECT? (Select THREE.)

Options:

  • A. When configuring SAML federation directly with IAM (without IAM Identity Center), a SAML assertion must always map to an IAM user; mapping a federated principal directly to IAM roles is not supported.

  • B. When using IAM Identity Center, you must create a separate IAM user in each AWS account for every federated user so they can sign in to the console and CLI.

  • C. Federated users can use the AWS CLI by configuring AWS access profiles for IAM Identity Center, which obtain short-lived role-based credentials based on the user’s assigned permission sets instead of static access keys.

  • D. IAM Identity Center uses permission sets to provision IAM roles in target AWS accounts, and federated users obtain temporary credentials by assuming those roles after authenticating with the external IdP.

  • E. Using SAML federation with IAM Identity Center lets you centralize MFA enforcement at the external identity provider and avoid distributing long-lived access keys to users.

  • F. IAM Identity Center issues long-lived IAM access keys for federated users so they do not need to assume roles repeatedly when accessing AWS resources.

Correct answers: A, B and F

Explanation: Federated access to AWS is designed to avoid managing long-lived IAM users and static access keys in each account. Instead, users authenticate with an external identity provider (IdP), such as an Active Directory–backed IdP or a SAML provider, and then assume IAM roles that grant them temporary, scoped permissions.

IAM Identity Center simplifies this model across multiple AWS accounts. Administrators define permission sets, which IAM Identity Center uses to provision IAM roles in target accounts. When a user signs in via the IdP and selects an account and permission set, IAM Identity Center uses role assumption behind the scenes to provide short-lived credentials, both for console and CLI access.

Direct SAML federation to IAM (without IAM Identity Center) follows a similar principle: the SAML assertion identifies which IAM roles the user can assume. The user again receives temporary security credentials via role assumption, not a persistent IAM user and not long-lived access keys.

Any statement that requires creating per-account IAM users for federated identities or that claims federation uses long-lived access keys contradicts the core design and best practices of federated, role-based access.


Question 43

Topic: Design High-Performing Architectures

Which THREE statements about configuring AWS data ingestion services to match workload characteristics are correct? (Select THREE.)

Options:

  • A. Increasing the number of shards in an Amazon Kinesis data stream increases the stream’s maximum aggregate throughput and parallelism for producers and consumers.

  • B. For Amazon Kinesis or DynamoDB Streams event sources, reducing the AWS Lambda batch size can decrease per-record latency but typically increases overall invocation overhead.

  • C. For an Amazon SQS queue used as a Lambda event source, configuring a larger batch size lets each Lambda invocation process more messages, improving throughput and reducing cost per message at the expense of higher per-message latency.

  • D. When some partition keys are hotter than others in a Kinesis data stream, increasing the consumer batch size is the primary way to eliminate throttling on the hot partition.

  • E. Kinesis Data Firehose buffer size and buffer interval settings affect only delivery cost; they do not significantly impact delivery latency to the destination.

Correct answers: A, B and C

Explanation: Aligning ingestion configuration with workload characteristics typically involves tuning parallelism, batch size, and buffering so that throughput, latency, and cost match requirements.

In Amazon Kinesis Data Streams, shards are the unit of capacity and parallelism. Each shard provides a fixed amount of read and write throughput. Increasing the number of shards increases the maximum sustainable throughput and allows more consumer workers to process data in parallel, as long as the partition keys distribute load evenly.

For stream-based event sources such as Kinesis and DynamoDB Streams, AWS Lambda’s batch size setting controls how many records are processed per invocation. Smaller batches are sent to Lambda more frequently, which reduces the time a single record waits in the stream (lower latency), but causes more invocations and higher overhead. Larger batches improve efficiency and throughput but increase per-record latency.

Similarly, when Lambda is triggered by Amazon SQS, the batch size determines how many messages are fetched and passed to a single invocation. Larger batch sizes generally improve throughput and reduce cost per message because invocation overhead is shared, but they increase the potential wait time for individual messages.

Kinesis Data Firehose uses buffer size and buffer interval to decide when to flush data to its destination. These parameters directly impact delivery latency: larger buffers or longer intervals delay delivery to accumulate more data, trading latency for efficiency.

Finally, hot partitions in Kinesis (where some partition keys receive disproportionate traffic) cannot be fixed by merely changing consumer batch size. The bottleneck is the shard’s provisioned capacity; the correct fixes are changing partition key design or performing reshard operations to add more shards and spread load.


Question 44

Topic: Design Resilient Architectures

Which of the following statements about using infrastructure as code (IaC), such as AWS CloudFormation or the AWS CDK, to support multi-AZ and multi-Region failover is NOT true? (Select THREE.)

Options:

  • A. Declaring networking, security, and data-tier resources in IaC guarantees that configuration drift across Availability Zones and Regions cannot occur, even if operators make manual changes.

  • B. A single AWS CloudFormation stack can directly create and manage resources in multiple AWS Regions, which simplifies multi-Region disaster recovery deployments.

  • C. Once infrastructure is fully defined as IaC, you no longer need to control or restrict manual changes in production, because any console changes will be overwritten and synchronized automatically by the IaC tool.

  • D. Storing IaC templates in a source control system enables you to quickly recreate infrastructure in a new Region from a known-good, tested revision after a Regional failure.

  • E. Using AWS CloudFormation StackSets or AWS CDK with a CI/CD pipeline allows you to roll out the same, versioned infrastructure definition to multiple Regions in a controlled, consistent way for disaster recovery.

Correct answers: A, B and C

Explanation: Infrastructure as code (IaC) such as AWS CloudFormation and the AWS CDK is fundamental for building resilient, multi-AZ and multi-Region architectures because it gives you repeatable, versioned definitions of your infrastructure. This supports automated failover, rapid recreation of environments, and consistent configuration across Regions.

However, it is important to understand the scope and limitations of these tools. CloudFormation stacks are Region-scoped, so a single stack cannot span multiple Regions. Multi-Region deployments are achieved by deploying the same template or synthesized stack into each Region (often with StackSets or CI/CD pipelines). Likewise, IaC greatly reduces configuration drift but cannot prevent it when manual changes are made; governance and change control are still required. IaC templates also do not automatically update themselves based on console changes.

Correct statements highlight using StackSets or pipelines to push the same definition to multiple Regions and using source control to store known-good templates for rapid recreation during failover. The incorrect statements overstate IaC’s capabilities by claiming cross-Region stacks, automatic synchronization of manual changes, or a complete guarantee against drift, which are not accurate behaviors.


Question 45

Topic: Design High-Performing Architectures

A company runs 30 microservices on Amazon EKS with EC2 worker nodes. The team struggles with Kubernetes complexity and node patching. They do not use Kubernetes-specific APIs and want to eliminate both Kubernetes and EC2 management while keeping containers on AWS, ALB routing, and IAM roles per service. Which change BEST meets these goals?

Options:

  • A. Migrate the workloads to Amazon ECS services using the Fargate launch type, fronted by an Application Load Balancer with IAM roles for tasks.

  • B. Move the workloads to Amazon ECS on EC2 by running an ECS cluster on an Auto Scaling group of EC2 instances behind an Application Load Balancer.

  • C. Migrate the workloads to Amazon EKS on Fargate by creating Fargate profiles for the namespaces and keep using Kubernetes Services with an Application Load Balancer.

  • D. Run the containers directly on an Auto Scaling group of EC2 instances using a custom in-house scheduler and user data scripts for deployment and health checks.

Best answer: A

Explanation: The company’s main goal is to reduce operational burden by removing both Kubernetes complexity and EC2 instance management while still running containers on AWS with ALB routing and per-service IAM roles.

Amazon ECS with the Fargate launch type is a fully managed container orchestration option on AWS that does not require Kubernetes. With Fargate, AWS manages the underlying compute capacity, so there are no EC2 instances to patch or scale. ECS integrates directly with Application Load Balancers for HTTP routing and supports IAM roles for tasks, meeting the ALB and per-service IAM requirements.

Alternative designs either keep Kubernetes (EKS with Fargate) or continue to require EC2 node management (ECS on EC2 or a custom scheduler), so they fail explicit constraints. ECS on Fargate is therefore the only option that is strictly better on operations while still satisfying all functional requirements, making it the best optimization of the baseline design.


Question 46

Topic: Design Secure Architectures

Which statement BEST defines envelope encryption as used with AWS Key Management Service (AWS KMS)?

Options:

  • A. A technique where a data key encrypts the data, and that data key is itself encrypted (“wrapped”) under a KMS key.

  • B. Encrypting every object or record directly with a single customer managed KMS key, without generating separate data keys.

  • C. Relying on each AWS service to automatically create and manage its own encryption keys with no customer-managed KMS keys.

  • D. Encrypting data entirely on the client using locally stored keys that never leave the application environment.

Best answer: A

Explanation: Envelope encryption is a key management pattern widely used with AWS KMS. In this model, an application calls AWS KMS to generate a data key. The plaintext data key is used locally to encrypt large amounts of data, such as S3 objects or database fields. The data key is then encrypted (wrapped) with a KMS key, and only the encrypted version of the data key is stored alongside the ciphertext data. Later, the encrypted data key can be sent back to AWS KMS to be decrypted, allowing the application to recover the plaintext data key and decrypt the data.

This approach minimizes the amount of data that AWS KMS must directly encrypt or decrypt, respects KMS size limits, and simplifies key rotation because you can re-encrypt the data keys under a new KMS key without re-encrypting the bulk data itself. It also clearly separates responsibilities: AWS KMS protects the KMS keys and performs small key operations, while the application or service performs data encryption using the generated data keys.


Question 47

Topic: Design Secure Architectures

Which TWO statements about AWS encryption and key management are correct? (Select TWO.)

Options:

  • A. Using service-managed encryption keys in Amazon S3 allows you to define and manage the key policy directly in AWS KMS.

  • B. Enabling server-side encryption with AWS managed keys (SSE-S3) on an S3 bucket gives you the same control over key rotation and key disabling as customer managed KMS keys.

  • C. Customer managed KMS keys are stored inside your VPC and require a dedicated VPC endpoint for each key to be used.

  • D. With client-side encryption, the application encrypts data before sending it to AWS, so AWS stores only ciphertext and cannot decrypt it without client-held keys.

  • E. In envelope encryption, a data key encrypts the data, and a KMS key is used only to encrypt and decrypt that data key.

Correct answers: D and E

Explanation: The question focuses on where encryption happens and who controls the keys in common AWS patterns: envelope encryption, client-side encryption, and the difference between AWS managed/service-managed keys and customer managed KMS keys.

In AWS, envelope encryption is widely used. The application or service asks AWS KMS for a data key. That data key (also called a DEK) encrypts the actual data. The data key itself is then encrypted under a KMS key. Later, the encrypted data key is sent back to KMS for decryption, and the decrypted data key is used to decrypt the data. The KMS key never leaves KMS; it only encrypts and decrypts data keys.

With client-side encryption, encryption and decryption happen entirely on the client, before data reaches AWS services. AWS sees and stores only ciphertext. Unless the client sends keys or decryption capability to AWS, AWS cannot decrypt the data. This maximizes customer control over keys and plaintext exposure, but also shifts more responsibility to the client.

By contrast, service-managed keys (such as S3 SSE-S3 or AWS owned keys) and AWS managed KMS keys reduce operational overhead but provide less control. With SSE-S3, you cannot see or configure the keys in your account; AWS manages policies and rotation for you. With customer managed KMS keys, you control key policies, key rotation configuration, enable/disable state, and can audit their use.

Summary of the options:

  • ✔ In envelope encryption, a data key encrypts data, and a KMS key protects that data key.
  • ✔ With client-side encryption, the application encrypts data before sending it to AWS, so AWS stores only ciphertext.
  • ✖ Service-managed S3 keys do not expose key policies in your AWS account/KMS.
  • ✖ Customer managed KMS keys are not stored in your VPC and do not require per-key VPC endpoints.
  • ✖ SSE-S3 does not give the same level of control as customer managed KMS keys over rotation and key lifecycle.

Question 48

Topic: Design Secure Architectures

Which of the following statements about access control in a multi-account AWS Organizations environment is INCORRECT?

Options:

  • A. For cross-account access, a common approach is to create an IAM role in the target account and configure the role’s trust policy to allow principals from the source account to assume it.

  • B. If an SCP explicitly denies an action in an account, you can still allow that action for a specific IAM role in that account by attaching an identity-based policy that grants the action.

  • C. An IAM permission boundary limits the maximum permissions that an IAM role or user can receive and does not, by itself, grant any permissions.

  • D. A service control policy (SCP) does not grant permissions; it defines the maximum set of permissions that principals in member accounts can have.

Best answer: B

Explanation: In a multi-account AWS Organizations environment, access control decisions are the combination of several layers: service control policies (SCPs) at the organization/OU/account level, identity-based and resource-based IAM policies within each account, and optional permission boundaries on individual principals. Understanding how these layers interact is critical for enforcing least privilege and guardrails across accounts.

SCPs apply to all principals (including the root user) in the member accounts that are affected by the SCP. They do not grant permissions; instead, they define the outer boundary of what is even allowed to be granted by IAM policies. Inside that boundary, IAM identity-based and resource-based policies determine what a principal can actually do.

IAM permission boundaries work at the principal (user/role) level. A permission boundary is an additional constraint that limits which permissions an identity-based policy can grant to that principal. Just like SCPs, permission boundaries do not grant permissions; they only filter what can be granted.

For cross-account access, the standard AWS pattern is to create a role in the target account and configure its trust policy to allow principals (users or roles) from the source account to assume it. The role’s attached permission policies then control what actions are allowed in the target account.

An important rule is that explicit denies from SCPs cannot be bypassed. If an SCP denies an action, no identity-based or resource-based policy in that account can re-allow it. This ensures that organization-level guardrails cannot be overridden locally in a member account.


Question 49

Topic: Design Secure Architectures

A company is deploying a three-tier web application in a single VPC. An internet-facing Application Load Balancer (ALB) is in public subnets; EC2 application servers and an Amazon RDS MySQL database are in private subnets. The company must: 1) allow users to reach the application only over HTTPS, 2) ensure the database is reachable only from the application servers on port 3306, 3) automatically allow return traffic for permitted flows, and 4) minimize operational effort when scaling instances. Which network configuration meets these requirements?

Options:

  • A. Attach a single security group to the ALB, application servers, and database that allows inbound TCP 443 and TCP 3306 from 0.0.0.0/0; keep the default network ACLs on all subnets.

  • B. Use network ACLs as the primary control: configure the public subnet NACL to allow inbound TCP 443 from 0.0.0.0/0 and the private subnet NACL to allow inbound TCP 443 from ALB IPs and TCP 3306 from application IPs; leave all security groups open to all traffic.

  • C. Use separate security groups for each tier: an ALB security group allowing TCP 443 from 0.0.0.0/0; an application security group allowing TCP 443 from the ALB security group; a database security group allowing TCP 3306 from the application security group; keep the VPC network ACLs at their default settings.

  • D. Configure restrictive network ACLs that allow only TCP 443 inbound to the public subnets and only TCP 3306 inbound to the private subnets; configure security groups on all resources to allow all inbound traffic from 0.0.0.0/0 while relying on the NACLs for enforcement.

Best answer: C

Explanation: Using tiered, stateful security groups referencing each other for ALB, application, and database traffic satisfies HTTPS-only access, strict database isolation, automatic handling of return traffic, and low operational overhead compared to NACL-centric or internet-open designs.


Question 50

Topic: Design Resilient Architectures

A stateless web application runs on a single Amazon EC2 m5.large instance in one Availability Zone, fronted by an Application Load Balancer. Traffic doubles during monthly campaigns, causing CPU saturation and occasional downtime. The application must remain available during an Availability Zone outage and require minimal operations effort. Which change is MOST appropriate?

Options:

  • A. Add a second EC2 instance in the same Availability Zone and register both instances with the existing Application Load Balancer without using Auto Scaling.

  • B. Migrate the web tier to a single AWS Fargate task behind the existing Application Load Balancer in the same Availability Zone to reduce server management.

  • C. Increase the instance size from m5.large to m5.4xlarge to handle traffic spikes on a single EC2 instance.

  • D. Place the EC2 instance in an Auto Scaling group spanning two Availability Zones with a minimum of two instances and CPU-based scaling, attached to the existing Application Load Balancer.

Best answer: D

Explanation: The scenario describes a stateless web application currently running on a single EC2 instance in one Availability Zone (AZ) behind an Application Load Balancer (ALB). The main problems are CPU saturation during predictable traffic spikes and downtime risk due to a single instance and single-AZ deployment.

To improve resilience and scalability while keeping operations simple, the best approach at the compute layer is to move from vertical scaling (bigger single instance) to horizontal scaling (a group of instances) across multiple AZs. An Auto Scaling group (ASG) with a minimum of two instances across at least two AZs behind the existing ALB provides both fault tolerance and elasticity.

Because the application is stateless, additional instances can be added or removed without impacting user state. Horizontal scaling across AZs directly addresses the AZ outage and scaling requirements, and managed scaling policies reduce ongoing operational effort compared to manually managing instance sizes or counts.


Questions 51-65

Question 51

Topic: Design Cost-Optimized Architectures

A company stores daily database backups in Amazon S3 Standard with versioning enabled and keeps all objects for 7 years. CloudWatch billing alerts show rapidly increasing S3 storage costs. Backups are rarely retrieved, but compliance requires 7-year retention and 99.999999999% durability. Which change BEST addresses this issue while meeting all requirements?

Options:

  • A. Configure an S3 Lifecycle rule to transition backup objects to S3 Glacier Deep Archive 30 days after creation and permanently delete them after 7 years.

  • B. Move all backup data from S3 to encrypted gp3 Amazon EBS volumes attached to a backup EC2 instance to avoid S3 storage charges.

  • C. Enable S3 Intelligent-Tiering on the backup bucket so objects automatically move between access tiers based on access patterns.

  • D. Reduce backup frequency from daily to monthly while keeping all backup objects in S3 Standard to lower overall storage usage.

Best answer: A

Explanation: The symptom is rapidly increasing S3 storage costs for long-term backups stored in S3 Standard. The backups are rarely retrieved, must be retained for 7 years, and require very high durability.

The root cause is that the current backup strategy uses S3 Standard for all data, including multi-year, cold backup data. S3 Standard is designed for frequently accessed objects and is more expensive than archival storage classes that still meet the durability requirement.

The proper fix is to keep the existing backup pattern (daily backups, 7-year retention, 11x9s durability) but change the storage lifecycle so older backups are automatically transitioned to a lower-cost archival storage class optimized for long-term retention, such as S3 Glacier Deep Archive. A lifecycle rule can also enforce the 7-year retention requirement by expiring objects at the end of that period. This reduces monthly spend without compromising durability or compliance.


Question 52

Topic: Design High-Performing Architectures

Which of the following statements about AWS Glue and related AWS data transformation services are INCORRECT? (Select THREE.)

Options:

  • A. AWS Glue can only read data from Amazon S3; it cannot connect to relational databases over JDBC.

  • B. AWS Glue ETL jobs run in a serverless Apache Spark environment where AWS automatically provisions and scales the required compute.

  • C. Every AWS Glue ETL job must be written entirely in code (Python or Scala); AWS Glue does not provide any visual interface for building ETL workflows.

  • D. AWS Glue Data Catalog provides centralized table metadata that services such as Amazon Athena and Amazon Redshift Spectrum can query.

  • E. AWS Glue requires you to provision and manage a persistent Amazon EC2 cluster to run ETL jobs.

  • F. Amazon Kinesis Data Firehose is commonly used to ingest streaming data into destinations like Amazon S3 and Amazon Redshift, optionally performing lightweight transformations before delivery.

Correct answers: A, C and E

Explanation: AWS Glue is a serverless data integration service used to discover, prepare, and transform data at scale. It runs ETL jobs on a managed Apache Spark environment and integrates with the AWS Glue Data Catalog for centralized metadata. Glue supports many sources (including S3 and JDBC databases) and offers both code-based development and a visual interface via AWS Glue Studio.

For streaming ingestion, services such as Amazon Kinesis Data Firehose can perform lightweight transformations while delivering data to storage and analytics destinations. Together, these services form common building blocks for analytics-ready data pipelines on AWS.


Question 53

Topic: Design High-Performing Architectures

An online media platform has a microservice that uploads videos and then synchronously calls a transcoding service. During traffic spikes, the transcoding service is overloaded and requests fail. Workers may be offline for several minutes. The company wants to decouple producers and consumers, absorb bursts, and allow multiple independent consumer applications (for example, analytics) to process the same events. Which TWO solutions meet these requirements? (Select TWO.)

Options:

  • A. Send video processing requests to an Amazon SQS standard queue and run an Auto Scaling group of worker instances that poll the queue for jobs.

  • B. Publish video processing events to an Amazon SNS topic that pushes HTTPS notifications directly to each worker service endpoint.

  • C. Publish video events to an Amazon EventBridge event bus that invokes each consumer as an HTTP API destination target in the same Region.

  • D. Increase the size of the transcoding Auto Scaling group and continue calling the service synchronously from the upload microservice.

  • E. Send video events to an Amazon Kinesis Data Stream and have multiple consumer applications read from the stream at their own rate within the retention period.

Correct answers: A and E

Explanation: The goal is to decouple the upload microservice from downstream processing so that spikes in video uploads do not directly overload the transcoding or analytics components. A decoupled design should allow producers to enqueue events quickly, buffer those events durably, and let multiple consumers process them asynchronously at their own pace.

Amazon SQS and Amazon Kinesis Data Streams are both purpose-built for buffering workloads and decoupling producers from consumers. They support pull-based consumption models where workers control their processing rate. This design naturally smooths traffic spikes and isolates consumer failures from producers.

In contrast, using SNS or EventBridge with direct HTTP push targets still leaves delivery timing tied to consumer availability and capacity unless combined with a queue. Simply scaling a synchronous backend continues to couple producer success to consumer health, which fails the decoupling requirement.


Question 54

Topic: Design Secure Architectures

A company hosts its orders database on Amazon RDS MySQL. The business has defined the following disaster recovery requirements for this database.

Exhibit:

ParameterValue
DatabaseOrders (RDS MySQL, single-AZ)
RPO5 minutes (max data loss)
RTO15 minutes (max downtime)
ScopeSingle Region; DR for AZ failure only

Based only on the information in the exhibit, which solution should a solutions architect recommend to meet these requirements?

Options:

  • A. Create a cross-Region read replica and plan to promote it manually if the primary becomes unavailable.

  • B. Convert the database to a Multi-AZ RDS deployment and enable automated backups with point-in-time recovery.

  • C. Keep the database single-AZ and schedule manual snapshots every 5 minutes, restoring from the latest snapshot during a failure.

  • D. Enable automated daily snapshots and copy them to a second Region for disaster recovery.

Best answer: B

Explanation: The exhibit states that the orders database is currently RDS MySQL, single-AZ and must meet an RPO of 5 minutes (max data loss) and an RTO of 15 minutes (max downtime), with the Scope: Single Region; DR for AZ failure only. This means the company wants to tolerate an Availability Zone failure with minimal data loss and short downtime, without paying for full multi-Region disaster recovery.

Converting the database to a Multi-AZ RDS deployment addresses both RPO and RTO in this context. Multi-AZ uses synchronous replication to a standby in another AZ and provides automatic failover, so data loss is typically zero or a few seconds and failover usually completes within minutes. This is aligned with a 5-minute RPO and 15-minute RTO. Enabling automated backups with point-in-time recovery (PITR) adds protection against data corruption or user error without affecting the core failover behavior.

Other choices either rely on slow restore-from-snapshot processes that cannot reliably meet a 15-minute RTO, or introduce cross-Region designs that do not match the exhibit’s stated Scope: Single Region; DR for AZ failure only, adding unnecessary complexity and cost while still not clearly satisfying the RTO requirement.


Question 55

Topic: Design Cost-Optimized Architectures

Which Amazon EBS volume type is HDD-based and optimized for frequently accessed, large, sequential workloads such as big data and log processing, offering lower cost per GB than SSD volumes?

Options:

  • A. Cold HDD (sc1)

  • B. Throughput Optimized HDD (st1)

  • C. General Purpose SSD (gp3)

  • D. Provisioned IOPS SSD (io2)

Best answer: B

Explanation: Throughput Optimized HDD (st1) is an Amazon EBS volume type that uses magnetic HDD media and is specifically tuned for large, sequential, throughput-driven workloads such as big data, data warehouses, and log processing. It offers higher throughput than General Purpose SSD for these streaming access patterns and a lower cost per GB than SSD-based volumes, making it a cost-effective choice when you need sustained throughput but not high random IOPS or low latency.

SSD-based options like gp3 and io2 are better suited to transactional workloads with lots of small, random I/O and where latency and IOPS are more critical than raw throughput per dollar. Cold HDD (sc1) is also HDD-based but is optimized for infrequently accessed data, trading performance for even lower cost, so it does not meet the requirement for frequently accessed data.


Question 56

Topic: Design Resilient Architectures

A company runs a global online learning platform. It needs a highly available, fault-tolerant data store for user profiles and content metadata with the following requirements:

  • Multi-AZ high availability in the primary Region
  • Cross-Region disaster recovery with RPO <1 minute
  • Minimize operational overhead by using managed, purpose-built AWS services

Which of the following designs should the solutions architect AVOID? (Select THREE.)

Options:

  • A. Deploy self-managed MySQL on Amazon EC2 instances in each Region, configure asynchronous replication between Regions, and manage failover with custom scripts.

  • B. Run a sharded MongoDB cluster on Amazon EC2 instances across multiple AZs in the primary Region, and take daily EBS snapshots to restore into a secondary Region if needed.

  • C. Use a single-AZ Amazon RDS for MySQL instance in the primary Region and copy automated backups to a secondary Region for disaster recovery.

  • D. Use Amazon Aurora Global Database (Aurora MySQL) with a multi-AZ writer cluster in the primary Region and a read-only secondary Region for disaster recovery.

  • E. Use Amazon DynamoDB global tables across two Regions with on-demand capacity and the AWS SDK’s automatic retries for transient failures.

Correct answers: A, B and C

Explanation: The company explicitly wants multi-AZ high availability, cross-Region disaster recovery with an RPO under 1 minute, and minimal operational overhead by using managed, purpose-built AWS services. Designs that rely on self-managed databases on EC2 or snapshot-based DR generally cannot meet these goals as efficiently or reliably as services like DynamoDB global tables or Aurora Global Database.

DynamoDB global tables and Aurora Global Database are purpose-built for exactly this kind of resilient, multi-Region architecture. They provide managed replication, automatic handling of failures, and reduced operational burden compared to self-managed stacks. Solutions that use single-AZ deployments, backup-based DR, or custom replication on EC2 should be avoided in this context.


Question 57

Topic: Design Resilient Architectures

A company is designing a backup solution for application logs. The requirements captured in a design workshop are shown in the following table:

RequirementValue
Workload criticalityNon-critical; data can be regenerated from source within 24 hours
Durability targetNo permanent loss of backed-up logs
Target restore time≤12 hours
DR RegionNot required

Based on this information, which backup solution is the MOST appropriate?

Options:

  • A. Store logs in Amazon S3 Standard-IA with versioning in a single Region. Do not enable cross-Region replication. Use lifecycle rules to transition logs to S3 Glacier Instant Retrieval after 30 days.

  • B. Store logs in Amazon S3 Standard with versioning in a single Region. Enable cross-Region replication to a second Region and transition objects to S3 Glacier Deep Archive after 30 days.

  • C. Store logs on Amazon EBS gp3 volumes attached to a backup EC2 instance, and take daily EBS snapshots. Replicate snapshots to a second Region for disaster recovery.

  • D. Store logs in Amazon S3 One Zone-IA in a single Availability Zone to reduce cost. Enable daily AWS Backup copies to an S3 Glacier Deep Archive vault in the same Region.

Best answer: A

Explanation: The exhibit shows that the workload is non-critical, can be regenerated within 24 hours, but still requires strong durability and a moderate restore time. The key lines are Durability target: No permanent loss of backed-up logs, Target restore time: ≤12 hours, and DR Region: Not required.

The best solution should therefore:

  • Use a highly durable storage service (such as S3) to avoid permanent loss.
  • Meet the 12-hour restore objective without using very slow archival tiers.
  • Avoid multi-Region replication, because the exhibit explicitly says a DR Region is not required, to prevent unnecessary complexity and cost.

Storing the logs in S3 Standard-IA with versioning and transitioning them to S3 Glacier Instant Retrieval after 30 days provides S3-level durability, keeps retrieval within minutes to hours, and stays within a single Region. This directly satisfies the durability and restore-time requirements while avoiding over-engineering with a second Region.

The other options either introduce unnecessary multi-Region replication, violate the durability expectations by using a single-AZ storage class, or risk exceeding the allowed restore time by using the slowest archival tier.


Question 58

Topic: Design Secure Architectures

A healthcare company stores research datasets and a smaller set of raw PII files in Amazon S3 across several AWS accounts. The security team must centrally govern which IAM principals can access objects tagged data-classification=PII, enforce encryption at rest with a customer managed KMS key, and prevent any public access. The design should minimize the use of S3 ACLs, scale easily as new accounts are added, and provide a clear audit trail of who accessed which objects. Which approach BEST meets these requirements?

Options:

  • A. Create a shared S3 bucket and grant each approved IAM user access through S3 bucket ACLs; enable SSE-S3 on the bucket, allow public read access for research collaborators, and rely on default CloudTrail configuration in each account for auditing.

  • B. Create S3 Access Points for each account with VPC restrictions; use client-side encryption for PII objects, S3 object ACLs to control business unit access, and S3 server access logs for auditing object access.

  • C. Create a central S3 bucket in a security account with S3 Block Public Access enabled and bucket owner enforced; use a bucket policy that allows access only to IAM roles in member accounts with a specific aws:PrincipalTag, conditioned on s3:ExistingObjectTag/data-classification = PII, require SSE-KMS with a customer managed KMS key in the security account, and enable CloudTrail data events on the bucket.

  • D. Keep one S3 bucket per account and manage access using IAM identity-based policies attached to users and roles in each account; enable SSE-KMS using the AWS managed key for S3 and S3 Block Public Access on all buckets, and configure organization-wide CloudTrail for auditing.

Best answer: C

Explanation: The best design for centrally governing PII access in S3 across accounts is to use a single, centrally managed bucket with S3 Block Public Access, bucket owner enforced (no ACLs), and tightly scoped bucket and KMS key policies that rely on tags and trusted roles. This approach satisfies security (least privilege, customer managed KMS key, no public access) and operational requirements (centralized control, easy onboarding of new accounts, and CloudTrail data events for detailed auditing).


Question 59

Topic: Design High-Performing Architectures

Which of the following statements about designing secure and efficient database connectivity on AWS are INCORRECT? (Select THREE.)

Options:

  • A. RDS Proxy can help protect a database during failovers by buffering new connection attempts and reusing existing connections once the new primary is available, reducing connection storms.

  • B. To connect from an application in the same VPC to an RDS instance over private IP addresses, you must first create a gateway VPC endpoint for Amazon RDS.

  • C. Because RDS Proxy manages all database connections, application-side connection pooling should always be disabled for every type of compute, such as containers and servers.

  • D. Enabling RDS Proxy significantly increases the database’s maximum number of concurrent transactions because the proxy nodes add extra CPU and storage capacity for query processing.

  • E. RDS Proxy sits between the application and the RDS database, maintaining a pool of database connections that can be reused by multiple application clients to reduce connection overhead.

  • F. Using an interface VPC endpoint for AWS Secrets Manager allows applications in private subnets to retrieve database credentials without requiring an internet gateway or NAT gateway.

Correct answers: B, C and D

Explanation: The question focuses on secure and efficient database connectivity patterns, especially around RDS Proxy, VPC endpoints, and connection pooling.

RDS Proxy is a managed service that sits between your application and RDS databases (including some Aurora engines). It maintains a pool of database connections and multiplexes many client connections onto fewer database connections. This reduces connection overhead, protects the database from connection storms, and can improve resiliency during failovers. However, it does not execute SQL queries or increase the underlying database’s CPU or storage capacity.

VPC endpoints (interface endpoints using AWS PrivateLink) allow services such as AWS Secrets Manager to be accessed privately from within a VPC without using an internet gateway, NAT gateway, or public IPs. This helps keep database credentials and other sensitive traffic on the AWS network while still supporting serverless and private-subnet workloads.

Direct connectivity from an application to an RDS instance inside the same VPC does not require any VPC endpoint: the instance already has private IP addresses in the VPC, and clients connect directly to the DB endpoint. RDS does not use a gateway VPC endpoint for data-plane connectivity.

Connection pooling can exist in multiple layers: within the application runtime, and in RDS Proxy. RDS Proxy reduces the need for aggressive application-side pooling and frequent connection opens/closes, but you do not automatically disable all application pooling in every scenario. The optimal design depends on workload characteristics and compute type (for example, Lambda vs long-lived containers).


Question 60

Topic: Design Resilient Architectures

A company runs a critical transactional application. During a full Region outage, the company requires the database to meet all of these goals:

  • Cross-Region RPO ≤1s
  • Cross-Region RTO ≤5min
  • Use a managed AWS database service to minimize operational effort

A solutions architect evaluated four designs, shown in the following exhibit.

OptionService / setupCross-Region RPO / RTOManagement effort
1Self-managed MySQL on EC2 + nightly copies~24h RPO, 4h+ manual RTOHigh: custom backup & cross-Region restore
2RDS for MySQL Multi-AZ (single Region)No cross-Region; AZ RTO<2minLow: managed Multi-AZ in one Region only
3Aurora Global Database (Aurora MySQL)RPO<1s, RTO<1minLow: managed global replication & failover
4MySQL on EC2 in 2 Regions, async replica5–30s RPO, 10–20min RTOHigh: manual failover runbooks

Based only on the information in the exhibit, which option should the solutions architect recommend?

Options:

  • A. Option 2: Amazon RDS for MySQL Multi-AZ in a single Region

  • B. Option 4: Self-managed MySQL on EC2 in two Regions with asynchronous replication

  • C. Option 1: Self-managed MySQL on EC2 with nightly cross-Region snapshot copies

  • D. Option 3: Aurora Global Database (Aurora MySQL)

Best answer: D

Explanation: The requirement is to survive a full Region outage with cross-Region RPO≤1s, RTO≤5min, and to use a managed AWS database service to minimize operational effort.

From the exhibit, Option 3 (Aurora Global Database) explicitly lists “RPO<1s, RTO<1min” and describes the management effort as “Low: managed global replication & failover.” This is the only option that simultaneously satisfies the strict RPO and RTO targets and the managed-service requirement.

Options 1, 2, and 4 each fail at least one of the stated goals. Some may appear resilient at first glance, but careful reading of the RPO/RTO and management-effort cells in the exhibit shows they either do not provide cross-Region protection, have too high an RTO, or require significant self-management compared to the purpose-built Aurora Global Database design.


Question 61

Topic: Design Secure Architectures

A company has 50 AWS accounts in an AWS Organizations organization. The security team must centrally inspect all outbound internet traffic from VPCs and enforce mandatory AWS WAF rules and security group policies on all internet-facing resources across all current and future accounts. Which statements about an appropriate design are correct? (Select TWO.)

Options:

  • A. Use AWS Systems Manager Automation documents to push consistent iptables and AWS WAF configurations to EC2 instances in all accounts.

  • B. Deploy third-party firewall appliances into each VPC and manage rules separately in every account instead of using AWS Network Firewall.

  • C. Deploy AWS Network Firewall endpoints in a centralized inspection VPC and route all VPC internet traffic through them using AWS Transit Gateway.

  • D. Use AWS Firewall Manager to create and apply AWS WAF and security group policies across the organization’s accounts and resources.

  • E. Attach AWS WAF web ACLs manually to each internet-facing ALB and CloudFront distribution in every account and rely on tagging to track compliance.

Correct answers: C and D

Explanation: The scenario combines two distinct but related security needs in a multi-account environment:

  • Centralized, stateful inspection of outbound internet traffic from multiple VPCs.
  • Organization-wide enforcement of AWS WAF rules and security group policies on internet-facing resources across many accounts.

AWS Network Firewall is a managed, stateful network firewall service that integrates well with AWS Transit Gateway and VPC routing. It is ideal for building a centralized inspection VPC where all egress (and optionally east-west) traffic is inspected according to rules managed by the security team.

AWS Firewall Manager is a security management service that works with AWS Organizations to centrally configure and manage security policies—such as AWS WAF web ACLs and security group policies—across all accounts. It automatically applies policies to existing and newly created resources that match specified scopes (for example, all internet-facing ALBs in member accounts).

Therefore, a well-architected design for this scenario uses AWS Network Firewall for centralized traffic inspection and AWS Firewall Manager for organization-wide enforcement of WAF and security group policies. The distractors propose manual, per-account management or tools that do not provide enforced, centralized governance, which fails the stated requirements.


Question 62

Topic: Design Resilient Architectures

Which statement BEST describes AWS Fargate in the context of designing scalable, loosely coupled architectures on AWS?

Options:

  • A. A serverless compute engine for containers that runs Amazon ECS tasks and Amazon EKS pods without requiring you to provision or manage EC2 instances

  • B. A fully managed container orchestration service that schedules containers across a cluster of EC2 instances you manage

  • C. A managed platform-as-a-service (PaaS) for deploying web applications from code repositories without dealing with infrastructure details

  • D. A service that runs event-driven functions in response to triggers without requiring containers or explicit runtime management

Best answer: A

Explanation: AWS Fargate is a serverless compute engine for containers that works with Amazon ECS and Amazon EKS, allowing you to run tasks and pods without provisioning, scaling, or managing EC2 instances or container clusters. Because AWS handles the underlying infrastructure and capacity, Fargate is well suited for bursty or unpredictable container workloads where automatic scaling and reduced operational overhead are important.


Question 63

Topic: Design Cost-Optimized Architectures

An IoT company stores time-series sensor data in Amazon S3 for analytics with Amazon Athena. It currently writes 2 TB of raw JSON files per month and must retain 24 months of data. After converting the data to compressed Apache Parquet, file size is reduced to 30% of the original. What is the total S3 storage required for 24 months after the conversion? Give the answer in TB, rounded to one decimal place.

Options:

  • A. 28.8 TB

  • B. 7.2 TB

  • C. 32.0 TB

  • D. 14.4 TB

Best answer: D

Explanation: The question compares storing raw JSON with using a columnar, compressed format (Apache Parquet) for time-series analytics on S3. Columnar formats typically compress better and reduce the amount of data scanned by query engines such as Athena, improving both performance and cost.

Here, we only need the total storage after conversion.

Variables used:

  • Monthly_raw = 2 TB/month of JSON
  • Months = 24
  • Size_fraction_after_conversion = 0.3 (30% of original size)

One-line calculation: Total_storage = Monthly_raw × Months × Size_fraction_after_conversion Total_storage = 2 TB × 24 × 0.3 = 14.4 TB

By switching from raw JSON to compressed Parquet, the company reduces 48 TB of raw data down to 14.4 TB, cutting S3 storage and Athena scan costs by 70% while also improving query performance due to the columnar layout and better compression.


Question 64

Topic: Design High-Performing Architectures

A company runs a nightly log ETL job on Amazon EMR that reads 4,500GB from Amazon S3. Benchmarking shows one EMR core node can reliably process 750GB per hour. The job must finish within 2 hours. How many EMR core nodes are required? (Round up to a whole node.)

Options:

  • A. 4 core nodes

  • B. 2 core nodes

  • C. 5 core nodes

  • D. 3 core nodes

Best answer: D

Explanation: This question tests how to size an Amazon EMR cluster for a batch ETL workload based on per-node processing capacity and a fixed completion window.

The total data to process is 4,500GB. Each EMR core node can process 750GB per hour. The job must complete in 2 hours.

Define the variables:

  • Total data \(D = 4{,}500\text{GB}\)
  • Per-node rate \(R = 750\text{GB/hour}\)
  • Time window \(T = 2\text{hours}\)
  • Number of nodes \(N\)

The cluster’s total processing capacity over the window is:

\[ \text{Capacity} = N \times R \times T \]

To meet the requirement, capacity must be at least equal to the total data:

\[ N \times 750 \times 2 \ge 4{,}500 \]\[ N \ge \frac{4{,}500}{750 \times 2} = \frac{4{,}500}{1{,}500} = 3 \]

So the minimum number of EMR core nodes required is 3. Since the question says to round up to a whole node, 3 nodes is the correct answer.

Architecturally, this demonstrates how EMR scales horizontally for batch data processing: you choose the number of core nodes so the aggregate throughput over the time window matches or exceeds the workload’s data volume, balancing performance and cost.


Question 65

Topic: Design Cost-Optimized Architectures

A company is reviewing three stateless workloads to reduce idle capacity costs. The utilization data is shown in the following table.

WorkloadCurrent platformAvg CPU utilizationTraffic pattern
User web appEC2 Auto Scaling group25%Steady, 24x7
Image processing jobs4 x t3.large EC25%Runs 10 minutes every hour
Reporting API2 x m5.large EC215%Business hours, weekdays

The company wants to address the largest source of idle capacity cost first without changing application behavior.

Based on the data in the table, which change is the MOST appropriate?

Options:

  • A. Migrate the image processing jobs from EC2 instances to AWS Lambda functions triggered every hour.

  • B. Move the reporting API from EC2 to an Amazon ECS cluster running on EC2 with reserved instances.

  • C. Purchase a 1-year Compute Savings Plan for all existing EC2 usage to get a lower hourly rate.

  • D. Replace the Auto Scaling group for the user web app with a single larger EC2 instance to increase utilization.

Best answer: A

Explanation: The table shows three stateless workloads with different utilization and patterns. The key line is:

  • Image processing jobs | 4 x t3.large EC2 | 5% | Runs 10 minutes every hour

This indicates that four EC2 instances are running continuously, but useful work happens only 10 minutes out of every 60, and average CPU utilization is only 5%. This is a classic case of overprovisioned, bursty compute where most of the time is spent paying for idle capacity.

AWS Lambda is a serverless compute service that charges per request and compute duration, not per provisioned instance hour. Moving the image processing jobs to Lambda means the company only pays when the code actually runs during those 10-minute windows, essentially eliminating the cost of idle EC2 time for that workload.

By contrast, the user web app and reporting API lines:

  • User web app | EC2 Auto Scaling group | 25% | Steady, 24x7
  • Reporting API | 2 x m5.large EC2 | 15% | Business hours, weekdays

show higher utilization and more continuous traffic, but they are not as extreme an idle-case as the image processing workload. The question specifically asks to address the largest source of idle capacity cost first, so the answer must target the most underutilized workload.

Choosing Savings Plans or simply rehosting to another EC2-based platform (such as ECS on EC2) may reduce unit cost or improve management, but they do not remove the underlying idle capacity. The best architectural change, based on the exhibit, is to move the highly bursty image processing jobs to a serverless model like AWS Lambda.


Continue with full practice

Use the AWS SAA-C03 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AWS SAA-C03 on Web View AWS SAA-C03 Practice Test

Focused topic pages

Free review resource

Read the AWS SAA-C03 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.

Revised on Thursday, May 14, 2026