Free AWS SAA-C03 Practice Exam: Solutions Architect - Associate

Try 65 free AWS Certified Solutions Architect - Associate (AWS SAA-C03) questions across the exam domains, with explanations, then continue with IT Mastery practice.

This free full-length AWS SAA-C03 practice exam includes 65 original IT Mastery questions across the exam domains.

These are original IT Mastery practice questions. They are not official AWS questions, copied live-exam content, or exam dumps. Use them to preview question style and explanation depth before continuing with mixed sets, topic drills, and timed mocks in IT Mastery.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try AWS SAA-C03 on Web

Exam snapshot

  • Practice target: AWS SAA-C03
  • Practice-set question count: 65
  • Time limit: 130 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Design Secure Architectures30%
Design Resilient Architectures26%
Design High-Performing Architectures24%
Design Cost-Optimized Architectures20%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and interactive practice.

Practice questions

Questions 1-25

Question 1

Topic: Design Cost-Optimized Architectures

Which TWO statements about designing Amazon S3 Lifecycle policies to minimize storage waste and unnecessary spend are true? (Select TWO.)

Options:

  • A. S3 Lifecycle rules must be run manually on a schedule; they do not execute automatically once configured.

  • B. Removing older, previous object versions is not recommended because S3 only charges for the current version of an object, not for previous versions.

  • C. Using object tags or prefixes in S3 Lifecycle filters helps apply aggressive archival or deletion rules only to stale data sets, avoiding impact on frequently accessed objects.

  • D. S3 Lifecycle rules can automatically transition objects to lower-cost storage classes and later expire them after a specified age, reducing storage cost without manual intervention.

  • E. The cheapest strategy is always to transition all new objects immediately into the lowest-cost archival storage class, regardless of access patterns or minimum storage duration charges.

Correct answers: C and D

Explanation: S3 Lifecycle policies are a primary tool for reducing long-term storage cost and waste. They allow you to automatically transition objects from more expensive, frequently accessed storage classes (such as S3 Standard) to lower-cost, infrequently accessed or archival classes (such as S3 Standard-IA, S3 Glacier Flexible Retrieval, or S3 Glacier Deep Archive) as data ages. They can also permanently delete objects when they are no longer needed.

To avoid harming performance or increasing total cost, lifecycle rules should be carefully scoped. Using object tags or prefixes, you can ensure that aggressive archival or deletion policies apply only to data that is truly stale, while keeping frequently accessed data in faster storage classes. Good lifecycle design also considers retrieval patterns and minimum storage duration charges so that transitions happen after data has naturally cooled.

Incorrect ideas include assuming lifecycle actions are manual, believing that deep archive is always best for all data, or thinking that previous versions are free. In reality, lifecycle rules are fully automated, deep-archive classes have trade-offs, and versioned objects accumulate cost unless old versions are expired.


Question 2

Topic: Design Resilient Architectures

Which THREE statements about architecting event-driven solutions on AWS using SNS, SQS, EventBridge, and Kinesis are true?

(Select THREE.)

Options:

  • A. Amazon EventBridge supports content-based filtering on events, so rules can deliver only events that match specific patterns to a given target.

  • B. Publishing events to an Amazon SNS topic that fans out to multiple Amazon SQS queues lets each consumer process messages at its own rate and fail independently, improving loose coupling.

  • C. With Amazon SQS, producers must know the number of consumer applications so they can configure the queue with the correct number of subscribers before sending messages.

  • D. Amazon Kinesis Data Streams immediately deletes a record after the first successful read by any consumer, which prevents the use of the same event by multiple independent consumer applications.

  • E. Synchronous REST calls between microservices through an Application Load Balancer are the preferred pattern for loose coupling in event-driven architectures because failures are surfaced immediately to callers.

  • F. Amazon Kinesis Data Streams is well suited for event streams that may need to be replayed, because consumers can read from a chosen sequence number or timestamp within the retention window.

Correct answers: A, B and F

Explanation: Event-driven architectures on AWS aim to decouple producers and consumers using messaging and streaming services. Amazon SNS and SQS are commonly used together for fanout and buffering. Amazon EventBridge provides event buses with content-based routing and integration with many AWS services and SaaS providers. Amazon Kinesis Data Streams is designed for ordered, replayable event streams with multiple consumers.

The true statements highlight how these services support loose coupling, independent scaling, and replay. The false statements describe synchronous dependencies or incorrect behaviors that would reduce resilience or flexibility in an event-driven design.


Question 3

Topic: Design Cost-Optimized Architectures

An application tier runs on t3.small instances in an Auto Scaling group. Daily, CPU stays near 70% for several hours, CPU credits drop to zero, and latency increases. Cost must not increase significantly. What is the MOST appropriate change?

Options:

  • A. Replace the Auto Scaling group with a single, larger t3.2xlarge instance to avoid Auto Scaling overhead and reduce throttling.

  • B. Change the Auto Scaling group to use general purpose m6i.large instances sized for the steady CPU load instead of burstable t3.small.

  • C. Increase the Auto Scaling group maximum size and add more t3.small instances during peak hours.

  • D. Enable T3 Unlimited on the existing Auto Scaling group so instances can burst beyond their CPU credits when needed.

Best answer: B

Explanation: Symptom: The application runs on t3.small instances and experiences high CPU (around 70%) for several hours daily. During this time, CPU credits fall to zero and latency increases, indicating the instances are being throttled once credits are exhausted.

Root cause: T3 instances are burstable. They are cost-effective when average CPU usage is low and bursts are short. Here, CPU is consistently high for extended periods, so credits are steadily drained and throttling occurs. For sustained CPU-bound workloads, burstable instances become both a performance and cost problem, because you either get throttled or pay extra (for example with T3 Unlimited) for continuous bursting.

Fix: Use non-burstable general purpose instances that are right-sized for the steady-state CPU and memory requirements. Moving the Auto Scaling group from t3.small to an appropriate m6i size removes CPU credit limitations while keeping the elasticity and high availability of the group. This typically provides better price/performance for sustained CPU usage compared to paying ongoing burst charges on T3 Unlimited or overprovisioning multiple small burstable instances.


Question 4

Topic: Design Resilient Architectures

Which TWO of the following statements about using Application Load Balancers (ALBs) and Network Load Balancers (NLBs) for high availability are INCORRECT? (Select TWO.)

Options:

  • A. Health checks for an Application Load Balancer are configured on the load balancer listener; all target groups behind that listener must share the same health check path and port.

  • B. For high availability, an Application Load Balancer must be associated with subnets in at least two Availability Zones; targets in those AZs receive traffic only if they pass health checks.

  • C. If all targets in one Availability Zone fail health checks but healthy targets exist in other enabled AZs, the ALB will automatically stop sending requests to the unhealthy AZ.

  • D. To achieve multi-AZ resiliency with an Application Load Balancer, it is sufficient to launch instances in multiple Availability Zones; the load balancer does not need subnets in every AZ where targets run.

  • E. A Network Load Balancer operates at Layer 4 and is commonly used to load balance TCP or UDP traffic while preserving the client source IP address.

Correct answers: A and D

Explanation: This question tests understanding of how Application Load Balancers (ALBs) and Network Load Balancers (NLBs) achieve high availability, especially with respect to Availability Zones (AZs), health checks, and where key settings are configured.

ALBs distribute incoming traffic across multiple targets (such as EC2 instances or ECS tasks) in one or more AZs. For true high availability, the ALB must be deployed (that is, associated with subnets) in at least two AZs. Within each enabled AZ, it only routes requests to targets that pass health checks configured at the target group level.

NLBs operate at Layer 4 (TCP/UDP) and are commonly used when you need ultra-low latency, very high throughput, or to preserve the client source IP. They also support multi-AZ configurations and health checks on targets.

The incorrect statements in this question either misplace the health check configuration (listener vs. target group) or incorrectly imply that the ALB does not need subnets in each AZ where you want to load balance traffic. Both misunderstandings can lead to designs that are not truly highly available or are harder to operate correctly.


Question 5

Topic: Design High-Performing Architectures

A company collects 3 TB per day of clickstream data in Amazon S3 as semi-structured JSON. They need to transform this data into partitioned Parquet files in the same S3 data lake for analysis with Amazon Athena while keeping table metadata centralized. The solution must be fully managed and serverless, automatically scale with data volume, and charge primarily when processing jobs are running. It should also automatically keep a data catalog in sync as new partitions arrive. Which solution BEST meets these requirements?

Options:

  • A. Configure Amazon S3 event notifications to invoke AWS Lambda functions that parse each JSON object and write transformed Parquet files to S3, and manually define tables in Athena.

  • B. Use AWS Glue ETL jobs with the AWS Glue Data Catalog, triggered when new JSON objects arrive in S3, to convert the data to partitioned Parquet in S3 and update the catalog.

  • C. Set up an Amazon Kinesis Data Firehose delivery stream to read from S3, convert records to Parquet, and load them into an Amazon Redshift cluster for analytics.

  • D. Provision an always-on Amazon EMR cluster with Apache Spark and Hive to run nightly ETL jobs that write Parquet files to S3 and manage table definitions in an external Hive metastore.

Best answer: B

Explanation: AWS Glue is a fully managed, serverless ETL and catalog service designed to transform large S3 datasets into analytics-ready formats like partitioned Parquet while automatically maintaining centralized metadata.


Question 6

Topic: Design High-Performing Architectures

Which TWO statements are correct when designing high-performing data ingestion solutions on AWS with varying ingestion frequency, volume, and traffic spikes? (Select TWO.)

Options:

  • A. For data sources that generate sudden high-volume spikes, using a durable buffer like Amazon SQS or Amazon Kinesis between producers and consumers helps decouple ingestion from processing and prevents data loss.

  • B. Batch ingestion is always higher-performing than streaming because it minimizes per-record overhead, so it is recommended even when sub-second end-to-end latency is required.

  • C. Auto Scaling for ingestion consumers should rely only on CPU utilization metrics, not on queue depth or stream shard metrics, to avoid scaling too aggressively.

  • D. Continuous event streams that require low-latency processing are usually better handled by streaming services such as Amazon Kinesis Data Streams or Amazon MSK than by large periodic batch uploads.

  • E. To maximize throughput for high-volume, highly bursty workloads, it is best to write directly from all producers into an Amazon RDS database without intermediate queues or streams.

Correct answers: A and D

Explanation: Designing high-performing data ingestion on AWS depends heavily on ingestion frequency (continuous vs periodic), volume, and spike behavior.

Continuous streams of small events with low-latency requirements are best handled by streaming services like Amazon Kinesis Data Streams or Amazon MSK. These services are built to ingest and process data in near real time, distribute load across shards/partitions, and support scaling consumers.

When workloads are bursty or exhibit sudden spikes in volume, introducing a buffer such as Amazon SQS or Kinesis between producers and consumers decouples the ingestion rate from processing capacity. The buffer absorbs surges, persists messages durably, and allows consumers to scale out based on backlog. This prevents data loss and reduces the chance of overloading downstream systems like databases.

Batch ingestion can be efficient for large files or scheduled transfers when latency is less critical, but it is not appropriate when near-real-time processing is required. Similarly, relying only on CPU metrics or pushing all traffic directly to databases ignores the real constraints: backlog, write throughput, and durability under spikes.


Question 7

Topic: Design High-Performing Architectures

An analytics team stores 5 TB of raw event data in Amazon S3 as uncompressed CSV files. Athena queries typically reference 5 of 200 columns. To improve query performance, the company wants to minimize data read by scanning only the referenced columns. Which change MOST directly meets this requirement?

Options:

  • A. Compress the existing CSV files with GZIP before querying.

  • B. Convert the CSV files to Apache Parquet and store them in Amazon S3.

  • C. Increase Athena query engine parallelism by running more concurrent queries.

  • D. Load the CSV data into an Amazon RDS for PostgreSQL DB instance and index the queried columns.

Best answer: B

Explanation: The key performance problem is that the data is stored as row-based CSV while most Athena queries touch only a small subset of many columns. With a row-based format, each query must read all columns for every row it scans, even if it uses only a few, leading to excessive I/O and slower queries.

Converting the data to a columnar format such as Apache Parquet allows Athena to perform column pruning: it reads only the column chunks referenced in the query. For a wide table with 200 columns where queries use only 5, this can reduce data scanned by roughly a factor of 40, which directly improves performance and can also reduce cost in services that bill per TB scanned.

Other changes like compression, indexing in a row-store database, or increasing parallelism may help in different ways, but they do not provide the specific optimization of reading only the needed columns from S3-based analytics data.


Question 8

Topic: Design Cost-Optimized Architectures

A company stores 50 TB of application audit logs in Amazon S3 and must retain them for 7 years to meet compliance requirements. Logs are heavily queried during the first 30 days, then accessed infrequently but still require millisecond access for the next 11 months. After 1 year, access is rare and can tolerate hours of restore time, but durability must remain high. The company wants to minimize storage costs using S3 lifecycle policies without changing the application. Which solution BEST meets these requirements?

Options:

  • A. Store logs in S3 Standard and use an S3 lifecycle rule to transition them directly to S3 Glacier Flexible Retrieval after 30 days, expiring them after 7 years.

  • B. Store logs in S3 Intelligent-Tiering for the entire 7-year retention period with no additional lifecycle transitions.

  • C. Store logs in S3 Standard-IA from creation and keep them there for the full 7-year retention period.

  • D. Store logs in S3 Standard, then use an S3 lifecycle rule to transition them to S3 Standard-IA after 30 days and to S3 Glacier Deep Archive after 365 days, expiring them after 7 years.

Best answer: D

Explanation: Using S3 lifecycle transitions from S3 Standard to S3 Standard-IA and then to S3 Glacier Deep Archive matches each access phase’s latency needs while minimizing 7-year storage cost and meeting compliance.


Question 9

Topic: Design Resilient Architectures

A company runs a web application in a single AWS Region using an Application Load Balancer (ALB) with Amazon EC2 instances and an Amazon S3 bucket for static assets. Users are globally distributed and report high latency, especially for dynamic API calls. The company wants to reduce latency for both static and dynamic content, offload as much traffic as possible from the ALB and EC2 instances, and protect the application from common web exploits. Personalized, authenticated responses must never be cached and shared across users, and the solution should minimize operational overhead by using managed services. Which architecture meets these requirements BEST?

Options:

  • A. Use AWS Global Accelerator in front of the ALB for all traffic, continue serving static content directly from the S3 bucket, and attach AWS WAF to the ALB.

  • B. Place an Amazon CloudFront distribution in front of the ALB, cache all responses from the ALB for several minutes without forwarding authentication headers or cookies, and attach AWS WAF to the CloudFront distribution.

  • C. Place an Amazon CloudFront distribution in front of both the S3 bucket and the ALB, cache only static paths, configure CloudFront to forward authentication headers and cookies and disable caching for dynamic API paths, and attach AWS WAF to the CloudFront distribution.

  • D. Place an Amazon CloudFront distribution only in front of the S3 bucket for static assets, leave users to access the ALB directly for APIs, and attach AWS WAF to the ALB.

Best answer: C

Explanation: Using CloudFront in front of both S3 and the ALB, caching only static content while forwarding authentication data and disabling caching for dynamic paths, and integrating AWS WAF at the edge satisfies the latency, origin offload, security, and operational requirements together.


Question 10

Topic: Design Secure Architectures

Which of the following statements correctly describe how to use AWS security services for specific security requirements? (Select THREE.)

Options:

  • A. Use Amazon GuardDuty to continuously analyze AWS CloudTrail, VPC flow logs, and DNS logs for suspicious or unexpected activity in your AWS accounts.

  • B. Use Amazon Macie to automatically discover and classify sensitive data, such as PII, stored in Amazon S3 buckets.

  • C. Use AWS Shield to block SQL injection and cross-site scripting (XSS) attacks on HTTP requests sent to an Application Load Balancer.

  • D. Use AWS Secrets Manager to securely store and automatically rotate database credentials used by an AWS Lambda function.

  • E. Use Amazon Cognito to continuously scan Amazon S3 buckets for personally identifiable information (PII) to support data privacy regulations.

  • F. Use AWS WAF to centrally store and automatically rotate API keys used by multiple microservices running on Amazon ECS.

Correct answers: A, B and D

Explanation: This question checks understanding of when to use core AWS security services for common requirements: threat detection, secret management, and data classification.

Amazon GuardDuty is used for intelligent threat detection and continuously analyzes several AWS data sources for suspicious behavior.

AWS Secrets Manager is designed for securely storing secrets and automating their rotation, removing the need to hard-code secrets in application code.

Amazon Macie focuses on data privacy and protection by discovering and classifying sensitive data in Amazon S3, such as PII.

Other services listed (AWS Shield, Amazon Cognito, AWS WAF) play important but different roles: DDoS protection, identity and access for end users, and application-layer firewalling respectively. They do not perform the functions described in the incorrect statements.


Question 11

Topic: Design Resilient Architectures

A company must replicate 2TB of daily backup data from its primary AWS Region to a disaster recovery Region within 24 hours over a dedicated link. What is the minimum sustained network throughput required for this replication?

Use 1TB = 1,000GB, 1GB = 10^9 bytes, 1 byte = 8 bits, 1 day = 86,400 seconds. Round your answer up to the nearest Mbps.

Options:

  • A. 186Mbps

  • B. 100Mbps

  • C. 1,000Mbps

  • D. 50Mbps

Best answer: A

Explanation: To ensure that daily backup data is fully replicated to the disaster recovery Region within 24 hours, you must calculate the minimum sustained throughput that can transfer the entire 2TB in one day.

First, convert 2TB per day into bits per second.

  • 2TB per day = 2,000GB per day (using 1TB = 1,000GB)
  • 1GB = 10^9 bytes
  • 1 byte = 8 bits
  • 1 day = 86,400 seconds

So the number of bits per day is:

\[ \text{bits per day} = 2{,}000 \times 10^{9} \, \text{bytes} \times 8 \, \frac{\text{bits}}{\text{byte}} = 16 \times 10^{12} \, \text{bits} \]

Convert bits per day to bits per second by dividing by the number of seconds in a day:

\[ \text{bits per second} = \frac{16 \times 10^{12}}{86{,}400} \approx 1.8519 \times 10^{8} \, \text{bits/s} \approx 185.19\,\text{Mbps} \]

Rounding up to the nearest Mbps, the minimum sustained throughput required is 186Mbps.

This ensures the replication completes within 24 hours, meeting the durability and availability objective for daily off-site backups without significantly overprovisioning bandwidth.


Question 12

Topic: Design Secure Architectures

A company uses multiple AWS accounts and services such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. Security engineers want a single place to centrally view and prioritize security findings across all accounts. Which approach is NOT appropriate to meet this requirement?

Options:

  • A. Create a custom AWS Lambda function that polls each security service’s API in every account and writes raw JSON events to a central Amazon S3 bucket for analysts to manually review.

  • B. Configure Amazon Inspector to automatically send its vulnerability findings to AWS Security Hub, where they are viewed and prioritized along with findings from other AWS security services.

  • C. Use Amazon Detective together with AWS Security Hub so that Security Hub surfaces GuardDuty and Inspector findings centrally, while Detective is used for deeper investigation of high-priority issues.

  • D. Enable AWS Security Hub in a delegated administrator account, integrate GuardDuty, Inspector, and Macie, and aggregate findings from all member accounts using AWS Organizations.

Best answer: A

Explanation: The requirement is for a single place to centrally view and prioritize security findings from services like GuardDuty, Inspector, and Macie across multiple accounts. At the SAA-C03 level, the key design choice is to use AWS’s managed security services that already aggregate and normalize findings, instead of building ad-hoc collection mechanisms.

AWS Security Hub is purpose-built to aggregate, normalize, and prioritize findings from multiple AWS security services and many third-party tools. It supports multi-account setups via AWS Organizations and provides dashboards, security standards, and automated insights. Amazon Inspector and Amazon GuardDuty act as findings sources feeding into Security Hub, while Amazon Detective helps investigate issues using correlated log data.

Manually polling APIs and dumping raw events into S3 does not provide central prioritization or an operationally efficient view, and it recreates undifferentiated tooling that Security Hub already provides. That approach is therefore the anti-pattern in this scenario.


Question 13

Topic: Design Secure Architectures

Which THREE of the following statements describe recommended patterns when designing flexible IAM authorization models using users, groups, roles, and policies? (Select THREE.)

Options:

  • A. Application code running on Amazon EC2 instances should typically use an IAM user with long-term access keys stored in configuration files, to avoid the overhead of using an IAM role.

  • B. Attaching permissions to IAM groups instead of directly to individual IAM users makes it easier to adjust access as people join, leave, or change job roles.

  • C. To minimize policy sprawl, it is best practice to create a single large inline policy attached directly to each IAM user, instead of using reusable managed policies.

  • D. Because IAM roles cannot be used with federated identities, external partners must always be created as IAM users in each AWS account they need to access.

  • E. Using cross-account IAM roles from a central security account allows security engineers to access workloads in other accounts without creating separate IAM users in each account.

  • F. Defining a highly privileged break-glass administrator role that requires MFA and has no long-term access keys is a recommended pattern for emergency access.

Correct answers: B, E and F

Explanation: Flexible IAM authorization models focus on separating identity from permissions, using groups and roles for manageability, and minimizing long-term credentials. IAM users generally represent individual people or machine identities only when necessary; permissions are attached to groups and roles using reusable managed policies. Cross-account roles and break-glass roles are key patterns for centralized security and emergency access.

The correct statements highlight group-based permission management, cross-account roles from a central security or governance account, and a controlled, MFA-protected break-glass administrator role without long-term access keys. The incorrect statements promote unsafe patterns such as long-term access keys for applications, user-specific inline policies, and the misconception that roles cannot be used with federation.


Question 14

Topic: Design Secure Architectures

An AWS organization has 1 management and 40 workload accounts. A central security team currently owns all IAM permissions in workload accounts, so developers must file tickets to create roles. The team wants to reduce operational load while enforcing least privilege and organization-wide guardrails. Which change is BEST?

Options:

  • A. Replace the existing SCPs with IAM permission boundaries and give workload-account admins full IAM administrative permissions in their accounts.

  • B. Keep the current SCPs and security-owned IAM permissions model, but add an internal ticket automation portal that files IAM change requests on behalf of developers.

  • C. Delegate IAM role and policy creation to workload-account IAM admins, but require all new roles to use a security-team–managed IAM permission boundary, keeping existing SCP guardrails unchanged.

  • D. Attach an SCP to the workload OU that allows all actions on all resources, and rely on IAM Access Analyzer findings for oversight by the security team.

Best answer: C

Explanation: The scenario describes a multi-account AWS Organizations setup where a central security team currently controls all IAM changes in each workload account. Developers must file tickets for any IAM role or policy changes, creating an operational bottleneck. At the same time, the organization requires strong, centralized guardrails and least privilege.

A well-architected pattern for delegated administration combines:

  • Service control policies (SCPs) at the organization or OU level to enforce global guardrails (for example, prohibiting disabling CloudTrail or leaving the organization).
  • IAM permission boundaries within each account to define a maximum permissions envelope for roles created by delegated admins.

In this pattern, the security team manages the SCPs and the permission boundary policy document. Workload-account IAM admins are granted permissions such as iam:CreateRole, iam:PutRolePolicy, etc., but only when they attach the approved permission boundary. This allows application teams to create and adjust IAM roles quickly while ensuring they cannot exceed the limits defined by security. Operational load on the central team drops, but security posture is preserved or improved.

The other choices either remove or weaken centralized guardrails, violating least privilege and separation of duties, or they do not materially reduce the operational burden on the security team and thus are not a true optimization of the baseline design.


Question 15

Topic: Design Secure Architectures

A company is designing a secure access pattern for an internet-facing web application. Users access the app through Amazon CloudFront, which forwards to an Application Load Balancer (ALB) in front of application servers that connect to a relational database. The application also calls internal microservices running on AWS.

Which of the following statements describe appropriate security responsibilities at each layer in this architecture? (Select THREE.)

Options:

  • A. Database access from the application layer should be restricted with security groups and database users/roles so that only the application servers, using least-privilege credentials, can connect to the database endpoint.

  • B. The application should always perform server-side input validation and output encoding, even when AWS WAF and client-side validation are configured.

  • C. Deploying AWS WAF on the CloudFront distribution helps block common web exploits (such as SQL injection) close to the edge, reducing malicious traffic that reaches the ALB and application servers.

  • D. Service-to-service calls between AWS microservices should rely primarily on IP allow lists in security groups instead of using IAM roles or signed tokens for authentication and authorization.

  • E. If the ALB terminates HTTPS and performs user authentication with OIDC, the application no longer needs to perform fine-grained authorization checks on user actions.

  • F. Storing database credentials in application environment variables without any encryption is sufficient as long as the application subnets are private and not directly internet-routable.

Correct answers: A, B and C

Explanation: In a CloudFront → ALB → application → database architecture with internal microservices, security should be applied in layers (defense in depth). Each layer has distinct responsibilities: edge protection, transport security, authentication, authorization, input validation, and data-layer controls. Relying on a single layer (for example, WAF only or network controls only) leaves gaps and violates Well-Architected security best practices.

Edge services such as CloudFront with AWS WAF can block many common web exploits and reduce load on downstream resources. However, they cannot replace proper application-layer controls like input validation and authorization logic. Similarly, network-level controls like security groups and private subnets limit reachability, but they do not provide identity or fine-grained permissioning; service-to-service calls should use IAM or token-based mechanisms. At the data layer, least-privilege access with both network and database-level restrictions is critical to protect stored data.

The correct statements together describe a layered security model: WAF at the edge, robust validation and authorization in the application, strong identity-based controls for microservices, and least-privilege access to the database. The incorrect statements either over-trust one layer (such as ALB authentication or private subnets) or recommend weak secret management practices.


Question 16

Topic: Design High-Performing Architectures

Which THREE statements about configuring AWS data ingestion services to match workload characteristics are correct? (Select THREE.)

Options:

  • A. For an Amazon SQS queue used as a Lambda event source, configuring a larger batch size lets each Lambda invocation process more messages, improving throughput and reducing cost per message at the expense of higher per-message latency.

  • B. For Amazon Kinesis or DynamoDB Streams event sources, reducing the AWS Lambda batch size can decrease per-record latency but typically increases overall invocation overhead.

  • C. Increasing the number of shards in an Amazon Kinesis data stream increases the stream’s maximum aggregate throughput and parallelism for producers and consumers.

  • D. Kinesis Data Firehose buffer size and buffer interval settings affect only delivery cost; they do not significantly impact delivery latency to the destination.

  • E. When some partition keys are hotter than others in a Kinesis data stream, increasing the consumer batch size is the primary way to eliminate throttling on the hot partition.

Correct answers: A, B and C

Explanation: Aligning ingestion configuration with workload characteristics typically involves tuning parallelism, batch size, and buffering so that throughput, latency, and cost match requirements.

In Amazon Kinesis Data Streams, shards are the unit of capacity and parallelism. Each shard provides a fixed amount of read and write throughput. Increasing the number of shards increases the maximum sustainable throughput and allows more consumer workers to process data in parallel, as long as the partition keys distribute load evenly.

For stream-based event sources such as Kinesis and DynamoDB Streams, AWS Lambda’s batch size setting controls how many records are processed per invocation. Smaller batches are sent to Lambda more frequently, which reduces the time a single record waits in the stream (lower latency), but causes more invocations and higher overhead. Larger batches improve efficiency and throughput but increase per-record latency.

Similarly, when Lambda is triggered by Amazon SQS, the batch size determines how many messages are fetched and passed to a single invocation. Larger batch sizes generally improve throughput and reduce cost per message because invocation overhead is shared, but they increase the potential wait time for individual messages.

Kinesis Data Firehose uses buffer size and buffer interval to decide when to flush data to its destination. These parameters directly impact delivery latency: larger buffers or longer intervals delay delivery to accumulate more data, trading latency for efficiency.

Finally, hot partitions in Kinesis (where some partition keys receive disproportionate traffic) cannot be fixed by merely changing consumer batch size. The bottleneck is the shard’s provisioned capacity; the correct fixes are changing partition key design or performing reshard operations to add more shards and spread load.


Question 17

Topic: Design Resilient Architectures

Partners upload hourly CSV files via AWS Transfer Family (SFTP) into S3. A Lambda function parses each file and writes all rows into Amazon RDS using credentials from AWS Secrets Manager. During uploads, Lambda often times out and RDS CPU reaches 100%. The business accepts a few minutes of delay. Which change will MOST effectively resolve the issue?

Options:

  • A. Modify the AWS Transfer Family server to limit concurrent SFTP uploads so that fewer Lambda functions run at the same time.

  • B. Have the Lambda function publish each parsed record to an Amazon SQS standard queue, and add consumers that read from the queue and batch writes to RDS.

  • C. Cache the database credentials in Lambda environment variables instead of calling AWS Secrets Manager for each file.

  • D. Increase the RDS instance to a larger size to handle the higher write throughput during file uploads.

Best answer: B

Explanation: Symptom: During hourly CSV uploads, many Lambda functions parse files and write directly into Amazon RDS. CloudWatch shows Lambda timeouts and RDS CPU reaching 100% during these spikes.

Root cause: The architecture couples file ingestion tightly to database writes. Each CSV can contain many rows, and multiple Lambdas may run concurrently, all issuing synchronous insert statements directly to RDS. This sudden, bursty write load overwhelms the database, causing high CPU utilization, longer query times, and eventually Lambda timeouts. AWS Transfer Family and AWS Secrets Manager are functioning correctly; the problem is the lack of an asynchronous buffer between ingestion and the database.

Fix: Introduce Amazon SQS as a decoupling layer. The Lambda function should parse each row and publish it (or small batches of rows) as messages to an SQS standard queue. A separate set of consumers (for example, another Lambda function or an ECS service with Auto Scaling) reads from the queue and performs batched writes to RDS at a controlled rate. This design smooths traffic spikes, reduces concurrent DB connections and CPU pressure, and still meets the requirement that processing can be delayed by a few minutes. It preserves both the existing SFTP integration through AWS Transfer Family and secure credential management via AWS Secrets Manager while improving resilience and scalability.


Question 18

Topic: Design Cost-Optimized Architectures

Which of the following statements about using AWS tools to monitor and analyze network-related (data transfer) charges are correct? (Select TWO.)

Options:

  • A. Creating an AWS Budget on “total EC2 instance hours” will automatically notify you when your inter-Region data transfer charges exceed a chosen threshold.

  • B. The AWS Cost and Usage Report (CUR) can be delivered to Amazon S3 and queried (for example, with Amazon Athena) to get per-usage-type line items for data transfer costs.

  • C. You can use AWS Cost Explorer and filter by usage type to identify which services and Regions generate the highest data transfer charges.

  • D. The AWS Pricing Calculator is the primary tool to obtain an hourly, historical breakdown of your past data transfer charges across all accounts.

  • E. VPC Flow Logs include per-GB transfer pricing details for each connection, allowing you to calculate exact data transfer cost from the log entries alone.

Correct answers: B and C

Explanation: Monitoring network-related spend on AWS focuses on analyzing data transfer usage and cost across services and Regions. At the Associate level, the two primary billing tools for this are AWS Cost Explorer and the AWS Cost and Usage Report (CUR).

Cost Explorer lets you interactively explore historical spend by filtering and grouping on dimensions such as service, Region, and usage type. Data transfer is represented by specific usage types (for example, DataTransfer-Regional-Bytes, USE2-DataTransfer-Out-Bytes), so you can surface which workloads and locations are driving network charges.

For deeper analysis, the AWS Cost and Usage Report provides line-item data at the most granular level available. You can configure CUR to deliver CSV or Parquet files to an S3 bucket and then query them with Amazon Athena, Amazon Redshift, or other tools. This allows detailed attribution of data transfer cost by account, tag, usage type, and more.

Other services like VPC Flow Logs, AWS Budgets, and the AWS Pricing Calculator are useful but do not, by themselves, provide detailed historical breakdowns of network charges in the ways described by the incorrect statements.


Question 19

Topic: Design Cost-Optimized Architectures

A startup is designing a new web application that requires a relational database. They want to minimize licensing costs, avoid proprietary database vendors, and use an engine with a large open-source ecosystem and no per-core licensing fees. Which database engines on Amazon RDS are the MOST appropriate? (Select TWO.)

Options:

  • A. Amazon RDS for MySQL

  • B. Amazon Aurora MySQL-Compatible Edition

  • C. Amazon RDS for PostgreSQL

  • D. Amazon RDS for Oracle Database

  • E. Amazon RDS for SQL Server

Correct answers: A and C

Explanation: The scenario emphasizes three key requirements: minimizing licensing costs, avoiding proprietary database vendors, and using an engine with a large open-source ecosystem. Open-source engines such as MySQL and PostgreSQL, when run on Amazon RDS, satisfy these goals because there are no separate per-core commercial database licenses and they offer mature, widely adopted ecosystems.

Commercial engines like Oracle Database and SQL Server add significant licensing cost and are tightly controlled by proprietary vendors, making them poor fits for this cost-optimization requirement. Amazon Aurora, while MySQL-compatible and without separate license fees, is an AWS-proprietary managed database engine rather than a community-governed open-source project, so it does not fully align with the desire to avoid proprietary vendors and maximize portability.

Therefore, the most cost-optimized and portable choices that align with the stated constraints are Amazon RDS for MySQL and Amazon RDS for PostgreSQL.


Question 20

Topic: Design Resilient Architectures

Which THREE statements about AWS disaster recovery (DR) strategies and RPO/RTO trade-offs are correct? (Select THREE.)

Options:

  • A. Backup and restore is typically the lowest-cost DR strategy but also has the highest RPO and RTO compared to other common AWS DR patterns.

  • B. In a pilot light strategy, a minimal version of the critical application components runs continuously in the DR Region and is scaled up to full capacity during a disaster.

  • C. Recovery Point Objective (RPO) defines the maximum acceptable application downtime, while Recovery Time Objective (RTO) defines how much data loss in time is acceptable.

  • D. A warm standby strategy usually has a worse RTO than backup and restore because most components in the DR Region are completely turned off until a disaster occurs.

  • E. In a pilot light strategy, the DR Region must run a full-capacity duplicate of the production environment at all times to ensure instant failover.

  • F. A multi-site active-active strategy can achieve near-zero RPO and very low RTO, but it is usually the most complex and expensive DR option.

Correct answers: A, B and F

Explanation: AWS disaster recovery strategies trade off cost against recovery speed and amount of acceptable data loss. Backup and restore is simple and inexpensive but recovers slowly with more data loss. Pilot light and warm standby improve RPO and RTO by keeping more of the environment running in the DR Region. Multi-site active-active offers the fastest recovery and smallest data loss, at the highest cost and complexity. RPO and RTO are core metrics: RPO is about data loss in time, and RTO is about downtime duration.


Question 21

Topic: Design Secure Architectures

A company runs a public web application behind an Application Load Balancer using HTTPS. Compliance requires annual TLS certificate renewal and annual rotation of a customer managed KMS key, without downtime. Which approach should the solutions architect AVOID?

Options:

  • A. About 30 days before expiration, request a new ACM certificate, update the ALB listener to use the new certificate during a low-traffic period, and enable automatic annual rotation on the KMS key while clients always reference the key alias.

  • B. Use an external public CA integrated with AWS Certificate Manager to import a certificate, configure an automated process to renew and re-import the certificate before expiration, attach it to the ALB, and enable automatic annual rotation on the customer managed KMS key.

  • C. Install the TLS certificate directly on each EC2 instance and manually replace it once per year during a maintenance window, temporarily disabling HTTPS, and turn off automatic rotation on the KMS key, planning to rotate the key only after a security incident.

  • D. Attach an ACM-issued public certificate to the ALB using DNS validation so ACM automatically renews and deploys the certificate, and enable automatic annual rotation on the customer managed KMS key while the application uses the KMS key alias.

Best answer: C

Explanation: For compliant and highly available architectures, key and certificate rotation should be regular, automated, and non-disruptive.

For TLS, using AWS Certificate Manager (ACM) with an Application Load Balancer allows certificates to be renewed and deployed automatically without touching individual instances or interrupting traffic. For KMS, enabling automatic rotation on customer managed KMS keys and using key aliases in application code allows the key to rotate on a yearly schedule without changes or downtime in the application.

Any strategy that relies on manual, infrequent rotation and intentionally accepts downtime or non-HTTPS access contradicts both security and availability requirements and should be avoided.


Question 22

Topic: Design Secure Architectures

An application runs on Amazon EC2 instances in private subnets within a VPC. It reads data from Amazon DynamoDB. Security recently removed the route to the NAT gateway and blocked all outbound internet access. The application now times out when calling DynamoDB. What should a solutions architect do to restore access while still preventing internet egress?

Options:

  • A. Create a gateway VPC endpoint for DynamoDB and update the private subnet route tables to send DynamoDB traffic to the endpoint.

  • B. Recreate the NAT gateway and restrict outbound access to only DynamoDB public IP address ranges in security groups and network ACLs.

  • C. Create an interface VPC endpoint powered by AWS PrivateLink for DynamoDB and associate it with the private subnets.

  • D. Move the EC2 instances into public subnets with public IP addresses and restrict outbound access to DynamoDB using security groups.

Best answer: A

Explanation: Symptom: The application previously reached DynamoDB over the internet via a NAT gateway. After security removed the NAT route and blocked outbound internet access, calls to DynamoDB now time out from private subnets.

Root cause: The EC2 instances are attempting to reach the public DynamoDB endpoint, which requires internet egress. With the NAT gateway removed and no outbound route, there is no path from the private subnets to DynamoDB.

Fix: For services like DynamoDB and Amazon S3, AWS provides gateway VPC endpoints. A gateway endpoint is a route-target in your VPC route tables that directs traffic to the AWS service over the AWS network without traversing the public internet. By creating a DynamoDB gateway VPC endpoint and updating the route tables for the private subnets to point DynamoDB traffic at this endpoint, the instances regain access to DynamoDB while still having no general internet egress. This both restores application functionality and satisfies the compliance requirement to prevent outbound internet access.

This design contrasts with public endpoints (which require internet egress, typically via a NAT gateway) and interface endpoints powered by AWS PrivateLink (used for many other AWS services and custom NLB-backed services). For DynamoDB specifically, the correct choice is a gateway VPC endpoint.


Question 23

Topic: Design Resilient Architectures

A company runs a fleet of Linux web servers across multiple Availability Zones behind an Application Load Balancer. The application stores user-uploaded images using standard POSIX file APIs and requires shared directory access from all servers, with automatic storage growth to very large scale. Which storage option is most appropriate?

Options:

  • A. Attach a single large Amazon EBS volume to one EC2 instance and export it via NFS to the other servers.

  • B. Use an Amazon EFS file system mounted on all web servers.

  • C. Store the images as BLOBs in a Multi-AZ Amazon RDS for MySQL database.

  • D. Store the images in an Amazon S3 bucket and access them directly via the application SDK.

Best answer: B

Explanation: The key discriminating factor in this scenario is the need for shared POSIX-compliant file system access across many Linux EC2 instances in multiple Availability Zones, with automatic storage scaling to very large sizes.

Amazon EFS is a fully managed, elastic, NFS-based file system that provides standard file system semantics (directories, permissions, POSIX APIs). It can be mounted concurrently by hundreds or thousands of Linux EC2 instances across multiple AZs in a region. EFS automatically grows and shrinks as files are added or removed, so there is no need to pre-provision capacity.

In contrast, object storage such as Amazon S3 and block storage such as Amazon EBS do not natively provide shared POSIX file system semantics across instances. While they are excellent for other patterns, they fail the specific requirement here. Building your own NFS server on top of EBS is possible but introduces operational overhead, fixed capacity, and single-instance failure risk, making it less resilient and less aligned with AWS managed-service best practices.

Therefore, using an Amazon EFS file system mounted on all web servers best matches the stated workload characteristics and resilience goals.


Question 24

Topic: Design Secure Architectures

Which of the following statements about securing the AWS account root user and IAM users are INCORRECT or unsafe? (Select TWO.)

Options:

  • A. IAM users who require only programmatic access should be created without a console password to avoid unnecessary exposure of the AWS Management Console.

  • B. Keeping one active access key for the root user for emergency use is acceptable if the key is stored securely offline.

  • C. The root user should be protected with MFA and used only for tasks that explicitly require root-level permissions.

  • D. For applications running on Amazon EC2, a secure practice is to create IAM users with long-term access keys and share those keys across all instances in the Auto Scaling group.

  • E. Access keys for IAM users should be rotated regularly and deactivated or deleted when no longer needed.

Correct answers: B and D

Explanation: The question focuses on best practices for securing the AWS account root user and IAM users, especially around MFA and credential hygiene.

AWS strongly recommends minimizing use of the root user, enabling MFA on it, and deleting any root access keys. For IAM users, good credential hygiene includes using the least privilege principle, rotating access keys, and not providing unnecessary console access. For workloads running on AWS services such as EC2, roles with temporary security credentials are preferred over long-term access keys.

The unsafe statements are the ones that keep a root access key “for emergencies” and that suggest sharing long-term IAM user access keys across EC2 instances. Both patterns increase risk and violate modern AWS security best practices.


Question 25

Topic: Design Resilient Architectures

Which TWO of the following statements about AWS messaging and publish/subscribe patterns are INCORRECT? (Select TWO.)

Options:

  • A. Using an Amazon SNS topic with multiple Amazon SQS subscriptions implements a fanout pattern that decouples a single producer from multiple independent consumers.

  • B. Amazon SQS FIFO queues are designed to guarantee that messages are processed in the order they are sent (within a message group) and support exactly-once processing semantics.

  • C. Consumers of an Amazon SNS topic normally poll the SNS topic for messages, which allows each consumer to control its own polling rate and back-pressure.

  • D. An Amazon SQS standard queue provides at-least-once delivery and best-effort ordering, so consumers should be designed to be idempotent.

  • E. You can configure an Amazon SQS dead-letter queue (DLQ) so that messages that exceed a configured receive count are moved to the DLQ for offline inspection or reprocessing.

  • F. Using an Amazon SQS queue directly between a producer and a consumer tightly couples them, because both must be online at the same time for messages to flow.

Correct answers: C and F

Explanation: AWS messaging services such as Amazon SNS and Amazon SQS are commonly combined to build scalable, loosely coupled architectures. SNS offers a publish/subscribe model with fanout to multiple subscribers. SQS offers durable, decoupled queues that allow producers and consumers to operate independently and at different rates.

Standard queues provide at-least-once delivery and best-effort ordering, so applications must be resilient to duplicate and out-of-order messages. FIFO queues add ordering guarantees and exactly-once processing semantics for workloads that require strict sequence handling. Dead-letter queues are a key error-handling pattern to prevent problematic messages from blocking the main flow.

The incorrect statements in this question confuse push vs pull semantics for SNS and misunderstand how queues affect coupling between components. SNS does not rely on consumers polling SNS, and SQS queues are used precisely to remove the requirement that producers and consumers be online at the same time.


Questions 26-50

Question 26

Topic: Design Secure Architectures

An organization centralizes shared VPC subnets using AWS Resource Access Manager and restricts which AWS accounts can modify networking resources by applying tightly scoped service control policies in AWS Organizations. Which AWS Well-Architected pillar is most directly addressed by this design choice?

Options:

  • A. Security

  • B. Reliability

  • C. Cost Optimization

  • D. Performance Efficiency

Best answer: A

Explanation: The scenario describes two key design choices: using AWS Resource Access Manager (AWS RAM) to share network resources like VPC subnets across accounts, and applying service control policies (SCPs) in AWS Organizations to tightly control which accounts can modify those resources.

Both actions are fundamentally about governing access. AWS RAM allows secure, controlled sharing of resources without needing to duplicate them, reducing the attack surface and improving visibility. SCPs provide central, organization-wide guardrails that enforce least privilege by defining the maximum available permissions for accounts.

These behaviors align directly with the Security pillar of the AWS Well-Architected Framework, which emphasizes strong identity and access management, protecting resources at multiple layers, and using mechanisms like policies to enforce security boundaries across accounts.


Question 27

Topic: Design High-Performing Architectures

A company ingests time-series sensor metrics from thousands of IoT devices into an Amazon DynamoDB table for low-latency key-value lookups. At peak, devices send 900 metrics per second. Each metric is stored as a 1 KB item. Each write capacity unit (WCU) supports 1 write/second for items up to 1 KB. What is the minimum number of WCUs the company should provision to sustain peak throughput?

Options:

  • A. 9,000 WCUs

  • B. 900 WCUs

  • C. 450 WCUs

  • D. 90 WCUs

Best answer: B

Explanation: This workload is a time-series, key-value access pattern that is well suited to Amazon DynamoDB as a managed NoSQL database. The question focuses on sizing write throughput using provisioned write capacity units (WCUs).

Each WCU in DynamoDB supports 1 write/second for items up to 1 KB in size. The workload requires 900 writes/second of 1 KB items.

Variables used:

  • \(R = 900\) writes/second (peak write rate)
  • \(C_{\text{per}} = 1\) write/second supported per WCU for 1 KB items

Calculation (one step):

\[ C_{\text{required}} = \frac{R}{C_{\text{per}}} = \frac{900}{1} = 900\,\text{WCUs} \]

Therefore, provisioning 900 WCUs is the minimum configuration that can sustain 900 writes/second of 1 KB items without throttling, while aligning with a cost-conscious, high-performing NoSQL design.


Question 28

Topic: Design Cost-Optimized Architectures

Which AWS service provides static anycast IP addresses that route user traffic over the AWS global network to the nearest healthy application endpoint, improving performance and potentially reducing data transfer costs for internet-facing applications?

Options:

  • A. AWS Global Accelerator

  • B. AWS Direct Connect

  • C. Amazon CloudFront

  • D. Amazon Route 53

Best answer: A

Explanation: AWS Global Accelerator is a networking service that provides a pair of static anycast IP addresses and uses the AWS global network to route user traffic to the nearest healthy application endpoint across multiple AWS Regions, improving performance, availability, and often reducing data transfer costs compared to standard internet routing paths.


Question 29

Topic: Design High-Performing Architectures

Which AWS service is specifically designed to decouple application components by durably storing messages across multiple Availability Zones and requiring consumers to poll for messages when they are ready to process them?

Options:

  • A. Amazon Kinesis Data Streams

  • B. Amazon EventBridge

  • C. Amazon Simple Queue Service (Amazon SQS)

  • D. Amazon Simple Notification Service (Amazon SNS)

Best answer: C

Explanation: Amazon SQS is AWS’s primary managed message queuing service. It is explicitly designed to decouple producers and consumers by buffering messages in a durable, multi-AZ queue. Producers send messages to the queue without needing to know when or how they will be processed. Consumers then poll the queue when they are ready to work, retrieving and processing messages at their own pace.

This pull-based pattern allows you to smooth traffic spikes: when request volume increases, more messages accumulate in the queue instead of overloading downstream systems. Auto Scaling groups of workers can scale based on queue depth, and failures in consumers do not affect producers because the messages remain stored in SQS until they are processed or expire.

Other services like SNS, EventBridge, and Kinesis also decouple components, but they use different models (push notifications, event buses, streaming) rather than a classic polling message queue designed to buffer discrete work items for asynchronous processing.


Question 30

Topic: Design Cost-Optimized Architectures

Which of the following statements about using AWS Lambda to improve compute cost efficiency is INCORRECT?

Options:

  • A. AWS Lambda pricing is based on the number of requests and the compute duration used, so there is no charge for idle server capacity.

  • B. Using event-driven Lambda functions for spiky traffic can be more cost-effective than running EC2 instances sized for peak load, because Lambda scales automatically and charges only when invoked.

  • C. Choosing an appropriate memory size for a Lambda function can lower cost because pricing scales with the configured memory and execution time.

  • D. Configuring provisioned concurrency for all Lambda functions always reduces cost compared to standard Lambda invocations because it eliminates cold starts.

Best answer: D

Explanation: AWS Lambda is a serverless compute service where you pay for the number of requests and the execution time, rather than for continuously running servers. This model can significantly reduce costs for intermittent, spiky, or unpredictable workloads, because idle capacity is not billed. Cost optimization with Lambda focuses on aligning usage to actual demand and right-sizing function resources.

Provisioned concurrency is a Lambda feature designed to improve performance by reducing cold starts, especially for latency-sensitive applications. However, it keeps a configured number of execution environments initialized and ready, which incurs an additional hourly cost regardless of actual invocation volume. Therefore, provisioned concurrency must be used selectively and tuned carefully when cost optimization is a priority.

The incorrect statement is the one claiming that configuring provisioned concurrency for all functions always reduces cost. Provisioned concurrency can improve responsiveness but does not inherently lower costs; in many cases it increases them relative to standard on-demand Lambda execution, especially when utilization is low.


Question 31

Topic: Design High-Performing Architectures

Which AWS service is specifically designed to provide a durable message queue that buffers requests between producers and consumers, enabling asynchronous processing and smoothing traffic spikes?

Options:

  • A. Amazon Kinesis Data Streams

  • B. Amazon SNS

  • C. Amazon EventBridge

  • D. Amazon SQS

Best answer: D

Explanation: Amazon SQS (Simple Queue Service) is AWS’s fully managed message queuing service. It is explicitly designed to decouple application components by letting producers send messages to a durable queue, where they are stored until consumers are ready to process them. Consumers poll the queue at their own rate, which smooths out traffic spikes and prevents sudden load bursts on back-end compute.

This pattern improves performance and elasticity: front-end systems can respond quickly by offloading work to the queue, while back-end workers scale horizontally (for example, with Auto Scaling or AWS Lambda) based on queue depth. SQS provides at-least-once delivery, message retention, and visibility timeouts, making it well suited for asynchronous workloads such as order processing, background jobs, and batch tasks.

Other messaging and event services like SNS, EventBridge, and Kinesis also decouple producers and consumers but are optimized for different patterns: pub/sub notifications, event routing, and real-time streaming analytics, rather than simple work queues for asynchronous processing between compute tiers.


Question 32

Topic: Design Secure Architectures

Which statement correctly describes how AWS Backup centralizes backup and retention configuration for services such as Amazon EC2, Amazon EBS, Amazon RDS, and Amazon DynamoDB?

Options:

  • A. You must configure backup schedules separately in each AWS service; AWS Backup only aggregates backup status and reports into a single dashboard.

  • B. AWS Backup automatically backs up every supported resource in a Region using a single fixed daily schedule and 35-day retention that cannot be changed.

  • C. You create backup plans with scheduled rules and lifecycle policies, then assign supported resources or tags so the same backup and retention policy is applied across multiple services.

  • D. AWS Backup can schedule backups for multiple services, but retention must still be configured separately in each individual service console.

Best answer: C

Explanation: AWS Backup is a centralized, policy-based service for managing backups across many AWS services, including Amazon EC2, Amazon EBS, Amazon RDS, and Amazon DynamoDB. Instead of setting up independent backup schedules and retention rules in each service, you use AWS Backup to create backup plans that define when backups run and how long they are retained.

A backup plan contains one or more backup rules. Each rule specifies a schedule (for example, daily at a certain time), the backup vault to store recovery points, and lifecycle settings such as when to transition to cold storage and when to expire backups. You then assign resources (by resource ID or by tags) from supported services to the backup plan. AWS Backup automatically applies the same schedule and lifecycle policy to all those resources, giving you consistent, centralized control.

This policy-based approach is a key data protection control: it reduces configuration drift, simplifies compliance with organizational backup standards, and makes it easier to audit and manage backups across multiple services and accounts.


Question 33

Topic: Design Resilient Architectures

Which of the following statements about implementing immutable infrastructure on AWS are INCORRECT? (Select THREE.)

Options:

  • A. Blue/green and canary deployments are incompatible with immutable infrastructure because they require modifying the same servers repeatedly.

  • B. Rollback in an immutable pattern typically involves redeploying a previous known-good image or version, rather than reconfiguring existing instances in place.

  • C. In immutable architectures, each EC2 instance should be treated as a unique pet and manually repaired when it fails to preserve its configuration state.

  • D. Immutable infrastructure requires administrators to SSH into servers and patch them in place to apply security updates quickly.

  • E. Using versioned Amazon Machine Images (AMIs) or container images for each release is a common way to implement immutable infrastructure.

Correct answers: A, C and D

Explanation: Immutable infrastructure means you never change servers in place; instead, you build new ones from a known, versioned image and replace the old ones. This approach reduces configuration drift, speeds recovery, and makes rollbacks predictable.

On AWS, this often involves baking Amazon Machine Images (AMIs) or building container images for each application version. Deployments launch new instances or tasks from these images and then cut over traffic using load balancers or DNS. If something goes wrong, you roll back by redeploying a previous image instead of trying to repair or reconfigure existing resources.

Patterns such as blue/green and canary deployments fit naturally with immutability, because they route traffic between separate, freshly built environments rather than modifying the same set of servers repeatedly. Likewise, instances are treated as disposable: if one is unhealthy, you terminate it and let Auto Scaling or your orchestrator replace it from the latest image, instead of logging in to fix it manually.


Question 34

Topic: Design High-Performing Architectures

A company must migrate 50 TB of archived log files from its on-premises data center to Amazon S3 within 2 weeks. The internet link is 200 Mbps shared, and the company cannot use more than 50% of this bandwidth. All data must be encrypted during transfer and at rest. The solution should minimize both operational effort and overall cost while meeting the deadline. Which solution is the MOST appropriate?

Options:

  • A. Provision a new 1 Gbps AWS Direct Connect connection and use AWS DataSync over Direct Connect to transfer the 50 TB.

  • B. Enable S3 Transfer Acceleration on the target bucket and upload the 50 TB directly from the data center over HTTPS.

  • C. Use AWS DataSync over the existing internet connection, throttling DataSync to 100 Mbps to stay within the bandwidth limit.

  • D. Order AWS Snowball Edge devices, copy the archives to the devices on-premises, and have AWS import the data into Amazon S3.

Best answer: D

Explanation: Snowball Edge is designed for large, one-time bulk data migrations under bandwidth and time constraints, providing encrypted, offline transfer that avoids saturating the network while minimizing cost and operational effort.


Question 35

Topic: Design Resilient Architectures

A company runs a web application in two AWS Regions using Route 53 failover routing. The primary Region has an Application Load Balancer. Route 53 uses a health check directly against the ALB. The health check is configured with a 10-second interval and a failure threshold of 3. The DNS record has a TTL of 60 seconds. After the ALB in the primary Region becomes unhealthy, Route 53 should fail over traffic automatically to the secondary Region.

Assuming the worst case and ignoring any DNS resolver caching beyond the TTL, what is the maximum expected failover time in seconds? (Round to the nearest whole second.)

Options:

  • A. 180 seconds

  • B. 90 seconds

  • C. 60 seconds

  • D. 30 seconds

Best answer: B

Explanation: This scenario tests how Route 53 failover timing works when using health checks and DNS TTLs.

Route 53 failover routing depends on two main time components in the worst case:

  • Failure detection time: How long it takes the health check to declare the primary endpoint unhealthy.
  • DNS propagation time: How long existing DNS responses remain cached before clients start using the updated response pointing to the secondary Region.

Given:

  • Health check interval \(I = 10\) seconds
  • Failure threshold \(N = 3\) consecutive failed checks
  • Record TTL \(T = 60\) seconds

Worst-case failure detection occurs when the endpoint fails just after a successful health check. Route 53 then needs \(N\) failed checks spaced \(I\) seconds apart before marking it unhealthy.

The failure detection time is:

\[ \text{Detection time} = I \times N = 10 \times 3 = 30\ \text{seconds} \]

Once Route 53 updates the status and starts responding with the secondary endpoint, clients that previously resolved the DNS name can still use the old cached answer for up to the TTL. In the worst case, they just cached the old answer, so they will wait the full TTL.

The DNS propagation time is \(T = 60\) seconds.

Therefore, the maximum expected failover time is:

\[ \text{Total failover time} = 30 + 60 = 90\ \text{seconds} \]

This aligns with automated failover design: understanding how health check configuration and DNS TTL affect RTO when using Route 53 failover routing between Regions.


Question 36

Topic: Design Secure Architectures

A company has 25 AWS accounts in AWS Organizations. The security team must enforce centrally managed, deep packet inspection for all internet egress and VPC-to-VPC traffic, with organization-wide firewall rules, auditability, and minimal per-account administration. Which approaches SHOULD BE AVOIDED? (Select THREE.)

Options:

  • A. Create an AWS Firewall Manager policy that automatically deploys AWS Network Firewall endpoints and a shared rule group into member VPCs in the organization.

  • B. Configure security groups and network ACLs separately in each account, and use AWS Config to check for non-compliant rules.

  • C. In each account, deploy a third-party firewall appliance in a dedicated VPC and route local traffic through it, managing rules by signing in to each appliance separately.

  • D. Use AWS Network Firewall in each application VPC without a shared transit layer, and let each application team manage its own firewall rule groups.

  • E. Deploy a centralized inspection VPC with AWS Network Firewall, connect all VPCs via AWS Transit Gateway, and manage firewall policies using AWS Firewall Manager.

Correct answers: B, C and D

Explanation: The scenario explicitly calls for centrally managed, deep packet inspection across all accounts and all internet and VPC-to-VPC traffic, with organization-wide rules, auditability, and minimal per-account administration. At the SAA-C03 level, this strongly indicates using AWS Network Firewall for stateful, deep packet inspection and AWS Firewall Manager in combination with AWS Organizations for central policy deployment and governance.

Designs that rely only on security groups and NACLs cannot perform deep packet inspection; they only filter at the network and transport layers on basic attributes such as protocol and port. Designs that manage firewalls independently in each account or VPC violate the requirement for centralized, organization-wide policy control and create operational overhead. The best-fit architectures either centralize traffic through a shared inspection VPC with AWS Network Firewall and Transit Gateway, or use AWS Firewall Manager to automatically deploy and manage Network Firewall resources and policies across accounts.


Question 37

Topic: Design Cost-Optimized Architectures

A company runs a stateless REST API on 10 t3.medium instances in an Auto Scaling group behind an ALB. Average CPU utilization is 10%, traffic is highly spiky, and the app cannot tolerate instance interruptions. Operations overhead must be minimized. Which change will MOST reduce idle capacity cost while meeting these requirements?

Options:

  • A. Replace the current On-Demand instances with Spot Instances in the Auto Scaling group to significantly reduce hourly instance cost.

  • B. Refactor the API into AWS Lambda functions integrated with Amazon API Gateway and decommission the Auto Scaling group.

  • C. Keep the current Auto Scaling group but purchase 3-year Compute Savings Plans for the 10 t3.medium instances and disable scale-in to simplify capacity management.

  • D. Containerize the API and run it on Amazon ECS using AWS Fargate behind the existing ALB, with a fixed number of tasks sized for peak traffic.

Best answer: B

Explanation: The baseline design runs 10 EC2 instances at only about 10% CPU utilization, which means roughly 90% of the provisioned capacity is idle most of the time. The workload is stateless and highly spiky, so it is a strong candidate for an event-driven, pay-per-use compute model.

Moving the API to AWS Lambda behind Amazon API Gateway removes the need to run instances 24/7. Lambda charges for requests and execution duration rather than for provisioned capacity. When traffic is low, there is effectively no idle capacity cost; when traffic spikes, Lambda scales automatically without the need to overprovision. This also minimizes operations overhead because AWS manages the compute fleet and scaling behavior.

Other options either maintain some form of always-on capacity or violate a key requirement (no interruptions). While they may reduce cost or improve operations in some ways, they do not deliver the same combination of idle cost removal, elasticity, and operational simplicity as the serverless, per-request model.


Question 38

Topic: Design Resilient Architectures

A retail company has an order microservice that must publish events for several independent subscriber services (email notifications, analytics, inventory). Processing must be asynchronous and must not delay order placement. The solution should minimize coupling and provide automatic retries for failed deliveries. Which solutions meet these requirements? (Select TWO.)

Options:

  • A. Publish order events to an Amazon SNS topic and subscribe a separate Amazon SQS standard queue for each subscriber service.

  • B. Write order events into an Amazon Kinesis Data Streams stream and have all subscriber services share one consumer application name to read from the stream.

  • C. Send order events to an Amazon EventBridge event bus and configure rules to route events to each target service (for example, AWS Lambda or Step Functions) with DLQs configured where needed.

  • D. Have the order microservice synchronously invoke each subscriber’s REST API through an Application Load Balancer and wait for responses.

  • E. Write order events to a single Amazon SQS FIFO queue that all subscriber services read from in parallel.

Correct answers: A and C

Explanation: The company needs an event-driven, asynchronous pattern where the order microservice publishes events and multiple independent subscribers consume them without impacting order placement. Pub/sub messaging or event buses are ideal because they decouple producers and consumers, allow independent scaling, and support retry behavior.

Using an Amazon SNS topic with SQS queues provides classic fanout: the order service publishes a single event, SNS delivers it to all subscribed SQS queues, and each subscriber processes messages from its own queue at its own pace. This is a well-known AWS pattern for loosely coupled, resilient microservices.

Amazon EventBridge also supports event-driven architectures. The order service sends events to an event bus, and rules route those events to various targets (such as Lambda functions, Step Functions, or other buses). EventBridge handles retries and optionally uses dead-letter queues, while decoupling event producers from consumers via schemas and event patterns.

In contrast, synchronous REST invocations create direct, blocking dependencies between services, a single SQS queue shared by all subscribers fails the fanout requirement, and the proposed Kinesis consumer configuration causes competing rather than broadcast consumption and adds unnecessary complexity.

Summary of options:

  • ✔ Publish to an SNS topic with an SQS queue per subscriber.
  • ✔ Send events to an EventBridge event bus with rules and appropriate targets/DLQs.
  • ✖ Synchronously call each subscriber via REST over ALB.
  • ✖ Use a single shared SQS FIFO queue for all subscribers.
  • ✖ Use Kinesis with a shared consumer app name for all subscribers.

Question 39

Topic: Design Cost-Optimized Architectures

A company needs to perform detailed, custom analytics on its Amazon S3 storage costs, down to individual resource usage and hourly granularity, using Amazon Athena. Which AWS cost management tool should the company use to obtain this level of data?

Options:

  • A. AWS Cost and Usage Report (CUR)

  • B. AWS Trusted Advisor

  • C. AWS Budgets

  • D. AWS Cost Explorer

Best answer: A

Explanation: The AWS Cost and Usage Report (CUR) is the authoritative source for the most granular cost and usage information in AWS. It delivers comprehensive, line-item records of AWS usage and associated charges, including hourly and resource-level detail. CUR files are stored in Amazon S3 and are specifically designed to be consumed by downstream tools such as Amazon Athena, Amazon Redshift, or external BI systems.

In contrast, AWS Cost Explorer and AWS Budgets work at a higher level of aggregation. Cost Explorer provides dashboards, visualizations, and some forecasting, while AWS Budgets focuses on thresholds and alerts. Neither provides the same raw, queryable, line-item dataset that CUR offers.

For a company that wants to run custom SQL queries in Athena to analyze S3 storage costs at fine granularity, the correct approach is to enable the AWS Cost and Usage Report and configure it to deliver reports to an S3 bucket, then query that data using Athena.


Question 40

Topic: Design High-Performing Architectures

A company runs a synchronous API using an AWS Lambda function with 512MB memory. Average duration is 900ms, p95 latency 980ms (SLA<1,000ms). Profiling shows the function is CPU-bound with no cold-start issues. Load testing with 1,024MB memory reduces duration to 400ms and lowers cost per request. Without changing code, which modification best improves cost-performance while meeting the latency SLA?

Options:

  • A. Keep memory at 512MB and enable provisioned concurrency of 200 to eliminate cold starts and improve latency.

  • B. Reduce the Lambda function’s memory size to 256MB to cut GB-second charges, accepting a longer execution time.

  • C. Increase the Lambda function’s memory size to 1,024MB in production to match the load-test configuration.

  • D. Split the logic into two 256MB Lambda functions orchestrated by AWS Step Functions to reduce per-invocation cost.

Best answer: C

Explanation: AWS Lambda pricing for execution is proportional to memory size and execution duration in GB-seconds. Increasing memory also scales up CPU and network resources, which can reduce execution time. For CPU-bound workloads, a higher memory setting can reduce duration by more than the proportional increase in memory, lowering both latency and cost per request.

In this scenario, the 512MB configuration has an average duration of 900ms and p95 latency of 980ms, just under the 1,000ms SLA. Profiling shows the function is CPU-bound and has no cold-start issues. Load testing demonstrates that at 1,024MB memory, execution time drops to 400ms and cost per request decreases. This means the higher memory tier provides more CPU, significantly reducing runtime and GB-seconds.

Because the constraints are to keep the code and architecture the same while improving cost-performance and respecting the latency SLA, the best optimization is to adopt the tested 1,024MB configuration in production.


Question 41

Topic: Design High-Performing Architectures

A ride-sharing company is designing databases for trip booking/payments, high-volume driver location updates, and real-time rider/driver matching. The solution must use managed services, ensure strong durability for bookings, and optimize performance for each pattern. Which design choices SHOULD BE AVOIDED? (Select THREE.)

Options:

  • A. Store all driver location updates in a single Amazon RDS MySQL instance, appending rows to one ever-growing table without time-based partitioning or tiering.

  • B. Use a single Amazon DynamoDB table for bookings, locations, and in-memory matching data, relying on eventually consistent reads to maximize throughput for all access patterns.

  • C. Store driver location updates in Amazon Timestream, partitioned by driver ID and time, and query recent trips using time-range filters.

  • D. Use Amazon Aurora MySQL in a multi-AZ configuration for trip booking and payment data, leveraging ACID transactions.

  • E. Maintain real-time rider/driver matching data in Amazon ElastiCache for Redis with keys expiring after a few minutes and no reliance on it as the system of record.

  • F. Persist trip booking and payment data only in Amazon ElastiCache for Redis, relying on occasional snapshots as the primary durability mechanism.

Correct answers: A, B and F

Explanation: The scenario calls for matching different workload patterns to appropriate managed database services while preserving durability for bookings and high performance for time-series and cache-like access patterns. A well-architected design separates transactional, time-series, and ephemeral cache workloads and uses services that natively support those patterns.

Aurora MySQL in multi-AZ is a strong fit for trip booking and payment data because it provides ACID transactions, relational modeling, and high availability. Amazon Timestream is built specifically for time-series workloads like frequent driver location updates, where data is written with timestamps and queried over time ranges. Amazon ElastiCache for Redis is ideal for real-time matching data where ultra-low latency is needed and occasional data loss is acceptable since another persistent store exists.

The designs that should be avoided either misuse an in-memory cache as a system of record for critical data, overload a single RDS instance with high-volume time-series writes in an unoptimized schema, or over-consolidate disparate workloads into a single DynamoDB table with eventual consistency, undermining transactional integrity and maintainability.


Question 42

Topic: Design Secure Architectures

Which of the following statements about using IAM Identity Center and SAML federation with external identity providers to access multiple AWS accounts are INCORRECT? (Select THREE.)

Options:

  • A. When configuring SAML federation directly with IAM (without IAM Identity Center), a SAML assertion must always map to an IAM user; mapping a federated principal directly to IAM roles is not supported.

  • B. When using IAM Identity Center, you must create a separate IAM user in each AWS account for every federated user so they can sign in to the console and CLI.

  • C. Federated users can use the AWS CLI by configuring AWS access profiles for IAM Identity Center, which obtain short-lived role-based credentials based on the user’s assigned permission sets instead of static access keys.

  • D. IAM Identity Center uses permission sets to provision IAM roles in target AWS accounts, and federated users obtain temporary credentials by assuming those roles after authenticating with the external IdP.

  • E. Using SAML federation with IAM Identity Center lets you centralize MFA enforcement at the external identity provider and avoid distributing long-lived access keys to users.

  • F. IAM Identity Center issues long-lived IAM access keys for federated users so they do not need to assume roles repeatedly when accessing AWS resources.

Correct answers: A, B and F

Explanation: Federated access to AWS is designed to avoid managing long-lived IAM users and static access keys in each account. Instead, users authenticate with an external identity provider (IdP), such as an Active Directory–backed IdP or a SAML provider, and then assume IAM roles that grant them temporary, scoped permissions.

IAM Identity Center simplifies this model across multiple AWS accounts. Administrators define permission sets, which IAM Identity Center uses to provision IAM roles in target accounts. When a user signs in via the IdP and selects an account and permission set, IAM Identity Center uses role assumption behind the scenes to provide short-lived credentials, both for console and CLI access.

Direct SAML federation to IAM (without IAM Identity Center) follows a similar principle: the SAML assertion identifies which IAM roles the user can assume. The user again receives temporary security credentials via role assumption, not a persistent IAM user and not long-lived access keys.

Any statement that requires creating per-account IAM users for federated identities or that claims federation uses long-lived access keys contradicts the core design and best practices of federated, role-based access.


Question 43

Topic: Design High-Performing Architectures

An architect is planning shard capacity for a new Amazon Kinesis Data Streams setup that ingests application logs. To size the stream correctly, the architect needs to know the documented per-shard write limit. What is the maximum incoming data rate supported by a single shard before additional shards are required?

Options:

  • A. Up to 1MB/second or 1,000 records/second for write operations

  • B. Up to 10MB/second for writes and 2,000 records/second for reads

  • C. Up to 5MB/second or 5,000 records/second for write operations

  • D. Up to 2MB/second for writes and 2MB/second for reads combined

Best answer: A

Explanation: Amazon Kinesis Data Streams uses shards as the base throughput unit. Each shard has a well-defined, documented capacity for reads and writes. For ingestion (writes), a single shard supports up to 1MB/second or 1,000 records/second of incoming data. For consumption (reads), each shard supports up to 2MB/second.

When designing a high-performing ingestion architecture, you estimate the total incoming throughput (in MB/second and records/second), then divide by the per-shard limits to determine how many shards are needed. Underestimating the per-shard capacity leads to unnecessary cost, while overestimating it leads to throttling and failed PutRecord/PutRecords calls.

The key fact here is the documented per-shard write limit: 1MB/second or 1,000 records/second. This is the authoritative value used in AWS documentation and capacity planning examples.


Question 44

Topic: Design Resilient Architectures

Which of the following statements about using infrastructure as code (IaC), such as AWS CloudFormation or the AWS CDK, to support multi-AZ and multi-Region failover is NOT true? (Select THREE.)

Options:

  • A. Declaring networking, security, and data-tier resources in IaC guarantees that configuration drift across Availability Zones and Regions cannot occur, even if operators make manual changes.

  • B. A single AWS CloudFormation stack can directly create and manage resources in multiple AWS Regions, which simplifies multi-Region disaster recovery deployments.

  • C. Once infrastructure is fully defined as IaC, you no longer need to control or restrict manual changes in production, because any console changes will be overwritten and synchronized automatically by the IaC tool.

  • D. Storing IaC templates in a source control system enables you to quickly recreate infrastructure in a new Region from a known-good, tested revision after a Regional failure.

  • E. Using AWS CloudFormation StackSets or AWS CDK with a CI/CD pipeline allows you to roll out the same, versioned infrastructure definition to multiple Regions in a controlled, consistent way for disaster recovery.

Correct answers: A, B and C

Explanation: Infrastructure as code (IaC) such as AWS CloudFormation and the AWS CDK is fundamental for building resilient, multi-AZ and multi-Region architectures because it gives you repeatable, versioned definitions of your infrastructure. This supports automated failover, rapid recreation of environments, and consistent configuration across Regions.

However, it is important to understand the scope and limitations of these tools. CloudFormation stacks are Region-scoped, so a single stack cannot span multiple Regions. Multi-Region deployments are achieved by deploying the same template or synthesized stack into each Region (often with StackSets or CI/CD pipelines). Likewise, IaC greatly reduces configuration drift but cannot prevent it when manual changes are made; governance and change control are still required. IaC templates also do not automatically update themselves based on console changes.

Correct statements highlight using StackSets or pipelines to push the same definition to multiple Regions and using source control to store known-good templates for rapid recreation during failover. The incorrect statements overstate IaC’s capabilities by claiming cross-Region stacks, automatic synchronization of manual changes, or a complete guarantee against drift, which are not accurate behaviors.


Question 45

Topic: Design High-Performing Architectures

An application runs in a single VPC across three private subnets in different Availability Zones. During a load test, EC2 Auto Scaling fails to launch new instances. CloudWatch shows EC2CapacityError: Insufficient Free Addresses In Subnet. The company expects traffic to triple next year. What is the MOST appropriate fix?

Options:

  • A. Increase the desired and maximum capacity limits of the Auto Scaling group so it can launch more instances during peak load.

  • B. Replace the current instances with larger instance types so that fewer instances are needed to handle the same load.

  • C. Add a larger secondary IPv4 CIDR block to the VPC, create new private subnets in each Availability Zone, and update the Auto Scaling group to use those subnets.

  • D. Move some instances to the public subnets so that private subnet IP addresses are freed for Auto Scaling.

Best answer: C

Explanation: The symptom EC2CapacityError: Insufficient Free Addresses In Subnet clearly indicates that Auto Scaling is unable to launch additional instances because the subnets have run out of available private IPv4 addresses.

Symptom → Root cause: The private subnets were created with CIDR ranges that are too small for the number of instances required during scaling. As the application scales out, the available IP pool in those subnets is exhausted, blocking new instance launches. This is a network design issue, not an Auto Scaling configuration problem.

Symptom → Root cause → Fix: You cannot resize an existing VPC primary CIDR block or shrink/grow an existing subnet. However, you can add one or more secondary IPv4 CIDR blocks to a VPC and then create new, larger private subnets from that additional space. Updating the Auto Scaling group to use these new subnets gives it a larger IP pool for current and future scaling while preserving the existing architecture (same VPC, same AZs, private tier design).

This approach both resolves the immediate IP exhaustion and plans for future demand by expanding the address space in a controlled, scalable way.


Question 46

Topic: Design Secure Architectures

Which statement BEST defines envelope encryption as used with AWS Key Management Service (AWS KMS)?

Options:

  • A. A technique where a data key encrypts the data, and that data key is itself encrypted (“wrapped”) under a KMS key.

  • B. Encrypting every object or record directly with a single customer managed KMS key, without generating separate data keys.

  • C. Relying on each AWS service to automatically create and manage its own encryption keys with no customer-managed KMS keys.

  • D. Encrypting data entirely on the client using locally stored keys that never leave the application environment.

Best answer: A

Explanation: Envelope encryption is a key management pattern widely used with AWS KMS. In this model, an application calls AWS KMS to generate a data key. The plaintext data key is used locally to encrypt large amounts of data, such as S3 objects or database fields. The data key is then encrypted (wrapped) with a KMS key, and only the encrypted version of the data key is stored alongside the ciphertext data. Later, the encrypted data key can be sent back to AWS KMS to be decrypted, allowing the application to recover the plaintext data key and decrypt the data.

This approach minimizes the amount of data that AWS KMS must directly encrypt or decrypt, respects KMS size limits, and simplifies key rotation because you can re-encrypt the data keys under a new KMS key without re-encrypting the bulk data itself. It also clearly separates responsibilities: AWS KMS protects the KMS keys and performs small key operations, while the application or service performs data encryption using the generated data keys.


Question 47

Topic: Design Secure Architectures

Which TWO statements about AWS encryption and key management are correct? (Select TWO.)

Options:

  • A. Using service-managed encryption keys in Amazon S3 allows you to define and manage the key policy directly in AWS KMS.

  • B. Enabling server-side encryption with AWS managed keys (SSE-S3) on an S3 bucket gives you the same control over key rotation and key disabling as customer managed KMS keys.

  • C. Customer managed KMS keys are stored inside your VPC and require a dedicated VPC endpoint for each key to be used.

  • D. With client-side encryption, the application encrypts data before sending it to AWS, so AWS stores only ciphertext and cannot decrypt it without client-held keys.

  • E. In envelope encryption, a data key encrypts the data, and a KMS key is used only to encrypt and decrypt that data key.

Correct answers: D and E

Explanation: The question focuses on where encryption happens and who controls the keys in common AWS patterns: envelope encryption, client-side encryption, and the difference between AWS managed/service-managed keys and customer managed KMS keys.

In AWS, envelope encryption is widely used. The application or service asks AWS KMS for a data key. That data key (also called a DEK) encrypts the actual data. The data key itself is then encrypted under a KMS key. Later, the encrypted data key is sent back to KMS for decryption, and the decrypted data key is used to decrypt the data. The KMS key never leaves KMS; it only encrypts and decrypts data keys.

With client-side encryption, encryption and decryption happen entirely on the client, before data reaches AWS services. AWS sees and stores only ciphertext. Unless the client sends keys or decryption capability to AWS, AWS cannot decrypt the data. This maximizes customer control over keys and plaintext exposure, but also shifts more responsibility to the client.

By contrast, service-managed keys (such as S3 SSE-S3 or AWS owned keys) and AWS managed KMS keys reduce operational overhead but provide less control. With SSE-S3, you cannot see or configure the keys in your account; AWS manages policies and rotation for you. With customer managed KMS keys, you control key policies, key rotation configuration, enable/disable state, and can audit their use.

Summary of the options:

  • ✔ In envelope encryption, a data key encrypts data, and a KMS key protects that data key.
  • ✔ With client-side encryption, the application encrypts data before sending it to AWS, so AWS stores only ciphertext.
  • ✖ Service-managed S3 keys do not expose key policies in your AWS account/KMS.
  • ✖ Customer managed KMS keys are not stored in your VPC and do not require per-key VPC endpoints.
  • ✖ SSE-S3 does not give the same level of control as customer managed KMS keys over rotation and key lifecycle.

Question 48

Topic: Design Secure Architectures

Which of the following statements about access control in a multi-account AWS Organizations environment is INCORRECT?

Options:

  • A. For cross-account access, a common approach is to create an IAM role in the target account and configure the role’s trust policy to allow principals from the source account to assume it.

  • B. If an SCP explicitly denies an action in an account, you can still allow that action for a specific IAM role in that account by attaching an identity-based policy that grants the action.

  • C. An IAM permission boundary limits the maximum permissions that an IAM role or user can receive and does not, by itself, grant any permissions.

  • D. A service control policy (SCP) does not grant permissions; it defines the maximum set of permissions that principals in member accounts can have.

Best answer: B

Explanation: In a multi-account AWS Organizations environment, access control decisions are the combination of several layers: service control policies (SCPs) at the organization/OU/account level, identity-based and resource-based IAM policies within each account, and optional permission boundaries on individual principals. Understanding how these layers interact is critical for enforcing least privilege and guardrails across accounts.

SCPs apply to all principals (including the root user) in the member accounts that are affected by the SCP. They do not grant permissions; instead, they define the outer boundary of what is even allowed to be granted by IAM policies. Inside that boundary, IAM identity-based and resource-based policies determine what a principal can actually do.

IAM permission boundaries work at the principal (user/role) level. A permission boundary is an additional constraint that limits which permissions an identity-based policy can grant to that principal. Just like SCPs, permission boundaries do not grant permissions; they only filter what can be granted.

For cross-account access, the standard AWS pattern is to create a role in the target account and configure its trust policy to allow principals (users or roles) from the source account to assume it. The role’s attached permission policies then control what actions are allowed in the target account.

An important rule is that explicit denies from SCPs cannot be bypassed. If an SCP denies an action, no identity-based or resource-based policy in that account can re-allow it. This ensures that organization-level guardrails cannot be overridden locally in a member account.


Question 49

Topic: Design Secure Architectures

A company is deploying a three-tier web application in a single VPC. An internet-facing Application Load Balancer (ALB) is in public subnets; EC2 application servers and an Amazon RDS MySQL database are in private subnets. The company must: 1) allow users to reach the application only over HTTPS, 2) ensure the database is reachable only from the application servers on port 3306, 3) automatically allow return traffic for permitted flows, and 4) minimize operational effort when scaling instances. Which network configuration meets these requirements?

Options:

  • A. Attach a single security group to the ALB, application servers, and database that allows inbound TCP 443 and TCP 3306 from 0.0.0.0/0; keep the default network ACLs on all subnets.

  • B. Use network ACLs as the primary control: configure the public subnet NACL to allow inbound TCP 443 from 0.0.0.0/0 and the private subnet NACL to allow inbound TCP 443 from ALB IPs and TCP 3306 from application IPs; leave all security groups open to all traffic.

  • C. Use separate security groups for each tier: an ALB security group allowing TCP 443 from 0.0.0.0/0; an application security group allowing TCP 443 from the ALB security group; a database security group allowing TCP 3306 from the application security group; keep the VPC network ACLs at their default settings.

  • D. Configure restrictive network ACLs that allow only TCP 443 inbound to the public subnets and only TCP 3306 inbound to the private subnets; configure security groups on all resources to allow all inbound traffic from 0.0.0.0/0 while relying on the NACLs for enforcement.

Best answer: C

Explanation: Using tiered, stateful security groups referencing each other for ALB, application, and database traffic satisfies HTTPS-only access, strict database isolation, automatic handling of return traffic, and low operational overhead compared to NACL-centric or internet-open designs.


Question 50

Topic: Design Resilient Architectures

A stateless web application runs on a single Amazon EC2 m5.large instance in one Availability Zone, fronted by an Application Load Balancer. Traffic doubles during monthly campaigns, causing CPU saturation and occasional downtime. The application must remain available during an Availability Zone outage and require minimal operations effort. Which change is MOST appropriate?

Options:

  • A. Add a second EC2 instance in the same Availability Zone and register both instances with the existing Application Load Balancer without using Auto Scaling.

  • B. Migrate the web tier to a single AWS Fargate task behind the existing Application Load Balancer in the same Availability Zone to reduce server management.

  • C. Increase the instance size from m5.large to m5.4xlarge to handle traffic spikes on a single EC2 instance.

  • D. Place the EC2 instance in an Auto Scaling group spanning two Availability Zones with a minimum of two instances and CPU-based scaling, attached to the existing Application Load Balancer.

Best answer: D

Explanation: The scenario describes a stateless web application currently running on a single EC2 instance in one Availability Zone (AZ) behind an Application Load Balancer (ALB). The main problems are CPU saturation during predictable traffic spikes and downtime risk due to a single instance and single-AZ deployment.

To improve resilience and scalability while keeping operations simple, the best approach at the compute layer is to move from vertical scaling (bigger single instance) to horizontal scaling (a group of instances) across multiple AZs. An Auto Scaling group (ASG) with a minimum of two instances across at least two AZs behind the existing ALB provides both fault tolerance and elasticity.

Because the application is stateless, additional instances can be added or removed without impacting user state. Horizontal scaling across AZs directly addresses the AZ outage and scaling requirements, and managed scaling policies reduce ongoing operational effort compared to manually managing instance sizes or counts.


Questions 51-65

Question 51

Topic: Design Cost-Optimized Architectures

A company is rightsizing a web application and determines that three general purpose instances of the same type will provide sufficient CPU and memory. Each instance costs $0.15 per hour On-Demand in the chosen AWS Region. The instances will run 24/7.

What is the approximate monthly compute cost for this option? Assume a 30-day month (720 hours) and round to the nearest dollar.

(All prices are hypothetical and provided only for this question.)

Options:

  • A. $486 per month

  • B. $216 per month

  • C. $648 per month

  • D. $324 per month

Best answer: D

Explanation: To estimate monthly compute cost from an hourly rate, multiply the hourly price by the number of instances and by the number of hours the instances will run in the month.

In this scenario:

  • Each instance costs $0.15 per hour.
  • There are 3 instances.
  • The instances run 24/7 for a 30-day month (720 hours).

So the monthly compute cost is:

\[ \text{monthly cost} = 0.15\;\text{(\$/hr/instance)} \times 3\;\text{instances} \times 720\;\text{hours} \]

First, multiply the hourly price by the number of instances:

\[ 0.15 \times 3 = 0.45\;\text{\$/hr (all instances)} \]

Then multiply by the hours per month:

\[ 0.45 \times 720 = 324\;\text{\$} \]

Rounded to the nearest dollar, the monthly compute cost is $324. Being able to quickly estimate this helps compare different instance sizes, counts, and pricing models for cost-optimized architectures.


Question 52

Topic: Design High-Performing Architectures

Which of the following statements about securing AWS data ingestion endpoints and pipelines are NOT correct? (Select THREE false statements.)

Options:

  • A. Using IAM roles and resource-based policies to limit which producers can write to an Amazon Kinesis Data Stream is a recommended way to secure ingestion.

  • B. AWS KMS is responsible for encrypting data in transit to ingestion endpoints such as Kinesis and Amazon SQS; therefore enabling a KMS key is required for TLS.

  • C. Because AWS automatically encrypts all network traffic at the infrastructure layer, using HTTPS/TLS from clients to public service endpoints is optional for secure ingestion.

  • D. Configuring an interface VPC endpoint (AWS PrivateLink) for Amazon Kinesis Data Streams allows producers in a VPC to send records without traversing the public internet.

  • E. For secure data ingestion, it is usually sufficient to rely on security groups and network ACLs; IAM permissions for the managed service APIs are optional.

  • F. Enabling server-side encryption with a customer managed KMS key on services such as Kinesis Data Streams or Amazon S3 helps protect ingestion data at rest and lets you control key rotation.

Correct answers: B, C and E

Explanation: Securing data ingestion on AWS requires a combination of identity and access control, encryption in transit, encryption at rest, and private connectivity where appropriate. IAM roles and resource-based policies control who can call ingestion APIs. TLS/HTTPS protects data in transit. AWS KMS integrates with services such as Kinesis, SQS, and S3 to encrypt data at rest using AWS managed or customer managed KMS keys. VPC endpoints (AWS PrivateLink) help keep traffic on the AWS private network instead of traversing the public internet.

Network-layer controls like security groups and network ACLs are important, but they are not a substitute for IAM-based authorization to managed services. Similarly, AWS KMS does not provide TLS; KMS manages keys for server-side encryption at rest. Clients must still use HTTPS/TLS when calling AWS service endpoints.

In this question, the false statements are the ones that over-rely on network controls, misattribute TLS to KMS, or incorrectly suggest that HTTPS/TLS is optional for secure ingestion.


Question 53

Topic: Design High-Performing Architectures

A company runs an e-commerce website on Amazon EC2 using an Amazon RDS for MySQL database. The product catalog is read-heavy, and the database is nearing CPU limits during peak traffic. The company wants to add Amazon ElastiCache to offload hot reads.

Requirements:

  • The application must continue to function if the cache becomes unavailable by falling back to RDS.
  • Cache nodes must be in private subnets with data encrypted in transit and at rest.
  • The solution should remain cost-effective by caching only frequently accessed items, using sensible TTLs.

Which of the following approaches should the solutions architect AVOID when designing the cache layer? (Select THREE.)

Options:

  • A. Populate Redis only with the most frequently accessed catalog items and apply TTLs so that less-used items eventually expire. On a cache miss, query RDS and update the cache.

  • B. Deploy an ElastiCache for Redis cluster in private subnets with in-transit and at-rest encryption enabled. Configure the application to read from Redis first and, on a cache miss, query RDS and then populate Redis with the result using a TTL.

  • C. Configure the application to treat Redis as a mandatory dependency: if the cache endpoint is unavailable, return an error rather than querying RDS to prevent additional load on the database.

  • D. Deploy ElastiCache for Memcached in private subnets. Use it as a write-through cache without encryption to minimize latency, accepting that traffic between the application and cache is unencrypted.

  • E. Place ElastiCache for Redis nodes in public subnets with security groups that allow connections from the internet, so external partners can query cached data directly while the application uses the same endpoint.

Correct answers: C, D and E

Explanation: The goal is to use Amazon ElastiCache to offload hot reads from Amazon RDS, improving latency and reducing load while still meeting security, reliability, and cost constraints.

A well-designed caching layer for a read-heavy pattern typically uses ElastiCache for Redis or Memcached in private subnets. The application should check the cache first and, on a miss, query the database and write the result back to the cache with a time-to-live (TTL). This ensures that only frequently accessed data is cached and that the system can still function if the cache is unavailable.

In this scenario, there are strict security requirements: cache nodes must be in private subnets and data in transit and at rest must be encrypted. There is also a reliability requirement that the application must continue to function (by falling back to RDS) if the cache fails, and a cost requirement to avoid caching everything indefinitely.

The designs that violate these constraints should be avoided, even if they might improve performance in some respects, because they introduce unacceptable security risks, reliability issues, or cost inefficiencies.


Question 54

Topic: Design Secure Architectures

A company hosts its orders database on Amazon RDS MySQL. The business has defined the following disaster recovery requirements for this database.

Exhibit:

ParameterValue
DatabaseOrders (RDS MySQL, single-AZ)
RPO5 minutes (max data loss)
RTO15 minutes (max downtime)
ScopeSingle Region; DR for AZ failure only

Based only on the information in the exhibit, which solution should a solutions architect recommend to meet these requirements?

Options:

  • A. Create a cross-Region read replica and plan to promote it manually if the primary becomes unavailable.

  • B. Convert the database to a Multi-AZ RDS deployment and enable automated backups with point-in-time recovery.

  • C. Keep the database single-AZ and schedule manual snapshots every 5 minutes, restoring from the latest snapshot during a failure.

  • D. Enable automated daily snapshots and copy them to a second Region for disaster recovery.

Best answer: B

Explanation: The exhibit states that the orders database is currently RDS MySQL, single-AZ and must meet an RPO of 5 minutes (max data loss) and an RTO of 15 minutes (max downtime), with the Scope: Single Region; DR for AZ failure only. This means the company wants to tolerate an Availability Zone failure with minimal data loss and short downtime, without paying for full multi-Region disaster recovery.

Converting the database to a Multi-AZ RDS deployment addresses both RPO and RTO in this context. Multi-AZ uses synchronous replication to a standby in another AZ and provides automatic failover, so data loss is typically zero or a few seconds and failover usually completes within minutes. This is aligned with a 5-minute RPO and 15-minute RTO. Enabling automated backups with point-in-time recovery (PITR) adds protection against data corruption or user error without affecting the core failover behavior.

Other choices either rely on slow restore-from-snapshot processes that cannot reliably meet a 15-minute RTO, or introduce cross-Region designs that do not match the exhibit’s stated Scope: Single Region; DR for AZ failure only, adding unnecessary complexity and cost while still not clearly satisfying the RTO requirement.


Question 55

Topic: Design Cost-Optimized Architectures

Which AWS compute option is generally the most cost-effective for a workload that runs very infrequently, executes for only a few seconds per request, is triggered by individual events, and does not require managing or accessing underlying servers?

Options:

  • A. AWS Lambda

  • B. Amazon EC2 Spot Instances with an Auto Scaling group

  • C. Amazon EC2 On-Demand Instances

  • D. AWS Fargate tasks

Best answer: A

Explanation: For very infrequent, short-duration, event-driven workloads that do not require server management, AWS Lambda is usually the most cost-effective compute option. Lambda uses a serverless execution model where you are billed only for the number of requests and the execution duration, measured in 1 ms increments. When the function is not being invoked, there are no compute charges because Lambda scales down to zero.

By contrast, EC2 On-Demand and Spot Instances charge for the time instances are running, regardless of how busy they are. Even if the workload runs only occasionally, you would still pay for instance uptime while the instance is provisioned. AWS Fargate similarly charges based on provisioned vCPU and memory resources for the duration that tasks are running. These models are well suited to longer-running or steady workloads, but they are typically less cost-efficient than Lambda for spiky, low-duty-cycle event processing.

Therefore, the compute model that best aligns with the described pattern—short bursts, low frequency, event-driven, and no need to manage servers—is AWS Lambda.


Question 56

Topic: Design Resilient Architectures

A company runs a global online learning platform. It needs a highly available, fault-tolerant data store for user profiles and content metadata with the following requirements:

  • Multi-AZ high availability in the primary Region
  • Cross-Region disaster recovery with RPO <1 minute
  • Minimize operational overhead by using managed, purpose-built AWS services

Which of the following designs should the solutions architect AVOID? (Select THREE.)

Options:

  • A. Deploy self-managed MySQL on Amazon EC2 instances in each Region, configure asynchronous replication between Regions, and manage failover with custom scripts.

  • B. Run a sharded MongoDB cluster on Amazon EC2 instances across multiple AZs in the primary Region, and take daily EBS snapshots to restore into a secondary Region if needed.

  • C. Use a single-AZ Amazon RDS for MySQL instance in the primary Region and copy automated backups to a secondary Region for disaster recovery.

  • D. Use Amazon Aurora Global Database (Aurora MySQL) with a multi-AZ writer cluster in the primary Region and a read-only secondary Region for disaster recovery.

  • E. Use Amazon DynamoDB global tables across two Regions with on-demand capacity and the AWS SDK’s automatic retries for transient failures.

Correct answers: A, B and C

Explanation: The company explicitly wants multi-AZ high availability, cross-Region disaster recovery with an RPO under 1 minute, and minimal operational overhead by using managed, purpose-built AWS services. Designs that rely on self-managed databases on EC2 or snapshot-based DR generally cannot meet these goals as efficiently or reliably as services like DynamoDB global tables or Aurora Global Database.

DynamoDB global tables and Aurora Global Database are purpose-built for exactly this kind of resilient, multi-Region architecture. They provide managed replication, automatic handling of failures, and reduced operational burden compared to self-managed stacks. Solutions that use single-AZ deployments, backup-based DR, or custom replication on EC2 should be avoided in this context.


Question 57

Topic: Design Resilient Architectures

A company is designing a backup solution for application logs. The requirements captured in a design workshop are shown in the following table:

RequirementValue
Workload criticalityNon-critical; data can be regenerated from source within 24 hours
Durability targetNo permanent loss of backed-up logs
Target restore time≤12 hours
DR RegionNot required

Based on this information, which backup solution is the MOST appropriate?

Options:

  • A. Store logs in Amazon S3 Standard-IA with versioning in a single Region. Do not enable cross-Region replication. Use lifecycle rules to transition logs to S3 Glacier Instant Retrieval after 30 days.

  • B. Store logs in Amazon S3 Standard with versioning in a single Region. Enable cross-Region replication to a second Region and transition objects to S3 Glacier Deep Archive after 30 days.

  • C. Store logs on Amazon EBS gp3 volumes attached to a backup EC2 instance, and take daily EBS snapshots. Replicate snapshots to a second Region for disaster recovery.

  • D. Store logs in Amazon S3 One Zone-IA in a single Availability Zone to reduce cost. Enable daily AWS Backup copies to an S3 Glacier Deep Archive vault in the same Region.

Best answer: A

Explanation: The exhibit shows that the workload is non-critical, can be regenerated within 24 hours, but still requires strong durability and a moderate restore time. The key lines are Durability target: No permanent loss of backed-up logs, Target restore time: ≤12 hours, and DR Region: Not required.

The best solution should therefore:

  • Use a highly durable storage service (such as S3) to avoid permanent loss.
  • Meet the 12-hour restore objective without using very slow archival tiers.
  • Avoid multi-Region replication, because the exhibit explicitly says a DR Region is not required, to prevent unnecessary complexity and cost.

Storing the logs in S3 Standard-IA with versioning and transitioning them to S3 Glacier Instant Retrieval after 30 days provides S3-level durability, keeps retrieval within minutes to hours, and stays within a single Region. This directly satisfies the durability and restore-time requirements while avoiding over-engineering with a second Region.

The other options either introduce unnecessary multi-Region replication, violate the durability expectations by using a single-AZ storage class, or risk exceeding the allowed restore time by using the slowest archival tier.


Question 58

Topic: Design Secure Architectures

A healthcare company stores research datasets and a smaller set of raw PII files in Amazon S3 across several AWS accounts. The security team must centrally govern which IAM principals can access objects tagged data-classification=PII, enforce encryption at rest with a customer managed KMS key, and prevent any public access. The design should minimize the use of S3 ACLs, scale easily as new accounts are added, and provide a clear audit trail of who accessed which objects. Which approach BEST meets these requirements?

Options:

  • A. Create a shared S3 bucket and grant each approved IAM user access through S3 bucket ACLs; enable SSE-S3 on the bucket, allow public read access for research collaborators, and rely on default CloudTrail configuration in each account for auditing.

  • B. Create S3 Access Points for each account with VPC restrictions; use client-side encryption for PII objects, S3 object ACLs to control business unit access, and S3 server access logs for auditing object access.

  • C. Create a central S3 bucket in a security account with S3 Block Public Access enabled and bucket owner enforced; use a bucket policy that allows access only to IAM roles in member accounts with a specific aws:PrincipalTag, conditioned on s3:ExistingObjectTag/data-classification = PII, require SSE-KMS with a customer managed KMS key in the security account, and enable CloudTrail data events on the bucket.

  • D. Keep one S3 bucket per account and manage access using IAM identity-based policies attached to users and roles in each account; enable SSE-KMS using the AWS managed key for S3 and S3 Block Public Access on all buckets, and configure organization-wide CloudTrail for auditing.

Best answer: C

Explanation: The best design for centrally governing PII access in S3 across accounts is to use a single, centrally managed bucket with S3 Block Public Access, bucket owner enforced (no ACLs), and tightly scoped bucket and KMS key policies that rely on tags and trusted roles. This approach satisfies security (least privilege, customer managed KMS key, no public access) and operational requirements (centralized control, easy onboarding of new accounts, and CloudTrail data events for detailed auditing).


Question 59

Topic: Design High-Performing Architectures

A company is designing a new global e-commerce platform on AWS. The system has three data patterns: (1) ACID payment transactions, (2) high-throughput time-series clickstream analytics, and (3) low-latency user session state. Each option proposes a database choice for one pattern. Which TWO options describe designs the solutions architect should AVOID for this workload? (Select TWO.)

Options:

  • A. Use Amazon DynamoDB for user session state with TTL enabled to expire inactive sessions and add DynamoDB Accelerator (DAX) for microsecond read latency.

  • B. Store all clickstream events in a single Amazon RDS for MySQL instance that also handles OLTP payment transactions and run analytics queries directly on that database.

  • C. Use Amazon Aurora MySQL Serverless v2 in a multi-AZ cluster for the payment transactions to provide ACID properties and automatic scaling.

  • D. Store all user session state only in Amazon ElastiCache for Redis clusters with no backing persistent data store to avoid write latency.

  • E. Ingest clickstream events into Amazon Timestream and power near-real-time dashboards directly from Timestream queries.

Correct answers: B and D

Explanation: This scenario tests how to map different workload patterns to appropriate AWS database services while avoiding common anti-patterns that hurt performance, reliability, or operational efficiency.

Payment transactions require strong ACID guarantees, durability, and high availability. A managed relational database such as Amazon Aurora MySQL Serverless v2 is a strong fit because it offers transactional semantics, multi-AZ resilience, and automatic capacity scaling with minimal operational overhead.

High-throughput, append-only clickstream analytics are a classic time-series workload. Purpose-built services like Amazon Timestream or a scalable NoSQL store are better suited than a shared OLTP relational database because they handle high ingest rates and analytical queries without overloading transactional systems.

User session state is typically small, read-heavy, and latency-sensitive. A key-value or in-memory pattern (DynamoDB, DAX, ElastiCache used as a cache) works well. However, caches should not be used as the only durable store because they are optimized for speed, not long-term durability, and can lose data on failures.

The two designs to avoid are those that misuse an in-memory cache as the system of record and that overload a single RDS instance with both OLTP payments and high-volume analytics, which conflicts with Well-Architected guidance on reliability and performance efficiency.


Question 60

Topic: Design Resilient Architectures

A company runs a critical transactional application. During a full Region outage, the company requires the database to meet all of these goals:

  • Cross-Region RPO ≤1s
  • Cross-Region RTO ≤5min
  • Use a managed AWS database service to minimize operational effort

A solutions architect evaluated four designs, shown in the following exhibit.

OptionService / setupCross-Region RPO / RTOManagement effort
1Self-managed MySQL on EC2 + nightly copies~24h RPO, 4h+ manual RTOHigh: custom backup & cross-Region restore
2RDS for MySQL Multi-AZ (single Region)No cross-Region; AZ RTO<2minLow: managed Multi-AZ in one Region only
3Aurora Global Database (Aurora MySQL)RPO<1s, RTO<1minLow: managed global replication & failover
4MySQL on EC2 in 2 Regions, async replica5–30s RPO, 10–20min RTOHigh: manual failover runbooks

Based only on the information in the exhibit, which option should the solutions architect recommend?

Options:

  • A. Option 2: Amazon RDS for MySQL Multi-AZ in a single Region

  • B. Option 4: Self-managed MySQL on EC2 in two Regions with asynchronous replication

  • C. Option 1: Self-managed MySQL on EC2 with nightly cross-Region snapshot copies

  • D. Option 3: Aurora Global Database (Aurora MySQL)

Best answer: D

Explanation: The requirement is to survive a full Region outage with cross-Region RPO≤1s, RTO≤5min, and to use a managed AWS database service to minimize operational effort.

From the exhibit, Option 3 (Aurora Global Database) explicitly lists “RPO<1s, RTO<1min” and describes the management effort as “Low: managed global replication & failover.” This is the only option that simultaneously satisfies the strict RPO and RTO targets and the managed-service requirement.

Options 1, 2, and 4 each fail at least one of the stated goals. Some may appear resilient at first glance, but careful reading of the RPO/RTO and management-effort cells in the exhibit shows they either do not provide cross-Region protection, have too high an RTO, or require significant self-management compared to the purpose-built Aurora Global Database design.


Question 61

Topic: Design Secure Architectures

A company has 50 AWS accounts in an AWS Organizations organization. The security team must centrally inspect all outbound internet traffic from VPCs and enforce mandatory AWS WAF rules and security group policies on all internet-facing resources across all current and future accounts. Which statements about an appropriate design are correct? (Select TWO.)

Options:

  • A. Use AWS Systems Manager Automation documents to push consistent iptables and AWS WAF configurations to EC2 instances in all accounts.

  • B. Deploy third-party firewall appliances into each VPC and manage rules separately in every account instead of using AWS Network Firewall.

  • C. Deploy AWS Network Firewall endpoints in a centralized inspection VPC and route all VPC internet traffic through them using AWS Transit Gateway.

  • D. Use AWS Firewall Manager to create and apply AWS WAF and security group policies across the organization’s accounts and resources.

  • E. Attach AWS WAF web ACLs manually to each internet-facing ALB and CloudFront distribution in every account and rely on tagging to track compliance.

Correct answers: C and D

Explanation: The scenario combines two distinct but related security needs in a multi-account environment:

  • Centralized, stateful inspection of outbound internet traffic from multiple VPCs.
  • Organization-wide enforcement of AWS WAF rules and security group policies on internet-facing resources across many accounts.

AWS Network Firewall is a managed, stateful network firewall service that integrates well with AWS Transit Gateway and VPC routing. It is ideal for building a centralized inspection VPC where all egress (and optionally east-west) traffic is inspected according to rules managed by the security team.

AWS Firewall Manager is a security management service that works with AWS Organizations to centrally configure and manage security policies—such as AWS WAF web ACLs and security group policies—across all accounts. It automatically applies policies to existing and newly created resources that match specified scopes (for example, all internet-facing ALBs in member accounts).

Therefore, a well-architected design for this scenario uses AWS Network Firewall for centralized traffic inspection and AWS Firewall Manager for organization-wide enforcement of WAF and security group policies. The distractors propose manual, per-account management or tools that do not provide enforced, centralized governance, which fails the stated requirements.


Question 62

Topic: Design Resilient Architectures

Which statement BEST describes AWS Fargate in the context of designing scalable, loosely coupled architectures on AWS?

Options:

  • A. A serverless compute engine for containers that runs Amazon ECS tasks and Amazon EKS pods without requiring you to provision or manage EC2 instances

  • B. A fully managed container orchestration service that schedules containers across a cluster of EC2 instances you manage

  • C. A managed platform-as-a-service (PaaS) for deploying web applications from code repositories without dealing with infrastructure details

  • D. A service that runs event-driven functions in response to triggers without requiring containers or explicit runtime management

Best answer: A

Explanation: AWS Fargate is a serverless compute engine for containers that works with Amazon ECS and Amazon EKS, allowing you to run tasks and pods without provisioning, scaling, or managing EC2 instances or container clusters. Because AWS handles the underlying infrastructure and capacity, Fargate is well suited for bursty or unpredictable container workloads where automatic scaling and reduced operational overhead are important.


Question 63

Topic: Design Cost-Optimized Architectures

Which of the following statements about cost-optimizing Amazon RDS backup retention is INCORRECT?

Options:

  • A. Deleting unneeded manual snapshots after the compliance retention period helps reduce ongoing RDS backup storage charges.

  • B. Because RDS manual snapshots are incremental, keeping additional snapshots after the first one does not incur any extra storage cost.

  • C. Extending the automated backup retention period allows point-in-time recovery over a longer window but can increase backup storage costs for a frequently updated database.

  • D. Retaining a large number of manual DB snapshots for long periods increases storage charges, because each snapshot is billed for the data blocks it preserves.

Best answer: B

Explanation: Amazon RDS provides automated backups and manual DB snapshots that are both stored in Amazon S3 and billed based on the storage they consume. Automated backups support point-in-time recovery (PITR) within a configurable retention window, while manual snapshots are kept until explicitly deleted.

Automated backups and snapshots are incremental at the block level, meaning that only changed data blocks are stored after the initial full copy. However, charges are based on the total amount of snapshot storage used across all retained backups. Keeping more snapshots or a longer automated retention window retains more blocks and therefore typically increases storage cost.

To balance resilience with spend, you choose a backup retention window that matches your PITR requirements and delete manual snapshots when they are no longer required for recovery or compliance. The claim that additional manual snapshots do not add any extra storage cost is incorrect and can lead to unexpectedly high backup storage bills.


Question 64

Topic: Design High-Performing Architectures

A company runs a Linux web application on an Auto Scaling group of EC2 instances in multiple Availability Zones. The application needs a shared, POSIX-compliant file system that all instances can read and write concurrently without managing file server software. Which storage solution is the MOST appropriate?

Options:

  • A. Use instance store volumes on each EC2 instance and replicate the data between instances using a custom script.

  • B. Attach a single Amazon EBS gp3 volume to one EC2 instance and export it via NFS to the other instances.

  • C. Create an Amazon EFS file system and mount it on all EC2 instances in the Auto Scaling group.

  • D. Store the data in an Amazon S3 bucket and use the AWS SDK to read and write objects instead of using a file system.

Best answer: C

Explanation: The key discriminating factor in this scenario is the need for a shared, POSIX-compliant file system that multiple EC2 instances in an Auto Scaling group, spanning multiple Availability Zones, can read and write concurrently, without the team having to run and maintain their own file server.

Amazon EFS is purpose-built for this requirement. It is a managed NFS file system that supports POSIX semantics and can be mounted from many EC2 instances across multiple AZs within a Region. It automatically scales capacity and throughput as files are added or removed, and it eliminates the operational overhead of patching and scaling file server instances.

The other options either do not provide POSIX file system semantics (object storage), cannot be safely or simply shared across many instances and AZs, or require the company to manage its own file server infrastructure, which the scenario explicitly wants to avoid.


Question 65

Topic: Design Cost-Optimized Architectures

Which statement about AWS data transfer pricing is most accurate for designing cost-optimized compute architectures?

Options:

  • A. Data transfer between EC2 instances in different Regions over private IPs is free if the instances are in the same AWS account.

  • B. Data transfer between EC2 instances in different Availability Zones within the same Region over private IPs is free, the same as traffic within a single Availability Zone.

  • C. Data transfer between EC2 instances in the same Availability Zone but different subnets is charged at the same rate as inter-Region data transfer.

  • D. Data transfer between EC2 instances in different Availability Zones within the same Region over private IPs is billed as regional data transfer in and out of each instance.

Best answer: D

Explanation: AWS data transfer pricing distinguishes between traffic within an Availability Zone, between Availability Zones in the same Region, and between Regions. Traffic between EC2 instances in the same Availability Zone using private IP addresses is free. However, when EC2 instances communicate across Availability Zones within the same Region, that traffic is billed as regional data transfer: each instance pays for the data it sends to the other Availability Zone.

For cost-optimized designs, this means highly chatty tiers (for example, application and cache layers) should be carefully placed. Keeping heavy east–west traffic within a single AZ avoids inter-AZ charges, while still using multi-AZ for components that truly require higher availability. Cross-Region communication is even more expensive and should be minimized or reserved for DR, replication, and global workloads where the business value justifies the cost.


Continue in the web app

Use IT Mastery for interactive AWS SAA-C03 practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try AWS SAA-C03 on Web

Focused topic pages

Browse Certification Practice Tests by Exam Family