Keep this page open while drilling questions. SAA‑C03 is about trade-offs: availability, performance, cost, operations, and security.
Quick facts (SAA-C03)
| Item | Value |
|---|
| Questions | 65 (multiple-choice + multiple-response) |
| Time | 130 minutes |
| Passing score | 720 (scaled 100–1000) |
| Cost | 150 USD |
| Domains | D1 Secure 30% • D2 Resilient 26% • D3 High-Performing 24% • D4 Cost-Optimized 20% |
How SAA-C03 questions work (fast strategy)
- Read the last sentence first: it usually contains the deciding constraint (lowest cost, least ops, highest availability, most secure).
- If multiple answers “work”, the best answer is typically the one with managed services + least operational overhead.
- If the question mentions private subnets, assume no direct internet access unless NAT/endpoints are included.
- If you see AccessDenied, separate concerns: identity policy vs resource policy vs KMS key policy.
1) VPC and networking — patterns that win
Subnets, routing, and egress (defaults you get tested on)
- Public subnet: route table has
0.0.0.0/0 → IGW. - Private subnet: no IGW route; outbound internet needs NAT Gateway (typically one per AZ for HA).
- Avoid cross-AZ hairpinning (SPOF + cross-AZ charges): private subnets in AZ‑A should use NAT in AZ‑A.
Gateway vs Interface endpoints (huge cost + security lever)
| Endpoint | Services | What it does | Cost note |
|---|
| Gateway endpoint | S3, DynamoDB | Adds route-table entries; stays on AWS network | No hourly charge |
| Interface endpoint (PrivateLink) | Most AWS services | ENI in subnets; private service access | Hourly + data processing |
Exam cue: “Private subnets must access S3/DynamoDB” → Gateway endpoint (not NAT).
Security layers: SG vs NACL (keep it simple)
| Control | Level | Stateful? | Typical best practice |
|---|
| Security group | ENI/resource | Yes | Primary control; reference SG‑to‑SG |
| NACL | Subnet | No | Keep simple unless compliance requires |
Multi-VPC connectivity (choose the right primitive)
| Need | Best-fit | Why |
|---|
| Few VPCs, simple links | VPC peering | Low latency; simple |
| Many VPCs (sprawl), transitive routing | Transit Gateway | Hub-and-spoke; scalable |
| Expose a service privately without routing everything | PrivateLink | Strong isolation; no CIDR coordination |
Hybrid connectivity (associate-level)
| Option | Best for | Notes |
|---|
| Site-to-Site VPN | Quick setup, encrypted tunnel | Uses internet; can be used as DX backup |
| Direct Connect (DX) | Predictable bandwidth/latency | Often paired with VPN for HA |
Exam cue: If the scenario says “quickly establish connectivity,” VPN is common. If it says “consistent performance,” Direct Connect is common.
Edge + global routing (Route 53 vs CloudFront vs Global Accelerator)
| Service | Best for | Key idea |
|---|
| Route 53 | DNS routing and failover | DNS‑based; health checks; TTL matters |
| CloudFront | CDN caching + origin failover for HTTP(S) | Faster global content delivery |
| Global Accelerator | Static anycast IPs + fast failover | Great for TCP/UDP and fast regional failover |
Reference diagram: common “best answer” VPC layout
flowchart LR
U[Users] --> ALB["ALB (Public Subnets)"]
ALB --> App1["App (Private AZ1)"]
ALB --> App2["App (Private AZ2)"]
App1 & App2 --> RDS["RDS/Aurora (Multi-AZ)"]
App1 & App2 --> S3["S3 via Gateway Endpoint"]
App1 & App2 --> DDB["DynamoDB via Gateway Endpoint"]
App1 --> NAT1["NAT GW (AZ1)"]
App2 --> NAT2["NAT GW (AZ2)"]
2) Security and identity — IAM, KMS, logging
IAM: policy types and “what breaks access”
| Policy/control | Answers “who?” | Answers “what can they do?” | Common exam gotcha |
|---|
| Trust policy | Who can assume a role | No | Too-broad principals |
| Identity policy | No | Principal permissions | Missing conditions/least privilege |
| Resource policy | Who can access the resource | Resource permissions | Forgetting to add cross-account principal |
| SCP (Organizations) | Org guardrail | Restricts | Explicit deny blocks everything |
| KMS key policy | Key admin/use | Key permissions | IAM allow isn’t enough if key policy blocks |
High-yield rule: Explicit Deny wins. If you see Organizations in a scenario, consider SCP effects.
KMS + encryption (common patterns)
| Need | Best-fit | Notes |
|---|
| Simple default encryption | SSE-S3 | S3-managed keys |
| Audit/control requirements | SSE-KMS | Watch key policy principals |
| Strict key custody | CloudHSM / client-side | More complexity |
Common failure mode: service can’t decrypt because the KMS key policy doesn’t allow the calling principal/service.
S3 security: “best answer” checklist
- Turn on S3 Block Public Access (account + bucket).
- Prefer private buckets + controlled access (roles, bucket policies, access points).
- Require TLS (
aws:SecureTransport) and require encryption where needed.
Require TLS for all S3 requests (copy-ready):
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Sid": "DenyInsecureTransport",
6 "Effect": "Deny",
7 "Principal": "*",
8 "Action": "s3:*",
9 "Resource": [
10 "arn:aws:s3:::my-bucket",
11 "arn:aws:s3:::my-bucket/*"
12 ],
13 "Condition": { "Bool": { "aws:SecureTransport": "false" } }
14 }
15 ]
16}
Require access via a specific VPC endpoint (copy-ready):
1{
2 "Version":"2012-10-17",
3 "Statement":[{
4 "Sid":"DenyNotFromVPCE",
5 "Effect":"Deny",
6 "Principal":"*",
7 "Action":"s3:*",
8 "Resource":[
9 "arn:aws:s3:::my-bucket",
10 "arn:aws:s3:::my-bucket/*"
11 ],
12 "Condition":{"StringNotEquals":{"aws:sourceVpce":"vpce-1234567890abcdef0"}}
13 }]
14}
CloudFront note: for private S3 origins, use Origin Access Control (OAC) (don’t make the bucket public).
Security services (recognize the role)
| Service | What it’s “for” on SAA |
|---|
| WAF | L7 protections (IP blocks, rate limiting, OWASP-style rules) |
| Shield | DDoS protection (Standard by default) |
| GuardDuty | Threat detection using logs (findings) |
| Security Hub | Aggregate/standardize findings and posture |
| Inspector | Vulnerability findings for compute/container images |
| Macie | Find sensitive data in S3 |
3) Compute and scaling — ELB/ASG patterns
Load balancer choice (frequent)
| Need | Pick | Notes |
|---|
| HTTP/HTTPS, L7 routing | ALB | Path/host routing, WAF integration |
| TCP/UDP, very high throughput | NLB | Static IPs per AZ, low latency |
| Inline appliances | GWLB | Insert firewalls/IDS appliances |
Auto Scaling “best answers”
- ASG across multiple AZs.
- Health checks: use ELB health checks to replace unhealthy instances.
- Prefer stateless app tiers; store session state externally (ElastiCache/DynamoDB) when needed.
Compute chooser (EC2 vs containers vs serverless)
| Option | Best for | Why it wins |
|---|
| EC2 | Full OS control and flexibility | Broadest compatibility |
| ECS/Fargate | Containers with low ops | No server management with Fargate |
| EKS | Kubernetes standardization | Ecosystem + portability (more ops) |
| Lambda | Event-driven + bursty workloads | Minimal ops; scales fast |
Exam cue: If the scenario says “least operational overhead,” managed options (Fargate/Lambda/managed data stores) usually beat self-managed EC2.
4) Storage — object vs block vs file, and S3 class selection
Storage type picker (very common)
| Requirement | Best-fit | Why |
|---|
| Durable object storage | S3 | Highly durable, cheap, scalable |
| Boot / per-instance block volume | EBS | Low-latency block storage for EC2 |
| Shared POSIX file system | EFS | Multi-AZ shared file storage |
| Windows file shares / high-perf file | FSx | Managed Windows/Lustre/etc |
EBS volume types (quick selection)
| Type | Best for | Notes |
|---|
| gp3 | General purpose | Default choice; tune IOPS/throughput |
| io2 | High IOPS + critical workloads | Higher cost; consistent performance |
| st1 | Throughput-heavy HDD | Good for large sequential workloads |
| sc1 | Lowest-cost HDD | Cold throughput workloads |
| Instance store | Temporary scratch | Fast but ephemeral (data lost on stop/terminate) |
Exam cue: EBS is AZ-scoped. For multi-AZ shared storage, EFS/FSx is the usual answer.
S3 storage classes (fast selection)
| Requirement | Service/Class | Notes |
|---|
| Hot object storage | S3 Standard | Default |
| Unknown access patterns | S3 Intelligent‑Tiering | Monitoring fee |
| Infrequent access, multi-AZ durability | S3 Standard‑IA | Retrieval fee |
| Infrequent access, single AZ | S3 One Zone‑IA | Cheaper; less resilient |
| Archive | S3 Glacier / Deep Archive | Retrieval time trade-offs |
Exam cue: “Long-term retention / compliance” → Glacier + lifecycle + (sometimes) Object Lock.
S3: durability + resilience patterns
- Versioning + lifecycle rules (protect against deletes and ransomware scenarios).
- CRR/SRR for replication needs (cross-region or same-region).
- Object Lock (WORM) for immutability requirements.
Backups (fast picks)
- EBS snapshots are incremental; copy snapshots cross-region for DR.
- AWS Backup helps centralize backup policies across common services.
- For databases, prefer managed backup features (RDS automated backups/snapshots).
5) Databases and caching — RDS/Aurora vs DynamoDB
RDS: Multi-AZ vs read replicas (classic SAA)
| Feature | Multi-AZ | Read replica |
|---|
| Primary purpose | HA/failover | Read scaling |
| Writes | One primary | Still one primary |
| Failover | Automatic | Manual promotion (generally) |
Rule: Multi-AZ is about availability; read replicas are about scale.
Aurora (why it’s often a “best answer”)
- Higher throughput than standard RDS engines (common exam framing).
- Multiple read replicas for read scaling (and faster reads in the same region).
- Aurora Global Database for low-latency global reads and faster cross-region DR.
When to choose what (fast)
| Need | Best-fit |
|---|
| Relational + joins + transactions | RDS/Aurora |
| Massive key-value scale | DynamoDB |
| Sub-millisecond cache | ElastiCache |
| DynamoDB read cache | DAX |
ElastiCache: Redis vs Memcached (SAA-level)
| Service | Best for | Notes |
|---|
| Redis | Rich features + durability options | Replication, multi-AZ patterns, data structures |
| Memcached | Simple cache | Very simple, no persistence |
DynamoDB: what wins questions
- Prefer Query over Scan.
- Choose partition keys to avoid hot partitions.
- Use GSIs for new access patterns.
- Use On‑Demand for spiky traffic; Provisioned + Auto Scaling for steady predictable workloads.
6) Resilience and DR — RTO/RPO patterns
HA patterns (default architecture)
- Multi-AZ for app tiers (ALB + ASG across AZs).
- Databases: Multi-AZ where required (RDS/Aurora).
- Use queues/caching to absorb spikes and failures.
DR strategies (know the table)
| Strategy | Typical RTO | Typical RPO | Cost | Notes |
|---|
| Backup/Restore | High | Hours | Low | Cheapest; slowest recovery |
| Pilot Light | Medium | Minutes–hours | Med | Minimal core in DR |
| Warm Standby | Low | Minutes | Med+ | Scaled-down prod running |
| Multi-site active-active | Very low | Seconds | High | Complex; highest cost |
Multi-Region data options (high yield)
| Data layer | Multi-Region option |
|---|
| S3 | CRR |
| DynamoDB | Global tables |
| Aurora | Aurora Global Database |
DR routing: Route 53 policy selection
| Routing policy | Best for |
|---|
| Failover | Active-passive DR |
| Weighted | Canary / migrations |
| Latency | Lowest latency routing per user |
| Geolocation | Compliance/content by country |
Exam cue: If you need faster failover with static anycast IPs, consider Global Accelerator. If you need caching + origin failover for HTTP(S), consider CloudFront.
Active-passive vs active-active (quick framing)
| Pattern | Pros | Cons |
|---|
| Active-passive | Cheaper; simpler operations | Higher RTO; failover/failback steps |
| Active-active | Lowest downtime; global performance | Most complex; highest cost |
DR sketch (active-passive)
flowchart LR
Users --> R53[Route 53]
R53 -->|Primary| A[Region A]
R53 -->|Failover| B[Region B]
A --> AppA[App + DB]
B --> AppB[Warm standby]
| Service | Think “this answers…” |
|---|
| CloudWatch | “How is it performing?” (metrics/logs/alarms) |
| CloudTrail | “Who did what?” (API audit trail) |
| Config | “What changed?” (config history + compliance) |
| X-Ray | “Where is latency?” (distributed traces) |
Exam cue: if the requirement is auditing and investigations, CloudTrail is usually the anchor.
High-yield alarms to know
- ALB:
HTTPCode_Target_5XX_Count, TargetResponseTime - ASG/EC2: CPU (and memory via agent), status checks
- SQS: queue depth/age (backpressure signals)
- DynamoDB: throttles, consumed capacity (hot partitions/underprovisioned)
8) Cost optimization — fast wins (and common traps)
Compute purchase options (memorize the “when”)
| Pattern | Option | Notes |
|---|
| Steady baseline | Savings Plans | Flexible discount for compute |
| Predictable, fixed needs | Reserved Instances | Strong discount for known shape |
| Fault-tolerant | Spot | Biggest savings; design for interruption |
| Tool | Best for |
|---|
| Cost Explorer | Trend analysis and spend breakdown |
| Budgets | Alerts on spend/usage thresholds |
| Cost Anomaly Detection | Unexpected spend spikes |
| Compute Optimizer | Rightsizing recommendations |
| Trusted Advisor | Best-practice checks (cost/security/perf) |
| Cost and Usage Report (CUR) | Most detailed cost dataset export |
Cost levers by area (fast table)
| Area | Lever | Example |
|---|
| Compute | Right-size + Graviton | Smaller instance families, Graviton where supported |
| Compute | Spot for fault-tolerant | Batch/worker tier with Spot + scaling |
| Storage | S3 lifecycle tiering | Move logs to IA/Glacier |
| Storage | gp3 tuning | Tune IOPS/throughput instead of overprovisioning |
| Network | Endpoints over NAT | S3/DynamoDB via Gateway endpoints |
| Edge | Cache at CloudFront | Reduce origin load and egress |
Cost “gotchas”
- Single NAT for all private subnets: SPOF + cross-AZ charges.
- Heavy NAT usage to reach S3/DynamoDB: often replace with Gateway endpoints.
- Cross-AZ data transfer and inter-region replication costs add up fast.
- Logging retention and high-volume logs can become a surprise bill.
9) Common pitfalls (exam bait)
- Assuming PrivateLink exists for S3/DynamoDB (use Gateway endpoints).
- ALB/ASG only in one AZ (not HA).
- Confusing RDS Multi-AZ (HA) with read replicas (scale).
- KMS key policy missing explicit principals (service can’t use key).
- Overly strict NACLs breaking stateful flows (SG-first is usually best).
- Choosing Scan over Query on DynamoDB.
10) Mini runbooks (copy/paste patterns)
Cross-account role assumption (trust policy):
1{
2 "Version":"2012-10-17",
3 "Statement":[{
4 "Effect":"Allow",
5 "Principal":{"AWS":"arn:aws:iam::222233334444:role/TeamRole"},
6 "Action":"sts:AssumeRole"
7 }]
8}
Route 53 failover (pseudo steps):
11) Create primary A/AAAA alias to ALB in Region A with Health Check.
22) Create secondary A/AAAA alias to ALB in Region B with Health Check.
33) Set routing policy to Failover: Primary / Secondary.
44) Verify health checks and simulate failover.
Final tip
If multiple answers work, pick the one that best matches the explicit constraint (for example: lowest cost or least operational effort) while still meeting availability and security requirements.