SOA-C03 — AWS Certified CloudOps Engineer – Associate Quick Reference
Compact AWS SOA-C03 quick reference for service selection, monitoring, automation, security, networking, reliability, and troubleshooting.
Exam-use orientation
This independent Quick Reference supports preparation for the AWS Certified CloudOps Engineer – Associate (SOA-C03) exam from AWS. Use it as a scenario decision guide: the exam often tests which AWS service, operational control, or troubleshooting step best fits a production operations problem.
CloudOps thinking pattern
| Question asks about… | First decide… | Then choose based on… |
|---|---|---|
| Monitoring | Metric, log, trace, event, or audit record? | CloudWatch, X-Ray, EventBridge, CloudTrail, AWS Config |
| Automation | One-time command, recurring desired state, patching, or workflow? | AWS Systems Manager capability or AWS CloudFormation |
| Change management | Infrastructure template, app deployment, or instance replacement? | CloudFormation, CodeDeploy, Auto Scaling instance refresh |
| Reliability | HA in one Region or DR across Regions? | Multi-AZ, backups, replication, Route 53 failover |
| Security | Identity, encryption, detection, or compliance evidence? | IAM, KMS, CloudTrail, Config, GuardDuty, Security Hub |
| Networking | Routing, DNS, firewalling, private access, or edge delivery? | VPC route tables, Route 53, security groups/NACLs, VPC endpoints, CloudFront |
| Cost/performance | Rightsizing, purchasing, data transfer, or storage tiering? | Compute Optimizer, Cost Explorer, Budgets, lifecycle policies |
Exam habit: eliminate answers that are manually operated, not highly available, not least privilege, or do not produce auditable operational evidence.
Core AWS operations service-selection matrix
| Operational need | Prefer | Why | Common trap |
|---|---|---|---|
| Audit who called AWS APIs | AWS CloudTrail | Records management events and optional data events | CloudWatch Logs show app/system logs, not complete API audit history |
| Detect resource configuration drift/compliance | AWS Config | Tracks resource configuration history and evaluates rules | CloudTrail tells who changed something, not whether current state is compliant |
| Alarm on metric threshold | Amazon CloudWatch alarm | Native metric evaluation and actions | EventBridge is for event patterns, not continuous metric evaluation |
| Route AWS service events to targets | Amazon EventBridge | Event bus, rules, schedules, SaaS/custom events | CloudWatch alarm actions are limited to alarm state transitions |
| Centralize application logs | CloudWatch Logs | Log groups, retention, metric filters, Logs Insights | CloudTrail is not an application log platform |
| Run commands on managed instances | Systems Manager Run Command | Remote command execution without inbound SSH/RDP | Requires SSM Agent, IAM role, and network path to Systems Manager endpoints |
| Enforce recurring instance configuration | Systems Manager State Manager | Maintains desired state associations | Run Command is better for ad hoc execution |
| Patch EC2 or hybrid nodes | Systems Manager Patch Manager | Baselines, maintenance windows, patch compliance | User data is not patch management |
| Secure shell access without opening ports | Systems Manager Session Manager | Auditable sessions through SSM | Still requires IAM permissions and managed instance connectivity |
| Automate operational runbook | Systems Manager Automation | Step-based remediation workflows | Lambda is useful for code, but Automation has runbook-native actions |
| Provision infrastructure as code | AWS CloudFormation | Declarative stacks, change sets, drift detection | CLI-created resources are harder to audit and reproduce |
| Deploy application revisions | AWS CodeDeploy | In-place/blue-green deployment strategies | CloudFormation manages infrastructure; CodeDeploy manages app rollout |
| Replace Auto Scaling instances safely | EC2 Auto Scaling instance refresh | Gradual replacement using launch template/config changes | Updating the launch template alone does not replace existing instances |
| Central backup policy | AWS Backup | Cross-service backup plans and vaults | Snapshots alone do not provide centralized policy/compliance views |
| Private access to AWS services | VPC endpoints | Avoid public internet paths for supported services | NAT gateway provides outbound internet, not private service access |
| Edge caching and TLS termination near users | Amazon CloudFront | Global CDN, caching, origin protection options | Route 53 does DNS routing; it does not cache content |
| Detect suspicious account or workload activity | Amazon GuardDuty | Threat detection from logs and signals | Security Hub aggregates findings; it is not the primary detector |
| Aggregate security posture | AWS Security Hub | Consolidates findings and standards checks | Config rules are resource compliance checks, not a findings hub |
| Discover sensitive data in S3 | Amazon Macie | S3 data discovery and classification | S3 Inventory lists objects; it does not classify sensitive content |
| Analyze IAM external access | IAM Access Analyzer | Identifies resource policies allowing external access | IAM credential report is about users/passwords/keys |
| Store database credentials with rotation | AWS Secrets Manager | Managed secret lifecycle and rotation integration | Parameter Store can store secrets, but rotation is not the same feature set |
| Store config parameters | Systems Manager Parameter Store | Hierarchical config values, optional encryption | Do not hard-code config in AMIs or user data |
| Manage encryption keys | AWS KMS | Key policies, grants, envelope encryption integration | IAM permission alone may not be enough; key policy must allow use |
| View AWS account health events | AWS Health | Service and account-specific operational events | CloudWatch health checks monitor endpoints, not AWS account advisories |
| Govern accounts at scale | AWS Organizations | Consolidated management, SCP guardrails | SCPs limit permissions; they do not grant permissions |
Monitoring, logging, and event response
Observability services: high-yield distinctions
| Service/feature | Best for | Exam cues | Not best for |
|---|---|---|---|
| CloudWatch metrics | Numeric time-series performance and health | CPU, latency, errors, queue depth, custom app metric | Full audit trail or config history |
| CloudWatch alarms | Threshold, anomaly, or metric math alarm actions | Notify, scale, recover, stop, route incident | Complex event enrichment |
| CloudWatch Logs | Central log storage and search | Application logs, OS logs, Lambda logs, VPC flow logs destination | Long-term object archive unless exported/archived |
| CloudWatch Logs Insights | Ad hoc log query and troubleshooting | Filter errors, aggregate by field, recent incident analysis | Permanent business analytics warehouse |
| CloudWatch metric filters | Turn log patterns into metrics | Count “ERROR” strings, unauthorized attempts | Free-form historical log analytics |
| CloudWatch Agent | OS/process/custom metrics and logs from EC2/on-prem | Memory, disk, swap, app logs | Native AWS service metrics that already exist |
| EventBridge | Match events and route to targets | EC2 state change, scheduled automation, SaaS/custom bus | Continuous metric thresholding |
| CloudTrail | API activity audit | Who changed security group? Who deleted object? | Instance CPU/memory monitoring |
| AWS Config | Resource inventory, config history, compliance | Is S3 public access blocked? Has SG changed? | User login/session troubleshooting |
| X-Ray | Distributed tracing | Service map, trace latency, segment errors | Infrastructure patch compliance |
CloudWatch alarm decision points
| Decision | Choose this when… | Notes |
|---|---|---|
| Standard metric alarm | One metric or metric math expression is enough | Most common alarm scenario |
| Composite alarm | Need to reduce noise by combining alarm states | Use when multiple symptoms must be true before paging |
| Anomaly detection | Normal baseline varies over time | Good for cyclical traffic patterns |
| Treat missing data as breaching | Missing metric is itself a failure | Useful for heartbeat/custom metrics |
| Treat missing data as not breaching | Silence can be normal | Avoid false alarms for sparse metrics |
| Metric math | Need derived signal | Example: error percentage from errors and requests |
| Detailed monitoring/custom metrics | Need more granular or non-default data | Memory and disk require agent/custom metrics on EC2 |
| Alarm action to Auto Scaling | Need scaling response | Scaling policy should align with application behavior |
| Alarm action to SNS/EventBridge/Incident Manager | Need notification or workflow | Use structured incident routing for operations teams |
CloudWatch Logs Insights patterns
fields @timestamp, @message
| filter @message like /ERROR|Exception|Timeout/
| sort @timestamp desc
| limit 50
fields @timestamp, @logStream, status, latency
| filter status >= 500
| stats count(*) as errors, avg(latency) as avgLatency by bin(5m)
| sort bin(5m) desc
Event-driven remediation pattern
| Event source | Match with | Target examples | Use case |
|---|---|---|---|
| EC2 instance state change | EventBridge rule | Lambda, Systems Manager Automation, SNS | React to stopped/terminated instances |
| AWS Health event | EventBridge rule | SNS, Incident Manager, ticket workflow | Notify on account-specific AWS events |
| CloudTrail API event | EventBridge rule | Lambda, Step Functions, SNS | Detect high-risk API calls quickly |
| Scheduled event | EventBridge schedule/rule | Systems Manager Automation, Lambda | Run maintenance tasks |
| Config compliance change | Config rule/EventBridge | Automation, SNS | Remediate noncompliant resources |
Automation, provisioning, and change management
CloudFormation operations reference
| Need | CloudFormation feature | Exam note |
|---|---|---|
| Preview stack changes | Change set | Safer than direct update for production |
| Detect manual changes | Drift detection | Identifies resources that no longer match template where supported |
| Protect critical resource from replacement/deletion | DeletionPolicy, UpdateReplacePolicy, stack policy | Use Retain or Snapshot where appropriate |
| Reuse common templates | Nested stacks/modules | Good for standardized patterns |
| Deploy to multiple accounts/Regions | StackSets | Fits organization-scale rollout |
| Pass values between stacks | Outputs and exports | Avoid hard-coded IDs |
| Create resources conditionally | Conditions | Useful for environment-specific resources |
| Bootstrap EC2 on create | User data, cfn-init, cfn-signal | Signals help CloudFormation wait for successful configuration |
| Roll back failed update | Automatic rollback or continue update rollback | Know how to recover stacks stuck during failed updates |
| Manage IAM resources | Capabilities acknowledgment | IAM creation often requires explicit deployment capability |
aws cloudformation validate-template \
--template-body file://template.yaml
aws cloudformation deploy \
--template-file template.yaml \
--stack-name app-prod \
--capabilities CAPABILITY_NAMED_IAM
aws cloudformation detect-stack-drift \
--stack-name app-prod
Systems Manager capability map
| Capability | Best for | Requires/depends on |
|---|---|---|
| Fleet Manager | Inventory and manage nodes | Managed instances |
| Session Manager | Shell access without inbound ports | SSM Agent, IAM, endpoint/internet connectivity |
| Run Command | Execute commands at scale | Managed instance role and target selection |
| State Manager | Keep configuration in desired state | Associations and documents |
| Patch Manager | Patch baselines and compliance | Maintenance windows optional but common |
| Automation | Multi-step runbooks | IAM service role/permissions |
| Distributor | Install software packages | Package definitions |
| Parameter Store | App configuration and secure strings | KMS for encrypted secure strings |
| Inventory | Collect software/config metadata | SSM Agent and association |
| Maintenance Windows | Scheduled operational tasks | Registered targets and tasks |
| OpsCenter | Track operational issues | Integrates with alarms/events |
| Change Manager | Controlled change workflows | Approval and change templates |
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--targets "Key=tag:Role,Values=web" \
--parameters commands='["uptime","df -h"]'
Deployment choices
| Scenario | Prefer | Why |
|---|---|---|
| Deploy new Lambda version gradually | CodeDeploy with Lambda deployment config | Supports traffic shifting and rollback |
| Deploy app to EC2 fleet | CodeDeploy | Lifecycle hooks and deployment groups |
| Replace EC2 instances using new launch template | Auto Scaling instance refresh | Operationally simple fleet replacement |
| Manage immutable infrastructure | CloudFormation + AMI/launch template + Auto Scaling | Reproducible state |
| Blue/green container deployment | ECS deployment controller/CodeDeploy depending setup | Safer traffic shifting |
| Manual emergency config change | Systems Manager Automation/Run Command | Auditable and repeatable |
| Infrastructure resource update | CloudFormation change set | Avoid console drift |
| Complex orchestration across services | Step Functions or Systems Manager Automation | Choose based on app workflow vs ops runbook |
Compute, scaling, and load balancing
EC2 operational troubleshooting
| Symptom | Check first | Likely direction |
|---|---|---|
| Instance unreachable | Security group, NACL, route table, public/private IP, SSM status | Separate network path problem from OS problem |
| System status check failed | AWS host/network issue | Stop/start, recover, or allow AWS remediation depending scenario |
| Instance status check failed | Guest OS/app issue | Check boot logs, CPU, disk, networking config |
| User data did not work | Cloud-init logs, script syntax, IAM role, network access | User data normally runs at first boot unless configured otherwise |
| Cannot access S3 from private subnet | Route/NAT or S3 VPC endpoint policy | Prefer gateway endpoint for private S3 access where appropriate |
| App lost AWS permissions | Instance profile, role policy, SCP/permission boundary, STS credentials | Temporary credentials come from role metadata |
| Memory/disk alarm missing | CloudWatch Agent/custom metrics | Default EC2 metrics do not include all OS-level metrics |
| Replacement instance not configured | AMI, launch template, user data, SSM State Manager | Avoid snowflake instances |
Auto Scaling decisions
| Need | Feature | Notes |
|---|---|---|
| Maintain fixed capacity | Desired/min/max capacity | Health checks replace failed instances |
| Scale around target metric | Target tracking policy | Common for CPU, request count, custom utilization metric |
| Scale by thresholds/steps | Step scaling | Useful when response should vary by severity |
| Scale on schedule | Scheduled scaling | Good for predictable business hours |
| Prepare for future demand | Predictive scaling | Use when historical patterns are reliable |
| Let instances finish work before termination | Lifecycle hooks | Pair with Lambda/SNS/SQS/Systems Manager |
| Use load balancer health | ELB health checks in Auto Scaling | Replaces instances failing app-level checks |
| Safely roll new launch template | Instance refresh | Combine with health checks and warmup |
| Keep scale-in from killing special node | Instance protection | Useful for stateful/critical instances, but avoid permanent snowflakes |
Load balancer selection
| Load balancer | Choose for | Key features | Avoid when… |
|---|---|---|---|
| Application Load Balancer | HTTP/HTTPS apps | Host/path routing, redirects, header rules, WebSocket, target groups | Need static IP at L4 |
| Network Load Balancer | TCP/UDP/TLS, high performance, static IP needs | Low latency, source IP preservation patterns, TLS passthrough/termination | Need advanced HTTP routing |
| Gateway Load Balancer | Third-party virtual appliances | Transparent inspection with appliances | Normal web app load balancing |
| Classic Load Balancer | Legacy workloads | Older EC2-era option | New architectures should usually choose ALB/NLB |
ALB/NLB troubleshooting
| Problem | Check |
|---|---|
| Targets unhealthy | Target security group, health check path/port/protocol, app listener, NACL, target response code |
| 502/503 errors | Target availability, listener rules, target group health, backend timeouts |
| Client IP handling | ALB uses headers; NLB can preserve source IP in supported patterns |
| TLS issue | Certificate in ACM/IAM, listener protocol, SNI, security policy |
| Sticky sessions required | ALB target group stickiness or app-level session design |
| Slow scale-in connection drops | Deregistration delay and app graceful shutdown |
Storage and database operations
Amazon S3 operations
| Need | Feature | Exam note |
|---|---|---|
| Block public exposure | S3 Block Public Access + bucket policy review | Account-level and bucket-level controls matter |
| Audit object-level API access | CloudTrail data events | Management events alone do not show every object operation |
| Monitor bucket compliance | AWS Config rules | Good for encryption, public access, versioning checks |
| Recover deleted/overwritten objects | Versioning | Lifecycle can manage old versions |
| Replicate objects | Same-Region or Cross-Region Replication | Versioning is required for replication |
| Enforce encryption | Default encryption and bucket policy | KMS permissions must allow use when SSE-KMS is selected |
| Archive or tier objects | Lifecycle policies | Align transitions with access pattern |
| Prevent deletion/tampering | S3 Object Lock where configured | Understand governance/compliance retention behavior at concept level |
| Query object metadata/inventory | S3 Inventory/Athena | Useful for large-scale reporting |
| Protect origin content | CloudFront origin access control/origin access identity pattern | Avoid public bucket origins when private delivery is required |
Block, file, and shared storage
| Service | Choose when… | Operations focus |
|---|---|---|
| EBS | Block storage for one EC2 instance or supported clustered use case | Snapshots, encryption, volume type/performance, attachment, resizing |
| EFS | Shared Linux NFS file system | Mount targets, security groups, access points, lifecycle policies |
| FSx for Windows File Server | Managed Windows SMB file shares | AD integration, backups, Windows workloads |
| FSx for Lustre | High-performance file system for compute workloads | S3 integration patterns, throughput-heavy jobs |
| Instance store | Temporary high-performance local storage | Data is ephemeral; do not use for durable state |
| S3 | Object storage | Event notifications, lifecycle, replication, access policies |
Database operations
| Need | RDS/Aurora feature | Key distinction |
|---|---|---|
| High availability in a Region | Multi-AZ deployment | HA/failover, not read scaling by itself |
| Read scaling | Read replicas/Aurora replicas | Can also support some DR patterns |
| Point-in-time restore | Automated backups | Restore creates a new DB resource |
| Manual long-term recovery point | DB snapshot | Operationally controlled backup point |
| Reduce connection storms | RDS Proxy | Especially useful with spiky/serverless app connections |
| Diagnose DB load | Performance Insights, Enhanced Monitoring, CloudWatch | Choose based on query/database vs OS-level view |
| Change engine settings | Parameter group | Some changes require reboot depending setting |
| Upgrade safely | Snapshot, test, maintenance window, blue/green where available | Avoid untested production upgrades |
| Encrypt database | KMS-backed encryption at creation/restore as supported | Plan key permissions and snapshot sharing behavior |
| Need | DynamoDB feature | Key distinction |
|---|---|---|
| Automatic capacity adjustment | Auto scaling or on-demand capacity mode | Choose based on predictability |
| Recover table to prior time | Point-in-time recovery | Operational recovery, not analytics |
| Global low-latency writes/reads | Global tables | Multi-Region active-active pattern |
| React to item changes | DynamoDB Streams | Feed Lambda/consumers |
| Expire old items | TTL | Deletion is asynchronous |
| Protect accidental deletion | Backups, PITR, IAM controls | CloudFormation deletion policy may also matter |
Networking and content delivery
VPC connectivity decision table
| Need | Choose | High-yield notes |
|---|---|---|
| Public IPv4 internet access for instance | Public subnet route to internet gateway + public IP | Security group/NACL must allow traffic |
| Private subnet outbound IPv4 internet | NAT gateway or NAT instance | NAT does not allow unsolicited inbound from internet |
| Private IPv6 outbound internet | Egress-only internet gateway | IPv6 does not use NAT in the same way |
| Private access to S3/DynamoDB | Gateway VPC endpoint | Route table association and endpoint policy matter |
| Private access to many AWS services | Interface VPC endpoint | ENI-based, security groups, private DNS option |
| Connect VPCs at scale | Transit Gateway | Hub-and-spoke routing; route tables still matter |
| Simple direct VPC-to-VPC connectivity | VPC peering | Non-transitive; CIDR overlap is a blocker |
| Hybrid encrypted connection | Site-to-Site VPN | Faster to establish than physical private connectivity |
| Dedicated private network | AWS Direct Connect | Often paired with VPN for encryption/backup design |
| DNS routing and failover | Route 53 | Health checks and routing policies are central |
| Global static entry and acceleration | AWS Global Accelerator | Routes to healthy regional endpoints over AWS network |
| Cache static/dynamic content at edge | CloudFront | Cache behavior, origin, TTL, invalidation, TLS |
Security group vs NACL
| Control | Security group | Network ACL |
|---|---|---|
| Scope | Elastic network interface/resource | Subnet |
| State | Stateful | Stateless |
| Rules | Allow rules only | Allow and deny rules |
| Evaluation | All applicable rules | Ordered rule evaluation |
| Common use | Instance/app firewall | Subnet guardrail or explicit deny |
| Exam trap | Return traffic automatically allowed | Return traffic must be explicitly allowed |
Route 53 routing policies
| Policy | Use when… |
|---|---|
| Simple | Single basic answer |
| Weighted | Split traffic by assigned proportions |
| Latency-based | Send users to lowest-latency Region |
| Failover | Active/passive with health checks |
| Geolocation | Route by user geographic location |
| Geoproximity | Route by location with optional bias |
| Multivalue answer | Return multiple healthy records |
| Alias | Point DNS to supported AWS resources without hard-coding IPs |
VPC troubleshooting quick path
flowchart TD
A[Connectivity failure] --> B{DNS resolves?}
B -- No --> C[Check Route 53/private hosted zone/resolver/DHCP options]
B -- Yes --> D{Route exists?}
D -- No --> E[Check route table, TGW, peering, IGW, NAT, endpoint]
D -- Yes --> F{Firewall allows?}
F -- No --> G[Check security groups and NACLs both directions]
F -- Yes --> H{Target healthy/listening?}
H -- No --> I[Check OS firewall, app port, ELB health check, instance status]
H -- Yes --> J[Check asymmetric routing, TLS, proxy, endpoint policy, IAM]
Security, identity, and compliance operations
IAM policy evaluation reference
| Concept | Exam meaning |
|---|---|
| Default deny | No permission unless allowed |
| Explicit deny | Overrides any allow |
| Identity-based policy | Attached to users, groups, or roles |
| Resource-based policy | Attached to resource, such as S3 bucket, KMS key, Lambda function |
| Permissions boundary | Maximum permissions an identity can receive |
| SCP | Maximum permissions for accounts/OUs in AWS Organizations; does not grant access |
| Session policy | Further restricts temporary session permissions |
| Role | Assumed for temporary credentials; preferred for AWS services and cross-account access |
| Instance profile | Delivers IAM role credentials to EC2 |
| Trust policy | Defines who can assume a role |
| Access Analyzer | Detects unintended external access and validates policies |
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::111122223333:role/AppRole \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::example-bucket/example-key
Security and governance service selection
| Need | Service | Notes |
|---|---|---|
| API audit logs | CloudTrail | Enable organization trail for multi-account visibility where appropriate |
| Resource compliance | AWS Config | Managed/custom rules, aggregators, conformance packs |
| Threat detection | GuardDuty | Findings from account/workload signals |
| Security findings aggregation | Security Hub | Consolidates findings and standards checks |
| Vulnerability scanning | Amazon Inspector | EC2, container, and Lambda vulnerability coverage depending configuration |
| S3 sensitive data discovery | Macie | Data classification focus |
| DDoS protection | AWS Shield | Standard is automatic; Advanced adds more protections/features |
| Web app filtering | AWS WAF | Rules for HTTP/S traffic at ALB, CloudFront, API Gateway, etc. |
| Certificate management | AWS Certificate Manager | Public/private cert lifecycle for integrated services |
| Secrets rotation | Secrets Manager | Rotation workflows and database integrations |
| Central account guardrails | AWS Organizations SCPs | Guardrails only; IAM still grants actual permissions |
| Key management | KMS | Key policy, IAM, grants, rotation settings, auditing |
KMS operational distinctions
| Topic | Remember |
|---|---|
| Key policy | Primary control for KMS key access |
| IAM policy | Can allow KMS actions only if key policy permits/delegates |
| Grants | Common for AWS services needing temporary/delegated key use |
| AWS managed key | Managed by AWS for a service/account |
| Customer managed key | More control over policy, rotation settings, auditing, deletion scheduling |
| Multi-Region key | Useful for client-side or service patterns needing related keys across Regions |
| Encryption context | Additional authenticated data used by some integrations/policies |
| S3 SSE-KMS failure | Check both S3 permission and KMS key permission |
Reliability, backup, and disaster recovery
Reliability design choices
| Requirement | Prefer | Why |
|---|---|---|
| Survive instance failure | Auto Scaling across Availability Zones | Replaces unhealthy capacity |
| Survive AZ failure for app tier | Multi-AZ subnets + load balancer + Auto Scaling | Distributes traffic and capacity |
| Survive DB instance/AZ failure | RDS Multi-AZ or Aurora HA design | Managed failover capability |
| Recover accidental delete | Backups, snapshots, versioning, PITR | HA is not backup |
| Regional disaster recovery | Cross-Region backups/replication + Route 53/Global Accelerator patterns | DR requires runbooks and testing |
| Reduce noisy alerts | Composite alarms and dependency-aware runbooks | Avoid paging on downstream symptoms only |
| Validate resilience | Game days/failure testing where appropriate | Know rollback and blast radius |
| Standardize recovery | Systems Manager Automation runbooks | Repeatable operations beat manual steps |
DR pattern reference
| Pattern | Cost/complexity | Operational idea |
|---|---|---|
| Backup and restore | Lowest | Restore from backups when needed |
| Pilot light | Low/medium | Core components replicated; scale out during event |
| Warm standby | Medium/high | Scaled-down full environment already running |
| Active-active | Highest | Multiple Regions actively serve traffic |
Backup decision points
| Need | Use |
|---|---|
| Centralized policy across supported services | AWS Backup plans |
| EC2 volume recovery | EBS snapshots or AWS Backup |
| RDS database recovery | Automated backups, snapshots, AWS Backup |
| S3 object recovery | Versioning, replication, Object Lock where configured |
| Cross-account backup isolation | AWS Backup cross-account strategy |
| Cross-Region recovery | Cross-Region backup/copy/replication |
| Accidental stack deletion protection | CloudFormation termination protection and deletion policies |
Cost, performance, and operational hygiene
| Goal | Tools/actions | Exam note |
|---|---|---|
| Detect budget overrun | AWS Budgets, Cost Explorer, cost anomaly detection | Budgets notify/control; Cost Explorer analyzes |
| Rightsize compute | AWS Compute Optimizer, CloudWatch metrics | Needs enough metric history to make useful recommendations |
| Reduce idle resources | Find unattached EBS volumes, idle load balancers, old snapshots, unused Elastic IPs | Tag ownership and lifecycle |
| Optimize S3 cost | Lifecycle policies, storage class analysis, inventory | Match storage class to access and retrieval needs |
| Reduce NAT dependency | VPC endpoints for supported AWS services | Often improves private connectivity posture too |
| Control log cost | Log retention, filters, export/archive strategy | Infinite retention can become expensive |
| Standardize tags | Tag policies, Config rules, cost allocation tags | Tags support cost, automation, and ownership |
| Improve app latency | ALB/NLB choice, CloudFront, caching, database tuning | Do not solve all latency with larger instances |
| Improve database performance | Performance Insights, read replicas, indexes/query tuning, caching | Multi-AZ is HA, not a read-scaling feature |
| Scale queues/workers | SQS metrics + Auto Scaling/custom metrics | Scale on backlog per worker or latency-oriented metric |
High-yield traps and distinctions
| Trap | Correct exam reasoning |
|---|---|
| “Need to know who changed it” -> choose CloudWatch | Choose CloudTrail for API audit; Config for configuration timeline |
| “Need to keep resource compliant” -> use CloudTrail only | Use AWS Config rules or Systems Manager State Manager depending resource/config |
| “Private subnet needs S3 access” -> NAT is always best | Gateway VPC endpoint is usually the private AWS-native path for S3 |
| “Multi-AZ means backup” | Multi-AZ is availability; backups/versioning/PITR handle recovery from bad changes |
| “Read replica means automatic HA failover for primary” | Read replicas are primarily read scaling/DR; Multi-AZ is the HA answer for RDS primary failover |
| “SCP grants admin access” | SCPs set maximum permissions; IAM/resource policies still grant |
| “Security group blocks with deny rule” | Security groups allow only; NACLs can explicitly deny |
| “User data is configuration management” | User data bootstraps; Systems Manager/CloudFormation maintain repeatable operations |
| “Changing launch template updates running instances” | Existing instances remain until replaced, refreshed, or relaunched |
| “CloudFront replaces Route 53” | CloudFront caches/distributes content; Route 53 resolves DNS and routes queries |
| “CloudWatch default EC2 metrics include memory” | Memory/disk typically require CloudWatch Agent/custom metrics |
| “CloudTrail data events are always automatically logged” | Know the distinction between management events and optional data-event logging |
| “KMS IAM allow is enough” | Key policy, IAM policy, grants, and service integration all matter |
| “Public subnet equals internet reachable” | Needs route, public address, firewall rules, and listening service |
| “NACL statefulness works like security groups” | NACLs are stateless; return path rules matter |
Rapid scenario drill table
| If the scenario says… | Fast answer direction |
|---|---|
| “No SSH allowed, but admins need shell access” | Systems Manager Session Manager |
| “Run a command on all instances tagged Environment=Prod” | Systems Manager Run Command |
| “Ensure a package remains installed” | Systems Manager State Manager |
| “Patch instances during a defined window” | Patch Manager + Maintenance Windows |
| “Preview infrastructure changes before update” | CloudFormation change set |
| “Manual console changes caused drift” | CloudFormation drift detection; remediate via template |
| “Notify on EC2 state changes” | EventBridge rule |
| “Alarm when error rate exceeds threshold” | CloudWatch metric/math alarm |
| “Search last hour of app errors” | CloudWatch Logs Insights |
| “Count log pattern as metric” | CloudWatch Logs metric filter |
| “Who opened port 22?” | CloudTrail, then Config for current/history |
| “Block public S3 buckets across accounts” | S3 Block Public Access, Config/SCP guardrails as appropriate |
| “Analyze whether bucket policy allows outside access” | IAM Access Analyzer |
| “Database failover within Region” | RDS Multi-AZ/Aurora HA |
| “Scale reads from database” | Read replicas or cache layer |
| “Static website/global caching” | CloudFront in front of S3 or origin |
| “Private service access from VPC” | VPC endpoint |
| “Hybrid connection over internet” | Site-to-Site VPN |
| “Central hub for many VPCs” | Transit Gateway |
| “Filter malicious HTTP requests” | AWS WAF |
| “Aggregate security findings” | Security Hub |
| “Detect suspicious AWS account activity” | GuardDuty |
| “Find sensitive data in S3” | Macie |
| “Central backup policy and reporting” | AWS Backup |
| “Cost forecast and historical spend” | Cost Explorer |
| “Alert before budget is exceeded” | AWS Budgets |
Final review checklist
Before sitting for SOA-C03, be able to:
- Pick between CloudWatch, CloudTrail, AWS Config, EventBridge, and Systems Manager without hesitation.
- Troubleshoot EC2, Auto Scaling, ELB, Route 53, and VPC connectivity from symptoms.
- Explain security group vs NACL, NAT vs VPC endpoint, Multi-AZ vs backup, and SCP vs IAM policy.
- Choose operational automation: Run Command, State Manager, Patch Manager, Automation, CloudFormation, CodeDeploy, or instance refresh.
- Map storage/database recovery needs to S3 versioning, EBS snapshots, RDS backups/PITR, DynamoDB PITR, and AWS Backup.
- Recognize least-privilege, encryption, logging, tagging, and repeatable infrastructure patterns.
Next step: work through timed SOA-C03-style scenarios and force yourself to name the AWS service, the operational reason, and the first troubleshooting check before reading the explanation.