SOA-C03 Syllabus - Objectives by Domain

Blueprint-aligned learning objectives for AWS Certified CloudOps Engineer - Associate (SOA-C03), organized by domain with quick links to targeted practice.

Use this syllabus as your source of truth for SOA-C03. Work through each domain in order and drill focused sets after every task.

What’s covered

Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization (22%)
Content Domain 2: Reliability and Business Continuity (22%)
Content Domain 3: Deployment, Provisioning, and Automation (22%)
Content Domain 4: Security and Compliance (16%)
Content Domain 5: Networking and Content Delivery (18%)

Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization (22%)

Task 1.1 - Implement metrics, alarms, and filters by using AWS monitoring and logging services

Differentiate CloudWatch metrics, logs, and alarms and choose the right signal type for a given operational requirement.
Configure CloudWatch metric collection and CloudWatch Logs for common AWS services and workloads.
Configure CloudTrail to record management events and deliver logs to Amazon S3 and/or CloudWatch Logs for auditing and investigations.
Use CloudWatch Logs Insights to filter, aggregate, and analyze operational logs to answer specific troubleshooting questions.
Integrate CloudWatch with Amazon Managed Service for Prometheus and Amazon Managed Grafana at a high level to monitor containerized workloads.
Configure and manage the CloudWatch agent on EC2 to collect OS-level metrics (CPU, memory, disk) and application log files.
Configure the CloudWatch agent for ECS/EKS environments at a high level to collect container logs and metrics.
Troubleshoot missing CloudWatch agent metrics or logs by validating IAM permissions, network reachability, and agent configuration.
Create CloudWatch metric alarms with appropriate thresholds, evaluation periods, and actions to detect unhealthy conditions.
Configure composite alarms to reduce alert fatigue and represent dependent conditions across multiple metrics.
Route alarm events through EventBridge to trigger automation targets (for example, Lambda or Systems Manager) at a high level.
Troubleshoot alarm behavior (flapping, INSUFFICIENT_DATA, missing datapoints) by adjusting metric selection and alarm settings.
Create CloudWatch dashboards that summarize key service metrics and alarm states for a workload or fleet.
Configure cross-account and cross-Region observability dashboards for centralized operations monitoring.
Configure Amazon SNS topics and subscriptions for alert notifications and connect CloudWatch alarms to SNS.

Task 1.2 - Identify and remediate issues by using monitoring and availability metrics

Analyze CloudWatch metrics and logs to identify performance degradation and availability issues in AWS workloads.
Correlate operational symptoms with recent configuration or deployment changes by using CloudTrail and resource history.
Select an automated remediation approach (Auto Scaling, Lambda, Systems Manager Automation) based on the incident type and blast radius.
Implement CloudWatch alarm-driven remediation by invoking Systems Manager Automation runbooks.
Implement CloudWatch alarm-driven remediation by invoking Lambda functions at a high level.
Configure scaling policies (target tracking or step scaling) to remediate sustained load and maintain service performance.
Configure notification workflows for incidents using AWS User Notifications and/or SNS at a high level.
Configure EventBridge rules to route service events (for example, EC2 state changes) to operational targets.
Use EventBridge input transformers or enrichment patterns at a high level to add context before delivering events to targets.
Configure EventBridge targets with retry and dead-letter queue settings to improve operational reliability.
Troubleshoot EventBridge rules that do not trigger by validating event patterns, permissions, and bus/Region selection.
Run prebuilt Systems Manager Automation runbooks to perform common operational tasks (restart, recover, remediate) safely.
Create custom Systems Manager Automation documents that call AWS APIs or scripts to automate repeatable operations at a high level.
Troubleshoot Automation execution failures by validating assume-role permissions, parameters, and resource preconditions.

Task 1.3 - Implement performance optimization strategies for compute, storage, and database resources

Use performance metrics and resource tags to identify hotspots and bottlenecks across compute resources.
Interpret AWS Compute Optimizer recommendations and choose right-sizing actions while managing risk and change windows.
Tune Auto Scaling behavior and instance health checks to meet latency and throughput targets during scaling events.
Diagnose whether a workload is CPU-bound, memory-bound, or IO-bound using CloudWatch and OS metrics.
Interpret key EBS performance metrics (queue length, throughput, IOPS, burst balance) to diagnose volume constraints.
Select an appropriate EBS volume type (for example, gp3, io2, st1) based on performance and cost requirements.
Modify EBS volume size, IOPS, and throughput safely to improve performance without unnecessary downtime.
Troubleshoot EBS performance issues caused by instance limits, mis-sized volumes, or suboptimal configuration.
Optimize S3 upload and download performance using multipart uploads, concurrency, and request patterns.
Choose between S3 Transfer Acceleration, DataSync, and standard transfers based on data location and throughput requirements.
Apply S3 lifecycle policies and storage class choices to align performance, retrieval patterns, and cost objectives.
Select shared storage solutions (EFS vs FSx variants) based on protocol needs (NFS/SMB) and performance characteristics.
Configure EFS performance and throughput modes and lifecycle policies to optimize cost and performance for a workload.
Monitor RDS performance using Performance Insights and CloudWatch alarms and identify likely bottleneck categories.
Select remediation actions for RDS performance issues (instance class, storage, parameter group, connection management).
Use RDS Proxy at a high level to improve database connection efficiency for bursty application workloads.
Use EC2 placement groups appropriately (cluster, spread, partition) to improve performance or resilience for a given workload.

Content Domain 2: Reliability and Business Continuity (22%)

Task 2.1 - Implement scalability and elasticity

Differentiate scalability from elasticity and apply the concepts to real operations scenarios on AWS.
Configure Auto Scaling groups with target tracking policies to maintain performance under changing load.
Configure Auto Scaling group health checks, instance replacement behavior, and capacity settings to maintain availability.
Use scheduled scaling to prepare for predictable traffic patterns and operational events.
Configure ECS service auto scaling or EKS scaling at a high level to maintain application performance.
Tune scaling cooldowns and stabilization windows to prevent thrashing and reduce operational noise.
Use CloudFront caching to reduce origin load and improve global scalability for web applications.
Configure CloudFront cache behaviors (TTL, cache key) appropriate for dynamic and static content patterns.
Use ElastiCache to offload frequent reads and reduce database load for scalable applications.
Choose between Redis and Memcached for caching based on persistence, features, and operational requirements.
Configure DynamoDB capacity mode and auto scaling to support variable traffic patterns reliably.
Apply caching strategies such as DynamoDB Accelerator (DAX) or application caching to improve read scalability.
Configure RDS scaling strategies at a high level (read replicas, instance sizing) to meet workload demand.
Interpret scalability metrics (request rate, latency, CPU, connections) to validate that scaling meets SLOs.
Troubleshoot scaling failures by validating scaling policies, metrics, permissions, and capacity constraints.

Task 2.2 - Implement highly available and resilient environments

Choose the appropriate Elastic Load Balancing option (ALB, NLB, GWLB) for a highly available workload.
Configure target groups and health checks to detect unhealthy instances accurately and trigger failover.
Configure cross-zone load balancing and understand its impact on distribution behavior and cost.
Configure listeners and routing at a high level to support resilient application traffic patterns.
Troubleshoot unhealthy targets and 5xx errors using ELB metrics and access logs.
Configure Route 53 health checks and failover routing to route traffic away from unhealthy endpoints.
Apply Route 53 routing policies (failover, weighted, latency) to improve availability and resilience.
Implement multi-AZ compute patterns using Auto Scaling groups across multiple subnets and Availability Zones.
Configure RDS Multi-AZ deployments and understand failover behavior and operational considerations.
Configure Aurora replicas and failover behavior at a high level to support resilience requirements.
Identify AZ-scoped single points of failure (for example, NAT gateways, instance-local state) and remediate them.
Use regional services (for example, S3, DynamoDB) where appropriate to reduce the impact of AZ failures.
Validate high availability by simulating failures and verifying health check and failover behavior.
Reduce blast radius by using isolation boundaries (AZs, subnets, partitions) and controlled rollout practices.

Task 2.3 - Implement backup and restore strategies

Create AWS Backup plans with schedules, lifecycle policies, and vault configuration to meet retention requirements.
Select and assign backup resources using tags and resource assignments to standardize coverage.
Configure cross-account and cross-Region backup copy policies to meet business continuity requirements.
Restore common resources (EBS volumes, EC2 instances, RDS databases, DynamoDB tables, EFS) using AWS Backup.
Explain RPO and RTO and map them to backup frequency, restore approach, and operational runbooks.
Perform RDS snapshot restore and point-in-time recovery to meet stated RTO/RPO and cost constraints.
Perform DynamoDB point-in-time recovery at a high level and validate restore outcomes.
Validate backup integrity by conducting restore drills and documenting operational results.
Enable and manage S3 versioning to protect against accidental deletion and overwrite scenarios.
Recover data by managing S3 delete markers and restoring previous object versions operationally.
Use FSx backup or snapshot capabilities at a high level to support recovery objectives.
Create and follow a disaster recovery runbook that includes failover and failback steps for a workload.
Choose an appropriate DR strategy (backup and restore, pilot light, warm standby) based on requirements and cost.
Automate snapshots and backups for EC2/EBS/RDS resources using AWS Backup or native mechanisms.
Troubleshoot failed backup jobs by validating permissions, vault policies, configuration, and service prerequisites.

Content Domain 3: Deployment, Provisioning, and Automation (22%)

Task 3.1 - Provision and maintain cloud resources

Create and manage AMIs by using EC2 Image Builder pipelines and manage versioning across releases.
Apply image hardening and patching practices during AMI builds to reduce operational risk.
Distribute AMIs across Regions and accounts and manage rollback to a prior image version when needed.
Build, tag, and manage container images and store them in Amazon ECR for operational use.
Create and update CloudFormation stacks using safe change practices such as change sets and drift detection.
Troubleshoot CloudFormation stack failures by using stack events, error messages, and rollback states.
Use AWS CDK at a high level to synthesize and deploy CloudFormation stacks as infrastructure as code.
Diagnose subnet sizing and IP exhaustion issues that prevent deployments and remediate with CIDR planning.
Diagnose IAM permission issues that prevent resource provisioning and remediate with least-privilege policies.
Deploy standardized resources across multiple accounts and Regions by using CloudFormation StackSets.
Share resources across accounts by using AWS RAM (for example, subnets or Transit Gateway attachments) at a high level.
Implement deployment strategies (rolling, blue/green, canary) to minimize downtime and reduce risk during changes.
Configure deployment services at a high level (for example, CodeDeploy or ECS deployment options) to support automatic rollback.
Apply consistent tagging standards to provisioned resources to support operations, cost allocation, and governance.
Use Terraform at a high level to provision AWS resources while managing state safely and predictably.
Use Git workflows at a high level (branching, pull requests, reviews) to manage infrastructure as code changes.
Remediate common deployment issues such as parameter misconfiguration, dependency ordering, and Region constraints.

Task 3.2 - Automate the management of existing resources

Use Systems Manager Run Command to execute operational actions on managed instances at scale.
Configure Patch Manager to apply OS and application patches on a schedule with controlled blast radius.
Use State Manager to enforce configuration baselines and remediate configuration drift.
Store and retrieve configuration values securely using Parameter Store and related Systems Manager capabilities.
Use Session Manager for secure administrative access without opening inbound ports or managing SSH keys.
Create automation workflows in Systems Manager Automation to restart services, remediate issues, and standardize operations.
Configure maintenance windows and associations to control timing and scope of automated operational tasks.
Automate operational tasks based on events by using Lambda and EventBridge at a high level.
Configure S3 event notifications to trigger automation when objects are created, updated, or deleted.
Implement guardrails for automation (approvals, rate limits, scoped IAM roles) to reduce operational risk.
Monitor automation executions and troubleshoot failures by using execution history, logs, and IAM diagnostics.
Combine CloudWatch alarms with automation targets to close the loop on detection and remediation.

Content Domain 4: Security and Compliance (16%)

Task 4.1 - Implement and manage security and compliance tools and policies

Configure IAM password policies and multi-factor authentication (MFA) requirements for human users.
Design and implement IAM roles and trust policies for secure service-to-service and cross-account access.
Apply IAM policy conditions (for example, tags, source IP, MFA present) to enforce least privilege.
Configure and use federated identity and IAM Identity Center at a high level for centralized access management.
Implement resource-based policies for services such as S3 or KMS and reason about policy evaluation at a high level.
Troubleshoot AccessDenied errors by using CloudTrail and identifying the calling principal and API action.
Use the IAM policy simulator to validate effective permissions before deploying access changes.
Use IAM Access Analyzer findings to detect unintended external access and remediate exposure.
Implement secure multi-account strategies using AWS Organizations and Control Tower concepts at a high level.
Use service control policies (SCPs) to enforce guardrails across accounts without granting permissions.
Interpret AWS Trusted Advisor security checks and prioritize remediation actions.
Operationalize remediation for common Trusted Advisor findings (for example, public S3 access, open security groups).
Enforce compliance constraints on Region and service usage by using SCPs and governance controls.
Use AWS Config at a high level to assess compliance posture and record configuration changes for auditing.

Task 4.2 - Implement strategies to protect data and infrastructure

Define and implement a data classification scheme and apply classification through tagging and access controls.
Use Amazon Macie at a high level to discover and classify sensitive data stored in Amazon S3.
Enforce encryption at rest for common AWS services (EBS, S3, RDS, DynamoDB) by using AWS KMS.
Manage KMS key policies and grants to enable least-privilege access to encrypted data.
Troubleshoot KMS-related access failures by validating key policy, IAM policy, key state, and Region.
Configure encryption in transit by using ACM certificates for endpoints such as ALB, CloudFront, or API front doors.
Troubleshoot TLS and certificate issues (expired certificates, wrong domain/SNI, chain problems) at a high level.
Store application secrets in AWS Secrets Manager and retrieve them securely from workloads using IAM roles.
Configure secret rotation and monitor rotation outcomes to ensure credentials remain valid.
Use AWS Config rules to detect noncompliant resource configurations and trigger remediation workflows.
Interpret GuardDuty findings and select appropriate incident response and remediation actions at a high level.
Interpret Inspector findings for vulnerabilities and prioritize remediation actions for affected resources.
Aggregate and triage security findings in Security Hub and route them into operational workflows.
Implement incident response actions to protect infrastructure (isolation, credential rotation, evidence preservation).

Content Domain 5: Networking and Content Delivery (18%)

Task 5.1 - Implement and optimize networking features and connectivity

Create and configure VPCs, subnets, and route tables to support public and private tiers for workloads.
Configure security groups and network ACLs and explain their differences in statefulness and evaluation order.
Configure NAT gateways and internet gateways to enable outbound and inbound connectivity appropriately.
Use egress-only internet gateways to enable IPv6 outbound-only internet access for private subnets.
Configure VPC endpoints (gateway and interface) to access AWS services privately without traversing the public internet.
Implement VPC peering connectivity and understand limitations such as non-transitive routing.
Implement Transit Gateway connectivity at a high level for hub-and-spoke routing across multiple VPCs.
Implement AWS PrivateLink at a high level to access services privately across VPCs and accounts.
Configure AWS Client VPN at a high level to provide secure user connectivity to VPC resources.
Configure Site-to-Site VPN at a high level to provide hybrid connectivity to AWS networks.
Audit network protection services (DNS Firewall, WAF, Shield, Network Firewall) and validate that controls are applied correctly.
Enable and review logs and metrics for network protection services to validate effectiveness and support investigations.
Optimize network architecture cost by reducing NAT gateway data processing and minimizing cross-AZ data transfer.
Choose private connectivity options (VPC endpoints, PrivateLink) versus public endpoints based on security and cost requirements.
Identify common connectivity-breaking misconfigurations (routes, DNS, SG/NACL) and select appropriate remediations.

Task 5.2 - Configure domains, DNS services, and content delivery

Configure Route 53 hosted zones and record sets for public and private DNS use cases.
Configure Route 53 Resolver inbound and outbound endpoints at a high level for hybrid DNS resolution.
Configure private hosted zones and associate them with VPCs correctly to support split-horizon DNS.
Implement Route 53 routing policies (simple, weighted, latency, failover) to meet availability and performance goals.
Implement Route 53 health checks and understand how they influence failover decisions and routing.
Enable Route 53 query logging and interpret logs to troubleshoot DNS resolution problems.
Troubleshoot DNS issues such as resolver misconfiguration, split-horizon conflicts, and record set mistakes.
Configure CloudFront distributions with origins, behaviors, and cache policies for content delivery.
Configure CloudFront origin access control (OAC) at a high level and restrict direct access to the origin.
Tune CloudFront caching behavior (TTL, cache key) to balance performance and correctness for a workload.
Use CloudFront and AWS WAF together at a high level to protect edge-delivered applications.
Configure Global Accelerator endpoints and health checks at a high level for improved performance and availability.
Choose between CloudFront and Global Accelerator based on protocol, caching needs, and performance requirements.
Troubleshoot content delivery issues by using CloudFront logs, metrics, and cache invalidations.

Task 5.3 - Troubleshoot network connectivity issues

Use VPC Reachability Analyzer to determine why traffic between two endpoints fails.
Troubleshoot routing issues caused by route tables, subnet associations, and missing or incorrect routes.
Troubleshoot security issues caused by security groups and network ACL rules for inbound and outbound traffic.
Troubleshoot NAT gateway and internet gateway connectivity issues for workloads in private subnets.
Troubleshoot Transit Gateway attachments and route propagation at a high level when connectivity is broken.
Collect and interpret VPC Flow Logs to identify accepted versus rejected traffic and the likely rejecting layer.
Enable and interpret ELB access logs to diagnose client errors, backend errors, and routing issues.
Enable and interpret AWS WAF web ACL logs to diagnose blocked requests and reduce false positives.
Use CloudFront logs and metrics to diagnose edge errors, origin timeouts, and caching behavior.
Identify and remediate CloudFront caching issues by adjusting cache policies and using invalidations appropriately.
Troubleshoot hybrid connectivity issues for VPN-based connections at a high level (tunnel state, routes, DNS).
Troubleshoot private connectivity issues involving VPC endpoints, PrivateLink, and DNS resolution at a high level.
Configure and analyze CloudWatch network monitoring metrics and features to detect and investigate connectivity degradation.
Apply a systematic troubleshooting workflow: symptom, scope, signals (logs/metrics), root cause, and remediation.
Validate remediation by re-running reachability checks and monitoring post-change metrics and logs.

Tip: for SOA-C03, convert misses into short runbook rules (signal -> root cause -> first safe remediation).

Study Plan

Cheat Sheet