SOA-C03 Syllabus - Objectives by Domain

Blueprint-aligned learning objectives for AWS Certified CloudOps Engineer - Associate (SOA-C03), organized by domain with quick links to targeted practice.

Use this syllabus as your source of truth for SOA-C03. Work through each domain in order and drill focused sets after every task.

What’s covered

Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization (22%)

Task 1.1 - Implement metrics, alarms, and filters by using AWS monitoring and logging services

  • Differentiate CloudWatch metrics, logs, and alarms and choose the right signal type for a given operational requirement.
  • Configure CloudWatch metric collection and CloudWatch Logs for common AWS services and workloads.
  • Configure CloudTrail to record management events and deliver logs to Amazon S3 and/or CloudWatch Logs for auditing and investigations.
  • Use CloudWatch Logs Insights to filter, aggregate, and analyze operational logs to answer specific troubleshooting questions.
  • Integrate CloudWatch with Amazon Managed Service for Prometheus and Amazon Managed Grafana at a high level to monitor containerized workloads.
  • Configure and manage the CloudWatch agent on EC2 to collect OS-level metrics (CPU, memory, disk) and application log files.
  • Configure the CloudWatch agent for ECS/EKS environments at a high level to collect container logs and metrics.
  • Troubleshoot missing CloudWatch agent metrics or logs by validating IAM permissions, network reachability, and agent configuration.
  • Create CloudWatch metric alarms with appropriate thresholds, evaluation periods, and actions to detect unhealthy conditions.
  • Configure composite alarms to reduce alert fatigue and represent dependent conditions across multiple metrics.
  • Route alarm events through EventBridge to trigger automation targets (for example, Lambda or Systems Manager) at a high level.
  • Troubleshoot alarm behavior (flapping, INSUFFICIENT_DATA, missing datapoints) by adjusting metric selection and alarm settings.
  • Create CloudWatch dashboards that summarize key service metrics and alarm states for a workload or fleet.
  • Configure cross-account and cross-Region observability dashboards for centralized operations monitoring.
  • Configure Amazon SNS topics and subscriptions for alert notifications and connect CloudWatch alarms to SNS.

Task 1.2 - Identify and remediate issues by using monitoring and availability metrics

  • Analyze CloudWatch metrics and logs to identify performance degradation and availability issues in AWS workloads.
  • Correlate operational symptoms with recent configuration or deployment changes by using CloudTrail and resource history.
  • Select an automated remediation approach (Auto Scaling, Lambda, Systems Manager Automation) based on the incident type and blast radius.
  • Implement CloudWatch alarm-driven remediation by invoking Systems Manager Automation runbooks.
  • Implement CloudWatch alarm-driven remediation by invoking Lambda functions at a high level.
  • Configure scaling policies (target tracking or step scaling) to remediate sustained load and maintain service performance.
  • Configure notification workflows for incidents using AWS User Notifications and/or SNS at a high level.
  • Configure EventBridge rules to route service events (for example, EC2 state changes) to operational targets.
  • Use EventBridge input transformers or enrichment patterns at a high level to add context before delivering events to targets.
  • Configure EventBridge targets with retry and dead-letter queue settings to improve operational reliability.
  • Troubleshoot EventBridge rules that do not trigger by validating event patterns, permissions, and bus/Region selection.
  • Run prebuilt Systems Manager Automation runbooks to perform common operational tasks (restart, recover, remediate) safely.
  • Create custom Systems Manager Automation documents that call AWS APIs or scripts to automate repeatable operations at a high level.
  • Troubleshoot Automation execution failures by validating assume-role permissions, parameters, and resource preconditions.

Task 1.3 - Implement performance optimization strategies for compute, storage, and database resources

  • Use performance metrics and resource tags to identify hotspots and bottlenecks across compute resources.
  • Interpret AWS Compute Optimizer recommendations and choose right-sizing actions while managing risk and change windows.
  • Tune Auto Scaling behavior and instance health checks to meet latency and throughput targets during scaling events.
  • Diagnose whether a workload is CPU-bound, memory-bound, or IO-bound using CloudWatch and OS metrics.
  • Interpret key EBS performance metrics (queue length, throughput, IOPS, burst balance) to diagnose volume constraints.
  • Select an appropriate EBS volume type (for example, gp3, io2, st1) based on performance and cost requirements.
  • Modify EBS volume size, IOPS, and throughput safely to improve performance without unnecessary downtime.
  • Troubleshoot EBS performance issues caused by instance limits, mis-sized volumes, or suboptimal configuration.
  • Optimize S3 upload and download performance using multipart uploads, concurrency, and request patterns.
  • Choose between S3 Transfer Acceleration, DataSync, and standard transfers based on data location and throughput requirements.
  • Apply S3 lifecycle policies and storage class choices to align performance, retrieval patterns, and cost objectives.
  • Select shared storage solutions (EFS vs FSx variants) based on protocol needs (NFS/SMB) and performance characteristics.
  • Configure EFS performance and throughput modes and lifecycle policies to optimize cost and performance for a workload.
  • Monitor RDS performance using Performance Insights and CloudWatch alarms and identify likely bottleneck categories.
  • Select remediation actions for RDS performance issues (instance class, storage, parameter group, connection management).
  • Use RDS Proxy at a high level to improve database connection efficiency for bursty application workloads.
  • Use EC2 placement groups appropriately (cluster, spread, partition) to improve performance or resilience for a given workload.

Content Domain 2: Reliability and Business Continuity (22%)

Task 2.1 - Implement scalability and elasticity

  • Differentiate scalability from elasticity and apply the concepts to real operations scenarios on AWS.
  • Configure Auto Scaling groups with target tracking policies to maintain performance under changing load.
  • Configure Auto Scaling group health checks, instance replacement behavior, and capacity settings to maintain availability.
  • Use scheduled scaling to prepare for predictable traffic patterns and operational events.
  • Configure ECS service auto scaling or EKS scaling at a high level to maintain application performance.
  • Tune scaling cooldowns and stabilization windows to prevent thrashing and reduce operational noise.
  • Use CloudFront caching to reduce origin load and improve global scalability for web applications.
  • Configure CloudFront cache behaviors (TTL, cache key) appropriate for dynamic and static content patterns.
  • Use ElastiCache to offload frequent reads and reduce database load for scalable applications.
  • Choose between Redis and Memcached for caching based on persistence, features, and operational requirements.
  • Configure DynamoDB capacity mode and auto scaling to support variable traffic patterns reliably.
  • Apply caching strategies such as DynamoDB Accelerator (DAX) or application caching to improve read scalability.
  • Configure RDS scaling strategies at a high level (read replicas, instance sizing) to meet workload demand.
  • Interpret scalability metrics (request rate, latency, CPU, connections) to validate that scaling meets SLOs.
  • Troubleshoot scaling failures by validating scaling policies, metrics, permissions, and capacity constraints.

Task 2.2 - Implement highly available and resilient environments

  • Choose the appropriate Elastic Load Balancing option (ALB, NLB, GWLB) for a highly available workload.
  • Configure target groups and health checks to detect unhealthy instances accurately and trigger failover.
  • Configure cross-zone load balancing and understand its impact on distribution behavior and cost.
  • Configure listeners and routing at a high level to support resilient application traffic patterns.
  • Troubleshoot unhealthy targets and 5xx errors using ELB metrics and access logs.
  • Configure Route 53 health checks and failover routing to route traffic away from unhealthy endpoints.
  • Apply Route 53 routing policies (failover, weighted, latency) to improve availability and resilience.
  • Implement multi-AZ compute patterns using Auto Scaling groups across multiple subnets and Availability Zones.
  • Configure RDS Multi-AZ deployments and understand failover behavior and operational considerations.
  • Configure Aurora replicas and failover behavior at a high level to support resilience requirements.
  • Identify AZ-scoped single points of failure (for example, NAT gateways, instance-local state) and remediate them.
  • Use regional services (for example, S3, DynamoDB) where appropriate to reduce the impact of AZ failures.
  • Validate high availability by simulating failures and verifying health check and failover behavior.
  • Reduce blast radius by using isolation boundaries (AZs, subnets, partitions) and controlled rollout practices.

Task 2.3 - Implement backup and restore strategies

  • Create AWS Backup plans with schedules, lifecycle policies, and vault configuration to meet retention requirements.
  • Select and assign backup resources using tags and resource assignments to standardize coverage.
  • Configure cross-account and cross-Region backup copy policies to meet business continuity requirements.
  • Restore common resources (EBS volumes, EC2 instances, RDS databases, DynamoDB tables, EFS) using AWS Backup.
  • Explain RPO and RTO and map them to backup frequency, restore approach, and operational runbooks.
  • Perform RDS snapshot restore and point-in-time recovery to meet stated RTO/RPO and cost constraints.
  • Perform DynamoDB point-in-time recovery at a high level and validate restore outcomes.
  • Validate backup integrity by conducting restore drills and documenting operational results.
  • Enable and manage S3 versioning to protect against accidental deletion and overwrite scenarios.
  • Recover data by managing S3 delete markers and restoring previous object versions operationally.
  • Use FSx backup or snapshot capabilities at a high level to support recovery objectives.
  • Create and follow a disaster recovery runbook that includes failover and failback steps for a workload.
  • Choose an appropriate DR strategy (backup and restore, pilot light, warm standby) based on requirements and cost.
  • Automate snapshots and backups for EC2/EBS/RDS resources using AWS Backup or native mechanisms.
  • Troubleshoot failed backup jobs by validating permissions, vault policies, configuration, and service prerequisites.

Content Domain 3: Deployment, Provisioning, and Automation (22%)

Task 3.1 - Provision and maintain cloud resources

  • Create and manage AMIs by using EC2 Image Builder pipelines and manage versioning across releases.
  • Apply image hardening and patching practices during AMI builds to reduce operational risk.
  • Distribute AMIs across Regions and accounts and manage rollback to a prior image version when needed.
  • Build, tag, and manage container images and store them in Amazon ECR for operational use.
  • Create and update CloudFormation stacks using safe change practices such as change sets and drift detection.
  • Troubleshoot CloudFormation stack failures by using stack events, error messages, and rollback states.
  • Use AWS CDK at a high level to synthesize and deploy CloudFormation stacks as infrastructure as code.
  • Diagnose subnet sizing and IP exhaustion issues that prevent deployments and remediate with CIDR planning.
  • Diagnose IAM permission issues that prevent resource provisioning and remediate with least-privilege policies.
  • Deploy standardized resources across multiple accounts and Regions by using CloudFormation StackSets.
  • Share resources across accounts by using AWS RAM (for example, subnets or Transit Gateway attachments) at a high level.
  • Implement deployment strategies (rolling, blue/green, canary) to minimize downtime and reduce risk during changes.
  • Configure deployment services at a high level (for example, CodeDeploy or ECS deployment options) to support automatic rollback.
  • Apply consistent tagging standards to provisioned resources to support operations, cost allocation, and governance.
  • Use Terraform at a high level to provision AWS resources while managing state safely and predictably.
  • Use Git workflows at a high level (branching, pull requests, reviews) to manage infrastructure as code changes.
  • Remediate common deployment issues such as parameter misconfiguration, dependency ordering, and Region constraints.

Task 3.2 - Automate the management of existing resources

  • Use Systems Manager Run Command to execute operational actions on managed instances at scale.
  • Configure Patch Manager to apply OS and application patches on a schedule with controlled blast radius.
  • Use State Manager to enforce configuration baselines and remediate configuration drift.
  • Store and retrieve configuration values securely using Parameter Store and related Systems Manager capabilities.
  • Use Session Manager for secure administrative access without opening inbound ports or managing SSH keys.
  • Create automation workflows in Systems Manager Automation to restart services, remediate issues, and standardize operations.
  • Configure maintenance windows and associations to control timing and scope of automated operational tasks.
  • Automate operational tasks based on events by using Lambda and EventBridge at a high level.
  • Configure S3 event notifications to trigger automation when objects are created, updated, or deleted.
  • Implement guardrails for automation (approvals, rate limits, scoped IAM roles) to reduce operational risk.
  • Monitor automation executions and troubleshoot failures by using execution history, logs, and IAM diagnostics.
  • Combine CloudWatch alarms with automation targets to close the loop on detection and remediation.

Content Domain 4: Security and Compliance (16%)

Task 4.1 - Implement and manage security and compliance tools and policies

  • Configure IAM password policies and multi-factor authentication (MFA) requirements for human users.
  • Design and implement IAM roles and trust policies for secure service-to-service and cross-account access.
  • Apply IAM policy conditions (for example, tags, source IP, MFA present) to enforce least privilege.
  • Configure and use federated identity and IAM Identity Center at a high level for centralized access management.
  • Implement resource-based policies for services such as S3 or KMS and reason about policy evaluation at a high level.
  • Troubleshoot AccessDenied errors by using CloudTrail and identifying the calling principal and API action.
  • Use the IAM policy simulator to validate effective permissions before deploying access changes.
  • Use IAM Access Analyzer findings to detect unintended external access and remediate exposure.
  • Implement secure multi-account strategies using AWS Organizations and Control Tower concepts at a high level.
  • Use service control policies (SCPs) to enforce guardrails across accounts without granting permissions.
  • Interpret AWS Trusted Advisor security checks and prioritize remediation actions.
  • Operationalize remediation for common Trusted Advisor findings (for example, public S3 access, open security groups).
  • Enforce compliance constraints on Region and service usage by using SCPs and governance controls.
  • Use AWS Config at a high level to assess compliance posture and record configuration changes for auditing.

Task 4.2 - Implement strategies to protect data and infrastructure

  • Define and implement a data classification scheme and apply classification through tagging and access controls.
  • Use Amazon Macie at a high level to discover and classify sensitive data stored in Amazon S3.
  • Enforce encryption at rest for common AWS services (EBS, S3, RDS, DynamoDB) by using AWS KMS.
  • Manage KMS key policies and grants to enable least-privilege access to encrypted data.
  • Troubleshoot KMS-related access failures by validating key policy, IAM policy, key state, and Region.
  • Configure encryption in transit by using ACM certificates for endpoints such as ALB, CloudFront, or API front doors.
  • Troubleshoot TLS and certificate issues (expired certificates, wrong domain/SNI, chain problems) at a high level.
  • Store application secrets in AWS Secrets Manager and retrieve them securely from workloads using IAM roles.
  • Configure secret rotation and monitor rotation outcomes to ensure credentials remain valid.
  • Use AWS Config rules to detect noncompliant resource configurations and trigger remediation workflows.
  • Interpret GuardDuty findings and select appropriate incident response and remediation actions at a high level.
  • Interpret Inspector findings for vulnerabilities and prioritize remediation actions for affected resources.
  • Aggregate and triage security findings in Security Hub and route them into operational workflows.
  • Implement incident response actions to protect infrastructure (isolation, credential rotation, evidence preservation).

Content Domain 5: Networking and Content Delivery (18%)

Task 5.1 - Implement and optimize networking features and connectivity

  • Create and configure VPCs, subnets, and route tables to support public and private tiers for workloads.
  • Configure security groups and network ACLs and explain their differences in statefulness and evaluation order.
  • Configure NAT gateways and internet gateways to enable outbound and inbound connectivity appropriately.
  • Use egress-only internet gateways to enable IPv6 outbound-only internet access for private subnets.
  • Configure VPC endpoints (gateway and interface) to access AWS services privately without traversing the public internet.
  • Implement VPC peering connectivity and understand limitations such as non-transitive routing.
  • Implement Transit Gateway connectivity at a high level for hub-and-spoke routing across multiple VPCs.
  • Implement AWS PrivateLink at a high level to access services privately across VPCs and accounts.
  • Configure AWS Client VPN at a high level to provide secure user connectivity to VPC resources.
  • Configure Site-to-Site VPN at a high level to provide hybrid connectivity to AWS networks.
  • Audit network protection services (DNS Firewall, WAF, Shield, Network Firewall) and validate that controls are applied correctly.
  • Enable and review logs and metrics for network protection services to validate effectiveness and support investigations.
  • Optimize network architecture cost by reducing NAT gateway data processing and minimizing cross-AZ data transfer.
  • Choose private connectivity options (VPC endpoints, PrivateLink) versus public endpoints based on security and cost requirements.
  • Identify common connectivity-breaking misconfigurations (routes, DNS, SG/NACL) and select appropriate remediations.

Task 5.2 - Configure domains, DNS services, and content delivery

  • Configure Route 53 hosted zones and record sets for public and private DNS use cases.
  • Configure Route 53 Resolver inbound and outbound endpoints at a high level for hybrid DNS resolution.
  • Configure private hosted zones and associate them with VPCs correctly to support split-horizon DNS.
  • Implement Route 53 routing policies (simple, weighted, latency, failover) to meet availability and performance goals.
  • Implement Route 53 health checks and understand how they influence failover decisions and routing.
  • Enable Route 53 query logging and interpret logs to troubleshoot DNS resolution problems.
  • Troubleshoot DNS issues such as resolver misconfiguration, split-horizon conflicts, and record set mistakes.
  • Configure CloudFront distributions with origins, behaviors, and cache policies for content delivery.
  • Configure CloudFront origin access control (OAC) at a high level and restrict direct access to the origin.
  • Tune CloudFront caching behavior (TTL, cache key) to balance performance and correctness for a workload.
  • Use CloudFront and AWS WAF together at a high level to protect edge-delivered applications.
  • Configure Global Accelerator endpoints and health checks at a high level for improved performance and availability.
  • Choose between CloudFront and Global Accelerator based on protocol, caching needs, and performance requirements.
  • Troubleshoot content delivery issues by using CloudFront logs, metrics, and cache invalidations.

Task 5.3 - Troubleshoot network connectivity issues

  • Use VPC Reachability Analyzer to determine why traffic between two endpoints fails.
  • Troubleshoot routing issues caused by route tables, subnet associations, and missing or incorrect routes.
  • Troubleshoot security issues caused by security groups and network ACL rules for inbound and outbound traffic.
  • Troubleshoot NAT gateway and internet gateway connectivity issues for workloads in private subnets.
  • Troubleshoot Transit Gateway attachments and route propagation at a high level when connectivity is broken.
  • Collect and interpret VPC Flow Logs to identify accepted versus rejected traffic and the likely rejecting layer.
  • Enable and interpret ELB access logs to diagnose client errors, backend errors, and routing issues.
  • Enable and interpret AWS WAF web ACL logs to diagnose blocked requests and reduce false positives.
  • Use CloudFront logs and metrics to diagnose edge errors, origin timeouts, and caching behavior.
  • Identify and remediate CloudFront caching issues by adjusting cache policies and using invalidations appropriately.
  • Troubleshoot hybrid connectivity issues for VPN-based connections at a high level (tunnel state, routes, DNS).
  • Troubleshoot private connectivity issues involving VPC endpoints, PrivateLink, and DNS resolution at a high level.
  • Configure and analyze CloudWatch network monitoring metrics and features to detect and investigate connectivity degradation.
  • Apply a systematic troubleshooting workflow: symptom, scope, signals (logs/metrics), root cause, and remediation.
  • Validate remediation by re-running reachability checks and monitoring post-change metrics and logs.

Tip: for SOA-C03, convert misses into short runbook rules (signal -> root cause -> first safe remediation).