DEA-C01 — AWS Certified Data Engineer – Associate Exam Blueprint
Last revised: June 29, 2026
Independent exam blueprint for AWS Certified Data Engineer – Associate (DEA-C01) readiness, covering ingestion, transformation, storage, operations, security, and governance.
How to Use This Exam Blueprint
Use this checklist as a practical readiness map for the AWS Certified Data Engineer – Associate (DEA-C01) exam from AWS. It is not a replacement for the official exam guide, and it does not claim exact exam weighting. Instead, it turns likely exam topic areas into concrete review tasks.
For each area, ask:
Can I choose the right AWS service for the scenario?
Can I explain why the wrong options are wrong?
Can I identify security, cost, reliability, and operational tradeoffs?
Can I troubleshoot a broken pipeline from symptoms, logs, permissions, schema changes, or data quality signals?
Can I connect ingestion, storage, transformation, cataloging, orchestration, monitoring, and governance into an end-to-end data architecture?
What happens if updated_at is missing or duplicated?
Whether the table should be partitioned by a date column for common queries?
Whether this logic belongs in a curated table, a view, or a downstream report?
Data quality and validation checklist
Quality dimension
What to test
Example failure
Completeness
Required fields exist and are populated
Missing customer ID
Validity
Values match expected type, range, or pattern
Negative quantity where not allowed
Uniqueness
Business keys are not duplicated unexpectedly
Duplicate order ID
Consistency
Related datasets agree
Order references unknown customer
Timeliness
Data arrives within expected freshness window
Daily file missing
Accuracy
Values match source of truth
Aggregates do not reconcile
Schema conformity
Fields and types match contract
String date replaces timestamp
Volume anomaly
Record counts are within expected bounds
Sudden 90% drop in rows
Readiness prompts:
Can you decide whether bad records should fail the pipeline or be quarantined?
Can you design a retry that does not reload already accepted records?
Can you explain how to alert on missing files or stale partitions?
Can you identify whether validation belongs at ingestion, transformation, or consumption?
Orchestration and workflow checklist
Need
Pattern to review
Readiness prompt
Scheduled daily ETL
EventBridge schedule plus Glue job or workflow
Can you handle missed or failed runs?
Multi-step dependency chain
Step Functions, Glue workflows, or MWAA concepts
Can you model success, failure, retry, and branching?
Event-driven object processing
S3 event pattern or EventBridge
Can you avoid duplicate processing?
Human-readable DAGs
MWAA / Apache Airflow concepts
Can you explain task dependencies and retries?
Conditional routing
Step Functions branching
Can you route validation failures separately?
Long-running distributed transform
Glue or EMR job orchestration
Can the orchestrator monitor completion and failure?
Workflow decision path
flowchart TD
A[New data or schedule] --> B{Single simple task?}
B -->|Yes| C[Trigger job or function directly]
B -->|No| D{Multiple dependencies or branches?}
D -->|Yes| E[Use workflow orchestration]
D -->|No| F[Use scheduled managed job]
E --> G{Failure handling needed?}
G -->|Yes| H[Add retries, alerts, quarantine, rollback or replay]
G -->|No| I[Still log status and outputs]
Security and governance checklist
IAM and permissions
Control
What to know
Ready signal
IAM role
Service assumes a role to access resources
You can identify the execution role for Glue, Lambda, or Step Functions
Trust policy
Defines who can assume a role
You can troubleshoot role assumption failures
Identity policy
Grants actions to principals
You can scope actions and resources
Resource policy
Grants access at resource level
You can reason about S3 bucket policies and cross-account access
KMS key policy
Controls use of encryption keys
You know S3 access alone may not be enough for encrypted data
Lake Formation permissions
Governs data lake access
You can separate table permissions from raw S3 access concepts
Encryption and network controls
Know where encryption at rest applies: S3, Redshift, databases, streams, logs, and intermediate outputs.
Know why encryption in transit matters for service-to-service and client-to-service traffic.
Recognize when KMS permissions are needed in addition to service permissions.
Understand why private connectivity and VPC endpoints may reduce exposure to public network paths.
Recognize that security groups and subnet routing can affect connectivity for jobs accessing data stores.
Understand audit needs using CloudTrail, service logs, and access logs where applicable.
Governance prompts
Which users can discover the dataset?
Which users can query the dataset?
Which users can access the underlying S3 objects?
Are sensitive columns masked, tokenized, excluded, or restricted?
Is access controlled consistently across Athena, Redshift, Glue, and other consumers?
Are data retention and deletion expectations reflected in lifecycle or pipeline design?
Can you prove who accessed or changed data-related resources?
Monitoring, logging, and operations checklist
Operational question
AWS area to review
Did the job run?
Glue job run history, Step Functions execution history, MWAA task status
Did it process the expected data?
Row counts, file counts, partition checks, data quality metrics
Did it fail because of permissions?
CloudWatch logs, IAM simulation concepts, CloudTrail events
Review monitoring tools and what evidence each one provides.
Final 24 hours
Skim this Exam Blueprint and mark any remaining weak areas.
Review your own missed-question notes.
Memorize no unsupported limits, dates, prices, or quotas.
Focus on service selection logic and tradeoffs.
Sleep instead of attempting to learn a new service from scratch.
Quick self-assessment table
Rate each area before you finish review.
Area
Not ready
Almost ready
Ready
Ingestion service selection
☐
☐
☐
S3 data lake layout
☐
☐
☐
File formats and partitioning
☐
☐
☐
Glue jobs and Data Catalog
☐
☐
☐
Athena and Redshift use cases
☐
☐
☐
Streaming and CDC concepts
☐
☐
☐
IAM, KMS, and Lake Formation concepts
☐
☐
☐
Orchestration and retries
☐
☐
☐
Monitoring and troubleshooting
☐
☐
☐
Cost and performance tradeoffs
☐
☐
☐
If any row is still “Not ready,” spend your next study session on scenario questions for that area. If most rows are “Almost ready,” shift from reading to timed mixed practice and post-question review.