Review a compact AWS Certified Data Engineer Associate (DEA-C01) cheat sheet for ingestion, transformation, storage, operations, data security, governance, monitoring, and pipeline decision-making before using IT Mastery practice.
Use this cheat sheet to keep DEA-C01 data-platform decisions separate before practice. The exam rewards choosing the right ingestion, storage, transformation, governance, and operations pattern for the stated data workload.
| Item | Review cue |
|---|---|
| Exam route | AWS Certified Data Engineer Associate |
| Exam code | DEA-C01 |
| Items | 65 total |
| Time | 130 minutes |
| Practice option | Live IT Mastery practice available |
| Best use | Practice data pipeline design, data store selection, operations, governance, and troubleshooting |
| Domain | Weight | What to know | Common trap |
|---|---|---|---|
| Data ingestion and transformation | 34% | batch vs streaming, ETL vs ELT, Glue, Kinesis, orchestration, schema handling | picking streaming when batch meets the requirement |
| Data store management | 26% | lake, warehouse, object storage, databases, partitioning, cataloging, lifecycle | using one store for every access pattern |
| Data operations and support | 22% | monitoring, retries, data quality, job failures, scaling, cost, automation | troubleshooting symptoms without identifying the failed stage |
| Data security and governance | 18% | IAM, encryption, Lake Formation, catalog controls, masking, retention, audit | securing compute while leaving data access broad |
Use the pipeline map to classify DEA-C01 scenarios before choosing a service. Most misses happen when candidates solve the wrong stage: storage when the failure is ingestion, governance when the failure is cataloging, or streaming when batch is enough.
flowchart LR
Ingest["Ingest events or files"] --> Transform["Transform and validate"]
Transform --> Store["Store with partitioning"]
Store --> Govern["Catalog and govern access"]
Govern --> Consume["Query, BI, or ML use"]
| Distinction | Exam reflex |
|---|---|
| Batch vs streaming | Choose streaming for low-latency continuous events; choose batch for scheduled bulk processing. |
| ETL vs ELT | ETL transforms before loading. ELT loads first, then transforms in the target platform. |
| Data lake vs warehouse | Lakes support raw and varied data. Warehouses support structured analytics. |
| Partitioning vs indexing | Partitioning improves scan pruning and storage layout. Indexing improves lookup patterns in databases. |
| Glue Data Catalog vs data store | The catalog describes data. The store holds data. |
| IAM vs Lake Formation | IAM controls AWS access broadly. Lake Formation can govern lake permissions more specifically. |
DEA-C01 snippets usually test pipeline fit, partitioning, schema handling, or governance boundaries rather than syntax memorization.
-- Scan-cost trap: filtering on a non-partitioned timestamp can still read too much data.
SELECT count(*)
FROM events
WHERE event_time >= TIMESTAMP '2026-05-01 00:00:00';
-- Better pattern when the table is partitioned by event_date.
SELECT count(*)
FROM events
WHERE event_date = DATE '2026-05-01';
For every missed DEA-C01 item, mark the pipeline stage: ingestion, transformation, storage, operations, or governance. If one stage dominates your misses, drill that topic before returning to mixed data-engineering sets.