DEA-C01 — AWS Certified Data Engineer – Associate Study Plan
A practical 7-, 14-, 30-, and 60/90-day preparation schedule for the AWS Certified Data Engineer – Associate (DEA-C01) exam.
Study plan orientation
This Study Plan is for candidates preparing for the AWS Certified Data Engineer – Associate (DEA-C01) exam from AWS. It is designed for practical scheduling: diagnostic practice, focused service review, hands-on concept checks, missed-question review, timed mocks, and final-week consolidation.
Use it whether you have one week left or are starting earlier. The shorter plans prioritize weak-area repair and exam execution. The longer plans give you time to build stronger AWS data engineering judgment across ingestion, storage, transformation, orchestration, security, governance, monitoring, and troubleshooting.
Which plan should you use?
| Time available | Best fit | Daily time target | Main goal | Mock exam approach |
|---|---|---|---|---|
| 7 days | You already studied and need final review | 1.5-3 hours | Repair weak areas, review mistakes, sharpen timing | 1 full timed mock if you can review it fully |
| 14 days | You know AWS basics but need focused DEA-C01 coverage | 1.5-2.5 hours | Cover core data services and build exam rhythm | 1 diagnostic set + 1-2 timed mocks |
| 30 days | Balanced preparation | 60-120 minutes | Learn, practice, review, and simulate | Diagnostic early, mocks in weeks 3 and 4 |
| 60 days | Full preparation with steady practice | 5-8 hours/week | Build service-selection depth and hands-on confidence | Sectional practice first, timed mocks late |
| 90 days | Newer to AWS data engineering or limited weekly time | 3-5 hours/week | Build foundations without cramming | More hands-on labs before full mocks |
If your time is very limited, do not try to “read everything.” Start with a diagnostic, identify the highest-value gaps, and spend most of your time reviewing missed questions and practicing realistic AWS data engineering scenarios.
Core DEA-C01 study map
Use the official AWS exam guide for the current objective list, then organize your study into these working areas.
| Study area | AWS topics to review | What you should be able to do |
|---|---|---|
| Data ingestion and movement | Amazon S3 ingestion patterns, AWS DMS, Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon MSK, event-driven ingestion | Choose batch, streaming, replication, or event-based ingestion based on latency, volume, source type, and operational needs |
| Data storage and lake foundations | Amazon S3, AWS Glue Data Catalog, crawlers, partitioning, file formats, schema evolution, Amazon Redshift, Amazon DynamoDB, relational sources | Design a storage layout, catalog data, choose formats, and reason about query performance and cost |
| Transformation and processing | AWS Glue jobs, Spark concepts, AWS Glue Studio, Amazon EMR, AWS Lambda for lightweight transforms, SQL-based transforms | Pick the right processing service, troubleshoot failed jobs, and understand ETL vs ELT decisions |
| Orchestration and automation | AWS Step Functions, Amazon EventBridge, AWS Glue workflows and triggers, Amazon MWAA concepts | Coordinate multi-step pipelines, retries, dependencies, and scheduled or event-driven workflows |
| Analytics and query access | Amazon Athena, Amazon Redshift, Redshift Spectrum concepts, data warehouse vs data lake query patterns | Choose query engines and storage patterns for reporting, ad hoc analysis, and large-scale analytics |
| Security and governance | IAM roles and policies, AWS KMS, encryption, Lake Formation, S3 bucket policies, VPC endpoints, CloudTrail | Apply least privilege, protect data, manage access, and identify audit or governance controls |
| Monitoring and troubleshooting | Amazon CloudWatch logs and metrics, AWS CloudTrail, job run history, pipeline failures, schema and partition issues | Diagnose failures, identify bottlenecks, and select appropriate observability tools |
| Performance, reliability, and cost | Partitioning, compression, columnar formats, retries, idempotency, lifecycle management, workload sizing concepts | Improve pipeline efficiency without overbuilding or choosing unnecessarily complex services |
Start with a diagnostic
Before following any plan, complete a diagnostic session.
| Step | Action | Output |
|---|---|---|
| 1 | Take a mixed DEA-C01 practice set under light timing | Baseline weak areas |
| 2 | Mark every missed, guessed, or slow question | Missed-question log |
| 3 | Group misses by topic | Top 3-5 repair areas |
| 4 | Review the related AWS service behavior | Short notes in your own words |
| 5 | Retest only those weak areas | Evidence of improvement |
Do not treat the diagnostic as a pass/fail judgment. Its purpose is to decide where your limited study time should go.
Daily practice rhythm
Use the rhythm below for most study days. Adjust the minutes, but keep the order: recall, learn, practice, review.
| Available time | Recommended session |
|---|---|
| 45 minutes | 5 min recall, 20 min focused review, 15 min practice questions, 5 min missed-question notes |
| 60 minutes | 10 min recall, 25 min topic review, 20 min practice, 5 min summary |
| 90 minutes | 10 min recall, 35 min service review or hands-on concept check, 30 min practice, 15 min missed-question review |
| 2+ hours | 15 min recall, 45 min focused study, 45 min scenario practice, 30 min review, 15 min flashcards or notes |
Daily checklist
- Review yesterday’s missed questions before adding new content.
- Study one focused topic at a time, such as Glue job troubleshooting or Kinesis ingestion choices.
- Practice with scenario questions, not only definition questions.
- Write down the service-selection rule you learned.
- End with a short note: “If I see this scenario, I will choose X because Y.”
Missed-question review method
Most DEA-C01 improvement comes from reviewing why an answer was wrong, not from taking more questions.
| Log field | What to record |
|---|---|
| Topic | Example: Glue crawler, Lake Formation access, Firehose delivery, Athena partitioning |
| Mistake type | Knowledge gap, misread scenario, wrong service, missed security requirement, performance/cost tradeoff |
| Why the correct answer works | One or two plain-language sentences |
| Why your answer failed | The constraint you ignored |
| Rule to remember | A short decision rule |
| Retest date | 24-72 hours later |
Common mistake categories
| Mistake category | Example repair action |
|---|---|
| Wrong ingestion service | Build a comparison table for Kinesis Data Streams, Data Firehose, MSK, DMS, and S3 batch ingestion |
| Weak IAM/security reasoning | Review IAM roles, KMS keys, S3 policies, Lake Formation permissions, and audit trails together |
| Confused catalog behavior | Practice how Glue crawlers, the Data Catalog, schemas, and partitions relate |
| Weak ETL troubleshooting | Review job logs, permissions, source connectivity, schema changes, and data format issues |
| Overlooking cost/performance | Revisit partitioning, columnar formats, compression, lifecycle policies, and query engine choice |
| Memorizing instead of reasoning | Rewrite the question as a real pipeline design decision |
Hands-on concept review
If you have access to a safe AWS practice environment, use small, controlled exercises. Clean up resources when finished and avoid deploying anything unnecessary.
| Hands-on theme | Practice task | What to learn |
|---|---|---|
| S3 data lake basics | Place sample files in S3 using different prefixes and formats | How layout affects cataloging and query patterns |
| Glue Data Catalog | Create or inspect tables, crawler behavior, schemas, and partitions | How metadata supports Athena, Glue, and other analytics tools |
| Athena query practice | Query sample data and compare layout or format choices | How partitioning and file format affect query behavior |
| Glue ETL concepts | Review job parameters, source/target settings, logs, and retries | How to diagnose transformation and permission failures |
| Streaming ingestion | Compare stream-based and delivery-stream patterns conceptually | When to use near-real-time ingestion vs direct delivery |
| Orchestration | Map a pipeline with triggers, dependencies, retries, and notifications | How to coordinate multi-step data workflows |
| Security controls | Review IAM role assumptions, KMS use, S3 policies, and Lake Formation concepts | How least privilege and data governance apply to pipelines |
| Monitoring | Inspect where logs, metrics, and audit records would appear | How to troubleshoot a failing pipeline |
If you cannot use hands-on labs, replace each lab with a diagram exercise: draw the pipeline, list the AWS services, list permissions, identify failure points, and explain how you would monitor it.
7-day final review plan
Use this plan when the exam is close and you already have some preparation. Do not try to learn every AWS data service from scratch in one week.
| Day | Focus | Study actions |
|---|---|---|
| Day 1 | Diagnostic and triage | Take a mixed practice set. Build a missed-question log. Pick your top 4 weak areas. |
| Day 2 | Ingestion and storage | Review S3, DMS, Kinesis options, Data Firehose, source-to-lake patterns, file formats, partitions, and schema handling. |
| Day 3 | Processing and orchestration | Review Glue jobs, Spark concepts, EMR use cases, Lambda limits as a pattern, Step Functions, EventBridge, Glue workflows, and pipeline dependencies. |
| Day 4 | Security and governance | Review IAM, KMS, encryption, S3 policies, Lake Formation, CloudTrail, VPC access patterns, and least-privilege pipeline roles. |
| Day 5 | Troubleshooting and performance | Drill Glue job failures, crawler/catalog issues, Athena query issues, permission failures, monitoring signals, and cost/performance tradeoffs. |
| Day 6 | Timed mock and deep review | Take one timed mock or the closest equivalent. Spend at least as much time reviewing as testing. |
| Day 7 | Light final review | Review notes, missed questions, service-selection tables, and exam logistics. Avoid heavy new content. |
7-day rule
Stop adding unfamiliar services after Day 5 unless they directly explain a repeated missed question. The final 48 hours should be for consolidation, not expansion.
14-day focused plan
Use this plan if you have two weeks and can study most days.
| Day | Focus | Output |
|---|---|---|
| 1 | Diagnostic set and exam guide review | Topic ranking and schedule adjustments |
| 2 | S3, file formats, partitioning, lifecycle concepts | Storage decision notes |
| 3 | Glue Data Catalog, crawlers, schemas, Athena basics | Catalog and query notes |
| 4 | Batch ingestion: S3, DMS, scheduled loads | Batch ingestion comparison |
| 5 | Streaming ingestion: Kinesis Data Streams, Data Firehose, MSK concepts | Streaming decision table |
| 6 | Glue ETL, Spark concepts, job configuration, retries | ETL troubleshooting notes |
| 7 | Orchestration: Step Functions, EventBridge, Glue workflows, MWAA concepts | Pipeline dependency diagram |
| 8 | Timed sectional practice | Missed-question log update |
| 9 | Security: IAM, KMS, S3 policies, Lake Formation | Security access-control map |
| 10 | Redshift, Athena, EMR, analytics service selection | Query and processing comparison |
| 11 | Monitoring and troubleshooting | Failure-mode checklist |
| 12 | Weak-area sprint | Retest of top weak topics |
| 13 | Full timed mock or near-full simulation | Timing and readiness evidence |
| 14 | Final review | Light notes, no major new topics |
14-day priorities
Spend the most time on the areas that affect many question types:
- Service selection for ingestion, processing, storage, and analytics.
- IAM, KMS, S3, and Lake Formation access patterns.
- Glue, Data Catalog, crawlers, partitions, and ETL troubleshooting.
- Monitoring, logs, retries, and pipeline reliability.
- Performance and cost tradeoffs in data lake and analytics designs.
30-day balanced plan
The 30-day path is best if you want enough time to learn, practice, review, and simulate without stretching preparation too long.
| Days | Focus | Primary tasks | Practice target |
|---|---|---|---|
| 1-2 | Diagnostic and planning | Take diagnostic practice, review the official exam guide, rank weak areas | Mixed baseline set |
| 3-5 | Data lake foundations | S3 layout, prefixes, file formats, compression, partitions, Glue Data Catalog | Storage and catalog questions |
| 6-8 | Batch ingestion | DMS, S3 ingestion, scheduled loads, source/target decisions, error handling | Batch ingestion scenarios |
| 9-11 | Streaming ingestion | Kinesis Data Streams, Data Firehose, MSK concepts, streaming-to-lake patterns | Streaming service selection |
| 12-15 | Processing and transformation | Glue jobs, Spark concepts, EMR, Lambda use cases, ETL vs ELT | ETL and processing drills |
| 16-17 | Orchestration | Step Functions, EventBridge, Glue workflows, scheduling, dependencies, retries | Pipeline workflow questions |
| 18-20 | Analytics and warehouse patterns | Athena, Redshift, Redshift Spectrum concepts, query access, data modeling considerations | Query engine selection |
| 21 | Timed mock 1 | Simulate exam conditions as closely as your practice tool allows | Full review afterward |
| 22-23 | Mock review and repair | Re-study every missed or guessed question | Retest weak areas |
| 24-25 | Security and governance | IAM, KMS, encryption, Lake Formation, S3 policies, auditability | Security scenario drills |
| 26-27 | Monitoring, troubleshooting, cost/performance | CloudWatch, CloudTrail, job logs, crawler issues, query performance, lifecycle choices | Troubleshooting drills |
| 28 | Timed mock 2 | Take a second full or near-full simulation | Timing and weak-area evidence |
| 29 | Final weak-area sprint | Review only recurring misses and service-selection rules | Short targeted sets |
| 30 | Light final review | Notes, flashcards, logistics, rest | No heavy new content |
Weekly rhythm for the 30-day plan
| Day type | What to do |
|---|---|
| New topic day | Learn the service patterns, then answer targeted questions |
| Review day | Revisit missed questions and draw architecture flows |
| Mock day | Test under timing, then review deeply |
| Repair day | Re-study only the topics that caused misses |
| Final day | Consolidate notes and protect energy |
60/90-day full preparation path
Use the 60-day path if you can study consistently several hours per week. Use the 90-day path if you are newer to AWS data engineering, have limited weekly time, or want more hands-on reinforcement.
| Phase | 60-day timing | 90-day timing | Focus | Deliverable |
|---|---|---|---|---|
| 1 | Week 1 | Weeks 1-2 | Diagnostic, AWS data pipeline foundations, exam guide review | Baseline scorecard and study map |
| 2 | Week 2 | Weeks 3-4 | S3, Glue Data Catalog, crawlers, schemas, partitions, Athena | Data lake notes and catalog diagram |
| 3 | Week 3 | Weeks 5-6 | Batch and streaming ingestion: DMS, Kinesis, Data Firehose, MSK, S3 patterns | Ingestion service-selection table |
| 4 | Week 4 | Weeks 7-8 | Glue ETL, Spark concepts, EMR, Lambda transforms, ELT patterns | Processing comparison notes |
| 5 | Week 5 | Week 9 | Orchestration and automation: Step Functions, EventBridge, Glue workflows, MWAA concepts | Pipeline workflow diagram |
| 6 | Week 6 | Week 10 | Security and governance: IAM, KMS, S3 policies, Lake Formation, auditability | Access-control checklist |
| 7 | Week 7 | Week 11 | Monitoring, troubleshooting, performance, reliability, and cost | Failure-mode playbook |
| 8 | Week 8 | Week 12 | Timed mocks, weak-area sprint, final review | Exam-readiness decision |
60/90-day weekly structure
| Weekly activity | Recommended amount |
|---|---|
| Focused reading or video review | 2 sessions |
| Hands-on or diagram-based concept practice | 1 session |
| Targeted practice questions | 2 sessions |
| Missed-question review | 2-3 short sessions |
| Architecture/service-selection drill | 1 session |
| Timed mock | Late phase only |
Long-path checkpoint schedule
| Checkpoint | When | What to decide |
|---|---|---|
| Baseline checkpoint | End of Phase 1 | Which topics are unfamiliar? |
| First repair checkpoint | End of Phase 3 | Can you choose ingestion services correctly? |
| Processing checkpoint | End of Phase 4 | Can you explain Glue, EMR, Lambda, and SQL transform tradeoffs? |
| Security checkpoint | End of Phase 6 | Can you reason through IAM, KMS, S3, and Lake Formation access? |
| Readiness checkpoint | Final 1-2 weeks | Are mistakes isolated and reviewable, or broad and repeated? |
Timed mock exam strategy
Timed mocks are useful only if you review them thoroughly. A mock without review is mostly a stamina exercise.
| Plan length | When to use timed mocks | How to review |
|---|---|---|
| 7 days | Once, around Day 6, if you have time to review | Review every missed, guessed, and slow question |
| 14 days | Around Days 8 and 13 | Use the first to repair, the second to confirm readiness |
| 30 days | Around Days 21 and 28 | Compare mistake patterns across both mocks |
| 60/90 days | Mostly in the final quarter of the plan | Use earlier practice as sectional drills, not full simulations |
Mock review rules
- Recreate the reasoning path for every miss.
- Mark questions you answered correctly but guessed.
- Identify whether the issue was AWS knowledge, scenario reading, or service selection.
- Review related services together. For example, do not review Athena without also reviewing S3 layout, partitions, Glue Data Catalog, and permissions.
- Avoid taking multiple full mocks back-to-back if you cannot review them the same day or next day.
Service-selection drills for DEA-C01
Many AWS data engineering questions test the ability to choose the right service or design pattern under constraints. Practice comparisons directly.
| Decision area | Compare | Practice question to answer |
|---|---|---|
| Batch vs streaming | S3 batch loads, DMS, Kinesis, Data Firehose, MSK | How fresh does the data need to be, and who manages the streaming complexity? |
| Storage choice | S3, Redshift, DynamoDB, relational databases | Is this a data lake, warehouse, operational store, or source system? |
| Processing choice | Glue, EMR, Lambda, SQL in Athena or Redshift | Is the workload serverless ETL, large Spark processing, lightweight event handling, or SQL transformation? |
| Catalog and query | Glue Data Catalog, crawlers, Athena, Redshift Spectrum concepts | How will data be discovered, partitioned, and queried? |
| Orchestration choice | Step Functions, EventBridge, Glue workflows, MWAA concepts | Is the pipeline event-driven, scheduled, dependency-heavy, or workflow-managed? |
| Access control | IAM, S3 policies, KMS, Lake Formation | Which layer controls identity, encryption, object access, and governed table access? |
| Observability | CloudWatch, CloudTrail, job logs, service metrics | Are you debugging performance, failures, permissions, or audit history? |
Final-week rules
Use these rules regardless of which plan you followed.
Stop adding new material
Stop adding broad new material about 2-3 days before the exam. Continue reviewing only:
- Repeated missed-question topics.
- Service comparisons you still confuse.
- Security and permission patterns.
- Troubleshooting workflows.
- Your own summary notes.
Protect review quality
| Do | Avoid |
|---|---|
| Review missed questions in detail | Skimming answer keys |
| Redraw common data pipelines | Memorizing isolated service names |
| Practice timing on mixed sets | Spending all day on one obscure topic |
| Sleep and keep a normal routine | Taking a full mock late the night before |
| Confirm exam logistics | Changing your entire strategy at the end |
Exam-readiness checks
You are closer to ready when you can do the following without heavy notes.
| Readiness signal | What it looks like |
|---|---|
| Explain an end-to-end AWS data pipeline | Source, ingestion, storage, catalog, transform, query, monitoring, and security are all included |
| Choose ingestion patterns | You can distinguish batch, streaming, replication, and event-driven designs |
| Reason about Glue and the Data Catalog | You understand crawlers, schemas, partitions, jobs, and common failure points |
| Apply security controls | You can reason through IAM, KMS, S3 policies, Lake Formation, and audit needs |
| Troubleshoot scenarios | You know where to look for logs, permissions, schema issues, and data layout problems |
| Manage timing | You can finish timed practice without rushing the final questions |
| Review effectively | Your missed-question log is shrinking and mistakes are less repetitive |
If your misses are still broad across ingestion, storage, processing, security, and troubleshooting, use more targeted review before relying on another full mock.
Practical next step
Start with a diagnostic DEA-C01 practice set, create a missed-question log, and choose the plan that matches your remaining time. Then follow the daily rhythm: review yesterday’s misses, study one AWS data engineering topic, practice scenario questions, and write down the service-selection rule you learned.