DP-700 — Microsoft Fabric Data Engineer Associate Exam Blueprint

Last revised: June 18, 2026

Practical DP-700 exam blueprint for Microsoft Fabric Data Engineer Associate candidates: Fabric architecture, ingestion, transformation, governance, monitoring, and optimization readiness.

How to Use This Exam Blueprint

Use this independent checklist to prepare for the Microsoft Microsoft Fabric Data Engineer Associate (DP-700) exam. It is a practical study map for DP-700, not an official Microsoft skills outline and not a claim about exact exam weights.

Mark a topic as “ready” only when you can do three things:

Choose the right Fabric service or artifact for a scenario.
Explain the tradeoff: security, performance, governance, cost, maintainability, or operational impact.
Troubleshoot a realistic failure without relying only on memorized UI steps.

For final review, focus less on definitions and more on scenario judgment: lakehouse vs warehouse, notebook vs Dataflow Gen2, copy vs shortcut, full load vs incremental load, and successful run vs trustworthy data.

DP-700 readiness map

Readiness area	What to review	What “ready” looks like
Microsoft Fabric platform foundation	Workspaces, capacities, items, OneLake, tenant and workspace concepts, item relationships	You can explain how Fabric organizes data engineering work and how a data pipeline, lakehouse, warehouse, notebook, semantic model, and OneLake path relate to each other.
Lakehouse and OneLake architecture	Lakehouse tables, files, Delta Lake concepts, shortcuts, medallion layers, schemas, SQL analytics endpoint	You can design a lakehouse layout for raw, cleansed, and curated data and explain when to avoid copying data by using a shortcut or other integration pattern.
Warehouse and SQL engineering	Warehouse use cases, relational modeling, T-SQL transformations, dimensional models, views, stored logic, serving layers	You can decide when a SQL-first warehouse is better than a Spark/lakehouse-first design and can model fact and dimension tables for analytics.
Data ingestion	Pipelines, Copy activity, Dataflow Gen2, notebooks, source connections, gateways, parameters, incremental patterns	You can design a repeatable ingestion flow with authentication, schema handling, error handling, and full or incremental load logic.
Data transformation	Spark notebooks, Spark SQL, PySpark, Dataflow Gen2 transformations, warehouse SQL, Delta table writes	You can transform raw data into validated serving tables and justify the tool choice for code-first, low-code, or SQL-first work.
Orchestration and scheduling	Pipeline activities, dependencies, parameters, variables, retries, triggers, run history	You can chain ingestion, validation, transformation, and notification steps into an operational workflow.
Data quality and reliability	Idempotent loads, deduplication, validation rules, watermarks, late-arriving data, schema drift, error tables	You can make a pipeline safe to rerun and can detect when a “successful” run produced incomplete or invalid data.
Security and governance	Workspace roles, item permissions, data permissions, sensitivity labels, lineage, endorsements, gateway credentials	You can apply least privilege and explain the difference between access to a Fabric workspace, access to an item, and access to the data inside that item.
Monitoring and troubleshooting	Monitoring hub, pipeline run details, Spark logs, refresh/run failures, capacity signals, lineage, query diagnostics	You can identify whether a failure is caused by credentials, schema mismatch, source limits, Spark performance, SQL logic, or capacity pressure.
Performance optimization	Partitioning, file size management, predicate pushdown, shuffle reduction, query design, table maintenance, workload scheduling	You can improve a slow load or query using evidence rather than guessing.
Deployment and lifecycle	Git integration concepts, deployment pipelines, workspace promotion, parameterization, environment separation	You can explain how to move data engineering artifacts from development to test or production with minimal manual rework.
Analytics handoff	Semantic models, Direct Lake-style serving concepts where appropriate, Power BI consumption patterns, curated gold layer	You can prepare data so downstream analysts and reports consume governed, documented, reliable tables.

Core service and artifact selection

Scenario cue	Strong candidate choice	Why it fits	Common trap
Need a file-based analytical store with Spark transformations	Lakehouse	Supports files, tables, Delta patterns, notebooks, and flexible data engineering	Treating every lakehouse table like a fully modeled warehouse table from day one
Need SQL-first relational analytics and curated dimensional structures	Warehouse	Better fit for SQL developers, relational modeling, and serving structured analytical data	Using Spark notebooks for transformations that are simpler and clearer in SQL
Need to orchestrate multiple steps with dependencies	Data pipeline	Coordinates copy, notebook, dataflow, validation, branching, and retry logic	Hiding orchestration inside a long notebook with no clear operational visibility
Need low-code data shaping from common sources	Dataflow Gen2	Useful for visual transformation, mapping, and repeatable data preparation	Choosing it for highly custom code-heavy logic better suited to notebooks
Need Python, PySpark, custom libraries, or complex data engineering logic	Notebook or Spark job pattern	Supports code-first transformation and advanced processing	Using notebooks without parameterization, logging, or rerun safety
Need to expose curated data to reporting	Semantic model or curated serving tables	Separates engineering from consumption and supports governed analytics	Pointing reports directly at raw or unstable tables
Need to reference data without physically copying it	Shortcut or equivalent integration pattern	Reduces duplication and can simplify lake architecture	Forgetting that permissions, source availability, and governance still matter
Need near-real-time or event-oriented processing	Fabric real-time or event-oriented item, when in scope for the solution	Fits telemetry, streams, and time-sensitive ingestion patterns	Forcing a batch pipeline onto a streaming requirement without latency analysis
Need environment promotion	Deployment pipeline, Git integration, parameters	Supports repeatable movement across workspaces or stages	Hard-coding source paths, workspace names, or credentials

Can you do this? High-value DP-700 skills

Fabric architecture and platform judgment

Explain the role of OneLake in a Fabric data estate.
Distinguish a workspace, capacity-backed environment, item, lakehouse, warehouse, pipeline, notebook, and semantic model.
Choose between lakehouse, warehouse, Dataflow Gen2, pipeline, notebook, and shortcut based on a scenario.
Design a workspace layout for development, test, and production without hard-coding environment-specific values.
Explain how lineage helps troubleshoot dependencies and downstream impact.
Identify which artifact owns storage, which artifact transforms data, and which artifact serves data.
Recognize when a successful pipeline run still requires data validation before publishing results.

Lakehouse, OneLake, and Delta readiness

Organize data into raw, cleaned, and curated zones or medallion-style layers.
Explain the difference between files and managed tables in a lakehouse-style design.
Describe why Delta Lake concepts matter for reliability, schema management, and analytical reads.
Use partitioning deliberately rather than by habit.
Recognize small-file, skew, and over-partitioning symptoms.
Explain when to preserve raw source data unchanged.
Implement deduplication and upsert logic using business keys and timestamps.
Handle source schema changes without silently breaking downstream tables.
Explain how shortcuts can reduce duplication and where they add dependency risk.
Validate row counts, null rates, duplicate counts, and referential assumptions across layers.

Warehouse and SQL engineering readiness

Define the grain of a fact table before building it.
Choose star schema patterns for reporting-friendly curated data.
Distinguish business keys, surrogate keys, natural keys, and composite keys.
Handle slowly changing dimension requirements at a conceptual level.
Choose between a view and a materialized/physical table based on performance, freshness, and maintainability.
Write SQL transformations that are clear, testable, and rerunnable.
Avoid building reports directly on staging tables unless the scenario explicitly supports it.
Explain how SQL serving layers relate to lakehouse and warehouse choices.

Ingestion and orchestration readiness

Choose full load, append load, incremental load, or change-based load based on source capability and business requirements.
Design a watermark strategy for incremental ingestion.
Parameterize source path, destination path, date range, environment, and table name where appropriate.
Configure connection and credential patterns securely.
Recognize when an on-premises or private source may require a gateway or equivalent connectivity pattern.
Add retry, timeout, failure branch, and notification logic to operational pipelines.
Make loads idempotent so a retry does not duplicate data.
Capture rejected rows or invalid records for review.
Separate ingestion, transformation, validation, and publishing steps when operational clarity matters.
Use run history and activity outputs to determine where a pipeline failed.

Transformation readiness

Choose Spark/PySpark when transformations need distributed processing or custom code.
Choose Dataflow Gen2 when a visual, low-code transformation is more maintainable.
Choose SQL when the logic is relational, set-based, and close to the serving model.
Convert semi-structured data into structured tables.
Normalize date, time, currency, and identifier fields.
Enforce data types before data reaches curated tables.
Implement deduplication by key and precedence rule.
Detect late-arriving records and decide whether to restate downstream tables.
Validate transformation outputs against source totals and expected business rules.
Document assumptions that downstream report authors depend on.

Security, governance, and access control readiness

Explain least privilege for Fabric workspaces and data artifacts.
Distinguish workspace roles from item-level permissions and data-level permissions.
Identify when SQL object-level security or data access controls are needed in addition to workspace access.
Protect credentials used by pipelines, dataflows, notebooks, and gateways.
Apply sensitivity and endorsement concepts appropriately.
Use lineage to understand downstream effects before changing or deleting data assets.
Recognize governance risks from unmanaged shortcuts, copied data, and duplicated curated tables.
Explain why production data engineering work should avoid personal credentials where possible.
Review sharing decisions from both convenience and data exposure perspectives.

Monitoring, troubleshooting, and optimization readiness

Use pipeline run details to locate the failed activity and inspect error messages.
Use notebook and Spark logs to identify failed cells, package issues, executor errors, skew, or memory pressure.
Check source authentication and destination permissions before rewriting transformation logic.
Diagnose schema mismatch, missing columns, changed data types, and malformed files.
Explain why a query may be slow due to file layout, partitioning, joins, filters, or workload concurrency.
Identify small-file issues and when table maintenance or compaction-style actions may help.
Use filters and column pruning to reduce unnecessary reads.
Avoid expensive transformations in the serving path when they can be precomputed.
Schedule heavy jobs to reduce contention where the business allows.
Use monitoring signals to distinguish data failure, compute failure, and capacity pressure.

Medallion and serving-layer checklist

Layer	Purpose	Candidate checks
Bronze/raw	Preserve source data with minimal transformation	Can you reload from source? Did you capture load time, source file/table name, and ingestion batch metadata where useful?
Silver/cleansed	Standardize, type, deduplicate, validate	Did you enforce schemas, remove duplicates, handle invalid records, and apply consistent business keys?
Gold/curated	Serve analytics-ready facts, dimensions, aggregates, or domain tables	Is the grain clear? Are measures and dimensions report-friendly? Are joins predictable?
Semantic/reporting layer	Provide governed consumption for analysts and business users	Are table names, relationships, permissions, and refresh/serving choices appropriate?

Ingestion decision checks

Question	If yes, consider	If no, consider
Does the source support reliable change detection?	Incremental or change-based load with watermark/checkpoint logic	Full load, snapshot comparison, or source-side export pattern
Is the source large or slow to extract?	Incremental copy, partitioned extraction, staged loads	Simpler full load may be acceptable for small reference data
Is transformation mostly visual and repeatable?	Dataflow Gen2	Notebook or SQL if logic is complex or code-heavy
Do you need multiple dependent steps?	Pipeline orchestration	Single dataflow or notebook schedule may be enough for simple jobs
Is data already available in a compatible cloud location?	Shortcut or direct integration pattern	Copy into OneLake/lakehouse when isolation or performance requires it
Are credentials or network access complex?	Connection, gateway, managed access pattern, or service identity approach	Standard cloud connector may be sufficient
Must the process be safe to rerun?	Idempotent design, staging, merge/upsert, batch IDs	Append-only may be acceptable only for immutable event data

Transformation patterns to recognize

Incremental watermark pattern

A strong DP-700 candidate can explain the purpose of a watermark even if syntax differs by tool.

-- Conceptual incremental filter pattern
WHERE SourceModifiedDate >  @LastSuccessfulWatermark
  AND SourceModifiedDate <= @CurrentWatermark;

Readiness checks:

You know where the previous successful watermark is stored.
You know when the new watermark is committed.
You avoid advancing the watermark before validation succeeds.
You have a plan for late-arriving records.
You can rerun a failed batch without duplicating rows.

PySpark table transformation pattern

You do not need to memorize every API call, but you should recognize the intent of common Spark/Delta operations.

orders = spark.read.format("delta").table("bronze_orders")

clean_orders = (
    orders
    .dropDuplicates(["OrderId"])
    .withColumnRenamed("OrderDateText", "OrderDate")
)

clean_orders.write.format("delta").mode("overwrite").saveAsTable("silver_orders")

Readiness checks:

Can you explain what table is read and what table is written?
Can you identify whether the write pattern is append, overwrite, or merge/upsert?
Can you explain why overwrite may be risky for large or production tables?
Can you add validation before publishing the result?
Can you parameterize table names for different environments?

Merge/upsert reasoning

You should be able to explain when an upsert is safer than a blind append.

Situation	Better pattern	Why
New immutable event rows	Append	Events are not expected to change after arrival
Source sends corrections for existing records	Merge/upsert	Existing rows may need updates
Source sends full daily snapshot	Replace snapshot or compare-and-merge	Avoid duplicate active records
Dimension attributes change over time	Type 1 or Type 2 dimension approach	Business requirement determines whether to preserve history
Deletions must be reflected	Change detection with delete handling	Append-only loads will leave stale rows

Security and governance decision checks

Scenario	What to think through
A developer needs to edit a notebook but not administer the workspace	Workspace role selection, item permissions, and separation of duties
A reporting user needs to view curated data only	Semantic model permissions, warehouse/lakehouse data permissions, and avoiding raw data exposure
A pipeline connects to a production source	Credential storage, identity choice, gateway requirements, and auditability
A shortcut references data owned by another team	Source permissions, lineage, ownership, availability, and change coordination
Sensitive columns appear in raw data	Classification, access controls, masking or exclusion in curated layers, and downstream exposure
A dataset is certified or endorsed	Ownership, quality expectations, lineage, and controlled change process
Production and development share data assets	Risk of accidental modification, credential leakage, and environment contamination

Monitoring and troubleshooting checklist

Symptom	Likely areas to inspect	Candidate response
Pipeline activity fails immediately	Credentials, connection, gateway, source path, permissions	Check authentication and connectivity before changing transformation code.
Copy succeeds but destination table is empty	Filters, parameters, source query, date range, destination mapping	Inspect activity inputs/outputs and validate source row counts.
Notebook fails after running for a long time	Spark logs, data skew, memory, shuffle, package dependency, bad record	Identify the failing stage or transformation and reduce data movement.
Schema mismatch error appears	Source column changes, data type changes, destination schema enforcement	Decide whether to update schema, handle drift, or reject the batch.
Query against curated table is slow	File layout, partitions, joins, filters, table size, unnecessary columns	Use pruning, precomputation, and table maintenance where appropriate.
Refresh/report output is wrong but pipeline succeeded	Data quality checks, business logic, duplicates, joins, late records	Treat operational success and data correctness as separate checks.
Users lost access after workspace change	Workspace role, item sharing, SQL/data permissions, semantic model permissions	Trace access from workspace to item to data.
Job performance varies by time of day	Capacity pressure, concurrency, scheduled workloads, source throttling	Review monitoring signals and schedule/resource tradeoffs.

Performance optimization checks

Spark and lakehouse performance

Filter early and select only needed columns.
Avoid unnecessary shuffles and wide transformations.
Understand when joins create skew or memory pressure.
Use partitioning only when it supports common filters and does not create excessive small files.
Recognize symptoms of too many small files.
Use table maintenance or optimization features where appropriate for the Fabric item and table format.
Cache only when reused data justifies it.
Avoid collecting large datasets to the driver.
Precompute curated tables rather than repeatedly transforming raw data for every report.
Validate that performance improvements preserve correct results.

SQL and warehouse performance

Write set-based transformations instead of row-by-row logic where possible.
Push filters close to the source or staging layer.
Avoid selecting unused columns in large transformations.
Use clear join keys and validate join cardinality.
Materialize expensive repeated logic when the scenario justifies it.
Separate staging, transformation, and serving objects for maintainability.
Check whether slow performance is caused by data design, query design, or resource contention.

Lifecycle, deployment, and maintainability

Area	Checklist
Environment separation	Development, test, and production should not depend on manually edited paths or personal connections.
Parameterization	Pipelines and notebooks should accept environment-specific values instead of hard-coded constants.
Source control	Know why Git integration or versioning helps with collaboration, rollback, and review.
Deployment	Understand the purpose of deployment pipelines or promotion patterns across workspaces.
Secrets and credentials	Do not place secrets directly in notebooks or scripts. Use secure connection and credential patterns.
Documentation	Document table purpose, grain, ownership, refresh/load pattern, and known data quality rules.
Impact analysis	Use lineage and dependency review before renaming, deleting, or changing tables.
Operational ownership	Know who responds to failed runs, bad data, source changes, and access requests.

Common weak areas and traps

Trap	Why it hurts exam readiness	Better habit
Memorizing UI clicks only	DP-700 scenarios test judgment, not just navigation	Learn the purpose of each Fabric artifact and when to use it.
Confusing workspace access with data access	A user may see an item but still lack permission to query certain data, or the reverse may be governed separately	Trace access at workspace, item, and data levels.
Treating lakehouse and warehouse as interchangeable	They support different engineering and consumption patterns	Choose based on workload, skill set, modeling, and transformation needs.
Using append for every ingestion job	Corrections and updates create duplicates or stale data	Use keys, watermarks, merge/upsert logic, or snapshot handling.
Advancing a watermark before validation	Failed or partial loads can cause permanent gaps	Commit watermarks only after the batch is verified.
Ignoring schema drift	Source changes can silently break curated outputs	Add schema checks and controlled evolution.
Over-partitioning	Too many partitions can create small-file and management problems	Partition for common filters and data volume, not every column.
Copying data unnecessarily	Duplication increases storage, governance, and freshness problems	Consider shortcuts or direct integration patterns when appropriate.
Hiding orchestration inside notebooks	Operations teams lose visibility into dependencies and failures	Use pipelines for multi-step control flow.
Equating pipeline success with data quality	A job can complete while producing wrong numbers	Add row counts, duplicate checks, null checks, and business validations.
Hard-coding environment values	Deployment becomes fragile	Parameterize workspace, lakehouse, table, path, and connection values.
Skipping lineage review	Changes can break downstream reports and semantic models	Review dependencies before changing shared assets.

Scenario practice prompts

Use these prompts to test whether you can reason like a DP-700 candidate.

Scenario 1: Daily sales ingestion

A sales system exports daily files. Business users need updated reports each morning. Files may be resent with corrections.

Can you answer?

Would you treat the files as append-only, full snapshots, or correction-capable inputs?
Where would you land raw files?
How would you prevent duplicate sales records?
What validation checks would you run before publishing curated tables?
How would you alert the team if the export is missing or malformed?
Would the serving layer be a lakehouse table, warehouse table, semantic model, or combination?

Scenario 2: Slow notebook transformation

A notebook that joins large order and customer datasets has become slow and unreliable.

Can you answer?

Which logs or monitoring views would you inspect first?
Could the issue be skew, shuffle, unnecessary columns, poor partitioning, or small files?
Can filters be applied earlier?
Should a reusable intermediate table be materialized?
Would SQL be clearer for part of the transformation?
How would you prove the optimized output is still correct?

Scenario 3: Secure curated reporting

Analysts should see only curated sales metrics, not raw customer data.

Can you answer?

Which workspace and item permissions are needed?
Where should raw data live relative to curated data?
Should analysts query a warehouse, lakehouse SQL endpoint, or semantic model?
How are sensitive columns excluded, masked, or controlled?
How would lineage show the relationship between raw and curated assets?
What happens when a new analyst joins the team?

Scenario 4: Source schema change

A source system adds a nullable column and changes the type of an existing field.

Can you answer?

Which ingestion or transformation step detects the change?
Should the pipeline fail fast or tolerate the change?
How does the bronze layer preserve the source state?
What changes are needed in silver and gold tables?
Which downstream reports or semantic models are affected?
How would you prevent silent incorrect results?

Final-week DP-700 checklist

Final-review task	Done
Compare this checklist with the current Microsoft DP-700 skills outline and mark any missing official topics for review.	[ ]
Build or rehearse one end-to-end Fabric data engineering flow: ingest, transform, validate, publish, and monitor.	[ ]
Practice choosing between lakehouse, warehouse, pipeline, Dataflow Gen2, notebook, shortcut, and semantic model.	[ ]
Review workspace roles, item permissions, data permissions, credentials, and lineage.	[ ]
Rehearse incremental load, watermark, deduplication, and merge/upsert scenarios.	[ ]
Review Spark troubleshooting: logs, skew, shuffle, small files, partitioning, and failed notebook runs.	[ ]
Review SQL modeling: fact grain, dimensions, keys, views vs tables, and curated serving layers.	[ ]
Practice reading error messages from pipeline, notebook, dataflow, and query scenarios.	[ ]
Create a one-page artifact selection sheet in your own words.	[ ]
Rework missed practice questions by explaining why each wrong option is wrong.	[ ]
Do a mixed timed practice set rather than studying one topic at a time.	[ ]
Stop memorizing exact UI paths unless they reinforce an architectural concept.	[ ]

Final readiness self-check

If asked to…	You are ready when you can…
Design a Fabric data engineering solution	Select artifacts, data layers, ingestion pattern, transformation tool, security model, and monitoring approach.
Fix a failed pipeline	Isolate the failed activity, inspect credentials and parameters, read run outputs, and propose a safe retry strategy.
Improve a slow workload	Identify whether the cause is data layout, query design, Spark behavior, source bottleneck, or capacity pressure.
Secure shared data	Apply least privilege across workspace, item, and data layers without blocking legitimate analytics use.
Build reliable incremental ingestion	Use watermarks, validation, idempotency, and error handling to avoid gaps and duplicates.
Prepare curated analytics data	Model facts and dimensions, validate business rules, and expose stable tables or semantic models for reporting.

Practical next step

After you mark weak areas, do targeted hands-on review before taking more practice questions. Build a small Fabric solution that includes a lakehouse, an ingestion pipeline, a transformation step, validation checks, and a curated serving table. Then use DP-700 practice questions to test whether you can choose the right Microsoft Fabric artifact and explain the operational tradeoffs under exam-style time pressure.

Study Plan

Scenario Guide

DP-700 — Microsoft Fabric Data Engineer Associate Exam Blueprint

How to Use This Exam Blueprint

DP-700 readiness map

Core service and artifact selection

Can you do this? High-value DP-700 skills

Fabric architecture and platform judgment

Lakehouse, OneLake, and Delta readiness

Warehouse and SQL engineering readiness

Ingestion and orchestration readiness

Transformation readiness

Security, governance, and access control readiness

Monitoring, troubleshooting, and optimization readiness

Medallion and serving-layer checklist

Ingestion decision checks

Transformation patterns to recognize

Incremental watermark pattern

PySpark table transformation pattern

Merge/upsert reasoning

Security and governance decision checks

Monitoring and troubleshooting checklist

Performance optimization checks

Spark and lakehouse performance

SQL and warehouse performance

Lifecycle, deployment, and maintainability

Common weak areas and traps

Scenario practice prompts

Scenario 1: Daily sales ingestion

Scenario 2: Slow notebook transformation

Scenario 3: Secure curated reporting

Scenario 4: Source schema change

Final-week DP-700 checklist

Final readiness self-check

Practical next step

Browse Certification Practice Tests by Exam Family