DP-700 — Microsoft Fabric Data Engineer Associate Quick Reference

Last revised: June 18, 2026

Compact DP-700 reference for Microsoft Fabric data engineering decisions, services, security, performance, and troubleshooting.

This Quick Reference is independent exam-prep support for candidates preparing for Microsoft Fabric Data Engineer Associate (DP-700). Use it to review service choices, implementation patterns, security boundaries, and common traps before practicing full scenarios.

DP-700 scope snapshot

DP-700 expects practical Microsoft Fabric data engineering judgment: choosing the right Fabric item, building ingestion and transformation flows, managing data in OneLake, securing analytics assets, and monitoring or optimizing workloads.

Area	What to be ready to do	High-yield exam angle
Analytics solution implementation	Workspaces, lakehouses, warehouses, semantic models, OneLake, domains, deployment patterns	Know which Fabric item owns data, compute, security, and serving behavior
Data ingestion and transformation	Pipelines, Dataflows Gen2, notebooks, Spark jobs, SQL transformations, incremental loads	Choose orchestration vs transformation tools correctly
Data management	Delta tables, shortcuts, medallion architecture, schemas, files, tables, refresh patterns	Know when data is copied, virtualized, mirrored, or transformed
Security and governance	Microsoft Entra ID, workspace roles, item permissions, SQL permissions, RLS, sensitivity labels, lineage	Distinguish coarse access, data-level access, and report-level restrictions
Monitoring and optimization	Pipeline runs, Spark jobs, SQL queries, Capacity Metrics, refresh failures, small files, partitioning	Diagnose symptoms before scaling capacity

Microsoft Fabric mental model

Core hierarchy and terms

Concept	Exam-ready meaning	Common trap
Tenant	Organization-level Microsoft Fabric environment	Tenant settings can enable, disable, or constrain features independently of workspace permissions
Capacity	Compute resource backing Fabric workloads	Slow workloads may be code, data layout, concurrency, or capacity pressure; do not assume one cause
Workspace	Collaboration and security boundary for Fabric items	Workspace roles are broad; use item/data permissions for finer control
Item	Fabric artifact such as lakehouse, warehouse, pipeline, notebook, dataflow, semantic model	Deploying an item usually does not deploy the underlying data
OneLake	Unified SaaS data lake for Fabric	OneLake is storage; Fabric items provide experiences and compute over it
Lakehouse	Delta-based data lake item for files, tables, Spark, and SQL analytics endpoint	SQL analytics endpoint is primarily for querying lakehouse tables, not full SQL-first data warehousing
Warehouse	SQL-first relational warehouse item in Fabric	Choose for T-SQL engineering and relational serving, not arbitrary raw file processing
Data pipeline	Orchestration item for movement and control flow	Pipelines coordinate work; they are not the best place for complex row-by-row transformations
Dataflow Gen2	Low-code Power Query-based ingestion and transformation	Good for connector-rich shaping; less ideal for complex code-first engineering
Notebook	Code-first Spark development item	Interactive state can hide missing setup; scheduled runs must be self-contained
Spark job definition	Production-style Spark job execution	Use when repeatable Spark execution matters more than notebook interactivity
Shortcut	OneLake reference to supported external or internal data	Shortcuts virtualize data; they do not automatically transform, cleanse, or copy it
Mirroring	Replication from supported operational sources into Fabric	Different from shortcuts: mirrored data is replicated for analytics scenarios
Semantic model	BI model used by Power BI/Fabric reports	RLS in a semantic model does not automatically secure raw lakehouse or warehouse access

Fabric item selection matrix

Requirement	Prefer	Why	Avoid / watch for
Store raw files and Delta tables, transform with Spark	Lakehouse	File + table experience, notebooks, Delta, medallion-friendly	Do not expect full SQL warehouse behavior through the lakehouse SQL endpoint
SQL-first dimensional warehouse	Warehouse	T-SQL DDL/DML, relational schemas, SQL serving	Do not use it as a generic file landing zone
Low-code ingestion and transformations	Dataflow Gen2	Power Query, many connectors, data destinations	Complex orchestration belongs in pipelines
Copy data, run activities in order, branch, loop, schedule	Data pipeline	Control flow, copy activity, parameters, monitoring	Avoid embedding complex business logic directly in pipeline expressions
Code-first large-scale transformation	Notebook or Spark job definition	PySpark/SQL/Scala-style transformations and custom libraries	Do not rely on interactive notebook state in production
Near-real-time replication from supported databases	Mirroring	Reduces custom ingestion for supported sources	Not a substitute for arbitrary transformations or unsupported sources
Query operational telemetry or event streams	Eventstream/Eventhouse, when scenario requires Real-Time Intelligence	KQL/event-first analytics	Not the default choice for batch lakehouse/warehouse engineering
Avoid copying data from supported external storage	OneLake shortcut	Virtualized access from OneLake namespace	Source permissions, latency, and write support still matter
Serve Power BI with minimal import refresh over Fabric Delta tables	Direct Lake semantic model	Reads data from OneLake in supported scenarios	Model-level security is not raw data security

Lakehouse, warehouse, and SQL endpoint distinctions

Feature / decision	Lakehouse	Lakehouse SQL analytics endpoint	Warehouse
Primary persona	Data engineer / Spark engineer	SQL consumer over lakehouse tables	SQL data engineer / analyst
Storage	Files and Delta tables in OneLake	Queries Delta tables exposed by lakehouse	Relational warehouse data in OneLake-backed storage
Write path	Spark, Dataflows, pipelines, lakehouse UI	Generally read-oriented for lakehouse data	T-SQL DDL/DML and ELT
Best for	Bronze/silver/gold Delta, raw files, Spark transformations	BI/query access to curated lakehouse tables	Dimensional models, SQL transformations, SQL serving
Table organization	Tables area plus Files area	Exposes registered lakehouse tables	Schemas, tables, views, procedures as supported
Common exam cue	“Need raw files, notebooks, Delta, medallion”	“Need SQL access to lakehouse tables”	“Need T-SQL warehouse and relational modeling”
Common trap	Files are not automatically queryable as tables	Do not use it as the write engine for lakehouse tables	Do not treat it like Spark for semi-structured raw files

Files vs tables in a lakehouse

Location	Use for	Exam note
Files	Raw or semi-structured files, landing zones, archives, unregistered data	Good for bronze landing, but not automatically a managed query table
Tables	Delta tables registered for Spark and SQL analytics	Use for curated data that downstream SQL/BI tools should query
Shortcuts	Referenced data from another OneLake location or supported external storage	Useful for data sharing and avoiding copies; still plan security and performance

Data architecture patterns

Medallion architecture reference

    flowchart LR
	    A[Sources] --> B[Bronze<br/>Raw landing]
	    B --> C[Silver<br/>Cleaned and conformed]
	    C --> D[Gold<br/>Business-ready model]
	    D --> E[Warehouse / Semantic model / Reports]

Layer	Purpose	Typical Fabric implementation	Quality expectations
Bronze	Preserve source data with minimal changes	Lakehouse Files or Delta tables; pipeline copy; shortcuts; mirroring output	Traceability, ingestion metadata, no heavy business logic
Silver	Clean, standardize, deduplicate, conform	Spark notebooks/jobs, Dataflows Gen2, Delta MERGE	Data types, keys, deduplication, valid records
Gold	Business-ready facts/dimensions or aggregates	Lakehouse Delta tables or Warehouse tables	Star schema, semantic names, performance-ready
Serving	SQL/BI/ML consumption	Warehouse, SQL analytics endpoint, semantic model, Direct Lake	Security, relationships, measures, query performance

Pattern selection

Scenario	Recommended pattern
Multiple raw source systems with different formats	Land to bronze first, then standardize in silver
Re-runnable daily loads	Use pipeline parameters, watermarks, idempotent writes, and MERGE
BI model over curated Fabric tables	Gold Delta or warehouse tables plus semantic model
SQL team owns transformations	Warehouse with T-SQL ELT
Spark team owns transformations	Lakehouse with notebooks or Spark job definitions
Data must remain in external supported storage	Shortcut if virtualization is acceptable; copy if isolation/performance/history is needed

Ingestion and orchestration quick reference

Ingestion method decision table

Need	Choose	Why	Watch for
Scheduled copy from source to OneLake/warehouse	Data pipeline Copy activity	Operational control, monitoring, retries, parameters	Schema drift, credentials, gateway, incremental logic
Low-code source shaping	Dataflow Gen2	Power Query transformations and destinations	Refresh duration, folding behavior, complex logic
Complex file parsing or custom transformation	Notebook/Spark job	Code-level control and distributed processing	Dependency management and reproducibility
Avoid moving supported external data	Shortcut	Reduces duplication	Source availability, security, performance, unsupported write patterns
Replicate supported operational DB data	Mirroring	Simplifies near-real-time analytics ingestion	Confirm source support and downstream modeling approach
On-premises or private network source	Pipeline/dataflow with gateway or supported private connectivity	Secure access to non-public data	Credential scope and gateway health
Event telemetry	Eventstream/Eventhouse if real-time scenario	Stream-first processing	Do not force event tooling for simple batch ingestion

Pipeline design checklist

Design concern	DP-700-ready approach
Parameters	Parameterize source path, target path, dates, environment, and load mode
Re-runs	Make activities idempotent; avoid duplicate inserts
Incremental loads	Use watermark columns, change tracking/CDC where available, or source-specific modified timestamps
Dependencies	Use activities for sequence, conditions, loops, and failure paths
Secrets	Store credentials in Fabric connections or approved secret mechanisms; do not hard-code
Observability	Capture run IDs, row counts, source extract time, and failure messages
Recovery	Use retry policies where appropriate, but fix non-transient data issues explicitly
Environment movement	Use deployment rules, parameters, or separate connections for dev/test/prod

Example pipeline expression pattern:

@concat('raw/orders/load_date=', formatDateTime(pipeline().parameters.LoadDate, 'yyyy-MM-dd'))

Incremental load reference

Technique	Use when	Implementation idea	Trap
Watermark	Source has reliable modified timestamp or increasing key	Store last successful watermark; extract rows greater than it	Late-arriving updates can be missed if watermark is advanced too early
Full reload	Small dimension or unstable source	Replace target or rebuild curated table	Expensive and risky for large facts
Append-only	Source only inserts immutable events	Append new records and partition by ingestion/event date	Duplicates require deduplication keys
Upsert/MERGE	Records can change	Match on business key or hash; update/insert target	Missing deletes unless source provides delete indicators
Snapshot comparison	Need detect changes without CDC	Compare current snapshot to previous snapshot	More compute and storage

Transformation reference

Transformation tool selection

Requirement	Best fit	Reason
Complex Python/PySpark logic	Notebook or Spark job definition	Full code control and scalable processing
SQL ELT and dimensional modeling	Warehouse	T-SQL-first development
Low-code shaping and connector transforms	Dataflow Gen2	Power Query experience
Orchestrate several transformations	Data pipeline	Sequence notebooks, dataflows, stored procedures, copy steps
Reusable production Spark execution	Spark job definition	Repeatable, less interactive than notebooks
Ad hoc exploration	Notebook	Interactive development, visualization, quick testing

PySpark Delta patterns

Write a bronze table from raw files:

df = (
    spark.read
    .option("header", "true")
    .csv("Files/raw/orders/")
)

(
    df.write
    .format("delta")
    .mode("append")
    .saveAsTable("bronze_orders")
)

Deduplicate and write a silver table:

from pyspark.sql.functions import col, row_number
from pyspark.sql.window import Window

w = Window.partitionBy("OrderId").orderBy(col("ModifiedDate").desc())

silver = (
    spark.table("bronze_orders")
    .withColumn("rn", row_number().over(w))
    .filter(col("rn") == 1)
    .drop("rn")
)

(
    silver.write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable("silver_orders")
)

Upsert with Delta MERGE:

from delta.tables import DeltaTable

updates = spark.table("staging_customer_updates")
target = DeltaTable.forName(spark, "silver_customer")

(
    target.alias("t")
    .merge(updates.alias("s"), "t.CustomerId = s.CustomerId")
    .whenMatchedUpdateAll()
    .whenNotMatchedInsertAll()
    .execute()
)

T-SQL warehouse patterns

Use T-SQL when the scenario is SQL-first and the target is a Fabric Warehouse.

CREATE TABLE dbo.DimCustomer
(
    CustomerKey INT NOT NULL,
    CustomerId VARCHAR(50) NOT NULL,
    CustomerName VARCHAR(200) NULL,
    IsCurrent BIT NOT NULL
);

MERGE dbo.DimCustomer AS target
USING dbo.StageCustomer AS source
    ON target.CustomerId = source.CustomerId
WHEN MATCHED THEN
    UPDATE SET
        CustomerName = source.CustomerName,
        IsCurrent = 1
WHEN NOT MATCHED THEN
    INSERT (CustomerKey, CustomerId, CustomerName, IsCurrent)
    VALUES (source.CustomerKey, source.CustomerId, source.CustomerName, 1);

Exam trap: use Warehouse T-SQL for warehouse tables. Use Spark or supported lakehouse operations for lakehouse Delta table writes; do not assume every T-SQL DML pattern applies to the lakehouse SQL analytics endpoint.

Modeling and serving data

Dimensional modeling quick reference

Object	Purpose	Fabric implementation	Exam tip
Fact table	Numeric events or transactions	Gold lakehouse table or warehouse table	Keep grain explicit
Dimension table	Descriptive context	Gold lakehouse table or warehouse table	Use stable keys and business-friendly attributes
Degenerate dimension	Identifier stored in fact, such as order number	Fact column	Avoid unnecessary dimension table if no attributes
Slowly changing dimension Type 1	Overwrite old attributes	MERGE update	Use for corrections where history is not needed
Slowly changing dimension Type 2	Preserve history	Add effective dates/current flag/surrogate keys	Requires careful joins and current-row filters
Aggregate table	Precomputed summary	Gold table or warehouse table	Use for performance when detail is too large for repeated queries
Semantic model measure	Business calculation	Power BI/Fabric semantic model	Prefer measures for reusable business logic

Serving option decision table

Consumer need	Choose	Why
BI over Fabric Delta tables with minimal refresh movement	Direct Lake semantic model	Uses OneLake-backed tables in supported scenarios
SQL analysts querying curated lakehouse data	Lakehouse SQL analytics endpoint	Familiar SQL query surface over lakehouse tables
SQL analysts building warehouse-style reports	Warehouse	Full SQL-first serving pattern
Reports need curated relationships, measures, RLS	Semantic model	Central BI model and security layer
Data scientists need feature data	Lakehouse tables/files	Spark-friendly access
External tools need SQL endpoint	Warehouse or lakehouse SQL analytics endpoint	Choose based on write/modeling needs

Direct Lake, Import, and DirectQuery

Mode	Choose when	Watch for
Direct Lake	Data is in supported Fabric/OneLake tables and you want low-latency BI without import refresh copies	Model design, permissions, and fallback behavior matter
Import	Need cached model performance, transformations, or sources not suitable for Direct Lake	Requires refresh planning
DirectQuery	Need live query passthrough to supported source	Source performance and query folding are critical

Security, permissions, and governance

Security layers

Layer	Controls	Use for	Common trap
Microsoft Entra ID	Users, groups, service principals	Identity foundation	Prefer groups over individual assignments
Tenant settings	Fabric feature availability and governance	Organization-wide controls	Workspace admins cannot override disabled tenant features
Capacity permissions	Who can use/administer capacity	Resource governance	Capacity access is not the same as data access
Workspace roles	Admin, Member, Contributor, Viewer-style collaboration	Broad item access and authoring	Too coarse for sensitive data segmentation
Item permissions	Sharing and access to specific Fabric items	Least-privilege item sharing	Item access may still require underlying data permissions
OneLake/data access controls	Folder/table/data access where supported	Granular lakehouse data control	Do not rely only on semantic model RLS for raw data protection
SQL permissions	GRANT/DENY-style database access where supported	Warehouse and SQL endpoint access	SQL access path can bypass report-only restrictions
Semantic model security	RLS/OLS-style BI restrictions	Report and model consumers	Does not automatically secure lakehouse files or warehouse tables
Sensitivity labels	Classification and protection metadata	Governance and compliance workflows	Labels identify/protect; they do not replace authorization design

Workspace role exam cues

Cue	Likely answer
User must manage workspace settings and permissions	Workspace Admin role or delegated admin pattern
User must create and edit Fabric items	Contributor/Member-style access, depending on governance needs
User must only consume reports/data	Viewer or item-level sharing plus data permissions
External or app identity runs ingestion	Use supported service principal/workspace identity/connection pattern; avoid personal credentials
Need least privilege for one dataset/table	Use item/data/SQL permissions rather than broad workspace admin

Governance checklist

Use Microsoft Entra groups for repeatable access assignments.
Separate development, test, and production workspaces.
Use sensitivity labels and endorsement for discoverability and trust.
Review lineage to understand downstream impact before schema changes.
Use domains or workspace organization patterns when many teams share Fabric.
Store credentials in Fabric connections or approved secret stores.
Validate that shortcuts and mirrored data inherit or enforce the intended access path.
Remember that report security and raw data security are separate design concerns.

DevOps and lifecycle management

Capability	Use for	Exam note
Git integration	Versioning supported Fabric item definitions	Data is not versioned by Git integration
Deployment pipelines	Promote items across dev/test/prod	Configure environment-specific connections, parameters, and rules
Workspace separation	Isolate lifecycle stages	Avoid developing directly in production
Parameters	Change paths, dates, connection names, schemas	Critical for reusable pipelines and notebooks
Fabric environments	Manage Spark libraries/settings where supported	Helps avoid “works in my notebook” dependency issues
Lineage view	Impact analysis	Use before modifying shared tables or semantic models
Monitoring hub	Central run status visibility	Useful for operational troubleshooting

Common lifecycle traps:

Deployment moves supported item metadata, not all data.
Hard-coded lakehouse IDs, paths, or connection names break promotion.
Personal credentials can fail when the owner leaves or permissions change.
Notebook cell execution order can hide missing initialization.
Schema changes must be coordinated with SQL endpoints, semantic models, and reports.

Performance and optimization

Delta and lakehouse optimization

Symptom	Likely issue	Fix pattern
Slow scans over many tiny files	Small-file problem	Compact/optimize Delta tables; batch writes appropriately
Queries scan too much data	Poor partition/filter design	Partition selectively; filter early; avoid over-partitioning
BI slow on raw tables	Raw layout not serving-friendly	Build gold tables or warehouse model
Duplicate rows after retry	Non-idempotent append	Use load IDs, deduplication, and MERGE
Schema mismatch failures	Source drift or incorrect inference	Define schemas explicitly for critical pipelines
Spark job slow shuffle	Large joins/grouping	Repartition carefully, reduce columns, filter early, consider broadcast for small dimensions
Lakehouse table not visible to SQL	Data written only as files or unregistered Delta	Save/register as a table in the lakehouse Tables area
High latency from shortcut source	Remote read/source bottleneck	Copy or mirror data when performance/isolation matters

Warehouse and SQL optimization

Area	Practical guidance
Data model	Prefer star schema for BI; avoid wide, ambiguous, highly normalized serving layers
Query shape	Select only needed columns, filter early, avoid unnecessary cross joins
ELT	Stage data, validate row counts, then merge/insert into curated tables
Statistics/metadata	Keep metadata current where supported by the engine
Concurrency	Monitor workload patterns before changing architecture
Capacity	Use Capacity Metrics to distinguish inefficient query design from resource pressure

Spark optimization quick checks

Check	Why it matters
Avoid reading entire bronze for small incremental updates	Reduces scan and shuffle
Persist/cache only when reused	Caching everything wastes memory
Control partition count after large shuffles	Too many or too few partitions hurts performance
Use explicit schemas for recurring files	Avoid expensive inference and inconsistent types
Use column pruning	Reading fewer columns reduces I/O
Use predicate pushdown-friendly filters	Helps Delta/Parquet skip data
Clean up old files carefully	Vacuum/retention choices affect rollback and time travel expectations

Monitoring and troubleshooting

Where to look first

Workload	Primary places to check	What to inspect
Data pipeline	Run history, activity output, Monitor hub	Failed activity, error text, rows copied, duration, retry behavior
Dataflow Gen2	Refresh/run details	Connector errors, transformation step, destination write failure
Notebook/Spark job	Spark application details, driver/executor logs, notebook output	Failed cell, dependency error, skew, shuffle, memory pressure
Warehouse SQL	Query history/insights where available	Long-running query, blocking, inefficient joins, data volume
Semantic model	Refresh history, model settings, lineage	Source permission, Direct Lake behavior, schema changes
Capacity-wide issue	Capacity Metrics	Throttling, overload, high concurrency, noisy workloads
Security issue	Workspace/item/data permissions, SQL grants, Entra groups	Missing group membership or mismatched access path

Troubleshooting decision table

Symptom	First question	Likely resolution
Pipeline succeeds but target has duplicates	Is the load idempotent?	Add keys, watermark, deduplication, or MERGE logic
Pipeline cannot reach source	Is source cloud, on-prem, private, or credential-restricted?	Configure supported gateway/private connectivity and credentials
Notebook runs manually but fails on schedule	Does it initialize everything?	Attach correct lakehouse, set parameters, install dependencies, avoid hidden state
SQL endpoint does not show new lakehouse data	Was data saved as a registered Delta table?	Write with saveAsTable or register table correctly
Report user sees denied data	Which layer denies access?	Check semantic model, item, SQL, and OneLake permissions separately
Direct Lake model behaves unexpectedly	Is the table/model mode supported and permissions valid?	Validate source tables, model design, and fallback/refresh settings
Spark job is slow only on large days	Is data skewed or partitioned poorly?	Inspect key distribution, filter early, repartition selectively
Warehouse query slows after schema/load change	Did data volume or query plan change?	Review query shape, table design, statistics/metadata, and capacity pressure
Shortcut data is unavailable	Is the external source accessible and authorized?	Check source credentials, network, and shortcut target
Dev deployment works but prod fails	Are environment-specific values hard-coded?	Use parameters, deployment rules, and prod connections

High-yield DP-700 distinctions

Distinction	Remember
Pipeline vs notebook	Pipeline orchestrates; notebook transforms with code
Dataflow Gen2 vs pipeline	Dataflow transforms low-code; pipeline controls workflow and movement
Shortcut vs copy	Shortcut references data; copy creates a new physical copy
Shortcut vs mirroring	Shortcut virtualizes supported data; mirroring replicates supported operational data
Lakehouse vs warehouse	Lakehouse is Spark/Delta/file-friendly; warehouse is SQL-first
Lakehouse SQL endpoint vs warehouse	SQL endpoint queries lakehouse tables; warehouse is the SQL engineering store
Semantic model RLS vs data security	RLS restricts model/report queries, not necessarily direct raw data access
Git/deployment vs backup	Git/deployment handles item definitions; it is not a data backup strategy
Scaling capacity vs optimizing workload	Optimize data layout and queries before assuming more capacity is the right answer
Bronze vs gold	Bronze preserves raw history; gold is business-ready and serving-oriented

Exam scenario playbook

If the scenario says…	Think…
“Business analysts need SQL access to curated tables”	Warehouse or lakehouse SQL analytics endpoint depending on write/model ownership
“Data engineers need to process JSON/CSV at scale”	Lakehouse + Spark notebook/job
“Need a scheduled daily copy with parameters”	Data pipeline
“Need low-code transformations using Power Query”	Dataflow Gen2
“Need avoid duplicate rows during retry”	Idempotent design, keys, MERGE, load audit
“Need avoid copying external data”	Shortcut, if supported and performance/security are acceptable
“Need near-real-time replicated operational data”	Mirroring, if source is supported
“Need promote solution from dev to prod”	Deployment pipelines/Git + parameters/connections
“Need restrict report rows by user”	Semantic model RLS, plus underlying data permissions if users can access raw data
“Need diagnose slow workloads across many Fabric items”	Capacity Metrics first, then item-specific logs

Last-minute checklist

Can you explain when to choose lakehouse, warehouse, pipeline, dataflow, notebook, shortcut, and mirroring?
Can you describe bronze, silver, and gold responsibilities without mixing raw and serving layers?
Can you design an incremental load that survives retries?
Can you identify which permission layer controls a failed access scenario?
Can you distinguish semantic model security from OneLake/SQL data security?
Can you troubleshoot a failed pipeline, notebook, SQL query, or refresh from logs?
Can you name practical fixes for small files, poor partitioning, schema drift, and duplicate loads?
Can you promote Fabric items across environments without hard-coded dev values?

Practical next step

Use this Quick Reference as a checklist while you work through DP-700 practice scenarios. For each scenario, force yourself to identify the Fabric item, data movement pattern, security boundary, monitoring point, and likely optimization before checking the answer.

Scenario Guide

Analytics Implementation