DP-700 — Microsoft Fabric Data Engineer Associate Quick Reference

Compact DP-700 reference for Microsoft Fabric data engineering decisions, services, security, performance, and troubleshooting.

This Quick Reference is independent exam-prep support for candidates preparing for Microsoft Fabric Data Engineer Associate (DP-700). Use it to review service choices, implementation patterns, security boundaries, and common traps before practicing full scenarios.

DP-700 scope snapshot

DP-700 expects practical Microsoft Fabric data engineering judgment: choosing the right Fabric item, building ingestion and transformation flows, managing data in OneLake, securing analytics assets, and monitoring or optimizing workloads.

AreaWhat to be ready to doHigh-yield exam angle
Analytics solution implementationWorkspaces, lakehouses, warehouses, semantic models, OneLake, domains, deployment patternsKnow which Fabric item owns data, compute, security, and serving behavior
Data ingestion and transformationPipelines, Dataflows Gen2, notebooks, Spark jobs, SQL transformations, incremental loadsChoose orchestration vs transformation tools correctly
Data managementDelta tables, shortcuts, medallion architecture, schemas, files, tables, refresh patternsKnow when data is copied, virtualized, mirrored, or transformed
Security and governanceMicrosoft Entra ID, workspace roles, item permissions, SQL permissions, RLS, sensitivity labels, lineageDistinguish coarse access, data-level access, and report-level restrictions
Monitoring and optimizationPipeline runs, Spark jobs, SQL queries, Capacity Metrics, refresh failures, small files, partitioningDiagnose symptoms before scaling capacity

Microsoft Fabric mental model

Core hierarchy and terms

ConceptExam-ready meaningCommon trap
TenantOrganization-level Microsoft Fabric environmentTenant settings can enable, disable, or constrain features independently of workspace permissions
CapacityCompute resource backing Fabric workloadsSlow workloads may be code, data layout, concurrency, or capacity pressure; do not assume one cause
WorkspaceCollaboration and security boundary for Fabric itemsWorkspace roles are broad; use item/data permissions for finer control
ItemFabric artifact such as lakehouse, warehouse, pipeline, notebook, dataflow, semantic modelDeploying an item usually does not deploy the underlying data
OneLakeUnified SaaS data lake for FabricOneLake is storage; Fabric items provide experiences and compute over it
LakehouseDelta-based data lake item for files, tables, Spark, and SQL analytics endpointSQL analytics endpoint is primarily for querying lakehouse tables, not full SQL-first data warehousing
WarehouseSQL-first relational warehouse item in FabricChoose for T-SQL engineering and relational serving, not arbitrary raw file processing
Data pipelineOrchestration item for movement and control flowPipelines coordinate work; they are not the best place for complex row-by-row transformations
Dataflow Gen2Low-code Power Query-based ingestion and transformationGood for connector-rich shaping; less ideal for complex code-first engineering
NotebookCode-first Spark development itemInteractive state can hide missing setup; scheduled runs must be self-contained
Spark job definitionProduction-style Spark job executionUse when repeatable Spark execution matters more than notebook interactivity
ShortcutOneLake reference to supported external or internal dataShortcuts virtualize data; they do not automatically transform, cleanse, or copy it
MirroringReplication from supported operational sources into FabricDifferent from shortcuts: mirrored data is replicated for analytics scenarios
Semantic modelBI model used by Power BI/Fabric reportsRLS in a semantic model does not automatically secure raw lakehouse or warehouse access

Fabric item selection matrix

RequirementPreferWhyAvoid / watch for
Store raw files and Delta tables, transform with SparkLakehouseFile + table experience, notebooks, Delta, medallion-friendlyDo not expect full SQL warehouse behavior through the lakehouse SQL endpoint
SQL-first dimensional warehouseWarehouseT-SQL DDL/DML, relational schemas, SQL servingDo not use it as a generic file landing zone
Low-code ingestion and transformationsDataflow Gen2Power Query, many connectors, data destinationsComplex orchestration belongs in pipelines
Copy data, run activities in order, branch, loop, scheduleData pipelineControl flow, copy activity, parameters, monitoringAvoid embedding complex business logic directly in pipeline expressions
Code-first large-scale transformationNotebook or Spark job definitionPySpark/SQL/Scala-style transformations and custom librariesDo not rely on interactive notebook state in production
Near-real-time replication from supported databasesMirroringReduces custom ingestion for supported sourcesNot a substitute for arbitrary transformations or unsupported sources
Query operational telemetry or event streamsEventstream/Eventhouse, when scenario requires Real-Time IntelligenceKQL/event-first analyticsNot the default choice for batch lakehouse/warehouse engineering
Avoid copying data from supported external storageOneLake shortcutVirtualized access from OneLake namespaceSource permissions, latency, and write support still matter
Serve Power BI with minimal import refresh over Fabric Delta tablesDirect Lake semantic modelReads data from OneLake in supported scenariosModel-level security is not raw data security

Lakehouse, warehouse, and SQL endpoint distinctions

Feature / decisionLakehouseLakehouse SQL analytics endpointWarehouse
Primary personaData engineer / Spark engineerSQL consumer over lakehouse tablesSQL data engineer / analyst
StorageFiles and Delta tables in OneLakeQueries Delta tables exposed by lakehouseRelational warehouse data in OneLake-backed storage
Write pathSpark, Dataflows, pipelines, lakehouse UIGenerally read-oriented for lakehouse dataT-SQL DDL/DML and ELT
Best forBronze/silver/gold Delta, raw files, Spark transformationsBI/query access to curated lakehouse tablesDimensional models, SQL transformations, SQL serving
Table organizationTables area plus Files areaExposes registered lakehouse tablesSchemas, tables, views, procedures as supported
Common exam cue“Need raw files, notebooks, Delta, medallion”“Need SQL access to lakehouse tables”“Need T-SQL warehouse and relational modeling”
Common trapFiles are not automatically queryable as tablesDo not use it as the write engine for lakehouse tablesDo not treat it like Spark for semi-structured raw files

Files vs tables in a lakehouse

LocationUse forExam note
FilesRaw or semi-structured files, landing zones, archives, unregistered dataGood for bronze landing, but not automatically a managed query table
TablesDelta tables registered for Spark and SQL analyticsUse for curated data that downstream SQL/BI tools should query
ShortcutsReferenced data from another OneLake location or supported external storageUseful for data sharing and avoiding copies; still plan security and performance

Data architecture patterns

Medallion architecture reference

    flowchart LR
	    A[Sources] --> B[Bronze<br/>Raw landing]
	    B --> C[Silver<br/>Cleaned and conformed]
	    C --> D[Gold<br/>Business-ready model]
	    D --> E[Warehouse / Semantic model / Reports]
LayerPurposeTypical Fabric implementationQuality expectations
BronzePreserve source data with minimal changesLakehouse Files or Delta tables; pipeline copy; shortcuts; mirroring outputTraceability, ingestion metadata, no heavy business logic
SilverClean, standardize, deduplicate, conformSpark notebooks/jobs, Dataflows Gen2, Delta MERGEData types, keys, deduplication, valid records
GoldBusiness-ready facts/dimensions or aggregatesLakehouse Delta tables or Warehouse tablesStar schema, semantic names, performance-ready
ServingSQL/BI/ML consumptionWarehouse, SQL analytics endpoint, semantic model, Direct LakeSecurity, relationships, measures, query performance

Pattern selection

ScenarioRecommended pattern
Multiple raw source systems with different formatsLand to bronze first, then standardize in silver
Re-runnable daily loadsUse pipeline parameters, watermarks, idempotent writes, and MERGE
BI model over curated Fabric tablesGold Delta or warehouse tables plus semantic model
SQL team owns transformationsWarehouse with T-SQL ELT
Spark team owns transformationsLakehouse with notebooks or Spark job definitions
Data must remain in external supported storageShortcut if virtualization is acceptable; copy if isolation/performance/history is needed

Ingestion and orchestration quick reference

Ingestion method decision table

NeedChooseWhyWatch for
Scheduled copy from source to OneLake/warehouseData pipeline Copy activityOperational control, monitoring, retries, parametersSchema drift, credentials, gateway, incremental logic
Low-code source shapingDataflow Gen2Power Query transformations and destinationsRefresh duration, folding behavior, complex logic
Complex file parsing or custom transformationNotebook/Spark jobCode-level control and distributed processingDependency management and reproducibility
Avoid moving supported external dataShortcutReduces duplicationSource availability, security, performance, unsupported write patterns
Replicate supported operational DB dataMirroringSimplifies near-real-time analytics ingestionConfirm source support and downstream modeling approach
On-premises or private network sourcePipeline/dataflow with gateway or supported private connectivitySecure access to non-public dataCredential scope and gateway health
Event telemetryEventstream/Eventhouse if real-time scenarioStream-first processingDo not force event tooling for simple batch ingestion

Pipeline design checklist

Design concernDP-700-ready approach
ParametersParameterize source path, target path, dates, environment, and load mode
Re-runsMake activities idempotent; avoid duplicate inserts
Incremental loadsUse watermark columns, change tracking/CDC where available, or source-specific modified timestamps
DependenciesUse activities for sequence, conditions, loops, and failure paths
SecretsStore credentials in Fabric connections or approved secret mechanisms; do not hard-code
ObservabilityCapture run IDs, row counts, source extract time, and failure messages
RecoveryUse retry policies where appropriate, but fix non-transient data issues explicitly
Environment movementUse deployment rules, parameters, or separate connections for dev/test/prod

Example pipeline expression pattern:

@concat('raw/orders/load_date=', formatDateTime(pipeline().parameters.LoadDate, 'yyyy-MM-dd'))

Incremental load reference

TechniqueUse whenImplementation ideaTrap
WatermarkSource has reliable modified timestamp or increasing keyStore last successful watermark; extract rows greater than itLate-arriving updates can be missed if watermark is advanced too early
Full reloadSmall dimension or unstable sourceReplace target or rebuild curated tableExpensive and risky for large facts
Append-onlySource only inserts immutable eventsAppend new records and partition by ingestion/event dateDuplicates require deduplication keys
Upsert/MERGERecords can changeMatch on business key or hash; update/insert targetMissing deletes unless source provides delete indicators
Snapshot comparisonNeed detect changes without CDCCompare current snapshot to previous snapshotMore compute and storage

Transformation reference

Transformation tool selection

RequirementBest fitReason
Complex Python/PySpark logicNotebook or Spark job definitionFull code control and scalable processing
SQL ELT and dimensional modelingWarehouseT-SQL-first development
Low-code shaping and connector transformsDataflow Gen2Power Query experience
Orchestrate several transformationsData pipelineSequence notebooks, dataflows, stored procedures, copy steps
Reusable production Spark executionSpark job definitionRepeatable, less interactive than notebooks
Ad hoc explorationNotebookInteractive development, visualization, quick testing

PySpark Delta patterns

Write a bronze table from raw files:

df = (
    spark.read
    .option("header", "true")
    .csv("Files/raw/orders/")
)

(
    df.write
    .format("delta")
    .mode("append")
    .saveAsTable("bronze_orders")
)

Deduplicate and write a silver table:

from pyspark.sql.functions import col, row_number
from pyspark.sql.window import Window

w = Window.partitionBy("OrderId").orderBy(col("ModifiedDate").desc())

silver = (
    spark.table("bronze_orders")
    .withColumn("rn", row_number().over(w))
    .filter(col("rn") == 1)
    .drop("rn")
)

(
    silver.write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable("silver_orders")
)

Upsert with Delta MERGE:

from delta.tables import DeltaTable

updates = spark.table("staging_customer_updates")
target = DeltaTable.forName(spark, "silver_customer")

(
    target.alias("t")
    .merge(updates.alias("s"), "t.CustomerId = s.CustomerId")
    .whenMatchedUpdateAll()
    .whenNotMatchedInsertAll()
    .execute()
)

T-SQL warehouse patterns

Use T-SQL when the scenario is SQL-first and the target is a Fabric Warehouse.

CREATE TABLE dbo.DimCustomer
(
    CustomerKey INT NOT NULL,
    CustomerId VARCHAR(50) NOT NULL,
    CustomerName VARCHAR(200) NULL,
    IsCurrent BIT NOT NULL
);
MERGE dbo.DimCustomer AS target
USING dbo.StageCustomer AS source
    ON target.CustomerId = source.CustomerId
WHEN MATCHED THEN
    UPDATE SET
        CustomerName = source.CustomerName,
        IsCurrent = 1
WHEN NOT MATCHED THEN
    INSERT (CustomerKey, CustomerId, CustomerName, IsCurrent)
    VALUES (source.CustomerKey, source.CustomerId, source.CustomerName, 1);

Exam trap: use Warehouse T-SQL for warehouse tables. Use Spark or supported lakehouse operations for lakehouse Delta table writes; do not assume every T-SQL DML pattern applies to the lakehouse SQL analytics endpoint.

Modeling and serving data

Dimensional modeling quick reference

ObjectPurposeFabric implementationExam tip
Fact tableNumeric events or transactionsGold lakehouse table or warehouse tableKeep grain explicit
Dimension tableDescriptive contextGold lakehouse table or warehouse tableUse stable keys and business-friendly attributes
Degenerate dimensionIdentifier stored in fact, such as order numberFact columnAvoid unnecessary dimension table if no attributes
Slowly changing dimension Type 1Overwrite old attributesMERGE updateUse for corrections where history is not needed
Slowly changing dimension Type 2Preserve historyAdd effective dates/current flag/surrogate keysRequires careful joins and current-row filters
Aggregate tablePrecomputed summaryGold table or warehouse tableUse for performance when detail is too large for repeated queries
Semantic model measureBusiness calculationPower BI/Fabric semantic modelPrefer measures for reusable business logic

Serving option decision table

Consumer needChooseWhy
BI over Fabric Delta tables with minimal refresh movementDirect Lake semantic modelUses OneLake-backed tables in supported scenarios
SQL analysts querying curated lakehouse dataLakehouse SQL analytics endpointFamiliar SQL query surface over lakehouse tables
SQL analysts building warehouse-style reportsWarehouseFull SQL-first serving pattern
Reports need curated relationships, measures, RLSSemantic modelCentral BI model and security layer
Data scientists need feature dataLakehouse tables/filesSpark-friendly access
External tools need SQL endpointWarehouse or lakehouse SQL analytics endpointChoose based on write/modeling needs

Direct Lake, Import, and DirectQuery

ModeChoose whenWatch for
Direct LakeData is in supported Fabric/OneLake tables and you want low-latency BI without import refresh copiesModel design, permissions, and fallback behavior matter
ImportNeed cached model performance, transformations, or sources not suitable for Direct LakeRequires refresh planning
DirectQueryNeed live query passthrough to supported sourceSource performance and query folding are critical

Security, permissions, and governance

Security layers

LayerControlsUse forCommon trap
Microsoft Entra IDUsers, groups, service principalsIdentity foundationPrefer groups over individual assignments
Tenant settingsFabric feature availability and governanceOrganization-wide controlsWorkspace admins cannot override disabled tenant features
Capacity permissionsWho can use/administer capacityResource governanceCapacity access is not the same as data access
Workspace rolesAdmin, Member, Contributor, Viewer-style collaborationBroad item access and authoringToo coarse for sensitive data segmentation
Item permissionsSharing and access to specific Fabric itemsLeast-privilege item sharingItem access may still require underlying data permissions
OneLake/data access controlsFolder/table/data access where supportedGranular lakehouse data controlDo not rely only on semantic model RLS for raw data protection
SQL permissionsGRANT/DENY-style database access where supportedWarehouse and SQL endpoint accessSQL access path can bypass report-only restrictions
Semantic model securityRLS/OLS-style BI restrictionsReport and model consumersDoes not automatically secure lakehouse files or warehouse tables
Sensitivity labelsClassification and protection metadataGovernance and compliance workflowsLabels identify/protect; they do not replace authorization design

Workspace role exam cues

CueLikely answer
User must manage workspace settings and permissionsWorkspace Admin role or delegated admin pattern
User must create and edit Fabric itemsContributor/Member-style access, depending on governance needs
User must only consume reports/dataViewer or item-level sharing plus data permissions
External or app identity runs ingestionUse supported service principal/workspace identity/connection pattern; avoid personal credentials
Need least privilege for one dataset/tableUse item/data/SQL permissions rather than broad workspace admin

Governance checklist

  • Use Microsoft Entra groups for repeatable access assignments.
  • Separate development, test, and production workspaces.
  • Use sensitivity labels and endorsement for discoverability and trust.
  • Review lineage to understand downstream impact before schema changes.
  • Use domains or workspace organization patterns when many teams share Fabric.
  • Store credentials in Fabric connections or approved secret stores.
  • Validate that shortcuts and mirrored data inherit or enforce the intended access path.
  • Remember that report security and raw data security are separate design concerns.

DevOps and lifecycle management

CapabilityUse forExam note
Git integrationVersioning supported Fabric item definitionsData is not versioned by Git integration
Deployment pipelinesPromote items across dev/test/prodConfigure environment-specific connections, parameters, and rules
Workspace separationIsolate lifecycle stagesAvoid developing directly in production
ParametersChange paths, dates, connection names, schemasCritical for reusable pipelines and notebooks
Fabric environmentsManage Spark libraries/settings where supportedHelps avoid “works in my notebook” dependency issues
Lineage viewImpact analysisUse before modifying shared tables or semantic models
Monitoring hubCentral run status visibilityUseful for operational troubleshooting

Common lifecycle traps:

  • Deployment moves supported item metadata, not all data.
  • Hard-coded lakehouse IDs, paths, or connection names break promotion.
  • Personal credentials can fail when the owner leaves or permissions change.
  • Notebook cell execution order can hide missing initialization.
  • Schema changes must be coordinated with SQL endpoints, semantic models, and reports.

Performance and optimization

Delta and lakehouse optimization

SymptomLikely issueFix pattern
Slow scans over many tiny filesSmall-file problemCompact/optimize Delta tables; batch writes appropriately
Queries scan too much dataPoor partition/filter designPartition selectively; filter early; avoid over-partitioning
BI slow on raw tablesRaw layout not serving-friendlyBuild gold tables or warehouse model
Duplicate rows after retryNon-idempotent appendUse load IDs, deduplication, and MERGE
Schema mismatch failuresSource drift or incorrect inferenceDefine schemas explicitly for critical pipelines
Spark job slow shuffleLarge joins/groupingRepartition carefully, reduce columns, filter early, consider broadcast for small dimensions
Lakehouse table not visible to SQLData written only as files or unregistered DeltaSave/register as a table in the lakehouse Tables area
High latency from shortcut sourceRemote read/source bottleneckCopy or mirror data when performance/isolation matters

Warehouse and SQL optimization

AreaPractical guidance
Data modelPrefer star schema for BI; avoid wide, ambiguous, highly normalized serving layers
Query shapeSelect only needed columns, filter early, avoid unnecessary cross joins
ELTStage data, validate row counts, then merge/insert into curated tables
Statistics/metadataKeep metadata current where supported by the engine
ConcurrencyMonitor workload patterns before changing architecture
CapacityUse Capacity Metrics to distinguish inefficient query design from resource pressure

Spark optimization quick checks

CheckWhy it matters
Avoid reading entire bronze for small incremental updatesReduces scan and shuffle
Persist/cache only when reusedCaching everything wastes memory
Control partition count after large shufflesToo many or too few partitions hurts performance
Use explicit schemas for recurring filesAvoid expensive inference and inconsistent types
Use column pruningReading fewer columns reduces I/O
Use predicate pushdown-friendly filtersHelps Delta/Parquet skip data
Clean up old files carefullyVacuum/retention choices affect rollback and time travel expectations

Monitoring and troubleshooting

Where to look first

WorkloadPrimary places to checkWhat to inspect
Data pipelineRun history, activity output, Monitor hubFailed activity, error text, rows copied, duration, retry behavior
Dataflow Gen2Refresh/run detailsConnector errors, transformation step, destination write failure
Notebook/Spark jobSpark application details, driver/executor logs, notebook outputFailed cell, dependency error, skew, shuffle, memory pressure
Warehouse SQLQuery history/insights where availableLong-running query, blocking, inefficient joins, data volume
Semantic modelRefresh history, model settings, lineageSource permission, Direct Lake behavior, schema changes
Capacity-wide issueCapacity MetricsThrottling, overload, high concurrency, noisy workloads
Security issueWorkspace/item/data permissions, SQL grants, Entra groupsMissing group membership or mismatched access path

Troubleshooting decision table

SymptomFirst questionLikely resolution
Pipeline succeeds but target has duplicatesIs the load idempotent?Add keys, watermark, deduplication, or MERGE logic
Pipeline cannot reach sourceIs source cloud, on-prem, private, or credential-restricted?Configure supported gateway/private connectivity and credentials
Notebook runs manually but fails on scheduleDoes it initialize everything?Attach correct lakehouse, set parameters, install dependencies, avoid hidden state
SQL endpoint does not show new lakehouse dataWas data saved as a registered Delta table?Write with saveAsTable or register table correctly
Report user sees denied dataWhich layer denies access?Check semantic model, item, SQL, and OneLake permissions separately
Direct Lake model behaves unexpectedlyIs the table/model mode supported and permissions valid?Validate source tables, model design, and fallback/refresh settings
Spark job is slow only on large daysIs data skewed or partitioned poorly?Inspect key distribution, filter early, repartition selectively
Warehouse query slows after schema/load changeDid data volume or query plan change?Review query shape, table design, statistics/metadata, and capacity pressure
Shortcut data is unavailableIs the external source accessible and authorized?Check source credentials, network, and shortcut target
Dev deployment works but prod failsAre environment-specific values hard-coded?Use parameters, deployment rules, and prod connections

High-yield DP-700 distinctions

DistinctionRemember
Pipeline vs notebookPipeline orchestrates; notebook transforms with code
Dataflow Gen2 vs pipelineDataflow transforms low-code; pipeline controls workflow and movement
Shortcut vs copyShortcut references data; copy creates a new physical copy
Shortcut vs mirroringShortcut virtualizes supported data; mirroring replicates supported operational data
Lakehouse vs warehouseLakehouse is Spark/Delta/file-friendly; warehouse is SQL-first
Lakehouse SQL endpoint vs warehouseSQL endpoint queries lakehouse tables; warehouse is the SQL engineering store
Semantic model RLS vs data securityRLS restricts model/report queries, not necessarily direct raw data access
Git/deployment vs backupGit/deployment handles item definitions; it is not a data backup strategy
Scaling capacity vs optimizing workloadOptimize data layout and queries before assuming more capacity is the right answer
Bronze vs goldBronze preserves raw history; gold is business-ready and serving-oriented

Exam scenario playbook

If the scenario says…Think…
“Business analysts need SQL access to curated tables”Warehouse or lakehouse SQL analytics endpoint depending on write/model ownership
“Data engineers need to process JSON/CSV at scale”Lakehouse + Spark notebook/job
“Need a scheduled daily copy with parameters”Data pipeline
“Need low-code transformations using Power Query”Dataflow Gen2
“Need avoid duplicate rows during retry”Idempotent design, keys, MERGE, load audit
“Need avoid copying external data”Shortcut, if supported and performance/security are acceptable
“Need near-real-time replicated operational data”Mirroring, if source is supported
“Need promote solution from dev to prod”Deployment pipelines/Git + parameters/connections
“Need restrict report rows by user”Semantic model RLS, plus underlying data permissions if users can access raw data
“Need diagnose slow workloads across many Fabric items”Capacity Metrics first, then item-specific logs

Last-minute checklist

  • Can you explain when to choose lakehouse, warehouse, pipeline, dataflow, notebook, shortcut, and mirroring?
  • Can you describe bronze, silver, and gold responsibilities without mixing raw and serving layers?
  • Can you design an incremental load that survives retries?
  • Can you identify which permission layer controls a failed access scenario?
  • Can you distinguish semantic model security from OneLake/SQL data security?
  • Can you troubleshoot a failed pipeline, notebook, SQL query, or refresh from logs?
  • Can you name practical fixes for small files, poor partitioning, schema drift, and duplicate loads?
  • Can you promote Fabric items across environments without hard-coded dev values?

Practical next step

Use this Quick Reference as a checklist while you work through DP-700 practice scenarios. For each scenario, force yourself to identify the Fabric item, data movement pattern, security boundary, monitoring point, and likely optimization before checking the answer.

Browse Certification Practice Tests by Exam Family