DP-700 — Microsoft Fabric Data Engineer Associate Quick Review

Quick Review for Microsoft DP-700 Fabric Data Engineer Associate candidates: key concepts, traps, decisions, and practice focus.

Quick Review purpose

This Quick Review is for candidates preparing for Microsoft Microsoft Fabric Data Engineer Associate (DP-700), exam code DP-700. Use it to refresh high-yield concepts before moving into topic drills, mock exams, and detailed explanations.

The DP-700 exam is not just a syntax test. Expect scenario-style decisions about how to design, ingest, transform, secure, monitor, and optimize data engineering solutions in Microsoft Fabric. The strongest candidates can explain why a Fabric item or pattern is the right fit, not only what it is called.

This page supports IT Mastery practice with original practice questions. It is not affiliated with Microsoft.

DP-700 mental model

Think in layers. Most exam scenarios can be solved by identifying the layer being tested.

LayerWhat to recognize quicklyCommon exam angle
Workspace and capacityWorkspaces contain Fabric items; capacity affects performance and throttlingChoose workspace roles, deployment approach, monitoring location
OneLake storageUnified storage layer; lakehouse tables use open Delta/Parquet patternsAvoid unnecessary copies, use shortcuts, manage files and tables
IngestionCopy activity, Dataflows Gen2, pipelines, mirroring, shortcutsChoose between low-code, orchestration, bulk copy, or no-copy access
TransformationSpark notebooks, Spark SQL, Dataflows Gen2, Warehouse T-SQLMatch transformation complexity to the right engine
ServingLakehouse SQL analytics endpoint, Warehouse, semantic model, Direct Lake patternsDecide how data should be queried or consumed
Security and governanceWorkspace roles, item permissions, SQL permissions, labels, lineage, Git/deploymentSeparate access control, collaboration, and deployment concerns
OperationsMonitoring hub, run history, Spark UI/logs, capacity metrics, query diagnosticsTroubleshoot failures, optimize performance, control cost/capacity pressure

High-yield Fabric item decisions

Requirement in the scenarioUsually points toWhy
Store raw, curated, and analytics-ready data in open formatsLakehouseBest fit for Delta tables, Spark processing, medallion architecture
Build a relational data warehouse with T-SQL transformations and SQL servingWarehouseStrong fit for SQL-centric data engineering and BI serving
Orchestrate multiple steps with dependencies, parameters, retries, and schedulesData pipelinePipelines coordinate work; they are not usually the heavy transformation engine
Perform low-code shaping, cleansing, and Power Query transformationsDataflow Gen2Good for analysts/data engineers who need repeatable low-code ETL
Perform complex code-based transforms, custom logic, ML-adjacent preparation, or Spark-scale processingNotebookGives PySpark, Spark SQL, and code-level control
Access existing data without copying itShortcutLogical access to data; useful when duplication is not required
Incrementally ingest changed records from a sourcePipeline plus watermark/CDC logic, often followed by MERGEAvoid repeated full loads when only changes are needed
Replicate supported operational data into Fabric for analytics with minimal ETLMirroringUseful when the source and latency requirements match the feature
Serve curated SQL tables to reporting usersWarehouse or Lakehouse SQL endpoint, depending on write/query needsSQL endpoint is useful for querying lakehouse tables; Warehouse is better for SQL DML/warehouse design

Fast elimination rule

If the question says:

WordingThink first
“Schedule,” “retry,” “dependency,” “parameterize activities”Pipeline
“Low-code,” “Power Query,” “combine and clean data visually”Dataflow Gen2
“PySpark,” “custom library,” “large-scale transformation”Notebook
“T-SQL warehouse,” “stored procedure,” “SQL DML”Warehouse
“No data duplication,” “use existing data in place”Shortcut
“Upsert,” “changed rows,” “incremental load”Watermark/CDC plus MERGE
“Files are visible but not queryable as tables”Register/create Delta tables or place data correctly as tables
“Users can open workspace but should not see all rows”Workspace role alone is not enough; use item/SQL/model-level security as appropriate

OneLake, lakehouses, and Delta tables

What to know

OneLake is the storage foundation for Fabric. Lakehouses organize data for data engineering workloads and expose data through both file/table structures and SQL query surfaces.

ConceptReview pointTrap
Files areaGood for raw or unstructured filesFiles are not automatically the same as managed queryable tables
Tables areaDelta tables used for structured analyticsTable metadata and format matter; random files do not equal a governed table
Delta LakeTransaction log, ACID-style table operations, schema handling, time travel conceptsTreating Delta as “just Parquet files” misses transaction and metadata behavior
ShortcutsLogical references to data stored elsewhereShortcuts reduce copying but do not remove the need to understand permissions and source behavior
Medallion patternBronze raw, silver cleaned, gold curatedIt is an architecture pattern, not a substitute for clear security, quality, and lifecycle rules
Schema evolutionControlled handling of changing columns/typesBlind schema drift can break downstream tables, reports, or queries
Small filesToo many tiny files hurt query performanceCompact/optimize instead of only adding more partitions
PartitioningHelps when queries filter by partition columnsOver-partitioning high-cardinality columns can make performance worse

Bronze, silver, gold review

LayerPurposeTypical operations
BronzePreserve source-like dataCopy/load, append, basic metadata capture, source audit columns
SilverClean and standardizeType conversion, deduplication, null handling, conforming names, CDC application
GoldServe business-ready analyticsAggregation, dimensional modeling, star-schema-style tables, reporting-ready facts/dimensions

Common mistake: candidates choose a gold-layer serving pattern for raw ingestion requirements. Read whether the scenario asks for landing, cleansing, conforming, or serving.

Lakehouse versus Warehouse

Decision pointLakehouseWarehouse
Main strengthOpen data engineering with Delta and SparkSQL-centric relational data warehousing
Transformation styleSpark notebooks, Spark SQL, Dataflows Gen2, pipelines writing to lakehouseT-SQL, SQL objects, stored procedures, warehouse modeling
Best forMedallion architecture, open data lake patterns, mixed file/table workloadsCurated relational warehouse, SQL users, dimensional reporting
Query surfaceSQL analytics endpoint for lakehouse tablesWarehouse SQL endpoint with stronger SQL DML orientation
Write expectationOften write through Spark, pipelines, or dataflowsWrite and transform with T-SQL patterns
Exam trapAssuming the lakehouse SQL endpoint is the same as a full warehouse write engineUsing a warehouse when the requirement is open lake storage and Spark processing

A practical rule: if the requirement emphasizes Delta tables, notebooks, open files, and Spark, think Lakehouse. If it emphasizes T-SQL transformations, relational warehouse objects, and SQL-first serving, think Warehouse.

Ingestion patterns

Choose the right ingestion method

ScenarioStrong optionWhy
Move data from a source into Fabric on a schedulePipeline with Copy activityBuilt for orchestrated movement
Clean and reshape data with low-code transformationsDataflow Gen2Power Query-style data preparation
Ingest only new or changed rowsPipeline with parameters/watermarks, CDC if available, then merge/upsertReduces load volume and avoids full reloads
Access data already stored in a supported external lakeShortcutAvoids duplicate storage and repeated copy jobs
Need complex parsing, enrichment, or custom librariesNotebookCode control and Spark scale
Need multiple activities with failure handlingPipelineDependencies, conditions, retries, parameters
Need SQL-based transformation after loadWarehouse SQL or Spark SQL depending on targetKeep transformations close to the serving/storage design

Incremental load essentials

For incremental ingestion, look for:

  1. A reliable change indicator, such as modified timestamp, increasing key, version, or CDC feed.
  2. A stored watermark from the last successful run.
  3. A cutoff value for the current run.
  4. A load step that brings only the eligible changes.
  5. An upsert/merge step into the target.
  6. Audit handling for failed runs so the watermark is not advanced incorrectly.

Common trap: updating the watermark before the target write succeeds. If the run fails after extraction but before merge, advancing the watermark can skip data.

Full load versus incremental load

Use full load whenUse incremental load when
Dataset is smallDataset is large
Source lacks reliable change trackingSource provides modified date, CDC, or versioning
Reload is simple and cheapReload would exceed time, capacity, or cost expectations
Target can be safely overwrittenTarget must preserve history or avoid disruption
Data freshness requirements are looseFrequent refresh is required

Transformation review

Transformation tool selection

NeedBetter fitWatch for
Simple column selection, filtering, type changesDataflow Gen2Query folding and source limitations
Reusable low-code data preparationDataflow Gen2Destination settings and refresh behavior
Complex business rules at scaleNotebookSpark performance, partitioning, shuffle, code quality
SQL warehouse transformationsWarehouse T-SQLDo not apply SQL Server tuning assumptions blindly
Delta upsert into lakehouse tableSpark SQL/PySpark MERGE patternCorrect keys and deduplication before merge
Orchestrate several transformationsPipelinePipeline coordinates; heavy work should run in the right engine
Data quality checksNotebook, SQL, or dataflow depending on designFail fast and log rejected records when required

Common transformation mistakes

MistakeBetter approach
Doing every transformation in a pipelineUse pipelines for orchestration and call notebooks, dataflows, or SQL as needed
Using full overwrite when only a few rows changedUse incremental load and merge/upsert
Partitioning by a unique IDPartition by columns commonly used for pruning, often date or region-like columns
Ignoring duplicate keys before MERGEDeduplicate and define deterministic conflict rules
Letting schema drift silently break downstream modelsValidate schema and handle expected changes explicitly
Optimizing only compute but ignoring file layoutOptimize table layout, file sizes, and filters

SQL, Spark, and Delta quick reminders

Spark/notebook patterns

Know when notebooks are appropriate:

  • Custom PySpark transformations.
  • Large-scale joins and aggregations.
  • Data cleansing that requires code.
  • Delta table maintenance.
  • Reusable engineering notebooks triggered by a pipeline.
  • Exploratory validation before productionizing a pipeline.

Performance traps:

SymptomLikely causeReview response
Slow joinLarge shuffle, skewed key, unnecessary columnsFilter early, select only needed columns, consider join strategy
Slow readsPoor partitioning, many small files, no predicate pruningOptimize layout and query filters
Slow writesToo many output files or poor partition choiceControl repartitioning and table maintenance
Repeated expensive computationRecomputing same intermediate dataCache only when reused and beneficial
Job fails after schema changeSchema mismatchAdd explicit schema management and validation

SQL patterns

For Warehouse-oriented questions, expect SQL design and operations:

PatternUse when
CTAS-style creationBuilding transformed tables from query results
ViewsAbstracting query logic or serving controlled projections
Stored proceduresEncapsulating repeatable SQL transformations
MERGE/upsertApplying changes from staging to target
Staging tablesLoading and validating before applying to curated tables
Star schemaServing facts and dimensions for analytics

Common trap: assuming every SQL Server feature or index-tuning habit maps directly to Fabric Warehouse. Focus on Fabric-appropriate table design, query shape, data volume reduction, and monitoring.

Security and governance review

Access control layers

LayerWhat it controlsCandidate trap
Workspace rolesCollaboration and broad access within a workspaceWorkspace access is not the same as row-level data security
Item permissionsAccess to specific Fabric itemsSharing an item may not grant every downstream data permission
SQL permissionsDatabase/warehouse object accessSQL permissions can differ from workspace collaboration roles
Semantic model securityRLS/OLS-style report consumption controlsModel security does not automatically secure raw lake files
Source permissionsAccess to shortcut or external data sourceA shortcut does not magically bypass source governance
Credentials/connectionsHow Fabric authenticates to sourcesDo not embed secrets in notebooks or hard-code credentials

Governance concepts to review

ConceptWhy it matters
LineageUnderstand upstream/downstream impact before changing tables, pipelines, or models
Sensitivity labelsCommunicate and enforce data classification expectations
Endorsement/certification of assetsHelps users identify trusted assets
Git integrationVersion control for supported Fabric items
Deployment pipelinesPromote content across dev/test/prod-style stages
Parameters and environment-specific settingsAvoid hard-coding workspace IDs, connection details, or paths
Least privilegeGrant only the access required for the user, service, or process

Security decision traps

  • Do not solve row-level restrictions by only assigning a Viewer workspace role.
  • Do not use broad workspace Admin access for routine pipeline execution.
  • Do not assume a user who can see a report should also access the raw lakehouse.
  • Do not hard-code credentials in notebooks or scripts.
  • Do not forget downstream access when sharing a report, SQL endpoint, or semantic model.

Deployment and lifecycle

DP-700 scenarios may test whether you can move a Fabric solution safely from development to production.

RequirementReview response
Track changes to notebooks, pipelines, or other supported itemsUse Git integration where supported
Promote content between environmentsUse deployment pipelines
Use different connections in dev/test/prodParameterize and remap settings during deployment
Avoid breaking productionTest in lower environment and validate dependencies
Understand impact of table changesUse lineage and dependency review
Repeat infrastructure/configuration consistentlyUse documented deployment patterns and avoid manual-only changes

Common mistake: treating deployment as only copying an item. Real deployment also includes connections, permissions, parameters, schedules, and downstream dependencies.

Monitoring and troubleshooting

Where to look first

ProblemFirst checks
Pipeline failedRun history, failed activity output, linked connection, parameters, source schema, permissions
Copy activity slowSource throughput, network/gateway constraints, partitioning, file count, parallelism settings
Dataflow refresh failedStep error, credentials, schema changes, query folding, destination configuration
Notebook failedSpark logs, cell output, package/library issues, permissions, table path, schema conflict
Warehouse query slowQuery shape, filters, joins, data volume, table design, monitoring/query diagnostics
Capacity throttling or delaysCapacity metrics, concurrency, background jobs, refresh schedules
Users cannot access dataWorkspace role, item permission, SQL permission, source/shortcut permission, semantic model permissions
Report/semantic model staleUpstream pipeline status, refresh history, Direct Lake/semantic model configuration, table update timing

Optimization levers

GoalPractical levers
Read less dataSelect only required columns, filter early, use partition pruning
Move less dataUse incremental loads, shortcuts, and staging only when needed
Write better dataUse Delta tables, appropriate file sizes, compaction/optimization patterns
Reduce Spark costAvoid unnecessary shuffles, handle skew, cache selectively
Improve SQL servingModel for common queries, avoid SELECT *, reduce joins where practical
Reduce failuresAdd validation, retries where appropriate, idempotent loads, clear audit logs
Control capacity pressureStagger schedules, manage concurrency, monitor capacity usage

Delta table maintenance concepts

ConceptPurposeTrap
Optimize/compactionReduce small-file overheadNot a substitute for good ingestion design
V-Order-style optimizationImprove read performance for analytics workloadsHelps reads but does not fix incorrect logic
VacuumRemove old files no longer needed by retention rulesCan affect time travel/history expectations
Schema enforcementPrevent unexpected incompatible writesMay require planned schema evolution
Time travel/historyUseful for audit and recovery scenariosRetention and cleanup policies matter

Common DP-700 scenario traps

TrapWhy it is wrongBetter thinking
“Use a notebook for everything”Not every task needs custom codeUse pipelines for orchestration, dataflows for low-code, warehouse for SQL
“Use a pipeline for all transformations”Pipelines coordinate work; heavy transforms belong elsewherePipeline calls the right engine
“Copy data even when a shortcut would work”Duplicates storage and introduces sync complexityUse shortcuts when no-copy access meets requirements
“Use full refresh for a large changing source”Wastes time and capacityUse incremental ingestion and merge
“Grant workspace Admin to fix access”Over-permissive and riskyDiagnose the correct permission layer
“Partition by high-cardinality column”Creates too many partitions and small filesPartition by useful pruning columns
“Ignore failed-run watermark behavior”Can skip recordsAdvance watermark only after successful target update
“Assume SQL endpoint equals Warehouse”Lakehouse and Warehouse have different write/serving patternsMatch engine to requirement
“Optimize compute before data layout”Bad layout can dominate performanceFix file sizes, filters, partitions, and table design
“Promote items without remapping connections”Dev settings can leak into prodParameterize and validate deployment settings

Quick decision checklist for exam questions

Before selecting an answer, identify:

  1. Target storage: lakehouse, warehouse, external source through shortcut, or semantic model.
  2. Transformation style: low-code, Spark/code, SQL, or orchestration-only.
  3. Load pattern: full, incremental, CDC, streaming/near-real-time, or no-copy.
  4. Security boundary: workspace, item, SQL object, semantic model, or source system.
  5. Operational requirement: schedule, retry, monitoring, deployment, lineage, or capacity optimization.
  6. Performance issue: compute, query shape, file layout, partitioning, source throughput, or concurrency.
  7. Failure behavior: idempotency, watermark handling, duplicate handling, and auditability.

If two answers seem plausible, prefer the one that satisfies the requirement with the least unnecessary complexity.

Mini review prompts

Use these as a quick readiness check before starting a DP-700 question bank.

PromptBest answer direction
You need to orchestrate a copy, run a notebook, and then execute a SQL step on a schedulePipeline
You need low-code cleansing using Power Query-style stepsDataflow Gen2
You need no-copy access to data already stored in a supported locationShortcut
You need complex PySpark transformations and Delta table maintenanceNotebook
You need a SQL-first curated dimensional storeWarehouse
You need to apply only changed records from a source tableIncremental load with watermark/CDC and merge/upsert
A lakehouse has raw files but SQL users cannot query them as tablesCreate/register proper tables or write to the Tables area in the correct format
A user can access a workspace but should only see certain rowsApply the appropriate data/model-level security, not only a workspace role
A pipeline skipped records after a failed loadReview watermark update timing and idempotency
A Spark job is slow after thousands of tiny files were createdCompact/optimize table layout and review write pattern

How to use IT Mastery practice effectively

After this Quick Review, move into original practice questions in focused sets rather than immediately taking a full mock exam.

Suggested topic drill order

DrillGoal
Fabric item selectionBuild fast recognition of lakehouse, warehouse, pipeline, dataflow, notebook, shortcut
OneLake and DeltaReview tables/files, shortcuts, schema, optimization, medallion patterns
IngestionPractice full versus incremental loads, CDC/watermarks, copy activity, source constraints
TransformationCompare Spark, SQL, and Dataflows Gen2 decisions
Security and governanceSeparate workspace, item, SQL, model, and source permissions
Monitoring and troubleshootingDiagnose failed runs, slow jobs, capacity issues, and stale data
End-to-end scenariosCombine design, implementation, security, and operations in one case

Review method

For each missed question:

  1. Write down the requirement you overlooked.
  2. Identify the Fabric item or feature the question was really testing.
  3. Note the wrong answer pattern that tempted you.
  4. Re-answer a similar topic drill before moving on.
  5. Read the detailed explanations, including why the distractors are wrong.

The goal is not memorizing answer letters. The goal is building a repeatable decision process for DP-700 scenarios.

Final readiness checklist

You are ready for heavier mock exam practice when you can quickly explain:

  • When to use a Lakehouse instead of a Warehouse.
  • When a pipeline should orchestrate rather than transform.
  • When Dataflow Gen2 is preferable to a notebook.
  • How shortcuts differ from copying data.
  • How to design incremental loads without skipping records.
  • How Delta tables, partitioning, compaction, and schema handling affect performance and reliability.
  • How workspace roles differ from item, SQL, source, and model-level permissions.
  • How Git integration and deployment pipelines support lifecycle management.
  • Where to look when a pipeline, dataflow, notebook, SQL query, or capacity is failing.
  • How to choose the simplest Fabric pattern that satisfies the scenario.

Practical next step

Use this Quick Review as your final scan, then start DP-700 topic drills in an IT Mastery question bank. Focus first on item-selection and ingestion scenarios, then move into mixed mock exams with detailed explanations so you can practice the same decision process under exam-like timing.

Continue in IT Mastery

Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official Microsoft questions, copied live-exam content, or exam dumps.

Browse Certification Practice Tests by Exam Family