DP-700 — Microsoft Fabric Data Engineer Associate Quick Review
Quick Review for Microsoft DP-700 Fabric Data Engineer Associate candidates: key concepts, traps, decisions, and practice focus.
Quick Review purpose
This Quick Review is for candidates preparing for Microsoft Microsoft Fabric Data Engineer Associate (DP-700), exam code DP-700. Use it to refresh high-yield concepts before moving into topic drills, mock exams, and detailed explanations.
The DP-700 exam is not just a syntax test. Expect scenario-style decisions about how to design, ingest, transform, secure, monitor, and optimize data engineering solutions in Microsoft Fabric. The strongest candidates can explain why a Fabric item or pattern is the right fit, not only what it is called.
This page supports IT Mastery practice with original practice questions. It is not affiliated with Microsoft.
DP-700 mental model
Think in layers. Most exam scenarios can be solved by identifying the layer being tested.
| Layer | What to recognize quickly | Common exam angle |
|---|---|---|
| Workspace and capacity | Workspaces contain Fabric items; capacity affects performance and throttling | Choose workspace roles, deployment approach, monitoring location |
| OneLake storage | Unified storage layer; lakehouse tables use open Delta/Parquet patterns | Avoid unnecessary copies, use shortcuts, manage files and tables |
| Ingestion | Copy activity, Dataflows Gen2, pipelines, mirroring, shortcuts | Choose between low-code, orchestration, bulk copy, or no-copy access |
| Transformation | Spark notebooks, Spark SQL, Dataflows Gen2, Warehouse T-SQL | Match transformation complexity to the right engine |
| Serving | Lakehouse SQL analytics endpoint, Warehouse, semantic model, Direct Lake patterns | Decide how data should be queried or consumed |
| Security and governance | Workspace roles, item permissions, SQL permissions, labels, lineage, Git/deployment | Separate access control, collaboration, and deployment concerns |
| Operations | Monitoring hub, run history, Spark UI/logs, capacity metrics, query diagnostics | Troubleshoot failures, optimize performance, control cost/capacity pressure |
High-yield Fabric item decisions
| Requirement in the scenario | Usually points to | Why |
|---|---|---|
| Store raw, curated, and analytics-ready data in open formats | Lakehouse | Best fit for Delta tables, Spark processing, medallion architecture |
| Build a relational data warehouse with T-SQL transformations and SQL serving | Warehouse | Strong fit for SQL-centric data engineering and BI serving |
| Orchestrate multiple steps with dependencies, parameters, retries, and schedules | Data pipeline | Pipelines coordinate work; they are not usually the heavy transformation engine |
| Perform low-code shaping, cleansing, and Power Query transformations | Dataflow Gen2 | Good for analysts/data engineers who need repeatable low-code ETL |
| Perform complex code-based transforms, custom logic, ML-adjacent preparation, or Spark-scale processing | Notebook | Gives PySpark, Spark SQL, and code-level control |
| Access existing data without copying it | Shortcut | Logical access to data; useful when duplication is not required |
| Incrementally ingest changed records from a source | Pipeline plus watermark/CDC logic, often followed by MERGE | Avoid repeated full loads when only changes are needed |
| Replicate supported operational data into Fabric for analytics with minimal ETL | Mirroring | Useful when the source and latency requirements match the feature |
| Serve curated SQL tables to reporting users | Warehouse or Lakehouse SQL endpoint, depending on write/query needs | SQL endpoint is useful for querying lakehouse tables; Warehouse is better for SQL DML/warehouse design |
Fast elimination rule
If the question says:
| Wording | Think first |
|---|---|
| “Schedule,” “retry,” “dependency,” “parameterize activities” | Pipeline |
| “Low-code,” “Power Query,” “combine and clean data visually” | Dataflow Gen2 |
| “PySpark,” “custom library,” “large-scale transformation” | Notebook |
| “T-SQL warehouse,” “stored procedure,” “SQL DML” | Warehouse |
| “No data duplication,” “use existing data in place” | Shortcut |
| “Upsert,” “changed rows,” “incremental load” | Watermark/CDC plus MERGE |
| “Files are visible but not queryable as tables” | Register/create Delta tables or place data correctly as tables |
| “Users can open workspace but should not see all rows” | Workspace role alone is not enough; use item/SQL/model-level security as appropriate |
OneLake, lakehouses, and Delta tables
What to know
OneLake is the storage foundation for Fabric. Lakehouses organize data for data engineering workloads and expose data through both file/table structures and SQL query surfaces.
| Concept | Review point | Trap |
|---|---|---|
| Files area | Good for raw or unstructured files | Files are not automatically the same as managed queryable tables |
| Tables area | Delta tables used for structured analytics | Table metadata and format matter; random files do not equal a governed table |
| Delta Lake | Transaction log, ACID-style table operations, schema handling, time travel concepts | Treating Delta as “just Parquet files” misses transaction and metadata behavior |
| Shortcuts | Logical references to data stored elsewhere | Shortcuts reduce copying but do not remove the need to understand permissions and source behavior |
| Medallion pattern | Bronze raw, silver cleaned, gold curated | It is an architecture pattern, not a substitute for clear security, quality, and lifecycle rules |
| Schema evolution | Controlled handling of changing columns/types | Blind schema drift can break downstream tables, reports, or queries |
| Small files | Too many tiny files hurt query performance | Compact/optimize instead of only adding more partitions |
| Partitioning | Helps when queries filter by partition columns | Over-partitioning high-cardinality columns can make performance worse |
Bronze, silver, gold review
| Layer | Purpose | Typical operations |
|---|---|---|
| Bronze | Preserve source-like data | Copy/load, append, basic metadata capture, source audit columns |
| Silver | Clean and standardize | Type conversion, deduplication, null handling, conforming names, CDC application |
| Gold | Serve business-ready analytics | Aggregation, dimensional modeling, star-schema-style tables, reporting-ready facts/dimensions |
Common mistake: candidates choose a gold-layer serving pattern for raw ingestion requirements. Read whether the scenario asks for landing, cleansing, conforming, or serving.
Lakehouse versus Warehouse
| Decision point | Lakehouse | Warehouse |
|---|---|---|
| Main strength | Open data engineering with Delta and Spark | SQL-centric relational data warehousing |
| Transformation style | Spark notebooks, Spark SQL, Dataflows Gen2, pipelines writing to lakehouse | T-SQL, SQL objects, stored procedures, warehouse modeling |
| Best for | Medallion architecture, open data lake patterns, mixed file/table workloads | Curated relational warehouse, SQL users, dimensional reporting |
| Query surface | SQL analytics endpoint for lakehouse tables | Warehouse SQL endpoint with stronger SQL DML orientation |
| Write expectation | Often write through Spark, pipelines, or dataflows | Write and transform with T-SQL patterns |
| Exam trap | Assuming the lakehouse SQL endpoint is the same as a full warehouse write engine | Using a warehouse when the requirement is open lake storage and Spark processing |
A practical rule: if the requirement emphasizes Delta tables, notebooks, open files, and Spark, think Lakehouse. If it emphasizes T-SQL transformations, relational warehouse objects, and SQL-first serving, think Warehouse.
Ingestion patterns
Choose the right ingestion method
| Scenario | Strong option | Why |
|---|---|---|
| Move data from a source into Fabric on a schedule | Pipeline with Copy activity | Built for orchestrated movement |
| Clean and reshape data with low-code transformations | Dataflow Gen2 | Power Query-style data preparation |
| Ingest only new or changed rows | Pipeline with parameters/watermarks, CDC if available, then merge/upsert | Reduces load volume and avoids full reloads |
| Access data already stored in a supported external lake | Shortcut | Avoids duplicate storage and repeated copy jobs |
| Need complex parsing, enrichment, or custom libraries | Notebook | Code control and Spark scale |
| Need multiple activities with failure handling | Pipeline | Dependencies, conditions, retries, parameters |
| Need SQL-based transformation after load | Warehouse SQL or Spark SQL depending on target | Keep transformations close to the serving/storage design |
Incremental load essentials
For incremental ingestion, look for:
- A reliable change indicator, such as modified timestamp, increasing key, version, or CDC feed.
- A stored watermark from the last successful run.
- A cutoff value for the current run.
- A load step that brings only the eligible changes.
- An upsert/merge step into the target.
- Audit handling for failed runs so the watermark is not advanced incorrectly.
Common trap: updating the watermark before the target write succeeds. If the run fails after extraction but before merge, advancing the watermark can skip data.
Full load versus incremental load
| Use full load when | Use incremental load when |
|---|---|
| Dataset is small | Dataset is large |
| Source lacks reliable change tracking | Source provides modified date, CDC, or versioning |
| Reload is simple and cheap | Reload would exceed time, capacity, or cost expectations |
| Target can be safely overwritten | Target must preserve history or avoid disruption |
| Data freshness requirements are loose | Frequent refresh is required |
Transformation review
Transformation tool selection
| Need | Better fit | Watch for |
|---|---|---|
| Simple column selection, filtering, type changes | Dataflow Gen2 | Query folding and source limitations |
| Reusable low-code data preparation | Dataflow Gen2 | Destination settings and refresh behavior |
| Complex business rules at scale | Notebook | Spark performance, partitioning, shuffle, code quality |
| SQL warehouse transformations | Warehouse T-SQL | Do not apply SQL Server tuning assumptions blindly |
| Delta upsert into lakehouse table | Spark SQL/PySpark MERGE pattern | Correct keys and deduplication before merge |
| Orchestrate several transformations | Pipeline | Pipeline coordinates; heavy work should run in the right engine |
| Data quality checks | Notebook, SQL, or dataflow depending on design | Fail fast and log rejected records when required |
Common transformation mistakes
| Mistake | Better approach |
|---|---|
| Doing every transformation in a pipeline | Use pipelines for orchestration and call notebooks, dataflows, or SQL as needed |
| Using full overwrite when only a few rows changed | Use incremental load and merge/upsert |
| Partitioning by a unique ID | Partition by columns commonly used for pruning, often date or region-like columns |
| Ignoring duplicate keys before MERGE | Deduplicate and define deterministic conflict rules |
| Letting schema drift silently break downstream models | Validate schema and handle expected changes explicitly |
| Optimizing only compute but ignoring file layout | Optimize table layout, file sizes, and filters |
SQL, Spark, and Delta quick reminders
Spark/notebook patterns
Know when notebooks are appropriate:
- Custom PySpark transformations.
- Large-scale joins and aggregations.
- Data cleansing that requires code.
- Delta table maintenance.
- Reusable engineering notebooks triggered by a pipeline.
- Exploratory validation before productionizing a pipeline.
Performance traps:
| Symptom | Likely cause | Review response |
|---|---|---|
| Slow join | Large shuffle, skewed key, unnecessary columns | Filter early, select only needed columns, consider join strategy |
| Slow reads | Poor partitioning, many small files, no predicate pruning | Optimize layout and query filters |
| Slow writes | Too many output files or poor partition choice | Control repartitioning and table maintenance |
| Repeated expensive computation | Recomputing same intermediate data | Cache only when reused and beneficial |
| Job fails after schema change | Schema mismatch | Add explicit schema management and validation |
SQL patterns
For Warehouse-oriented questions, expect SQL design and operations:
| Pattern | Use when |
|---|---|
| CTAS-style creation | Building transformed tables from query results |
| Views | Abstracting query logic or serving controlled projections |
| Stored procedures | Encapsulating repeatable SQL transformations |
| MERGE/upsert | Applying changes from staging to target |
| Staging tables | Loading and validating before applying to curated tables |
| Star schema | Serving facts and dimensions for analytics |
Common trap: assuming every SQL Server feature or index-tuning habit maps directly to Fabric Warehouse. Focus on Fabric-appropriate table design, query shape, data volume reduction, and monitoring.
Security and governance review
Access control layers
| Layer | What it controls | Candidate trap |
|---|---|---|
| Workspace roles | Collaboration and broad access within a workspace | Workspace access is not the same as row-level data security |
| Item permissions | Access to specific Fabric items | Sharing an item may not grant every downstream data permission |
| SQL permissions | Database/warehouse object access | SQL permissions can differ from workspace collaboration roles |
| Semantic model security | RLS/OLS-style report consumption controls | Model security does not automatically secure raw lake files |
| Source permissions | Access to shortcut or external data source | A shortcut does not magically bypass source governance |
| Credentials/connections | How Fabric authenticates to sources | Do not embed secrets in notebooks or hard-code credentials |
Governance concepts to review
| Concept | Why it matters |
|---|---|
| Lineage | Understand upstream/downstream impact before changing tables, pipelines, or models |
| Sensitivity labels | Communicate and enforce data classification expectations |
| Endorsement/certification of assets | Helps users identify trusted assets |
| Git integration | Version control for supported Fabric items |
| Deployment pipelines | Promote content across dev/test/prod-style stages |
| Parameters and environment-specific settings | Avoid hard-coding workspace IDs, connection details, or paths |
| Least privilege | Grant only the access required for the user, service, or process |
Security decision traps
- Do not solve row-level restrictions by only assigning a Viewer workspace role.
- Do not use broad workspace Admin access for routine pipeline execution.
- Do not assume a user who can see a report should also access the raw lakehouse.
- Do not hard-code credentials in notebooks or scripts.
- Do not forget downstream access when sharing a report, SQL endpoint, or semantic model.
Deployment and lifecycle
DP-700 scenarios may test whether you can move a Fabric solution safely from development to production.
| Requirement | Review response |
|---|---|
| Track changes to notebooks, pipelines, or other supported items | Use Git integration where supported |
| Promote content between environments | Use deployment pipelines |
| Use different connections in dev/test/prod | Parameterize and remap settings during deployment |
| Avoid breaking production | Test in lower environment and validate dependencies |
| Understand impact of table changes | Use lineage and dependency review |
| Repeat infrastructure/configuration consistently | Use documented deployment patterns and avoid manual-only changes |
Common mistake: treating deployment as only copying an item. Real deployment also includes connections, permissions, parameters, schedules, and downstream dependencies.
Monitoring and troubleshooting
Where to look first
| Problem | First checks |
|---|---|
| Pipeline failed | Run history, failed activity output, linked connection, parameters, source schema, permissions |
| Copy activity slow | Source throughput, network/gateway constraints, partitioning, file count, parallelism settings |
| Dataflow refresh failed | Step error, credentials, schema changes, query folding, destination configuration |
| Notebook failed | Spark logs, cell output, package/library issues, permissions, table path, schema conflict |
| Warehouse query slow | Query shape, filters, joins, data volume, table design, monitoring/query diagnostics |
| Capacity throttling or delays | Capacity metrics, concurrency, background jobs, refresh schedules |
| Users cannot access data | Workspace role, item permission, SQL permission, source/shortcut permission, semantic model permissions |
| Report/semantic model stale | Upstream pipeline status, refresh history, Direct Lake/semantic model configuration, table update timing |
Optimization levers
| Goal | Practical levers |
|---|---|
| Read less data | Select only required columns, filter early, use partition pruning |
| Move less data | Use incremental loads, shortcuts, and staging only when needed |
| Write better data | Use Delta tables, appropriate file sizes, compaction/optimization patterns |
| Reduce Spark cost | Avoid unnecessary shuffles, handle skew, cache selectively |
| Improve SQL serving | Model for common queries, avoid SELECT *, reduce joins where practical |
| Reduce failures | Add validation, retries where appropriate, idempotent loads, clear audit logs |
| Control capacity pressure | Stagger schedules, manage concurrency, monitor capacity usage |
Delta table maintenance concepts
| Concept | Purpose | Trap |
|---|---|---|
| Optimize/compaction | Reduce small-file overhead | Not a substitute for good ingestion design |
| V-Order-style optimization | Improve read performance for analytics workloads | Helps reads but does not fix incorrect logic |
| Vacuum | Remove old files no longer needed by retention rules | Can affect time travel/history expectations |
| Schema enforcement | Prevent unexpected incompatible writes | May require planned schema evolution |
| Time travel/history | Useful for audit and recovery scenarios | Retention and cleanup policies matter |
Common DP-700 scenario traps
| Trap | Why it is wrong | Better thinking |
|---|---|---|
| “Use a notebook for everything” | Not every task needs custom code | Use pipelines for orchestration, dataflows for low-code, warehouse for SQL |
| “Use a pipeline for all transformations” | Pipelines coordinate work; heavy transforms belong elsewhere | Pipeline calls the right engine |
| “Copy data even when a shortcut would work” | Duplicates storage and introduces sync complexity | Use shortcuts when no-copy access meets requirements |
| “Use full refresh for a large changing source” | Wastes time and capacity | Use incremental ingestion and merge |
| “Grant workspace Admin to fix access” | Over-permissive and risky | Diagnose the correct permission layer |
| “Partition by high-cardinality column” | Creates too many partitions and small files | Partition by useful pruning columns |
| “Ignore failed-run watermark behavior” | Can skip records | Advance watermark only after successful target update |
| “Assume SQL endpoint equals Warehouse” | Lakehouse and Warehouse have different write/serving patterns | Match engine to requirement |
| “Optimize compute before data layout” | Bad layout can dominate performance | Fix file sizes, filters, partitions, and table design |
| “Promote items without remapping connections” | Dev settings can leak into prod | Parameterize and validate deployment settings |
Quick decision checklist for exam questions
Before selecting an answer, identify:
- Target storage: lakehouse, warehouse, external source through shortcut, or semantic model.
- Transformation style: low-code, Spark/code, SQL, or orchestration-only.
- Load pattern: full, incremental, CDC, streaming/near-real-time, or no-copy.
- Security boundary: workspace, item, SQL object, semantic model, or source system.
- Operational requirement: schedule, retry, monitoring, deployment, lineage, or capacity optimization.
- Performance issue: compute, query shape, file layout, partitioning, source throughput, or concurrency.
- Failure behavior: idempotency, watermark handling, duplicate handling, and auditability.
If two answers seem plausible, prefer the one that satisfies the requirement with the least unnecessary complexity.
Mini review prompts
Use these as a quick readiness check before starting a DP-700 question bank.
| Prompt | Best answer direction |
|---|---|
| You need to orchestrate a copy, run a notebook, and then execute a SQL step on a schedule | Pipeline |
| You need low-code cleansing using Power Query-style steps | Dataflow Gen2 |
| You need no-copy access to data already stored in a supported location | Shortcut |
| You need complex PySpark transformations and Delta table maintenance | Notebook |
| You need a SQL-first curated dimensional store | Warehouse |
| You need to apply only changed records from a source table | Incremental load with watermark/CDC and merge/upsert |
| A lakehouse has raw files but SQL users cannot query them as tables | Create/register proper tables or write to the Tables area in the correct format |
| A user can access a workspace but should only see certain rows | Apply the appropriate data/model-level security, not only a workspace role |
| A pipeline skipped records after a failed load | Review watermark update timing and idempotency |
| A Spark job is slow after thousands of tiny files were created | Compact/optimize table layout and review write pattern |
How to use IT Mastery practice effectively
After this Quick Review, move into original practice questions in focused sets rather than immediately taking a full mock exam.
Suggested topic drill order
| Drill | Goal |
|---|---|
| Fabric item selection | Build fast recognition of lakehouse, warehouse, pipeline, dataflow, notebook, shortcut |
| OneLake and Delta | Review tables/files, shortcuts, schema, optimization, medallion patterns |
| Ingestion | Practice full versus incremental loads, CDC/watermarks, copy activity, source constraints |
| Transformation | Compare Spark, SQL, and Dataflows Gen2 decisions |
| Security and governance | Separate workspace, item, SQL, model, and source permissions |
| Monitoring and troubleshooting | Diagnose failed runs, slow jobs, capacity issues, and stale data |
| End-to-end scenarios | Combine design, implementation, security, and operations in one case |
Review method
For each missed question:
- Write down the requirement you overlooked.
- Identify the Fabric item or feature the question was really testing.
- Note the wrong answer pattern that tempted you.
- Re-answer a similar topic drill before moving on.
- Read the detailed explanations, including why the distractors are wrong.
The goal is not memorizing answer letters. The goal is building a repeatable decision process for DP-700 scenarios.
Final readiness checklist
You are ready for heavier mock exam practice when you can quickly explain:
- When to use a Lakehouse instead of a Warehouse.
- When a pipeline should orchestrate rather than transform.
- When Dataflow Gen2 is preferable to a notebook.
- How shortcuts differ from copying data.
- How to design incremental loads without skipping records.
- How Delta tables, partitioning, compaction, and schema handling affect performance and reliability.
- How workspace roles differ from item, SQL, source, and model-level permissions.
- How Git integration and deployment pipelines support lifecycle management.
- Where to look when a pipeline, dataflow, notebook, SQL query, or capacity is failing.
- How to choose the simplest Fabric pattern that satisfies the scenario.
Practical next step
Use this Quick Review as your final scan, then start DP-700 topic drills in an IT Mastery question bank. Focus first on item-selection and ingestion scenarios, then move into mixed mock exams with detailed explanations so you can practice the same decision process under exam-like timing.
Continue in IT Mastery
Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official Microsoft questions, copied live-exam content, or exam dumps.