Microsoft DP-700 Practice Test: Fabric Data Engineer

Practice Microsoft DP-700 Fabric Data Engineer Associate with free sample questions, timed mock exams, topic drills, and detailed explanations in IT Mastery.

DP-700 is Microsoft Certified: Fabric Data Engineer Associate for candidates who design and deploy Fabric data engineering solutions across ingestion, transformation, analytics management, security, monitoring, optimization, SQL, PySpark, and KQL.

IT Mastery practice for DP-700 is live now. Use this page to start the web simulator, review the exam snapshot, work through 24 public sample questions, and continue into the full question bank with the same account on web, iOS, iPadOS, macOS, or Android.

Interactive Practice Center

Start a practice session for Microsoft Fabric Data Engineer Associate (DP-700) below, or open the full app in a new tab. For the best experience, open the full app in a new tab and navigate with swipes/gestures or the mouse wheel—just like on your phone or tablet.

Open Full App in a New Tab

A small set of questions is available for free preview. Subscribers can unlock full access by signing in with the same account they use on web and mobile.

Prefer to practice on your phone or tablet? Download the IT Mastery – AWS, Azure, GCP & CompTIA exam prep app for iOS or IT Mastery app on Google Play (Android) and use the same account across web and mobile.

What this DP-700 practice page gives you

  • a direct route into the live IT Mastery simulator for DP-700
  • 24 on-page sample questions with detailed explanations
  • topic drills and mixed sets across Fabric analytics solutions, ingestion, transformation, monitoring, and optimization
  • a clear free-preview path before you subscribe
  • the same account across web and mobile

Who DP-700 is for

  • Fabric data engineers responsible for loading, transforming, securing, monitoring, and optimizing analytics solutions
  • candidates who work with SQL, PySpark, KQL, lakehouses, warehouses, pipelines, and analytics architecture
  • teams comparing Microsoft Fabric with Databricks, Snowflake, Azure data, or broader data-engineering routes

DP-700 exam snapshot

  • Issuer: Microsoft
  • Official certification name: Microsoft Certified: Fabric Data Engineer Associate
  • Exam code: DP-700
  • Product: Microsoft Fabric
  • Exam time shown by Microsoft Learn: 100 minutes
  • Current IT Mastery status: live practice available

Topic coverage for DP-700

Area assessed by MicrosoftPractical focus
Implement and manage an analytics solutionBuild, secure, and manage Fabric analytics assets.
Ingest and transform dataUse pipelines, transformations, SQL, PySpark, KQL, and data-loading patterns.
Monitor and optimize an analytics solutionImprove reliability, performance, and operational visibility.

How to use the DP-700 simulator efficiently

  1. Start with Fabric architecture and asset-management drills so lakehouse, warehouse, pipeline, semantic model, and workspace boundaries are clear.
  2. Review every miss until you can explain why the best answer fits the ingestion pattern, transformation method, security boundary, or performance constraint.
  3. Move into mixed sets once you can switch between SQL, PySpark, KQL, Data Factory-style orchestration, monitoring, and optimization without losing the prompt’s priority.
  4. Finish with timed runs so Fabric data-engineering choices stay precise under exam pressure.

Free preview vs premium

  • Free preview: a smaller web set so you can validate the question style and explanation depth.
  • Premium: the full DP-700 practice bank, focused drills, mixed sets, timed mock exams, detailed explanations, and progress tracking across web and mobile.

Good next pages after DP-700

Official sources

24 DP-700 sample questions with detailed explanations

These sample questions are drawn from the current local bank for this exact exam code. Use them to check your readiness here, then continue into the full IT Mastery question bank for broader timed coverage.

Question 1

Topic: General

A sales analytics workspace has a Fabric pipeline that runs nightly. The pipeline triggers Dataflows Gen2 to stage source data, runs a Warehouse stored procedure to update fact tables, and then starts downstream refresh processing. Monitoring shows Dataflows Gen2 completes in 8 minutes and pipeline wait time is negligible. The stored procedure spends 70 minutes executing one T-SQL aggregation query in the Warehouse. You must reduce the nightly duration. What should you do next?

  • A. Keep orchestration and optimize the Warehouse T-SQL query
  • B. Replace Dataflows Gen2 with a notebook ingestion step
  • C. Add event-based triggers for each source file
  • D. Split ingestion into parallel pipeline branches

Best answer: A

Explanation: The evidence isolates the delay to Warehouse query execution, not ingestion or orchestration. The best next step is to keep the pipeline pattern and optimize the long-running Warehouse T-SQL query or its supporting table design. When monitoring shows that Dataflows Gen2 and pipeline waits are already fast, changing the orchestration layer is unlikely to reduce the critical path. In this scenario, the Warehouse stored procedure dominates the run time, and the expensive work is a T-SQL aggregation query. Use Warehouse performance evidence, such as query insights and execution characteristics, to tune the query, reduce unnecessary scans or joins, and adjust the warehouse design if needed. Orchestration changes help when the schedule, dependency chain, or trigger strategy is the bottleneck; they do not make a slow Warehouse query execute faster.


Question 2

Topic: General

Your team is building a nightly finance publish process in Microsoft Fabric. Raw payroll files are stored in a Lakehouse item in a restricted workspace. Analysts in a shared workspace must receive only approved curated tables and must not gain access to the raw item or the restricted workspace. Which orchestration pattern should you use?

  • A. Create a Dataflows Gen2 item in the shared workspace using an admin-owned connection to the raw Lakehouse.
  • B. Share the raw OneLake path with analysts and filter allowed folders by notebook parameters.
  • C. Schedule a pipeline to run a restricted-workspace notebook that writes only curated output to a secured Lakehouse item.
  • D. Create a shortcut in the shared workspace directly to the raw payroll folder.

Best answer: C

Explanation: Use an orchestrated publish pattern that processes raw data under restricted controls and exposes only curated output. OneLake security should refine access to the approved target item, not provide a way around source workspace or item permissions. For restricted OneLake data, orchestration should preserve the security boundary around the raw Lakehouse. A scheduled pipeline can run a notebook in the restricted workspace by using a least-privileged execution identity, transform or filter the raw payroll data, and write only approved results to a curated Lakehouse item. Analysts should then receive access to the curated item and any applicable OneLake data access roles for the approved folders or tables. This avoids giving analysts raw workspace access, raw item access, or a reusable credential that can read the source files.


Question 3

Topic: General

A Fabric pipeline named pl_LoadTelemetry is run by service principal spn-orchestrator. The service principal has Read and Execute permissions on the pipeline item, but it is not a member of the workspace. The first activity fails.

Activity: Notebook - nb_NormalizeTelemetry
Status: Failed
Message: 403 Forbidden. Principal does not have access to execute item nb_NormalizeTelemetry.
Pipeline item authorization: Succeeded

Which permission change should you make with least privilege?

  • A. Grant Read and Execute on nb_NormalizeTelemetry.
  • B. Grant ReadData on the destination lakehouse.
  • C. Add spn-orchestrator as a workspace Viewer.
  • D. Enable high concurrency for the notebook.

Best answer: A

Explanation: The evidence shows pipeline authorization succeeded, but the notebook activity failed with a 403 on the notebook item. The least-privilege fix is to grant the service principal the required item-level permissions on the notebook it executes. Fabric access can be enforced at both the workspace and item levels. In this case, the service principal already has permissions to run the pipeline item, so the root cause is not pipeline access. The failure names the notebook item and says the principal cannot execute it. Granting the required Read and Execute permissions on nb_NormalizeTelemetry addresses the specific missing item permission without broad workspace access. The key troubleshooting step is to match the failed operation and item in the error message to the permission being checked.


Question 4

Topic: General

A retail workspace loads daily order files into a Lakehouse by using a Dataflow Gen2 refresh and then a notebook transformation. They are currently scheduled independently, and support tickets show that late files sometimes cause stale downstream data. Requirements are: alert support if no successful publish occurs by 06:00, block publishing when duplicate OrderId values exist, and provide one run record with the business date and failed step. Which orchestration change best satisfies the monitoring coverage requirements?

  • A. Keep the schedules and monitor only Dataflow Gen2 refresh history.
  • B. Schedule the notebook more frequently and log duplicate counts.
  • C. Use a parameterized Fabric pipeline with event and deadline checks.
  • D. Route the files through Eventstreams and monitor Eventhouse metrics.

Best answer: C

Explanation: The requirements span freshness, correctness, and operational support across multiple Fabric items. A parameterized Fabric pipeline is the best orchestration boundary because it can coordinate the Dataflow Gen2 refresh, notebook transformation, validation gates, deadline checks, and run-level support evidence. For end-to-end monitoring coverage, use a Fabric pipeline as the orchestrator. The pipeline can be triggered when the source file arrives and can also include a scheduled deadline check for the 06:00 freshness requirement. It can pass a business date parameter to the Dataflow Gen2 and notebook activities, evaluate duplicate-key validation output, fail the run before publishing, and record the failed step in one pipeline run history. This gives support a single operational view instead of separate item histories. The key takeaway is that monitoring coverage should match the business requirement, not just whether an individual item refreshed successfully.


Question 5

Topic: General

A Dataflow Gen2 loads daily customer CSV files into a Fabric Lakehouse table. The dataflow usually succeeds, but it fails when the source owner omits nullable MiddleName or adds extra optional columns. The query includes an auto-generated Changed Type step for all detected source columns. You must keep loading the same curated table without manual edits for optional schema changes. What is the best improvement?

  • A. Add a pipeline retry policy around the dataflow refresh.
  • B. Replace the Lakehouse table on every refresh.
  • C. Increase the workspace capacity for the dataflow run.
  • D. Project target columns with MissingField.UseNull before type steps.

Best answer: D

Explanation: The failure is caused by a fixed schema assumption in the Dataflow Gen2 Power Query steps. Projecting the expected curated columns and handling missing nullable fields makes the refresh resilient to optional source schema drift while preserving the same Lakehouse target. Dataflow Gen2 uses Power Query steps, and auto-generated steps such as Changed Type can fail when they reference a column that is no longer present. For a curated Lakehouse table, shape the dataflow output deliberately: select the target columns, use missing-field handling such as MissingField.UseNull for nullable fields, then apply type conversions to that stable output schema. Extra source columns are ignored because they are not projected, and omitted nullable columns are supplied as nulls. Retries or capacity changes do not resolve a deterministic schema mismatch in the query logic.


Question 6

Topic: General

A retail company streams point-of-sale device telemetry into a Fabric Eventhouse. You need to detect failed card reader events per store in five-minute intervals and let analysts drill into the results with near real-time latency. The solution must keep the data in Real-Time Intelligence and avoid batch-copying the events to another Fabric item.

Which implementation should you use?

  • A. Create a T-SQL view in a Warehouse over imported event data.
  • B. Create a KQL query in the Eventhouse using summarize and bin().
  • C. Create a Dataflow Gen2 refresh that groups events every 30 minutes.
  • D. Run a scheduled PySpark notebook against a Lakehouse copy of the events.

Best answer: B

Explanation: KQL is the best fit for analyzing data that already resides in an Eventhouse. It supports high-volume, near real-time event exploration and time-window aggregation while keeping the workload in Real-Time Intelligence. For Eventhouse data, KQL is the primary implementation choice for real-time analytical queries. The requirement is to aggregate telemetry failures by store over five-minute intervals and support near real-time drilldown without copying data to a Warehouse or Lakehouse. A KQL query can filter failed events, group by store, and use bin() on the event timestamp to create the five-minute windows directly in the Eventhouse. This preserves the Real-Time Intelligence architecture and avoids introducing batch latency. T-SQL, Dataflows Gen2, and PySpark can be valid Fabric transformation tools, but they fit different storage and processing patterns than direct Eventhouse analysis.


Question 7

Topic: General

You are designing a Fabric pipeline that loads a 4-TB Orders table from an operational database into a Lakehouse Delta table. The source receives frequent inserts and updates, exposes a reliable ModifiedAtUtc column, and does not need hard-delete propagation. Analysts need data within 30 minutes, and failed reruns must not duplicate rows or require reloading all history. Which loading pattern should you use?

  • A. Use a watermark-based incremental pipeline with MERGE and post-success watermark updates.
  • B. Schedule a full truncate-and-reload pipeline every 30 minutes.
  • C. Append each 30-minute extract directly to the Lakehouse table.
  • D. Create a OneLake shortcut and enable Query acceleration.

Best answer: A

Explanation: A watermark-based incremental load fits high volume, frequent changes, and a 30-minute freshness target. Using MERGE handles updates without creating duplicate rows, and updating the watermark only after a successful load supports safe recovery after failures. For a large table with frequent inserts and updates, a full reload is inefficient and increases recovery risk. A Fabric pipeline can store the last successful ModifiedAtUtc watermark, extract rows changed since that value, stage them, and apply them to the Lakehouse Delta table with an upsert such as MERGE. The watermark should be advanced only after the target commit succeeds, so a failed run can be retried without skipping changes. This is the key distinction from a simple append pattern, which can duplicate updated or retried rows.


Question 8

Topic: General

A Fabric workspace named Finance-Prod contains a lakehouse with raw payroll files, curated sales tables, and internal notebooks. Data engineers must administer all workspace items. The SalesReaders security group must read only the curated sales table and a public reference folder through OneLake shortcuts from another workspace, and must not browse other Finance-Prod items or raw folders. Which design should you implement?

  • A. Add SalesReaders as workspace Viewers.
  • B. Share the lakehouse with ReadAll permissions.
  • C. Grant Contributor in the shortcut workspace.
  • D. Use minimal lakehouse item access plus OneLake data access roles.

Best answer: D

Explanation: The best design uses layered Fabric security. Workspace roles are too broad for this consumer group, while OneLake data access roles can restrict read access to specific lakehouse tables or folders after the item is exposed appropriately. OneLake security decisions should separate workspace, item, and data-level access. Data engineers can remain workspace members so they can administer all items. SalesReaders should not be added to the Finance-Prod workspace because that would expose workspace contents beyond the stated need. Instead, grant only the lakehouse item access required for consumption, then use OneLake data access roles to allow the group to read only the curated sales table and public reference folder paths. This supports the shortcut-based read pattern without granting access to raw payroll folders or unrelated notebooks. The key takeaway is to avoid using broad workspace or all-data item permissions when the requirement is path-level OneLake access.


Question 9

Topic: General

You are building a Microsoft Fabric pipeline for batch ingestion. An Azure SQL manifest table contains one row for each file that is ready to load, with FileUrl and TargetFolder columns. The number of rows varies on each run. Each listed CSV file must be copied to Files/raw/{TargetFolder} in a Lakehouse. You must avoid notebook code and hard-coded file names. Which implementation should you use?

  • A. Use a Dataflow Gen2 activity with fixed mappings.
  • B. Use Lookup, ForEach, and parameterized Copy data activities.
  • C. Use one Copy data activity from a static source folder.
  • D. Use a Script activity to run T-SQL COPY INTO.

Best answer: B

Explanation: The requirement is a variable, manifest-driven file copy into Lakehouse Files without code. A Lookup activity can return the ready rows, ForEach can iterate them, and Copy data can use dynamic expressions for each source and destination path. Fabric data pipelines are appropriate for low-code batch movement into Fabric destinations. For manifest-driven ingestion, use Lookup to read the ready file list, ForEach to process each manifest row, and Copy data inside the loop to move the file. The Copy data source and Lakehouse Files destination can be parameterized with dynamic content from the current ForEach item, so each file lands in Files/raw/{TargetFolder} without hard-coded file names or notebook code. A single static copy does not honor the manifest-driven routing requirement.


Question 10

Topic: General

You are designing transformations in a Fabric workspace. Two new workloads must be implemented:

  • Business users must clean vendor CSV files and load standardized rows into a Lakehouse by using a low-code, Power Query-based experience.
  • An operations dashboard must aggregate device telemetry that is already stored in an Eventhouse by five-minute windows.

Which TWO transformation choices should you use? Select TWO.

  • A. Use T-SQL for the device telemetry workload.
  • B. Use Dataflows Gen2 for the vendor CSV workload.
  • C. Use notebooks for the vendor CSV workload.
  • D. Use KQL for the device telemetry workload.
  • E. Use KQL for the vendor CSV workload.
  • F. Use Dataflows Gen2 for the device telemetry workload.

Best answer: B

Explanation: The correct choices match each transformation engine to the workload style. Dataflows Gen2 fits low-code, Power Query-based batch cleansing into a Lakehouse. KQL fits Eventhouse telemetry analysis and time-window aggregations in Real-Time Intelligence. The core concept is selecting the transformation tool by workload. Dataflows Gen2 is appropriate when users need a low-code Power Query experience for cleansing, shaping, and loading batch data to Fabric destinations such as a Lakehouse. KQL is appropriate for Eventhouse data because Eventhouses are optimized for real-time and time-series analytics, including filtering, aggregation, and windowing over telemetry. Notebooks are better for custom PySpark or Python transformations, and T-SQL is better for Warehouse or SQL-based transformations rather than native Eventhouse telemetry work. The key takeaway is to match the transformation language and experience to both the data store and the maintenance requirement.


Question 11

Topic: General

You are designing ingestion for a Fabric Lakehouse. The source is a self-hosted SQL Server ERP database that cannot be mirrored to Fabric. The largest table is 4 TB, daily change volume is about 0.8%, CDC exposes LSNs for inserts, updates, and deletes, and analysts need data within 30 minutes. If a run fails, the design must resume without losing or duplicating changes. Which loading pattern should you use?

  • A. Schedule full truncate-and-reload copies every 30 minutes
  • B. Overwrite only current-date partitions every 30 minutes
  • C. Copy CDC LSN ranges, persist checkpoints, and MERGE into Delta tables
  • D. Append CDC records directly without applying deletes

Best answer: C

Explanation: The best fit is an incremental CDC loading pattern. The source provides ordered change identifiers, the change volume is small compared with the table size, and the solution must support replay after failure. Persisted checkpoints plus idempotent merges meet the freshness and recoverability requirements. This scenario calls for an incremental load based on SQL Server CDC LSN ranges. Each run should extract a bounded LSN range, land the changes, and apply inserts, updates, and deletes to Delta Lakehouse tables by using an idempotent merge pattern. The checkpoint or high-water mark should be advanced only after the target commit succeeds, so a failed run can replay the same change range without data loss. Full reloads are inefficient for a 4 TB table, and date-based overwrites do not reliably capture updates or deletes outside the current partition. The key takeaway is to align the load pattern to the source change feed and recovery boundary.


Question 12

Topic: General

Your pipeline loads partner order events as nested JSON files into a Lakehouse Files folder. A Dataflow Gen2 activity expands orderLines, removes duplicates by orderId and lineId, and writes a curated Lakehouse table. The activity fails on every refresh.

Exhibit: Refresh message

Step: Expand orderLines
Rows read: 48 million
Error: Evaluation ran out of memory while expanding a nested list column.
Target: Curated Delta table in the same Lakehouse

You need the most appropriate fix while keeping the transformation in Fabric. What should you do?

  • A. Rewrite the transformation as Warehouse T-SQL.
  • B. Replace the Dataflow Gen2 transform with a PySpark notebook.
  • C. Move the data to an Eventhouse and use KQL.
  • D. Add a retry policy to the Dataflow Gen2 activity.

Best answer: B

Explanation: The failure occurs during a high-volume nested JSON expansion in Dataflows Gen2. A Fabric notebook using PySpark is better suited for scalable, code-based parsing, flattening, deduplication, and writing curated Delta tables in a Lakehouse. Choose the transformation engine based on the workload shape. Dataflows Gen2 is strong for low-code data preparation, but the evidence shows a repeatable memory failure while expanding a large nested list column. A notebook with PySpark can process large Lakehouse files in a distributed Spark engine, apply custom flattening and deduplication logic, and write the curated output as a Delta table. T-SQL is a better fit for relational transformations in a Warehouse, and KQL is a better fit for Eventhouse analytics over real-time or log-style data. Retrying does not address a deterministic capacity or engine-fit problem.


Question 13

Topic: General

A Microsoft Fabric pipeline incrementally loads daily sales from a Lakehouse bronze table into a Warehouse fact table. The CustomerId column is required to look up DimCustomer, and finance policy says sales with a missing CustomerId must not be posted to the fact table until the source is corrected. Valid sales must still load on schedule, and operations needs the rejected rows. Which loading pattern should you implement?

  • A. Discard rows with missing CustomerId.
  • B. Impute CustomerId with the most common customer.
  • C. Flag missing CustomerId and load all rows.
  • D. Route missing rows to remediation and load valid rows.

Best answer: D

Explanation: Missing data that violates a required dimension lookup should be routed for remediation when it cannot be safely loaded or inferred. This pattern keeps the incremental load moving for valid rows while preserving invalid source records for operational correction. The core decision is how to handle missing data during an incremental dimensional load. Because CustomerId is required for the DimCustomer lookup and policy prevents those sales from being posted, the load should split the data: valid rows continue into the Warehouse fact table, and rows with missing CustomerId go to a remediation or quarantine location with enough detail for operations to fix the source. This avoids silently losing data, inventing incorrect keys, or contaminating the fact table. Flagging is useful when rows may remain in the analytical dataset, but here the stem explicitly prohibits posting them until corrected.


Question 14

Topic: General

A company uses Microsoft Fabric domains to govern data by business area. For the Finance domain, only approved workspace admins should be able to associate their workspaces with the domain. The finance governance group must enforce this without receiving access to lakehouse files, warehouse tables, or workspace items.

What should you configure?

  • A. Add the governance group as Workspace Admins
  • B. Grant Read permissions on finance Lakehouses
  • C. Apply a Finance sensitivity label to all items
  • D. Require domain contributors for assignment

Best answer: D

Explanation: Domain workspace settings are the right control for governing which workspaces can be associated with a Fabric domain. Requiring domain contributors for assignment lets governance restrict domain membership without granting access to data or items inside the workspaces. Fabric domains provide governance grouping for workspaces and their analytics content. For this requirement, the key is to control who can associate workspaces with the Finance domain, not who can read or administer the data. Configuring the domain workspace assignment setting to require domain contributors lets the governance team approve or limit the eligible workspace admins while preserving least privilege. Workspace roles and item permissions control access to content, such as Lakehouse files or Warehouse objects. They are not the correct mechanism for domain membership governance and would over-grant access in this scenario.


Question 15

Topic: General

A Fabric workspace uses a Lakehouse bronze layer with OneLake shortcuts to mirrored sales tables. A nightly pipeline loads a Warehouse that feeds a semantic model. Analysts need only daily revenue and units by product category and sales region, and semantic model refresh should scan the fewest rows without losing that detail. Which design should you implement?

  • A. Load order-line rows with category and region columns.
  • B. Load a monthly table grouped by category and region.
  • C. Load a daily table grouped by customer and order.
  • D. Load a gold table grouped by day, category, and region.

Best answer: D

Explanation: The correct grain is the lowest level needed by downstream analytics: day, product category, and sales region. Aggregating to that grain preserves the requested detail while reducing scan volume for the Warehouse and semantic model. The core concept is selecting the aggregation grain before loading curated data. Because analysts need daily revenue and units by category and region, the Fabric transformation should join the needed reference data, then group order-line facts into one row per day, category, and region. This creates a gold aggregate table that is still detailed enough for the required analysis but avoids carrying unnecessary order-line detail into the Warehouse. Grouping above that level, such as by month, loses required detail; grouping below it, such as by order or customer, keeps avoidable volume.


Question 16

Topic: General

A Fabric workspace contains a Lakehouse and pipelines that perform incremental loads into curated Delta tables. A data engineering group must build and modify the loading pipelines, notebooks, and Lakehouse items in that workspace. The group must not be able to manage workspace membership or workspace settings. Which workspace-level role should you assign to the group?

  • A. Contributor
  • B. Viewer
  • C. Member
  • D. Admin

Best answer: A

Explanation: The requirement is write access to workspace items, not workspace administration. The Contributor role fits a data engineering team that must create and edit pipelines, notebooks, and Lakehouse items for loading workloads without managing workspace access. Fabric workspace-level access controls are implemented by assigning users or groups to workspace roles. For an ingestion team that needs to build and maintain loading assets, the key permission is the ability to create and modify items in the workspace. Contributor provides that capability while keeping user-management and workspace-administration permissions out of scope. This follows least privilege because the team can operate the data loading solution without controlling who else has access to the workspace. Use higher roles only when the user must manage workspace membership or administer the workspace itself.


Question 17

Topic: General

A Fabric notebook loads CSV order files into a Lakehouse table. The current PySpark transformation completes, but QA finds that the order_total >= '250' filter includes some smaller amounts and excludes some larger amounts because order_total is read as a string. The workload must keep currency precision and write only valid orders with order_total of at least 250. Which improvement should you make?

  • A. Increase Spark executor memory before filtering.
  • B. Sort by order_total before filtering.
  • C. Cast order_total to double after filtering.
  • D. Cast order_total to DecimalType(18,2) before filtering.

Best answer: D

Explanation: This is an incorrect transformation output caused by type handling in the filter. Casting to an exact numeric type before applying the predicate makes the comparison numeric and reliable while keeping the Lakehouse output at the required precision. The core issue is type handling, not Spark capacity. CSV ingestion reads order_total as a string, so a predicate written against a string literal can be evaluated lexicographically instead of numerically. Create a typed intermediate column, cast it to DecimalType(18,2), and run the threshold filter and any revenue calculations against that typed column before writing the Lakehouse table. Decimal is preferred for currency-style values because it avoids binary floating-point rounding that can appear with double. Changing resource size or row order cannot make a string comparison semantically correct.


Question 18

Topic: General

Your team is creating a Fabric workspace for finance data. New Fabric item data must reside in the organization’s EU Fabric capacity. Direct OneLake/ADLS Gen2 API readers must be limited to the Curated folder. External ADLS shortcut tables must remain shortcuts, but be efficient for SQL analytics endpoint and Direct Lake consumption. Which workspace configuration should you approve?

  • A. Use the EU capacity, OneLake data access roles, and Query acceleration.
  • B. Use the tenant home region, item permissions, and Query acceleration.
  • C. Use the EU capacity, Viewer access, and no Query acceleration.
  • D. Use the EU capacity, workspace Admin access, and nightly table copies.

Best answer: A

Explanation: The workspace should be assigned to the required Fabric capacity so OneLake storage aligns with the regional constraint. OneLake data access roles support folder-scoped direct access, and Query acceleration improves supported shortcut consumption for SQL analytics endpoint and Direct Lake scenarios. OneLake workspace configuration must satisfy storage placement, access scope, and consumption behavior together. Assigning the workspace to the EU Fabric capacity addresses where new Fabric item data is stored. OneLake data access roles are the appropriate control for limiting direct OneLake/ADLS Gen2 API access to a specific folder such as Curated. Query acceleration for OneLake shortcuts helps supported external shortcut data perform better for downstream SQL analytics endpoint and Direct Lake consumption without requiring the data to be copied into managed tables. Broad workspace roles or item permissions do not provide the same folder-scoped direct OneLake access control.


Question 19

Topic: General

An analytics team uses Dataflows Gen2 in a Fabric workspace to prepare finance data and load curated results to a Warehouse. Workspace policy now requires the managed data preparation refresh engine to use the workspace-level staging/storage configuration for intermediate results. You must apply the configuration once and keep the existing output destinations unchanged. Which action should you take?

  • A. Set the default Spark pool for the workspace.
  • B. Change each dataflow destination to a Lakehouse.
  • C. Configure the Dataflows Gen2 workspace staging/storage setting.
  • D. Create deployment pipeline rules for the Warehouse.

Best answer: C

Explanation: Dataflows Gen2 workspace settings are the correct place to configure managed data preparation behavior for dataflows in a workspace. This meets the requirement to apply the setting once while preserving the existing Warehouse destinations. For Dataflows Gen2, workspace-level settings control the managed data preparation configuration, such as the staging/storage behavior used during refresh. Because the requirement is about intermediate preparation storage and not the final data destination, the workspace Dataflows Gen2 setting is the right scope. Spark workspace settings apply to Spark jobs and notebooks, not Power Query-based Dataflows Gen2 refresh behavior. Changing dataflow destinations would alter where curated outputs are written, which violates the constraint. The key distinction is staging for managed refresh execution versus destinations for persisted outputs.


Question 20

Topic: General

A finance team uses a Fabric Warehouse. A pipeline currently copies dbo.Customer to a redacted table for analysts because permissions on one table produced unreliable access behavior. You need one physical table: Analysts can query only non-PII columns, and Stewards can query all columns. Some stewards are also in the Analysts group through Entra ID nesting.

Current permissions:

GRANT SELECT ON OBJECT::dbo.Customer TO Analysts;
DENY SELECT ON OBJECT::dbo.Customer (Email, Phone) TO Analysts;
GRANT SELECT ON OBJECT::dbo.Customer TO Stewards;

Assume no other warehouse permissions are assigned. A steward in both groups cannot query Email or Phone. What is the best improvement?

  • A. Replace Analysts access with GRANTs only on approved columns.
  • B. Add another PII-column GRANT for Stewards.
  • C. Mask PII columns for the Analysts group.
  • D. Grant SELECT on schema dbo to Stewards.

Best answer: A

Explanation: The unreliable behavior is caused by overlapping group membership and a column-level DENY. In T-SQL effective permissions, DENY overrides GRANT, so a steward who is also an analyst is blocked from the denied columns. Replacing the analyst DENY pattern with column-level allow-list GRANTs preserves one table and reliable access. Fabric Warehouse uses T-SQL-style effective permissions for granular access control. A DENY on Email and Phone for the Analysts group still applies when the same user also belongs to Stewards, so the Stewards table-level GRANT cannot restore access. For overlapping groups, use an allow-list pattern for the lower-privilege group: remove the analyst table-level GRANT and explicit column DENY, then grant Analysts SELECT only on approved non-PII columns. Stewards can retain full table access. This keeps one physical table and avoids maintaining a duplicated redacted table.


Question 21

Topic: General

You are building a PySpark notebook in Microsoft Fabric to transform raw order events from a Lakehouse bronze table into a silver Delta table. The analytics team requires one current row per OrderLineId, late-arriving corrections to update existing rows, and rejected rows to be traceable to the source file, load run, and failed rule. Rows missing CustomerId or ProductId must not appear in analytics until corrected.

Which implementation should you use?

  • A. Append all rows and filter duplicates in the semantic model
  • B. Drop null-key rows and overwrite the silver table daily
  • C. Deduplicate valid rows, quarantine invalid rows, and MERGE corrections
  • D. Load missing keys as zero values and aggregate by order date

Best answer: C

Explanation: The notebook must both protect analytics output and preserve data quality evidence. Deduplicating by OrderLineId, quarantining invalid rows with audit metadata, and using Delta MERGE for corrections satisfies correctness, traceability, and late-arriving data requirements. For this data quality pattern, the silver table should contain only analytically valid records, while invalid records are retained separately for remediation. A PySpark transformation can split rows into valid and invalid sets, write invalid rows to a data quality exceptions table with fields such as source file, load run, and failed rule, then deduplicate valid rows by OrderLineId. Delta MERGE updates existing silver rows when late-arriving corrections arrive and inserts new valid rows when no match exists. This prevents double counting without losing rejected-row lineage. The key takeaway is to handle quality failures explicitly, not hide them in reporting logic or destructive loads.


Question 22

Topic: General

Your team owns a Fabric semantic model named SalesExec in a production workspace and has a nonproduction copy for testing. Stakeholders require the scheduled 6:00 AM refresh to complete by 6:45 AM, and the data engineering mailbox must receive an alert when a refresh fails. You must validate the monitoring setup by using semantic model refresh signals, not report usage. Which TWO actions should you take?

  • A. Use report usage metrics to verify morning page views.
  • B. Review SalesExec refresh history against the SLA.
  • C. Alert on capacity CU spikes during refresh windows.
  • D. Run a controlled failed refresh and confirm notification delivery.
  • E. Use Lakehouse table timestamps as completion evidence.
  • F. Check only the 6:00 AM pipeline trigger status.

Best answer: B

Explanation: The SLA is about semantic model refresh completion and refresh failure alerting. Refresh history provides the authoritative status and timing for the model refresh, while a controlled failed refresh validates that the configured alert reaches the required recipients. For semantic model refresh monitoring, validate the signals that directly represent the semantic model refresh run. The refresh history or Monitoring hub run details can confirm whether the scheduled refresh completed successfully and whether its end time met the 6:45 AM expectation. Alerting should be validated by testing the failure path in a safe nonproduction copy and confirming that the configured notification reaches the data engineering mailbox. Upstream data readiness, report usage, and capacity behavior can support troubleshooting, but they do not prove that the semantic model refresh met the stakeholder SLA or that failure notifications work.


Question 23

Topic: General

You support a Fabric workspace with a Lakehouse and the SalesModel semantic model. Stakeholders define an SLE: the daily semantic model refresh must complete within 45 minutes, and the data engineering team must be alerted when the refresh fails. You need to implement monitoring with minimal custom code and produce run-level evidence during daily operations. Which implementation should you use?

  • A. Subscribe stakeholders to the report at the expected completion time.
  • B. Monitor only the upstream pipeline run history.
  • C. Enable refresh failure notifications and use Monitoring hub refresh history.
  • D. Add a fixed Wait activity before sending a success notification.

Best answer: C

Explanation: Semantic model refresh monitoring should use the model’s refresh status and duration, not only upstream workload status. Built-in refresh failure notifications provide the required alerting, while Monitoring hub provides run-level evidence such as status, start time, end time, and duration. For a semantic model SLE, monitor the semantic model refresh job itself. Fabric Monitoring hub can be used to review refresh activity and confirm whether each run completed within the expected duration. Refresh failure notifications on the semantic model provide a low-code way to alert the data engineering team when the refresh fails. This validates both parts of the stakeholder expectation: alerting for failures and operational evidence for the 45-minute refresh SLE. Monitoring only the pipeline proves the upstream load ran, but it does not prove the semantic model refresh succeeded or met its duration target.


Question 24

Topic: General

An Eventstream es-telemetry reads device messages from Azure Event Hubs, applies a Manage fields operation, and writes the output to an Eventhouse table. You review the monitoring details:

Source: Azure Event Hubs
Status: Running
Input events/sec: 1,200

Operator: Manage fields
Status: Error
Last error: Cannot cast field temperature value 'N/A' to Real

Destination: Eventhouse table DeviceTelemetry
Status: Error
Last error: BadRequest_InvalidMapping: column temperature expects real

Which TWO error types are identified by these symptoms? Select TWO.

  • A. A source connectivity error
  • B. A processing error in the operator
  • C. A destination mapping error
  • D. A Lakehouse write permission error
  • E. A semantic model refresh error
  • F. A OneLake shortcut resolution error

Best answer: B

Explanation: The monitoring details show errors in the Eventstream processing path and at the Eventhouse destination. The source is running and receiving events, so the listed symptoms do not indicate a source connectivity problem. Eventstream troubleshooting starts by locating where the failure is reported in the flow: source, operator, or destination. Here, the Azure Event Hubs source is running and has a positive input rate, so events are entering the stream. The Manage fields operator reports a failed conversion of temperature to Real, which identifies a processing error. The Eventhouse destination separately reports BadRequest_InvalidMapping, which identifies a destination ingestion or schema mapping error. The key takeaway is to match the symptom to the Eventstream component that reports it, rather than treating all failed delivery as a source issue.

Revised on Sunday, April 26, 2026