Browse Certification Practice Tests by Exam Family

Free Microsoft DP-700 Full-Length Practice Exam: 50 Questions

Try 50 free Microsoft DP-700 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length Microsoft DP-700 practice exam includes 50 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the Microsoft DP-700 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try Microsoft DP-700 on Web View full Microsoft DP-700 practice page

Exam snapshot

  • Exam route: Microsoft DP-700
  • Practice-set question count: 50
  • Time limit: 120 minutes
  • Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

DomainWeight
Implement and Manage an Analytics Solution34%
Ingest and Transform Data33%
Monitor and Optimize an Analytics Solution33%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Ingest and Transform Data

A Fabric pipeline loads daily point-of-sale files to a Lakehouse and then merges rows into a Warehouse fact table. The merge failed with Cannot insert NULL into column ProductKey. Profiling shows 42 of 80,000 rows have a blank ProductCode because new items are not yet in the product master. Finance says the sales amounts are valid, but product assignment must not be guessed.

What should you do?

Options:

  • A. Impute ProductKey by using the most frequent product

  • B. Route the rows for remediation and load valid rows

  • C. Filter out the rows with blank ProductCode permanently

  • D. Flag the rows and load them with NULL ProductKey

Best answer: B

Explanation: The missing ProductCode prevents a required ProductKey from being assigned, and the business explicitly prohibits guessing. The best fix is to separate those rows into a remediation path while continuing to load rows that pass validation.

Missing data handling depends on business meaning and downstream constraints. In this case, the sales rows are not invalid transactions, but they cannot be loaded into the fact table because the required product relationship is unresolved. Since finance does not allow estimation, the appropriate pattern is to route the affected rows to a quarantine or remediation table, notify the data owner, and reprocess them after the product master is corrected. This preserves valid sales data without violating the Warehouse constraint or corrupting product-level reporting. Filtering would lose real transactions, and imputation would create inaccurate product attribution.

  • Permanent filtering fails because the rows represent valid sales that must be preserved after correction.
  • Product imputation fails because guessing a product key would distort finance reporting.
  • Loading NULL keys fails because the Warehouse constraint already shows ProductKey is required.

Question 2

Topic: Ingest and Transform Data

A Fabric workspace contains a raw Lakehouse table with customer attributes, including Email, Phone, and NationalId. Analysts need a downstream semantic model that uses CustomerKey, Segment, and Region, but governance requires that analysts cannot access raw identifiers or raw OneLake files. Access must be limited to the curated object used by the model.

Which transformation and access choice should you implement?

Options:

  • A. Expose a curated Warehouse view with SELECT-only permissions.

  • B. Grant workspace Viewer access and apply dynamic data masking.

  • C. Share the raw Lakehouse table with a sensitivity label.

  • D. Use region-based RLS on the raw semantic model.

Best answer: A

Explanation: A curated Warehouse view can project only the approved columns needed by the semantic model. Granting SELECT only on that object avoids exposing the raw Lakehouse item, files, or sensitive identifiers.

The core choice is to combine a maintainable transformation boundary with least-privilege access. A curated Warehouse view gives downstream consumers a stable schema such as CustomerKey, Segment, and Region without carrying raw identifiers forward. Granting analysts SELECT only on that view applies object-level access and avoids workspace-level access to the raw Lakehouse or OneLake files.

Dynamic data masking, labels, and RLS can be useful controls, but they do not replace removing sensitive columns from the curated consumption path when the requirement is no access to raw identifiers.

  • Workspace Viewer access over-grants access to workspace items and does not prevent exposure of raw OneLake content.
  • Sensitivity labels classify and help protect data but do not by themselves remove identifiers or enforce least-privilege query access.
  • Region-based RLS filters rows, not sensitive columns or raw file access, so it does not meet the identifier protection requirement.

Question 3

Topic: Monitor and Optimize an Analytics Solution

A Microsoft Fabric pipeline runs a PySpark notebook that incrementally loads sales orders into a Lakehouse table. The pipeline passes a valid watermark parameter, and the notebook activity starts successfully. The activity fails with this notebook error:

AnalysisException: cannot resolve 'LastModifiedDate'
Input columns: order_id, customer_id, updated_at, amount
Failing step: df.filter(col("LastModifiedDate") > watermark)

You must restore the incremental load without changing the pipeline schedule or switching to a full reload. What should you do?

Options:

  • A. Change the pipeline to a full truncate-and-load.

  • B. Change the pipeline watermark expression to LastModifiedDate.

  • C. Update the notebook to filter on updated_at.

  • D. Add a retry policy to the notebook activity.

Best answer: C

Explanation: This is a notebook code error, not an orchestration problem. The pipeline successfully starts the notebook and passes the parameter, but the PySpark code references LastModifiedDate, which is not present in the input schema.

Notebook error resolution depends on where the failure occurs. In this case, orchestration is working: the pipeline invokes the notebook and provides the watermark. The failure occurs inside the PySpark transformation because the incremental filter uses a missing column name. The fix is to correct the notebook logic so the filter uses the available timestamp column, updated_at, for the incremental load.

Changing scheduling, retry behavior, or the loading pattern would not address the unresolved column reference. The key takeaway is to fix code defects in the notebook when the pipeline successfully executes the notebook but the notebook fails during transformation logic.

  • Retry policy fails because retrying the same invalid PySpark code will produce the same unresolved column error.
  • Full reload fails because the requirement is to preserve the incremental loading pattern.
  • Pipeline expression change fails because the missing column is referenced inside the notebook DataFrame filter, not caused by a bad parameter value.

Question 4

Topic: Ingest and Transform Data

A retail company stores minute-by-minute point-of-sale events as Delta files in cloud storage. The data is exposed in Fabric by using a OneLake shortcut, and analysts query it from Real-Time Intelligence with KQL. Queries over the shortcut are too slow, but the team must continue using the shortcut-backed data instead of building a separate ingestion pipeline.

Which loading pattern should you use?

Options:

  • A. Stream events into native Eventhouse tables

  • B. Enable Query acceleration for the shortcut

  • C. Replace the shortcut with Fabric mirroring

  • D. Create a scheduled full load pipeline

Best answer: B

Explanation: Query acceleration for OneLake shortcuts is the best fit when shortcut-backed data must remain the source for analytics but query performance needs to improve. It targets faster analytical access without changing the pattern to a separate batch or streaming ingestion path.

The core concept is choosing the loading pattern that matches shortcut-backed analytics. OneLake shortcuts let Fabric reference data without physically loading it into a Fabric table. When KQL or Real-Time Intelligence queries need better performance over that shortcut-backed data, Query acceleration is the Fabric capability intended to speed those queries while preserving the shortcut-based access pattern. A pipeline full load or Eventstream ingestion can create separate managed copies, but that changes the architecture and does not meet the requirement to keep querying the shortcut-backed data.

  • Full load pipeline fails because it creates a scheduled copied dataset instead of accelerating queries over the existing shortcut.
  • Fabric mirroring is intended for continuously replicated operational sources, not replacing a shortcut to Delta files.
  • Native Eventhouse streaming is useful for new streaming ingestion, but it does not satisfy the requirement to keep using shortcut-backed data.

Question 5

Topic: Monitor and Optimize an Analytics Solution

A Dataflow Gen2 named df_load_orders reads CSV files from a Lakehouse folder in a governed workspace. The connection uses the identity spn-fabric-df. Refresh fails with this output:

DataSource.Error: Access to the OneLake path was denied
Step: Source
Path: /Files/landing/orders/2026/04/orders.csv
Identity: spn-fabric-df

You must restore the refresh. The identity must access only the orders landing folder, not other folders in the Lakehouse. Which action should you take?

Options:

  • A. Grant OneLake read access only to /Files/landing/orders/.

  • B. Grant read access to all Lakehouse files.

  • C. Remove the sensitivity label from the source Lakehouse.

  • D. Add the identity as source workspace Admin.

Best answer: A

Explanation: The refresh output identifies a Dataflow Gen2 source access error, not a transformation logic error. The failing identity needs permission to the specific OneLake folder that contains the source files. Folder-scoped access restores the refresh without exposing unrelated Lakehouse data.

Read Dataflow Gen2 refresh output from the failing step and error text. Here, the Source step fails with Access to the OneLake path was denied for spn-fabric-df, which means the connection identity cannot read the source files. Because the requirement limits access to the orders landing folder, the governance-safe fix is to grant read access only at that folder scope by using OneLake security. Broader workspace or Lakehouse permissions might also remove the error, but they would violate least privilege. Removing labels does not address file access. The key takeaway is to remediate the permission shown in the refresh output at the narrowest required scope.

  • Workspace Admin overgrants management and data access beyond the single folder needed by the dataflow.
  • Sensitivity label removal weakens governance and does not grant the identity permission to read the files.
  • Read all files could bypass the access error but violates the stated orders-only access requirement.

Question 6

Topic: Implement and Manage an Analytics Solution

A finance workspace uses a Fabric pipeline to create and load tables in a Warehouse each night. A Microsoft Entra group has only the Warehouse Read item permission needed to connect, and must be able to query only the published rpt schema, not any stg objects. Because objects can be recreated during deployment, the control must be reapplied automatically. What should you add to the orchestration?

Options:

  • A. Use a notebook to move staging files to a private OneLake folder.

  • B. Trigger a semantic model refresh that uses row-level security.

  • C. Add a post-deployment pipeline Script activity for T-SQL grants.

  • D. Use Dataflows Gen2 to filter staging rows from the load.

Best answer: C

Explanation: The requirement is to control which Warehouse objects a group can query. A pipeline can orchestrate a post-deployment SQL permission step, while T-SQL object permissions implement schema or table access without changing the data.

In a Fabric Warehouse, object-level access is implemented by SQL permissions such as GRANT, DENY, and REVOKE on schemas, tables, or views. Because the pipeline creates or recreates objects during deployment, adding a post-deployment Script activity keeps the security step in the same batch orchestration and reapplies the intended permissions consistently. For this scenario, the script would grant access to the published reporting schema and avoid granting access to staging objects. Data transformation tools can prepare data, but they do not replace Warehouse object permissions.

  • Dataflow filtering changes rows during ingestion, but it does not enforce query permissions on Warehouse objects.
  • OneLake folders are a file or folder access pattern, not the best control for Warehouse schemas and tables.
  • Semantic model RLS can filter model queries, but it does not govern direct access to Warehouse objects.

Question 7

Topic: Implement and Manage an Analytics Solution

You are securing a Microsoft Fabric Warehouse that contains a Customer table with EmailAddress and TaxIdentifier columns. Business analysts must continue to query the same table for joins and aggregations, but they should see only masked values for those two columns. Members of the ComplianceAuditors group must see the original values. What should you implement?

Options:

  • A. Endorse the Warehouse as certified

  • B. Dynamic data masking with UNMASK for auditors

  • C. Deny analysts SELECT on the sensitive columns

  • D. A sensitivity label on the Warehouse item

Best answer: B

Explanation: Dynamic data masking is the Fabric Warehouse feature for controlled exposure of sensitive fields. It lets analysts query the same table while seeing masked column values, and users with permission can view the original values.

Dynamic data masking applies masking rules to sensitive columns and evaluates them at query time. In this scenario, analysts still need access to the same Customer table for joins and aggregations, so blocking column access would break the requirement. Applying masks to EmailAddress and TaxIdentifier, then granting unmasked access only to ComplianceAuditors, preserves usability while limiting exposure of sensitive data.

Sensitivity labels and endorsement help with governance, classification, and trust signals, but they do not change query results. The key takeaway is to use dynamic data masking when users can access data but should see protected column values in a masked form.

  • Sensitivity label classifies and protects content metadata, but it does not mask SQL query results.
  • Certified endorsement signals that an item is trusted, but it is not a data protection control.
  • Column denial prevents access to the fields instead of allowing masked exposure in the same table.

Question 8

Topic: Ingest and Transform Data

A retail company ingests POS transaction files into a Fabric Lakehouse every 15 minutes and publishes a sales fact table in a Warehouse. Duplicate transaction_id values must not double-count sales. Rows missing store_id or amount must be traceable to the source file and pipeline run. Corrections can arrive up to three days late and must update analytics without losing the original raw evidence.

Which design should you implement?

Options:

  • A. Drop null and duplicate rows in Dataflows Gen2 before loading

  • B. Append files directly to the Warehouse and filter duplicates in reports

  • C. Bronze Lakehouse landing, validation notebook, reject table, and Warehouse MERGE

  • D. Mirror the POS source and rely on semantic model refresh

Best answer: C

Explanation: A Fabric-native quality design should separate raw evidence from curated analytics data. Landing raw data first, validating it into accepted and rejected outputs, and using MERGE for the fact table supports correctness, traceability, and late-arriving corrections.

The core pattern is a Lakehouse medallion-style flow with explicit data quality handling. Raw POS files should be retained in a bronze table with metadata such as source file, load timestamp, and pipeline run ID. A validation step can write valid records to a curated table and invalid records to a reject or quarantine table with rule IDs and error reasons. The Warehouse fact table can then be loaded with MERGE keyed by transaction_id, optionally using update timestamps or sequence columns to apply late corrections safely.

This keeps analytics users on validated data while preserving an audit trail for remediation and reconciliation.

  • Report filtering fails because duplicate handling belongs in the curated load path, not as an inconsistent reporting workaround.
  • Dropping bad rows fails because it removes the traceability required for missing-field exceptions.
  • Mirroring alone fails because it replicates source changes but does not define validation, quarantine, or analytics-safe upsert logic.

Question 9

Topic: Implement and Manage an Analytics Solution

A Fabric pipeline performs incremental loads by landing source JSON files in a Lakehouse under /Files/landing/{source}/, then a notebook merges the data into Delta tables. A support contractor must inspect files only in /Files/landing/orders/ and must not access other landing folders. Which granular access-control layer should you use?

Options:

  • A. Folder/file-level access control on the OneLake path

  • B. Object-level security on the curated table

  • C. Column-level security on sensitive fields

  • D. Row-level security on the Delta table

Best answer: A

Explanation: The requirement is to control access to raw files in a specific OneLake folder during the loading process. Folder/file-level access control is the correct granular layer because it secures the physical path before the data is transformed into tables.

Granular access controls should match the protected resource. In this scenario, the contractor needs access to only one raw landing folder in OneLake, not to rows, columns, or whole curated tables. Folder/file-level access control lets you grant permission to /Files/landing/orders/ while preventing access to sibling folders such as other source landing paths. Row, column, and object controls are appropriate after data is exposed as tabular objects, but they do not satisfy a folder-scoped raw file requirement.

  • Row filtering fails because the requirement is about file paths, not table rows.
  • Column filtering fails because no specific table columns need to be hidden.
  • Object restriction fails because blocking or allowing an entire table does not isolate one landing folder.

Question 10

Topic: Ingest and Transform Data

A Fabric solution ingests point-of-sale events through Eventstreams into a Lakehouse. You need to publish 5-minute sales totals grouped by the event timestamp, not by ingestion time. Store devices can send events up to 15 minutes late. Totals must include those late events and then be finalized after the allowed lateness. Which process design should you use?

Options:

  • A. Run a Spark Structured Streaming notebook with event-time windows and a 15-minute watermark.

  • B. Run Dataflows Gen2 every 5 minutes and group by refresh start time.

  • C. Trigger a pipeline for each microbatch and aggregate by arrival time.

  • D. Publish Eventstreams aggregates immediately when each 5-minute window ends.

Best answer: A

Explanation: Late-arriving streaming events should be handled with event-time processing and a watermark. A Spark Structured Streaming notebook can keep 5-minute windows open for the 15-minute lateness allowance, update totals for qualifying late events, and then finalize the results.

The core concept is event-time windowing with watermarking. The event timestamp determines which 5-minute sales window an event belongs to, while the 15-minute watermark defines how long the engine should retain state for late events. In Fabric, a notebook using Spark Structured Streaming is a good fit for this stateful streaming logic because it can apply tumbling windows and a lateness policy before writing final aggregates.

Schedules and triggers can start work, but they do not by themselves solve late-event correctness. The key takeaway is to model lateness in the streaming computation, not by grouping on refresh or arrival time.

  • Refresh-time grouping fails because scheduled Dataflows Gen2 would group by processing time instead of the event timestamp.
  • Arrival-time aggregation fails because delayed store events would be counted in the wrong 5-minute window.
  • Immediate finalization fails because it provides no 15-minute allowance for late events before closing the window.

Question 11

Topic: Implement and Manage an Analytics Solution

A production Fabric workspace contains a Lakehouse with finance tables. Pipelines and SQL queries work when users open the Lakehouse directly, and the Lakehouse already has the correct endorsement and sensitivity label. However, finance analysts cannot find it when browsing assets by the Finance domain, and the workspace currently shows no assigned domain.

Which improvement should you make?

Options:

  • A. Create a OneLake shortcut in a Finance workspace.

  • B. Rebuild the Lakehouse tables in a new Lakehouse.

  • C. Grant Viewer access to all finance analysts.

  • D. Assign the workspace to the Finance domain.

Best answer: D

Explanation: The issue is not data access or pipeline reliability; it is domain-based discoverability. Assigning the workspace to the correct Fabric domain fixes governance placement while preserving the existing Lakehouse, tables, labels, endorsements, and loads.

Fabric domains are used to organize workspaces and make data assets easier to govern and discover by business area. If a workspace is unassigned or assigned to the wrong domain, its Lakehouse and other items can still work technically, but they may not appear where users expect in domain-based browsing and governance workflows. Updating the workspace domain setting is the targeted fix because it changes the governance association of the workspace without moving data, changing item permissions, or rebuilding pipelines.

The key takeaway is to troubleshoot domain visibility at the workspace domain setting before changing the data architecture.

  • Shortcut workaround exposes another path to data but does not correct the source workspace’s domain governance setting.
  • Broader access changes permissions and does not address why the asset is missing from the Finance domain view.
  • Rebuilding tables is unnecessary because the Lakehouse and pipelines already work.

Question 12

Topic: Implement and Manage an Analytics Solution

Your team maintains a Fabric Warehouse in a finance workspace. Developers must collaborate on schema changes, require pull-request review before release, and must not be granted permissions that allow them to view or modify production data. Release managers can approve production changes. Which approach best satisfies the collaboration and release-management requirements without over-granting access?

Options:

  • A. Use a Warehouse database project in Git with release-manager-only production publishing.

  • B. Apply sensitivity labels and let developers deploy production DDL directly.

  • C. Connect the production workspace to Git and grant developers Contributor in production.

  • D. Use OneLake folder permissions and run production DDL from notebooks.

Best answer: A

Explanation: The key is to separate source-controlled collaboration from production deployment rights. A Warehouse database project in Git supports pull-request review for schema changes, and limiting production publishing to release managers avoids granting developers production data access.

For Fabric Warehouse schema lifecycle management, a database project can represent the schema as source-controlled artifacts that developers review through Git pull requests. Production release should be performed only by an authorized release manager or controlled release identity, so developers do not need production workspace roles or direct data access. Governance controls such as labels or file permissions do not replace a reviewed, repeatable schema release process. The important pattern is Git-based collaboration plus restricted production deployment authority.

  • Production Contributor access over-grants developers because Contributor can modify production items and may expose production data.
  • Sensitivity labels classify and help protect data, but they do not provide pull-request review or safe schema release control.
  • OneLake folder permissions are not the right control plane for managing Warehouse DDL releases and would bypass the database-project workflow.

Question 13

Topic: Ingest and Transform Data

A finance team lands validated ERP extracts into staging tables in a Microsoft Fabric Warehouse every night. The target tables are also in the same Warehouse. The transformation includes multi-step dimensional loads, surrogate key lookups, and conditional inserts and updates. The engineers who maintain the logic are SQL developers, and the batch must run once after the nightly load completes.

Which orchestration pattern should you recommend?

Options:

  • A. Schedule a PySpark notebook to transform staging data

  • B. Schedule a pipeline to run Warehouse T-SQL scripts

  • C. Build a Dataflow Gen2 with Power Query transformations

  • D. Create KQL functions in an Eventhouse

Best answer: B

Explanation: The best fit is to keep the transformations in the Warehouse and orchestrate them with a scheduled pipeline. T-SQL matches the target data store and the SQL-focused team, while the pipeline handles the nightly dependency after staging completes.

Transformation-tool choice in Fabric should align to where the data resides, the transformation style, latency needs, and the team’s skills. Here, both staging and target tables are in a Warehouse, the logic is relational and dimensional, and the maintainers are SQL developers. A Fabric pipeline can coordinate the batch sequence after the nightly load, while T-SQL performs the transformations in place without moving data to Spark or a real-time store.

Dataflows Gen2 is better for low-code Power Query-style transformations. Notebooks are better for Spark-oriented or code-heavy lakehouse processing. KQL is for Eventhouse and Real-Time Intelligence scenarios.

  • Low-code mismatch: Dataflows Gen2 is not the best fit for complex Warehouse-centered dimensional update logic maintained by SQL developers.
  • Spark overfit: A PySpark notebook adds Spark processing even though the data and target are already in the Warehouse.
  • Wrong store: KQL functions in an Eventhouse target real-time analytical data, not nightly Warehouse batch transformations.

Question 14

Topic: Implement and Manage an Analytics Solution

You are troubleshooting a Microsoft Fabric Warehouse named FinanceWH. The requirement is that members of PayrollReaders can query dbo.EmployeeCompensation but must not read the NationalId or BankAccount columns.

Users report that even this query fails with SELECT permission denied on object 'EmployeeCompensation':

SELECT EmployeeId, Department
FROM dbo.EmployeeCompensation;

The deployed permissions include:

GRANT SELECT ON SCHEMA::dbo TO PayrollReaders;
DENY SELECT ON OBJECT::dbo.EmployeeCompensation TO PayrollReaders;

What should you change?

Options:

  • A. Use column-level DENY for the sensitive columns.

  • B. Move the table files to a restricted OneLake folder.

  • C. Add row-level security for payroll rows.

  • D. Apply a sensitivity label to the two columns.

Best answer: A

Explanation: The symptom shows an object-level denial on the table, so all column selections are blocked. The requirement is column-specific protection, so the fix is to remove the table-level DENY and apply column-level permissions to NationalId and BankAccount.

Granular SQL permissions should match the data-protection requirement. Here, DENY SELECT ON OBJECT::dbo.EmployeeCompensation overrides the schema-level grant and prevents any query against the table, including queries that select only non-sensitive columns. To allow table access while blocking NationalId and BankAccount, use column-level access control, such as denying SELECT only on those columns or granting only approved columns.

Row-level controls filter records, not columns. Folder or file controls protect OneLake storage paths, but they are not the right layer for this Warehouse T-SQL column requirement.

  • Row filtering fails because the requirement is to hide specific columns, not restrict which employee rows are returned.
  • Folder security is the wrong layer for a Warehouse table query permission issue.
  • Sensitivity labeling helps classify and govern data but does not by itself replace column-level query access control.

Question 15

Topic: Monitor and Optimize an Analytics Solution

You are implementing a Microsoft Fabric pipeline that loads daily sales CSV files into a Lakehouse staging table. The Copy data activity appends rows with load_id = '20260424-01'. A later notebook merges staging rows into the curated table and updates the watermark only after the copy succeeds. The Copy data activity failed with a transient timeout after writing some staging rows. You must complete the load without duplicate rows. Which action should you take?

Options:

  • A. Clear the failed load_id, then rerun the Copy activity.

  • B. Rerun from the failed Copy activity immediately.

  • C. Advance the watermark and wait for the next run.

  • D. Rerun the entire pipeline from the beginning.

Best answer: A

Explanation: The failure occurred after a non-idempotent append wrote partial data. The safest rerun action is to remove the partial output for the failed load and then rerun the failed Copy activity so the source files are loaded once.

Pipeline reruns are safest when the failed step is made idempotent before retrying. In this case, the Copy activity appends to a staging table and already wrote some rows for the same load_id. Rerunning the Copy activity without cleanup can append the same rows again. Because the merge and watermark update have not run yet, clearing only the partial staging rows for the failed load_id preserves the successful pipeline state and lets the failed ingestion step run cleanly.

The key takeaway is to rerun at the narrowest safe point, but only after removing partial output from non-idempotent writes.

  • Full rerun risk can repeat earlier append work and create duplicates in staging.
  • Immediate failed-step rerun ignores the partial rows already written by the failed Copy activity.
  • Advancing the watermark can skip unloaded source rows and leave the curated table incomplete.

Question 16

Topic: Ingest and Transform Data

A Fabric pipeline loads dbo.DimProduct in a Warehouse from normalized ERP tables staged in a Lakehouse. The dimension grain is one row per ProductId, and the target must include brand, subcategory, and current primary category attributes for a star schema. The Load DimProduct T-SQL activity fails with this excerpt:

MERGE dbo.DimProduct AS t
USING vw_ProductSource AS s
ON t.ProductId = s.ProductId

Error: The MERGE statement attempted to UPDATE or DELETE
 the same row more than once.
Diagnostics: ProductId 1007 has 3 source rows after joining
 Product -> ProductCategoryBridge -> Category.

Which fix should you implement?

Options:

  • A. Load brand, subcategory, and category as separate dimensions.

  • B. Retry the activity after increasing Warehouse capacity.

  • C. Change the MERGE match to use ProductId and CategoryId.

  • D. Create a denormalized staging view with one current category row per product.

Best answer: D

Explanation: The failure shows multiple source rows for the same ProductId, but the dimension grain is one row per product. For a star schema, denormalizing the related brand, subcategory, and current primary category attributes into a single staging row supports a clean dimension load.

Denormalization supports dimensional model loading when normalized source attributes must become columns in a single dimension at a defined grain. Here, DimProduct is keyed by ProductId, but the source join through ProductCategoryBridge returns multiple rows per product. A denormalized staging view should join the related descriptive tables and filter or resolve the bridge so only the current primary category remains for each product. That gives the MERGE one source row per target business key and preserves the star-schema dimension design. Changing the match condition would change the grain instead of fixing the source shape.

  • Loading separate hierarchy dimensions snowflakes the product attributes and does not satisfy the stated star-schema target.
  • Matching on product and category hides the duplicate source issue by changing the dimension grain.
  • Increasing capacity can help resource pressure, but it cannot fix duplicate source keys in a MERGE.

Question 17

Topic: Implement and Manage an Analytics Solution

A Fabric workspace contains several production PySpark notebooks that transform data in a Lakehouse each night. The solution works, but runs are inconsistent because all notebooks use the default Spark compute settings, and memory-intensive joins sometimes fail. You must improve reliability for all notebooks in the workspace without editing notebook code, changing pipeline parameters, or modifying item permissions.

What should you configure?

Options:

  • A. Workspace Spark settings

  • B. Pipeline activity parameters

  • C. Lakehouse item permissions

  • D. Notebook-level spark.conf statements

Best answer: A

Explanation: The requirement is to improve Spark reliability consistently across workspace notebooks without changing code or orchestration. Workspace Spark settings are the correct place to configure shared Spark compute behavior for notebooks in a Fabric workspace.

Fabric workspace Spark settings control shared Spark configuration such as the default Spark pool and related runtime settings for items in the workspace. When multiple notebooks have the same compute-related weakness, configuring the workspace-level Spark settings avoids duplicating settings in each notebook and preserves existing pipeline definitions and item permissions.

Notebook code is appropriate for job-specific logic or temporary session settings. Pipeline parameters are for orchestration values passed into activities. Item permissions control access, not Spark executor sizing or workspace compute defaults. The key distinction is that this is a workspace-level Spark configuration problem, not a code, parameterization, or security problem.

  • Notebook code fails the constraint because it requires editing each notebook and does not centrally govern workspace defaults.
  • Pipeline parameters are useful for orchestration inputs, but they do not replace workspace Spark compute configuration.
  • Item permissions affect who can access Lakehouse items, not Spark performance or executor reliability.

Question 18

Topic: Implement and Manage an Analytics Solution

You use a Fabric database project in Git to manage schema changes for a Fabric Warehouse named wh_sales. A pull request adds a new view, but the CI pipeline fails during the database project build before deployment.

SQL71501: View [dbo].[vwCustomerRevenue] has an unresolved reference to [dbo].[DimCustomer].
Build failed.

dbo.DimCustomer was created directly in the development Warehouse last week, but no file for it exists in the repository. What should you do?

Options:

  • A. Create DimCustomer manually in each target Warehouse.

  • B. Add DimCustomer to the database project and commit it.

  • C. Grant the CI identity Workspace Contributor access.

  • D. Create a OneLake shortcut named DimCustomer.

Best answer: B

Explanation: A Fabric database project treats the source-controlled project files as the schema model. The view references a table that exists only as a manual Warehouse change, so the build cannot resolve it. Add or extract the table definition into the project, commit it, and rerun the build.

Database project builds compile the schema model from the files in the project, not from objects that happen to exist in a Fabric Warehouse. Because dbo.DimCustomer was created directly in development and was not committed, the project model contains a view that points to an object outside the source-managed schema. The fix is to bring the Warehouse schema back under source control by adding or extracting the DimCustomer table definition into the database project, then committing and deploying the project.

Manual target changes can hide drift temporarily, but they do not make the database project build valid or maintain source control as the system of record.

  • Manual table creation fails because the build validates the project model before deployment, not the target Warehouse state.
  • Workspace permissions are not indicated because the error is an unresolved schema reference, not an authorization failure.
  • OneLake shortcut is unrelated because a shortcut does not add a Warehouse table definition to a database project.

Question 19

Topic: Monitor and Optimize an Analytics Solution

A Fabric data pipeline runs a PySpark notebook that is attached to the Lakehouse lh_curated as the default Lakehouse. The notebook must read source files by using /lakehouse/default/Files/source/orders/ and you cannot change the notebook code or copy the source data.

The run fails with this excerpt:

Path does not exist: /lakehouse/default/Files/source/orders/2026/04/*.parquet
Lakehouse Files view: Files/source/customers is a shortcut
Lakehouse Files view: Files/source/orders is not listed

Which implementation should you use?

Options:

  • A. Create a Warehouse table named orders.

  • B. Create a OneLake shortcut at Files/source/orders.

  • C. Grant workspace Admin to the pipeline owner.

  • D. Attach a second Lakehouse to the notebook.

Best answer: B

Explanation: The error indicates that the expected folder path does not exist in the default Lakehouse. Because the path should be provided by a OneLake shortcut and the shortcut is missing, creating the shortcut at the expected location satisfies the constraints without changing the notebook or duplicating data.

Fabric notebooks resolve /lakehouse/default/... paths against the Lakehouse attached as the default for the notebook. In this scenario, the code expects a folder under Files/source/orders, but the Lakehouse Files view shows no orders shortcut at that location. A OneLake shortcut behaves like a folder in the Lakehouse, so creating the missing shortcut with the expected name and location restores the path used by the notebook. This preserves the no-code-change and no-copy constraints. Creating a different item, attaching another Lakehouse, or changing broad workspace permissions does not make the specific /lakehouse/default/Files/source/orders/ path exist.

  • Warehouse table fails because the notebook is reading a Lakehouse file path, not querying a Warehouse table.
  • Second Lakehouse fails because /lakehouse/default still points to the existing default Lakehouse path.
  • Workspace Admin is excessive and does not create the missing Files/source/orders shortcut.

Question 20

Topic: Implement and Manage an Analytics Solution

A team is configuring a Microsoft Fabric deployment pipeline for a lakehouse and warehouse solution. Promotion from Development to Test works, but configuring the Production stage fails with this activity output:

Deployment pipeline: RetailAnalytics
Development stage workspace: Retail-Dev
Test stage workspace: Retail-Test
Production stage workspace selected: Retail-Test
Result: Failed
Message: The workspace is already assigned to a deployment pipeline stage.

What is the best fix?

Options:

  • A. Assign a separate Retail-Prod workspace to Production.

  • B. Copy the deployed items manually to Retail-Test.

  • C. Leave Production unassigned and release from Test.

  • D. Reuse Retail-Test and add deployment rules.

Best answer: A

Explanation: The failure is caused by trying to assign the same workspace to more than one deployment pipeline stage. For a Development, Test, and Production promotion path, each stage should map to the appropriate separate workspace.

Fabric deployment pipelines promote items between stages that are backed by workspaces. A workspace cannot be reused as both the Test and Production stage workspace in the same promotion path. The evidence shows Retail-Test is already assigned to the Test stage, so selecting it again for Production causes the stage assignment failure. The appropriate fix is to assign an eligible, separate production workspace, such as Retail-Prod, to the Production stage and then promote from Test to Production.

Deployment rules can adjust stage-specific settings, but they do not allow one workspace to serve as multiple stages.

  • Deployment rules can help with environment-specific values, but they do not resolve a duplicate stage workspace assignment.
  • Skipping Production avoids the failed assignment but does not configure the required development, test, and production promotion path.
  • Manual copying bypasses lifecycle management and still leaves the deployment pipeline stages misconfigured.

Question 21

Topic: Ingest and Transform Data

You are designing a Fabric pipeline that loads a 4-TB Orders table from an operational database into a Lakehouse Delta table. The source receives frequent inserts and updates, exposes a reliable ModifiedAtUtc column, and does not need hard-delete propagation. Analysts need data within 30 minutes, and failed reruns must not duplicate rows or require reloading all history. Which loading pattern should you use?

Options:

  • A. Use a watermark-based incremental pipeline with MERGE and post-success watermark updates.

  • B. Append each 30-minute extract directly to the Lakehouse table.

  • C. Schedule a full truncate-and-reload pipeline every 30 minutes.

  • D. Create a OneLake shortcut and enable Query acceleration.

Best answer: A

Explanation: A watermark-based incremental load fits high volume, frequent changes, and a 30-minute freshness target. Using MERGE handles updates without creating duplicate rows, and updating the watermark only after a successful load supports safe recovery after failures.

For a large table with frequent inserts and updates, a full reload is inefficient and increases recovery risk. A Fabric pipeline can store the last successful ModifiedAtUtc watermark, extract rows changed since that value, stage them, and apply them to the Lakehouse Delta table with an upsert such as MERGE. The watermark should be advanced only after the target commit succeeds, so a failed run can be retried without skipping changes. This is the key distinction from a simple append pattern, which can duplicate updated or retried rows.

  • Full reload wastes resources on a 4-TB table and does not meet the recoverability goal efficiently.
  • Append-only extract fails because updates and retries can create duplicate or stale rows.
  • Shortcut acceleration improves querying shortcut data, but it is not a change-capture loading pattern for this requirement.

Question 22

Topic: Implement and Manage an Analytics Solution

A finance team runs multiple PySpark notebooks in a Fabric workspace. Security review found that developers sometimes pass passwords and tokens as Spark options during testing, and these values can appear in Spark UI or driver logs. You need to apply a workspace-wide control for new Spark sessions without changing data access permissions or masking analytical results.

What should you configure?

Options:

  • A. Dynamic data masking on Warehouse columns

  • B. A workspace-level Spark redaction property

  • C. A sensitivity label on the Lakehouse item

  • D. Viewer access for all notebook developers

Best answer: B

Explanation: The requirement is to protect secrets that may be exposed by Spark execution metadata, not to change table query results or item classification. A workspace-level Spark property applies consistently to new Spark sessions in the workspace and targets the logging exposure directly.

Fabric Spark workspace settings can define Spark properties that apply across Spark sessions in the workspace. For a governance requirement focused on accidental disclosure of passwords, tokens, or connection strings in Spark logs or the Spark UI, a redaction-related Spark configuration is the direct control. It reduces exposure in execution metadata while leaving lakehouse permissions and query results unchanged.

Data masking, sensitivity labels, and workspace roles solve different governance problems. The key takeaway is to use Spark workspace settings when the required control is about Spark session behavior across the workspace.

  • Data masking mismatch fails because dynamic data masking protects query output, not Spark driver logs or Spark UI metadata.
  • Classification only fails because a sensitivity label identifies or protects an item but does not redact Spark option values in logs.
  • Access overcorrection fails because changing developers to Viewer would block authoring and does not address secret redaction for Spark sessions.

Question 23

Topic: Ingest and Transform Data

A manufacturer sends machine telemetry to Microsoft Fabric. Operations needs dashboards and alerts updated within seconds. The raw payload includes operator IDs, and only the operations data engineering team may access raw events. Analysts may query only curated telemetry and must not receive workspace-wide access. Which design should you implement?

Options:

  • A. Use Eventstream to Eventhouse and item-level access to curated telemetry.

  • B. Use Eventstream to Lakehouse and workspace Contributor access for analysts.

  • C. Use a scheduled pipeline and workspace Viewer access to raw telemetry.

  • D. Use hourly Dataflow Gen2 refresh and shared raw Lakehouse files.

Best answer: A

Explanation: Second-level freshness requires a streaming ingestion pattern, not a scheduled batch load. Using Eventstream with an Eventhouse supports continuously arriving telemetry, while item-level access to curated data avoids giving analysts broad workspace or raw-event access.

Streaming ingestion is appropriate when events must be processed as they arrive, such as telemetry that drives near-real-time dashboards and alerts. Scheduled pipelines or Dataflow Gen2 refreshes are batch patterns, even if they run frequently, because they process data on a timer. The governance requirement also matters: analysts should receive access only to the curated Fabric item they need, not a workspace role or direct access to raw OneLake files containing operator IDs. The key distinction is continuous event processing plus least-privilege access to curated data.

  • Scheduled pipeline fails because it is timer-based batch ingestion and workspace Viewer access is broader than the analyst requirement.
  • Hourly Dataflow Gen2 fails because it cannot meet second-level freshness and exposes raw Lakehouse files.
  • Workspace Contributor fails because it grants analysts more permissions than needed, even though Eventstream is a streaming mechanism.

Question 24

Topic: Monitor and Optimize an Analytics Solution

A refresh-failure alert is raised for a Fabric semantic model used by Finance analysts. The alert details show that the refresh uses the connection identity svc-finance-refresh and failed with SELECT permission denied on object 'dbo.vw_FactPayrollCurated' in a Fabric Warehouse. Finance policy requires analysts to consume only the semantic model and not receive access to raw payroll tables.

Which operational response should you take?

Options:

  • A. Disable dynamic data masking on payroll columns.

  • B. Grant OneLake folder access to raw payroll files.

  • C. Add Finance analysts to the workspace Contributor role.

  • D. Grant SELECT on the curated view to the refresh identity.

Best answer: D

Explanation: The alert evidence points to an object-level permission failure for the semantic model refresh identity, not an analyst access problem. The least-privilege response is to grant only the required SELECT permission on the curated Warehouse view used by refresh.

For semantic model refresh incidents, use the alert details to identify which identity failed and at what layer. Here, the failing principal is the configured refresh identity, and the Warehouse error names a specific view. Granting object-level SELECT on that curated view restores the refresh path while preserving the governance boundary that analysts consume data through the semantic model only.

Workspace role changes, raw file access, or weakening masking controls would expand access beyond what the alert requires. The key takeaway is to remediate the failing operational identity at the narrowest security scope shown by the evidence.

  • Workspace overgrant fails because Contributor access is broader than the refresh identity needs and changes analyst permissions.
  • Weakened masking fails because the error is a SELECT permission denial, not a masking issue.
  • Raw file access fails because the source error is in the Warehouse view and raw payroll access violates the policy.

Question 25

Topic: Monitor and Optimize an Analytics Solution

A Fabric Lakehouse contains a Delta table named SalesFact. A pipeline appends data every 10 minutes, and each run creates many small files. SQL endpoint queries and PySpark transformations that scan SalesFact have become slow. You must improve read and transformation performance while keeping the table in the Lakehouse and preserving query semantics.

Which TWO actions should you take?

Options:

  • A. Create a OneLake shortcut to the same table.

  • B. Write or rewrite the table with V-Order enabled.

  • C. Run OPTIMIZE to compact the table files.

  • D. Run VACUUM with zero-hour retention after each load.

  • E. Enable dynamic data masking on filter columns.

  • F. Move the table to an Eventhouse and query it with KQL.

Correct answers: B and C

Explanation: Small-file compaction and V-Order are Lakehouse table optimization techniques. They improve how Fabric engines read Delta Parquet data without changing the logical table or moving the workload to another Fabric item.

For a Lakehouse Delta table with frequent appends, many small files can cause excessive file listing and scan overhead. Running table maintenance with OPTIMIZE compacts small files into larger files, which helps Spark transformations and SQL endpoint queries read the table more efficiently. V-Order improves the physical layout and encoding of Delta Parquet files for Fabric read engines, so rewriting or writing the table with V-Order enabled can further improve scan performance.

Storage cleanup, security masking, shortcuts, and changing to a real-time engine do not directly optimize the existing Lakehouse table for these workloads.

  • Vacuum cleanup removes obsolete files for storage management; it does not compact active small files and zero-hour retention is unsafe.
  • Shortcut indirection changes how data is referenced, not the physical layout of the native Delta table.
  • Masking controls protect sensitive values but do not improve file scan or transformation performance.
  • Eventhouse migration targets real-time KQL workloads and violates the requirement to keep the table in the Lakehouse.

Questions 26-50

Question 26

Topic: Monitor and Optimize an Analytics Solution

A Fabric workspace has an hourly pipeline that loads three independent operational sources into Lakehouse staging tables and then runs a Warehouse stored procedure. Pipeline monitoring shows each copy activity takes about 12 minutes, the downstream procedure meets its SLA after staging is complete, and the current pipeline runs the three copy activities sequentially. You need to reduce total pipeline elapsed time without changing Lakehouse, Spark, or Warehouse logic. What should you do?

Options:

  • A. Run the independent copy activities in parallel in the pipeline.

  • B. Rewrite the Warehouse stored procedure as notebook code.

  • C. Increase the Spark pool size for the workspace.

  • D. Optimize the Lakehouse tables after each copy activity.

Best answer: A

Explanation: The bottleneck is the pipeline dependency design, not the Lakehouse, Spark, or Warehouse processing. Running independent ingestion activities concurrently reduces orchestration wait time while keeping the stored procedure dependent on all staging loads finishing successfully.

Pipeline optimization focuses on how activities are scheduled, parameterized, triggered, and coordinated. In this scenario, the measured activity durations are acceptable, but the pipeline forces three independent copy operations to run one after another. Removing unnecessary sequential dependencies and allowing parallel execution shortens the critical path without changing table layout, Spark configuration, or Warehouse SQL. The downstream stored procedure should still wait until all required staging loads succeed. The key takeaway is to optimize the orchestration layer when monitoring shows unnecessary pipeline waiting or serialization.

  • Lakehouse optimization misses the evidence that table processing is not the bottleneck.
  • Spark sizing does not address copy activities that are delayed by pipeline sequencing.
  • Notebook rewrite changes transformation implementation instead of fixing the orchestration dependency pattern.

Question 27

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline runs a PySpark notebook that transforms raw sales files in a Lakehouse into a curated Delta table used by a Warehouse load. After fixing a notebook error caused by changed source columns, the team must validate both transformation correctness and operational reliability. The fix must prevent bad curated data from being published and must leave run-level evidence for support.

Which design is the best fit?

Options:

  • A. Replace the notebook with Dataflows Gen2 and sample rows manually.

  • B. Validate totals only after the Warehouse load completes.

  • C. Overwrite the curated table, then review Spark logs manually.

  • D. Write to staging, run notebook assertions, then promote only on pass.

Best answer: D

Explanation: The safest validation pattern is to separate transformation output from publication. A notebook can compute data quality checks, write metrics, and raise errors so the pipeline stops before curated data is promoted.

For a repaired Fabric notebook, validation should confirm the transformation result and the operational behavior of the job. Writing to a staging Delta table lets the notebook run row-count, schema, duplicate-key, null, and aggregate checks against the fixed output. If a check fails, the notebook should raise an exception and write validation metrics or status to a Lakehouse table so the pipeline run shows a failed state with support evidence. Only after the checks pass should the pipeline promote or merge the staged data into the curated table used by downstream Warehouse loads.

The key takeaway is to validate before publication and make failure observable through the pipeline, not by manual review after downstream consumers are affected.

  • Manual log review does not prove transformation correctness and is not a reliable control before publication.
  • Tool replacement is unnecessary; the issue is validating a fixed notebook, not choosing a different transformation engine.
  • Late validation allows bad data to reach the Warehouse before checks detect the issue.

Question 28

Topic: Monitor and Optimize an Analytics Solution

A Fabric solution streams telemetry through an Eventstream into a native Eventhouse table named Telemetry. Monitoring shows balanced Eventstream input and output rates, normal ingestion latency, and no backpressure. A KQL dashboard tile is slow because its query scans 90 days of data, although the tile only needs the last 24 hours. You need to reduce tile latency while keeping all raw events available for historical queries. What should you implement?

Options:

  • A. Enable Query acceleration for OneLake shortcuts.

  • B. Add an early KQL time filter before aggregation.

  • C. Filter out older events in the Eventstream destination.

  • D. Increase Spark executor memory for the workspace.

Best answer: B

Explanation: The evidence points to an Eventhouse query scan problem, not a streaming or Spark problem. Adding a selective KQL where filter on the event time before aggregation limits the data scanned for the dashboard while preserving all raw events in the Eventhouse.

Eventhouse performance issues should be addressed in the engine or item where the evidence occurs. Here, the Eventstream is healthy and the slow workload is a KQL dashboard query that scans far more history than required. The targeted implementation is to filter the native Eventhouse table early, such as applying where EventTime >= ago(24h) before summarize. This lets the KQL engine prune data and aggregate only the required time window. Changing Spark settings or OneLake shortcut behavior does not address a native Eventhouse query, and filtering in the Eventstream would remove data needed for historical analysis.

  • Spark tuning fails because the slow operation is a KQL query in Eventhouse, not a Spark job.
  • Shortcut acceleration fails because the source is a native Eventhouse table, not a OneLake shortcut query path.
  • Eventstream filtering fails because it would stop retaining older raw events required for historical queries.

Question 29

Topic: Implement and Manage an Analytics Solution

You support a Fabric workspace that contains several Dataflows Gen2. After a cleanup, every Dataflow Gen2 refresh fails before reading the source. Pipelines and notebooks in the workspace still run.

Workspace settings show:

  • Dataflows Gen2 staging lakehouse: dfg2_staging_old
  • dfg2_staging_old: deleted during cleanup
  • Available item: lh_finance_staging in the same workspace

You need to restore Dataflows Gen2 refresh behavior without changing individual dataflows. What should you do?

Options:

  • A. Regenerate source connection credentials for each dataflow.

  • B. Create a OneLake shortcut named dfg2_staging_old.

  • C. Increase the workspace Spark executor size.

  • D. Set the Dataflows Gen2 staging lakehouse to lh_finance_staging.

Best answer: D

Explanation: Dataflows Gen2 refreshes depend on the workspace-level staging configuration. Because the configured staging lakehouse was deleted, the fix is to update the Dataflows Gen2 workspace setting to an existing lakehouse in the same workspace.

The core issue is not source connectivity, Spark capacity, or OneLake file access. The exhibit shows that all Dataflows Gen2 fail before source reads and that the workspace-level staging lakehouse setting points to a deleted Fabric item. Updating the Dataflows Gen2 workspace setting to use an available staging lakehouse restores the shared staging dependency without editing each dataflow.

The key takeaway is to troubleshoot shared Dataflows Gen2 settings first when multiple dataflows fail with the same pre-source refresh behavior.

  • Spark sizing does not address a deleted Dataflows Gen2 staging item.
  • Source credentials are unlikely because refresh fails before reading sources and affects all dataflows.
  • OneLake shortcut does not replace the required Fabric staging lakehouse item used by Dataflows Gen2.

Question 30

Topic: Implement and Manage an Analytics Solution

A Fabric workspace contains PySpark notebooks that are run by pipelines. The notebooks inherit the workspace default Spark pool and cannot be edited before the next run. After an admin changed the workspace default from SalesETLPool (custom, medium nodes, autoscale enabled) to the starter pool, a large join fails with ExecutorLostFailure and memory pressure.

You need to fix the processing issue by using Spark workspace settings and without changing notebook code. Which two actions could meet the requirement? Select TWO.

Options:

  • A. Add spark.executor.memory in each notebook.

  • B. Increase the pipeline activity timeout.

  • C. Enable Query acceleration for OneLake shortcuts.

  • D. Set a Warehouse as the default SQL endpoint.

  • E. Create a right-sized custom pool and set it as default.

  • F. Set the default Spark pool back to SalesETLPool.

Correct answers: E and F

Explanation: The failure started after the workspace default Spark pool was changed to a starter pool. Because the notebooks inherit the workspace default and cannot be edited, the fix should change the workspace Spark pool setting to an appropriately sized custom pool.

Fabric Spark workspace settings control the default compute used by notebooks that do not specify their own Spark pool. In this scenario, the processing issue is tied to an inappropriate default pool: the starter pool is not sized for the large join, causing executor loss and memory pressure. Restoring the previous custom pool or making another right-sized custom pool the workspace default fixes the inherited compute configuration without changing notebook code.

Changing pipeline, Warehouse, or OneLake shortcut settings does not change the Spark executors used by these PySpark notebooks. Notebook-level Spark configuration also violates the stated no-code-change constraint.

  • Pipeline timeout does not address executor loss caused by insufficient Spark resources.
  • Warehouse endpoint affects SQL workloads, not PySpark notebook Spark sessions.
  • Query acceleration targets OneLake shortcut query performance, not Spark pool memory.
  • Notebook configuration could tune Spark, but it requires editing notebooks and is not a workspace setting.

Question 31

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline SQL activity runs a T-SQL statement against the SQL analytics endpoint of a lakehouse. The query succeeds for native lakehouse tables but fails for dbo.SalesOrders, which is a OneLake shortcut to an ADLS Gen2 Delta table.

Statement: SELECT COUNT(*) FROM dbo.SalesOrders;
Error: T-SQL query failed during data scan.
Detail: OneLake shortcut target returned 403 Forbidden.
Shortcut status: Authentication failed for connection adls-erp-prod.

What is the best fix?

Options:

  • A. Repair the shortcut connection permissions

  • B. Rewrite the query by using fully qualified names

  • C. Grant the pipeline identity Warehouse Contributor

  • D. Recreate the pipeline SQL activity

Best answer: A

Explanation: The failure occurs during the scan of a OneLake shortcut, not during T-SQL parsing or object resolution. The 403 Forbidden and shortcut authentication status point to the shortcut connection or external storage permissions as the root cause.

A T-SQL activity can surface an error even when the underlying problem is a OneLake shortcut access failure. Here, the same endpoint can query native lakehouse tables, and the statement is a simple valid SELECT. The decisive evidence is the 403 response from the shortcut target and the failed authentication status for the shortcut connection. The fix is to reauthorize or update the shortcut connection and ensure the connection identity has the required read access to the ADLS Gen2 Delta location.

Changing the SQL text would not repair an external storage authorization failure.

  • Fully qualified names do not address a 403 response from the shortcut target.
  • Pipeline recreation does not change the shortcut connection credentials or target storage ACLs.
  • Warehouse Contributor is not the relevant fix because the failure is at the OneLake shortcut target, not warehouse administration.

Question 32

Topic: Monitor and Optimize an Analytics Solution

You manage a Microsoft Fabric Warehouse for sales analytics. The main workload is T-SQL: star-schema queries join FactSales to DimCustomer, DimProduct, and DimDate, and nightly stored procedures run similar joins and filters. Query insights show poor cardinality estimates after each large load. A staging validation job already verifies that dimension keys are unique and fact foreign keys are valid. You must optimize the Warehouse without moving data to another Fabric workload. Which TWO actions should you recommend?

Options:

  • A. Define NOT ENFORCED key constraints on validated relationships.

  • B. Configure hash distribution for facts and replicated distribution for dimensions.

  • C. Replace Warehouse native tables with OneLake shortcuts to the source system.

  • D. Move the tables to an Eventhouse and rewrite queries in KQL.

  • E. Update statistics on join and filter columns after large loads.

  • F. Create clustered columnstore and nonclustered indexes on the tables.

Correct answers: A and E

Explanation: Fabric Warehouse optimization for relational workloads depends on useful metadata and accurate statistics. For validated star-schema data, key constraints describe relationships, while updated statistics improve row-count estimates after large loads. These actions keep the workload in the Warehouse and support T-SQL query and transformation patterns.

For a join-heavy Fabric Warehouse workload, the optimizer benefits from accurate column statistics and trustworthy relational metadata. Updating statistics after large data changes helps the optimizer estimate selectivity for joins, filters, and aggregations. Because the staging process already validates uniqueness and referential integrity, NOT ENFORCED primary and foreign key constraints can safely describe the star-schema relationships without relying on Fabric to enforce them. Dedicated SQL pool physical-design choices, such as table distributions and traditional index strategies, are not the right tuning mechanism for this Fabric Warehouse scenario.

  • Index tuning fails because clustered columnstore and nonclustered index design is not the supported Fabric Warehouse tuning path here.
  • Distribution design fails because hash and replicated distribution settings are dedicated SQL pool-style controls, not Fabric Warehouse table design options.
  • Eventhouse rewrite fails because it changes the workload to KQL instead of optimizing the existing relational T-SQL Warehouse.
  • Source shortcuts fail because bypassing native Warehouse tables does not address the stated Warehouse query and transformation optimization need.

Question 33

Topic: Monitor and Optimize an Analytics Solution

A Fabric workspace uses a Dataflow Gen2 item to load daily sales files into a Lakehouse table. Refreshes succeed, but monthly totals are too low. Monitoring shows the source recently changed Amount from numeric to text, with values such as 1,250.00 and N/A. The dataflow currently filters Amount > 0 and then groups by month. You must keep the low-code Dataflow Gen2 process. What should you change?

Options:

  • A. Convert Amount with the correct locale before filtering and grouping

  • B. Create a OneLake shortcut to the source files

  • C. Add a pipeline retry policy before the dataflow refresh

  • D. Aggregate the Lakehouse table in a notebook after loading

Best answer: A

Explanation: The issue is not a refresh failure; it is incorrect transformation logic caused by type conversion order. In Dataflow Gen2, the Amount column should be converted using the correct locale and error-handling rules before filters and aggregations use it.

Dataflow Gen2 transformations are applied in step order. If a text column is filtered or grouped before it is converted to a numeric type, values can be excluded, treated as errors, or aggregated incorrectly. The Fabric-native fix is to edit the applied steps so Amount is converted with the correct locale, handle N/A values intentionally, and then apply the Amount > 0 filter and monthly grouping. This keeps the low-code process and fixes the root cause in the dataflow instead of masking it downstream.

  • Retrying refreshes does not help because the refresh succeeds and the incorrect output is deterministic.
  • Using a shortcut changes access to files but does not correct transformation step order or data types.
  • Post-load notebook aggregation adds another processing layer and leaves the Dataflow Gen2 output incorrect.

Question 34

Topic: Monitor and Optimize an Analytics Solution

A Microsoft Fabric pipeline ingests CRM files into a Lakehouse and then runs a notebook to merge the data into curated Delta tables. The operations team treats a run as unhealthy if any ingestion or transformation activity fails, row counts differ during copy, or the run ends after the 03:30 UTC SLA.

Review the run history output:

Pipeline: DailyCRMIngest
Trigger time: 02:00 UTC
Run start: 02:01 UTC
Run end: 03:47 UTC
Pipeline status: Completed

Activities:
Get_New_Files       Succeeded   filesFound=8
Copy_to_Lakehouse   Succeeded   rowsRead=487,920 rowsWritten=487,920
Validate_Bronze     Succeeded   invalidRows=0
Merge_Silver        Failed      error=ConcurrentAppendException

Which TWO observations identify failed or delayed processing? Select TWO.

Options:

  • A. Merge_Silver has status Failed.

  • B. Get_New_Files found 8 files.

  • C. The pipeline status is Completed.

  • D. Validate_Bronze reported invalidRows=0.

  • E. The run ended after the 03:30 UTC SLA.

  • F. Copy_to_Lakehouse read and wrote 487,920 rows.

Correct answers: A and E

Explanation: The activity-level output shows a failed transformation, and the run timestamps show that the pipeline missed its SLA. In Fabric monitoring, both activity status and run duration are important because an overall Completed status might not prove that every processing step met operational requirements.

Use Fabric run history and activity output to compare the actual run against the operational success criteria. Here, Merge_Silver is a transformation activity and its status is Failed, so it directly identifies failed processing. The run ended at 03:47 UTC, which is later than the 03:30 UTC SLA, so it also identifies delayed processing. Matching copy row counts and zero invalid rows are healthy signals for those steps, not evidence of failure. The key takeaway is to inspect both timestamps and activity-level output, not only the overall pipeline status.

  • Copy row counts do not indicate a problem because rows read and rows written are equal.
  • Validation output is healthy because invalidRows=0 shows no bronze validation failures.
  • File discovery count is not enough to prove incompleteness because no expected file count failure is shown.
  • Overall completion does not override the failed activity shown in the detailed run output.

Question 35

Topic: Ingest and Transform Data

A retail team loads data nightly from an ERP system into Microsoft Fabric. The source is normalized into Orders, OrderLines, Customers, and Products. Analysts need a single queryable table at order-line grain that includes customer segment and product category, so their queries do not require repeated joins. The source tables include ModifiedDate columns. Which loading pattern should you implement?

Options:

  • A. Incrementally MERGE into a denormalized gold Lakehouse table

  • B. Mirror the ERP database into Fabric without transformation

  • C. Create OneLake shortcuts to the four source tables

  • D. Load each source table fully into separate Warehouse tables

Best answer: A

Explanation: The requirement is not only to ingest the ERP data, but to simplify analytics queries. An incremental load into a denormalized gold table keeps the table current and removes the need for analysts to join the same normalized entities repeatedly.

Denormalization for analytics commonly materializes a curated table at the query grain, such as order-line grain, with frequently used descriptive attributes included. Because the source has ModifiedDate columns, the load can detect changed rows and use an incremental upsert, such as a MERGE, into a Lakehouse Delta table. This supports efficient refresh while producing a simplified table for downstream SQL, Spark, or semantic model use. Shortcuts, mirroring, or raw full loads can expose source data in Fabric, but they preserve the normalized query pattern unless an additional transformation creates the denormalized table.

  • Shortcuts only expose existing tables through OneLake but do not reshape normalized data into a single analytics table.
  • Mirroring only keeps the operational schema synchronized but does not create the required order-line wide table.
  • Separate full loads duplicate the normalized structure and increase refresh work without simplifying analyst queries.

Question 36

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline loads daily sales data. A Dataflow Gen2 activity reads a Lakehouse table that is created by a notebook, and a Warehouse stored procedure runs after the dataflow.

The latest run shows:

08:00:00 Dataflow_Load started
08:00:02 Notebook_CreateStage started
08:00:04 Dataflow_Load failed: table stg_sales not found
08:04:10 Notebook_CreateStage succeeded

On the pipeline canvas, both activities are connected directly from the start. What should you do next?

Options:

  • A. Pass stg_sales as a pipeline parameter to the dataflow.

  • B. Configure an on-success dependency from Notebook_CreateStage to Dataflow_Load.

  • C. Add a schedule offset so Dataflow_Load starts five minutes later.

  • D. Replace the Dataflow Gen2 activity with an Eventstream.

Best answer: B

Explanation: The failure is caused by an incorrect activity order, not by the dataflow logic itself. The Dataflow Gen2 activity starts before the notebook creates the required Lakehouse table, so the pipeline needs an explicit success dependency.

Fabric pipeline activities that consume outputs from earlier steps should be connected with dependency conditions, such as an on-success path. In this case, the Dataflow Gen2 activity reads stg_sales, but the notebook that creates stg_sales is running in parallel. The correct orchestration fix is to make the dataflow wait for the notebook to complete successfully. The downstream Warehouse stored procedure should remain after the dataflow so it only merges data that has been loaded successfully. A fixed delay or retry might hide the symptom, but it does not model the real dependency.

  • Schedule delay is brittle because the notebook duration can vary and the dataflow may still start too early.
  • Eventstream replacement targets streaming ingestion and does not fix this batch pipeline dependency.
  • Parameterizing the name can make pipelines reusable, but it does not create the missing table before the dataflow runs.

Question 37

Topic: Implement and Manage an Analytics Solution

You administer a Fabric workspace with 15 Dataflows Gen2 that load curated operational data into Lakehouse tables. The solution works, but refresh performance is inconsistent because Dataflows Gen2 staging is not standardized across the workspace. The team must improve reliability for current and future dataflows while preserving the existing Power Query logic and Lakehouse destinations. Which change should you make?

Options:

  • A. Configure a shared Dataflows Gen2 staging lakehouse in workspace settings.

  • B. Change each dataflow destination to a Warehouse.

  • C. Enable Fast Copy inside each Dataflow Gen2.

  • D. Increase the workspace Spark pool size.

Best answer: A

Explanation: Dataflows Gen2 workspace settings are used for shared workspace-level behavior such as staging configuration. Because the requirement is to standardize staging without changing individual dataflows, the improvement should be made in the workspace settings, not inside each dataflow.

Dataflows Gen2 workspace settings apply across the workspace and are the right place to configure shared staging behavior. This preserves existing Power Query transformations and Lakehouse destinations while improving consistency for multiple dataflows. Authoring choices such as Fast Copy, query steps, and destination mappings are configured inside a specific dataflow and do not satisfy the workspace-wide requirement. Spark pool settings are separate from Dataflows Gen2 execution and are not the control for Power Query-based dataflows.

The key distinction is whether the setting governs the workspace or only one authored dataflow item.

  • Fast Copy can improve some individual dataflows, but it is an authoring choice and does not standardize workspace staging.
  • Spark pool size affects Spark workloads such as notebooks, not Dataflows Gen2 staging behavior.
  • Warehouse destination changes the target workload and violates the requirement to keep Lakehouse destinations.

Question 38

Topic: Monitor and Optimize an Analytics Solution

An Eventstream lands IoT events into a Fabric Lakehouse Delta table named RawEvents. A scheduled PySpark notebook builds hourly aggregates in Curated.HourlyAggregates, but it scans many small files across 30 days and misses its SLA. Analysts must access only the aggregate results; raw events contain PII and must remain restricted to data engineers. Which implementation should you choose?

Options:

  • A. Apply dynamic data masking to RawEvents and grant analysts table access.

  • B. Maintain partitioned, compacted RawEvents; grant OneLake access only to Curated.HourlyAggregates.

  • C. Cache RawEvents in the notebook and grant Lakehouse item read access.

  • D. Grant analysts Workspace Viewer and apply sensitivity labels to RawEvents.

Best answer: B

Explanation: The best approach combines Spark table-layout optimization with least-privilege access. A partitioned, compacted Delta table helps the notebook avoid excessive small-file scanning, and OneLake security scoped to the curated aggregate table prevents analyst access to raw PII.

For Spark notebook performance, optimize the Delta data layout that the notebook reads. Partitioning by the filtering column, such as event date, enables partition pruning, and compaction reduces the overhead of reading many small files. For governance, analysts should receive access only to the curated aggregate table or folder through OneLake security, not to the workspace, Lakehouse item, or raw event data. This meets the SLA goal without weakening data protection. Sensitivity labels and dynamic data masking can support governance in other ways, but they do not replace least-privilege access to OneLake data or fix Spark small-file scan overhead.

  • Workspace role overgrant fails because Workspace Viewer can expose more items than analysts need, and sensitivity labels classify data rather than enforce access.
  • Notebook caching is session-scoped and does not address durable small-file layout, while Lakehouse item read access can expose raw PII.
  • Dynamic masking mismatch fails because masking SQL output does not optimize Spark scans and still grants access to the raw table.

Question 39

Topic: Ingest and Transform Data

An operations team uses Microsoft Fabric for two continuous event feeds: vehicle GPS telemetry through Eventstreams and toll-gate alarm events from a streaming source. Analysts must store the raw events in Real-Time Intelligence and query them directly with KQL for investigations and alert validation. The data must not land in a Lakehouse or Warehouse before it is queried.

Which TWO actions should you take? Select TWO.

Options:

  • A. Load alarm events into a Warehouse fact table.

  • B. Write GPS events to a Lakehouse Delta table.

  • C. Refresh events with Dataflows Gen2 on a schedule.

  • D. Create OneLake shortcuts to the raw event files.

  • E. Ingest alarm events into a native Eventhouse KQL database table.

  • F. Land the GPS Eventstream in a native Eventhouse table.

Correct answers: E and F

Explanation: Native Eventhouse tables are the appropriate choice when event data must be stored and queried directly in the Real-Time Intelligence workload. Routing an Eventstream to an Eventhouse table or ingesting directly into a native KQL database table keeps the data available for KQL without first landing it in a Lakehouse or Warehouse.

In Microsoft Fabric Real-Time Intelligence, Eventhouse is optimized for high-volume event and time-series data. A native table in an Eventhouse KQL database is the right storage target when raw event data must be persisted in the real-time workload and queried directly with KQL. Eventstreams can route continuous events to Eventhouse tables, and other supported ingestion paths can also load native Eventhouse tables.

Lakehouse Delta tables, Warehouse tables, shortcuts, and scheduled dataflows can be useful in other Fabric ingestion patterns, but they do not satisfy the requirement to store and query the events directly in Real-Time Intelligence as the primary workload.

  • Lakehouse storage fails because it lands the events in a Lakehouse instead of an Eventhouse native table.
  • Warehouse loading fails because it targets relational T-SQL storage, not Real-Time Intelligence KQL storage.
  • OneLake shortcuts fail because they virtualize existing files rather than creating native real-time tables.
  • Scheduled dataflows fail because they are refresh-oriented and do not provide direct native Eventhouse storage for streaming events.

Question 40

Topic: Implement and Manage an Analytics Solution

Your team uses the Fabric workspace SalesOps. A pipeline named pl_LoadSales contains a Notebook activity that calls nb_StandardizeSales. You shared only pl_LoadSales with the SalesLoadOperators security group, and the group is not assigned to any workspace role. Their run fails at the Notebook activity.

Error excerpt:

Activity: StandardizeSales
Status: Failed
Message: User is not authorized to access item nb_StandardizeSales.

The group must not receive access to unrelated workspace items. What should you do?

Options:

  • A. Grant access only to pipeline run history.

  • B. Add the group as Workspace Contributor.

  • C. Apply a sensitivity label to the notebook.

  • D. Share nb_StandardizeSales with required item-level permissions.

Best answer: D

Explanation: The pipeline was shared, but the referenced notebook remains a separate Fabric item. Because the error identifies missing access to nb_StandardizeSales, the fix is to grant item-level access to that notebook without broadening workspace permissions.

Fabric item-level access controls let you grant permissions to a specific artifact, such as a notebook, pipeline, lakehouse, or warehouse, without making the user a workspace member. In this scenario, the group can access the pipeline but the activity fails when it tries to use a notebook that was not shared with them. Sharing the notebook item with the required permissions addresses the failed activity and satisfies the requirement to avoid access to unrelated workspace items. A workspace role might also expose many other items, so it is not the least-privilege fix.

  • Workspace role is too broad because Contributor access would grant permissions across the workspace, contrary to the stated requirement.
  • Sensitivity labeling is not access because labels classify or protect content but do not grant notebook execution access by themselves.
  • Run history is insufficient because viewing pipeline results does not authorize access to the notebook artifact used by the activity.

Question 41

Topic: Ingest and Transform Data

A team has a working Fabric solution that loads daily sales into staging tables in a Fabric Warehouse. A pipeline then starts a notebook that reads the Warehouse tables with PySpark, performs joins and aggregations, and writes the curated fact table back to the same Warehouse. The notebook is the slowest step because it moves data out of the Warehouse engine. The result must remain a Warehouse table and continue to run from the pipeline.

What is the best improvement?

Options:

  • A. Rewrite the logic as a KQL function in an Eventhouse.

  • B. Move the joins to Spark Structured Streaming.

  • C. Create KQL update policies on the Warehouse tables.

  • D. Run a Warehouse T-SQL transformation from the pipeline.

Best answer: D

Explanation: For Fabric Warehouse transformations, T-SQL is the native query language and execution engine. Moving SQL-style joins and aggregations from PySpark into Warehouse T-SQL reduces data movement while preserving the required Warehouse target and pipeline orchestration.

The core issue is a language and engine mismatch. The data already lands in a Fabric Warehouse, and the required output must remain a Warehouse table. Running joins and aggregations in PySpark forces data to move between the Warehouse and Spark runtime, which adds overhead and failure points. A pipeline can orchestrate a Warehouse-based T-SQL transformation so the work stays close to the data and uses the Warehouse engine.

KQL is for Eventhouse and Real-Time Intelligence workloads, while Spark Structured Streaming is for streaming scenarios, not a nightly relational Warehouse load.

  • Eventhouse KQL fails because it changes the target engine and does not preserve the Warehouse table requirement.
  • Structured Streaming fails because the workload is a scheduled batch Warehouse transformation, not a streaming ingestion path.
  • KQL update policies fail because they apply to Eventhouse scenarios, not Fabric Warehouse tables.

Question 42

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline runs a Warehouse stored procedure and then validates a Lakehouse table that uses a OneLake shortcut. You are reviewing the activity output and must identify the failures that should be assigned to the T-SQL developer for query correction. Which TWO messages are T-SQL errors?

Options:

  • A. Pipeline activity failed before execution because the connection was invalid.

  • B. Shortcut target returned PathNotFound while resolving /sales/raw/.

  • C. Semantic model refresh failed after the Warehouse load completed.

  • D. Msg 207, Level 16: Invalid column name 'TotalAmount'.

  • E. OneLake shortcut access denied for the external storage location.

  • F. Msg 245, Level 16: Conversion failed converting 'N/A' to int.

Correct answers: D and F

Explanation: T-SQL errors are raised by the SQL engine during query compilation or execution. Messages such as invalid column names and failed data type conversions indicate problems in the SQL statement or the data operation being performed by that statement.

Fabric Warehouse and SQL query activity failures that include SQL-style messages such as Msg, Level, and a statement-specific cause are typically T-SQL errors. An invalid column name means the submitted query references metadata that does not exist or is not visible in that context. A conversion failure means the query attempted an incompatible cast or implicit conversion during execution. By contrast, OneLake shortcut resolution, shortcut authorization, invalid pipeline connections, and downstream semantic model refresh failures are operational or integration issues. They may appear near the SQL activity in monitoring output, but they are not T-SQL query errors that a SQL developer fixes by changing the statement.

  • Shortcut resolution fails because a missing shortcut target is a OneLake path issue, not a SQL syntax or runtime error.
  • Shortcut authorization fails because access to external storage must be fixed through permissions or credentials, not T-SQL logic.
  • Connection failure fails before the SQL statement runs, so it is an orchestration or connection configuration issue.
  • Semantic model refresh is downstream of the Warehouse load and does not identify a T-SQL error in the query output.

Question 43

Topic: Ingest and Transform Data

A Fabric Spark structured streaming notebook reads equipment telemetry from an Eventstream and writes 5-minute fault counts to a Lakehouse table. Counts for outage periods are lower than a nightly batch recomputation. Faults must be counted in the 5-minute window when they occurred.

Business timestamp column: eventTime
Window currently used by query: arrivedAt
Sample record: eventTime=09:01:20Z, arrivedAt=09:07:45Z
Duplicate keys detected: 0

What is the best fix?

Options:

  • A. Window on eventTime with a lateness watermark.

  • B. Deduplicate by deviceId before aggregation.

  • C. Increase processing parallelism for the Eventstream.

  • D. Partition the Lakehouse table by arrivedAt.

Best answer: A

Explanation: The stream is grouping by arrival time even though the business rule depends on when the fault occurred. Late records from the outage need event-time windowing, with a watermark set to tolerate expected lateness.

Event-time handling is required when streaming results depend on the time embedded in each event rather than the time the event reaches Fabric. In this scenario, a fault that occurred at 09:01 arrives at 09:07, so an arrival-time window places it in the wrong 5-minute bucket. A Spark structured streaming aggregation should use the eventTime column for the window and configure a watermark that is long enough for expected late arrivals. The key takeaway is that late and out-of-order events require event-time semantics, not just faster processing.

  • Throughput focus fails because faster processing does not reassign late records to the window when the fault occurred.
  • Deduplication focus fails because the evidence says duplicates were not detected.
  • Storage partitioning fails because partitioning can help reads but does not change streaming window semantics.

Question 44

Topic: Ingest and Transform Data

A shipping company streams package scan events into Microsoft Fabric by using Eventstreams. You must continuously load a Lakehouse table that contains scan counts per facility in 5-minute windows based on the event timestamp. Events can arrive up to 10 minutes late and must still be counted in the correct window; events arriving later must be dropped.

Which loading pattern should you use?

Options:

  • A. Pipeline copy activity scheduled every 5 minutes

  • B. Spark Structured Streaming with a 10-minute watermark

  • C. OneLake shortcut with Query acceleration enabled

  • D. Eventstream direct Lakehouse output grouped by ingestion time

Best answer: B

Explanation: Use Spark Structured Streaming with event-time windowing and a 10-minute watermark. This pattern matches the requirement to aggregate by the event timestamp while accepting late arrivals only within a defined lateness interval.

Late-arriving streaming events are handled by event-time processing plus watermarking. In Fabric, a notebook using Spark Structured Streaming can define a 5-minute window over the event timestamp and set a 10-minute watermark. The watermark tells the engine how long to keep state for possible late records, then drop records that arrive after the allowed delay. Scheduling a batch pipeline or grouping by ingestion time would use arrival time rather than the true event time, which can place delayed scans in the wrong window. Query acceleration for shortcuts improves query access to shortcut data; it does not implement streaming lateness semantics.

  • Scheduled batch copy misses the event-time lateness requirement and can assign delayed events to the wrong load interval.
  • Ingestion-time grouping fails because the business window must be based on the event timestamp, not arrival time.
  • Shortcut acceleration is a query optimization feature, not a streaming load pattern for watermarks or late-event dropping.

Question 45

Topic: Implement and Manage an Analytics Solution

A Fabric workspace contains a raw Lakehouse protected by OneLake security. Only the data engineering group can read /raw/hr; analysts can read only curated Warehouse tables. A new process must run when each HR file arrives, pass the file path at run time, hash national ID values before loading curated tables, and avoid granting analysts access to raw files or transformation code. Which implementation should you choose?

Options:

  • A. Use Dataflows Gen2 shared with analysts for self-service refresh.

  • B. Use a standalone notebook in the analysts’ workspace.

  • C. Use a pipeline Copy activity and rely on sensitivity labels.

  • D. Use a pipeline to trigger a secured masking notebook with parameters.

Best answer: D

Explanation: A Fabric pipeline is the best orchestration choice because the process is event-driven and must pass the arriving file path at run time. Invoking a secured notebook keeps the custom hashing logic and raw Lakehouse access restricted to the data engineering boundary.

The core decision is orchestration plus governed transformation. Pipelines are designed to coordinate Fabric activities, use triggers, and pass parameters such as a file path. A notebook is appropriate for custom PySpark logic such as hashing sensitive identifiers, but the pipeline should orchestrate it so the process can start from the file-arrival event and supply runtime values. Keeping the notebook and raw Lakehouse access restricted preserves OneLake and item-level security, while analysts receive only the curated Warehouse output. Sensitivity labels and sharing convenience do not replace masking or least-privilege access controls.

  • Shared Dataflow fails because analyst sharing introduces unnecessary access and does not best fit event-driven, parameterized custom hashing.
  • Analyst notebook fails because placing the logic in the analysts’ workspace can expose raw data access or transformation code.
  • Sensitivity labels only fail because labels classify or protect content but do not hash sensitive values before loading curated tables.

Question 46

Topic: Implement and Manage an Analytics Solution

Analysts report that lh_sales_gold is accessible from a shared link but does not appear when they filter Fabric items to Certified trusted assets. The lakehouse owner can set the item to Promoted, but the Certify option is disabled. The tenant certification setting is enabled only for the Data Governance Certifiers security group. What is the best fix?

Options:

  • A. Have an authorized certifier certify the lakehouse

  • B. Move the lakehouse to a Production deployment stage

  • C. Grant analysts Contributor access to the workspace

  • D. Apply a sensitivity label to the lakehouse

Best answer: A

Explanation: Fabric endorsement uses Promoted and Certified states to help users identify trusted assets. The evidence shows the item is accessible, but certification is unavailable to the owner because only a configured security group can certify items.

Fabric endorsement is governance metadata that helps users find trusted data assets. Promotion can be used to highlight useful content, but certification is a stronger organizational trust signal and is controlled by the tenant certification setting. In this scenario, the lakehouse is reachable and the owner can promote it, so permissions to view the item are not the problem. The disabled Certify option plus the configured certifier group indicates that certification must be performed by an approved member of that group, or the appropriate steward must be added to it before certifying.

  • Sensitivity label classifies or protects data, but it does not make an item appear as Certified.
  • Deployment stage supports lifecycle promotion between environments, not Fabric endorsement status.
  • Contributor access changes workspace permissions and is unnecessary because analysts can already access the lakehouse.

Question 47

Topic: Ingest and Transform Data

An HR workspace contains payroll source tables in a Microsoft Fabric Warehouse. Engineers must create a curated monthly summary table from permitted columns only. The security team requires the transformation to stay inside the Warehouse security boundary: no workspace Admin or Member assignment, no OneLake folder/file access to source data, and existing object permissions, column protections, and dynamic data masking must continue to govern access.

Which transformation approach should you choose?

Options:

  • A. Copy the data to an Eventhouse and use KQL.

  • B. Write Warehouse T-SQL with least-privileged SQL grants.

  • C. Run a notebook with OneLake source folder access.

  • D. Create Dataflows Gen2 with an admin-owned connection.

Best answer: B

Explanation: Warehouse T-SQL is the best fit because the data already resides in a Fabric Warehouse and the governance requirement depends on SQL security controls. Least-privileged SQL permissions keep object, column, and masking protections in the enforcement path without granting broader workspace or OneLake access.

Choosing between transformation engines should include both workload fit and governance boundaries. In this scenario, the transformation is relational, the source and target are in a Warehouse, and the required controls are Warehouse SQL controls such as object permissions, column protections, and dynamic data masking. Implementing the logic in T-SQL allows engineers to work through SQL permissions rather than receiving workspace roles or direct OneLake file access. Dataflows Gen2 and notebooks are valid transformation tools in other scenarios, but the described access model would broaden or shift the security boundary. KQL is intended for Eventhouse and real-time analytics workloads, not for preserving Warehouse governance over existing relational tables.

  • Notebook access fails because granting OneLake folder access would exceed the stated Warehouse-only security boundary.
  • Admin-owned Dataflow fails because a full-access shared connection can expose more data than the engineers are allowed to use.
  • Eventhouse copy fails because moving sensitive Warehouse data creates another governed store and does not preserve the existing Warehouse controls.

Question 48

Topic: Ingest and Transform Data

A data engineering team is choosing a Microsoft Fabric data store for monthly customer extracts. The solution must keep the data in OneLake in open Delta/Parquet files, support Spark notebook transformations and SQL analytics, and let analysts access only curated data while blocking raw PII files.

Which option meets the requirements without over-granting access?

Options:

  • A. Create a Warehouse and apply dynamic data masking.

  • B. Create an Eventhouse and use KQL table permissions.

  • C. Create a Lakehouse and make analysts workspace Contributors.

  • D. Create a Lakehouse and use OneLake data access roles.

Best answer: D

Explanation: A Lakehouse is the best fit when the data must remain in open data lake storage and be processed by Fabric analytics engines. OneLake data access roles can help restrict access to curated paths or tables without granting broad workspace privileges.

The core decision is the data store plus governance boundary. A Fabric Lakehouse stores data in OneLake using open formats such as Delta and Parquet, and it supports Spark notebook processing and SQL analytics over curated tables. To meet the governance requirement, use OneLake security, such as data access roles, so analysts can access curated data without being granted access to raw PII files. This preserves the open lake storage requirement while enforcing least privilege.

A Warehouse can support SQL and masking, but it does not satisfy the requirement to manage open lake files for Spark-based processing. An Eventhouse is optimized for Real-Time Intelligence and KQL workloads, not monthly open-format lake storage.

  • Warehouse masking protects displayed column values, but it does not meet the open Delta/Parquet lake storage requirement.
  • Eventhouse permissions fit KQL event analytics, not general open lake storage with Spark transformations.
  • Workspace Contributor would over-grant authoring and item access beyond curated analyst access.

Question 49

Topic: Ingest and Transform Data

A manufacturing company is designing a Fabric solution for equipment telemetry. Events arrive continuously from factory gateways, must be filtered during ingestion, and must be queryable within seconds for time-window anomaly investigations. Engineers will write KQL for operational analysis, while aggregated daily results will later be copied to a Lakehouse for batch reporting.

Which data store and loading pattern should you choose for the raw telemetry?

Options:

  • A. Eventstream into native Eventhouse tables

  • B. OneLake shortcut to the gateway files

  • C. Dataflows Gen2 into Lakehouse tables

  • D. Pipeline full load into a Warehouse

Best answer: A

Explanation: The raw telemetry needs streaming ingestion, near-real-time availability, and KQL-based time-window analysis. Native Eventhouse tables loaded by an Eventstream match those ingestion and consumption requirements, while the Lakehouse can remain a downstream batch target.

For continuously arriving telemetry that must be filtered as it is ingested and queried within seconds by using KQL, the appropriate Fabric target is an Eventhouse with native Real-Time Intelligence tables. Eventstreams provide the streaming ingestion and lightweight event processing path, and Eventhouse is optimized for high-volume, append-oriented time-series and log analytics workloads. The later daily batch copy to a Lakehouse is a downstream consumption pattern, not the best landing store for the raw operational stream.

The key distinction is that the first store must match the streaming and KQL investigation requirements, not the later batch reporting requirement.

  • Warehouse full load fits relational batch loading and T-SQL analytics, not second-level streaming telemetry investigation with KQL.
  • Dataflows Gen2 is better for scheduled data preparation, not continuous low-latency event ingestion.
  • OneLake shortcut exposes existing files without copying them, but it does not provide streaming filtering or native KQL event analytics.

Question 50

Topic: Implement and Manage an Analytics Solution

A finance workspace in Microsoft Fabric contains several Lakehouse notebooks that are started by different pipelines. Governance requires all Spark sessions in the workspace to use the same centrally approved Spark configuration for PII processing. The configuration must not depend on notebook authors adding code, changing pipeline logic, or granting extra item permissions. What should you do?

Options:

  • A. Grant notebook item permissions to developers.

  • B. Configure the property in workspace Spark settings.

  • C. Add spark.conf.set to each notebook.

  • D. Pass the value as a pipeline parameter.

Best answer: B

Explanation: Workspace-level Spark settings are the right control when the requirement is a centrally managed Spark configuration for the workspace. Notebook code and pipeline parameters are implementation-level choices, and item permissions address access, not Spark defaults.

Fabric Spark workspace settings are used to manage Spark configuration at the workspace scope, such as centrally approved properties or compute-related defaults. This fits a governance requirement that must apply consistently across notebooks started in different ways. Putting the setting in notebook code depends on every author using the same code, while a pipeline parameter only affects orchestrated runs that consume that parameter. Granting item permissions would increase access and still would not define the Spark configuration. The key distinction is that Spark configuration governance belongs in workspace Spark settings, not in per-notebook code, orchestration parameters, or item access control.

  • Notebook code fails because it relies on every notebook author to add and maintain the same configuration.
  • Pipeline parameters fail because they control orchestration inputs, not workspace-wide Spark defaults.
  • Item permissions fail because they grant access to items and do not configure Spark behavior.

Continue with full practice

Use the Microsoft DP-700 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try Microsoft DP-700 on Web View Microsoft DP-700 Practice Test

Focused topic pages

Free review resource

Read the Microsoft DP-700 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.

Revised on Thursday, May 14, 2026