Microsoft DP-700: Analytics Optimization

May 1, 2026

Try 10 focused Microsoft DP-700 questions on Analytics Optimization, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try Microsoft DP-700 on Web View full Microsoft DP-700 practice page

Topic snapshot

Field	Detail
Exam route	Microsoft DP-700
Topic area	Monitor and Optimize an Analytics Solution
Blueprint weight	33%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Monitor and Optimize an Analytics Solution for Microsoft DP-700. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 33% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Monitor and Optimize an Analytics Solution

A Fabric workspace processes IoT telemetry. A pipeline scheduled every 15 minutes runs a PySpark notebook that currently scans the full raw Lakehouse table and then filters by EventTime. The team needs batch aggregates available within 20 minutes.

Recent metrics:

Eventstream latency: 12 seconds (target: <60 seconds)
Eventhouse P95 query response: 1.8 seconds (target: <5 seconds)
Notebook duration: 42 minutes (target: <10 minutes)
Semantic model refresh: 4 minutes (target: <8 minutes)

Which next step should you take?

Options:

A. Trigger the pipeline from each Eventstream batch.
B. Replace the notebook with a Dataflows Gen2 full refresh.
C. Pass dynamic window parameters to the notebook.
D. Optimize the Eventhouse KQL queries.

Best answer: C

Explanation: The metrics identify the PySpark notebook as the optimization target. Streaming latency, Eventhouse query response, and semantic model refresh are all within target, while the notebook exceeds both its duration target and the pipeline interval.

Choose the component whose metric violates the requirement and is on the critical path. Here, the scheduled pipeline is blocked by a notebook that takes 42 minutes for a 15-minute cycle. Use pipeline dynamic expressions to calculate values such as windowStart and windowEnd, pass them as notebook parameters, and have the PySpark job read only the required event-time range instead of scanning the full raw table. This targets the actual Spark processing bottleneck without changing the healthy streaming or query path.

Changing triggers or refreshing more often would create more overlapping work rather than reduce notebook runtime.

Eventhouse tuning is unnecessary because the P95 query response is already below the target.
Dataflows full refresh would still perform broad batch processing and does not address the stated full-table scan pattern.
Eventstream-triggered runs would increase orchestration frequency while the notebook is already too slow for the current schedule.

Question 2

Topic: Monitor and Optimize an Analytics Solution

Your team curates sales data in Delta tables in a Microsoft Fabric Lakehouse. A nightly notebook appends many small Parquet files, and analysts report slow scan-heavy queries from Spark and the SQL analytics endpoint. You must optimize the Lakehouse tables in place and avoid changing the data to a Fabric Warehouse. Which TWO actions should you perform?

Options:

A. Run OPTIMIZE on the Delta tables.
B. Create statistics on a copied Warehouse table.
C. Rewrite the tables with V-Order enabled.
D. Add a materialized view in a Warehouse.
E. Use Warehouse query insights to tune T-SQL queries.
F. Create clustered columnstore indexes on the SQL analytics endpoint.

Correct answers: A and C

Explanation: Lakehouse table optimization focuses on Delta and Parquet file layout in OneLake. For many small files and slow scans, compaction with OPTIMIZE and V-Order are the Fabric Lakehouse actions that improve the table in place.

The core distinction is the optimization target. A Fabric Lakehouse table is a Delta table stored as Parquet files in OneLake, so table optimization changes file size, layout, and encoding. OPTIMIZE addresses the small-file problem by compacting files, while V-Order improves scan efficiency for Fabric engines. These actions keep the data in the Lakehouse and benefit Spark and the SQL analytics endpoint over the same Delta data.

Warehouse optimizations apply to Warehouse objects, T-SQL workloads, or copied data. They do not optimize the original Lakehouse Delta files in place.

SQL endpoint indexing fails because the SQL analytics endpoint over a Lakehouse is not where you create indexes to optimize Delta files.
Warehouse statistics fail because they apply to a Warehouse copy, not the original Lakehouse table.
Query insights fail because they help investigate Warehouse T-SQL behavior, not compact or reorder Lakehouse files.
Materialized Warehouse view fails because it creates a Warehouse optimization path instead of optimizing the Lakehouse table in place.

Question 3

Topic: Monitor and Optimize an Analytics Solution

A Fabric warehouse contains fact.Sales used by an hourly aggregation query. The query is slow only during 08:00–08:20. Outside that window, the same SQL finishes quickly. Row counts and product distribution are normal, and the query text has not changed. A pipeline starts 60 parallel append loads at 08:00 and triggers the aggregation at 08:05 on a fixed schedule, even when loads are still running. Which orchestration change should you make first?

Options:

A. Rewrite the aggregation in a notebook.
B. Move the aggregation into Dataflows Gen2.
C. Gate the aggregation on successful load completion.
D. Split fact.Sales into regional tables.

Best answer: C

Explanation: The performance issue aligns with concurrent upstream ingestion, not a stable warehouse design, query, or data-shape problem. Because the aggregation starts while many append loads are still running, the best first change is to orchestrate the dependency in the pipeline.

When the same warehouse query is fast outside the load window and slow only while parallel append loads are running, the likely cause is upstream ingestion behavior. The evidence rules out common alternatives: the query text is unchanged, the row distribution is normal, and the slowdown is time-bound to active loads. A pipeline dependency should make the aggregation wait until all relevant load activities succeed, which coordinates batch ingestion and downstream warehouse processing without unnecessary redesign.

Dataflows Gen2 move changes the transformation tool but does not address the fixed schedule racing active warehouse loads.
Notebook rewrite adds another processing engine without evidence that the SQL logic is the bottleneck.
Regional table split is a design change, but the data shape is normal and the issue is limited to the ingestion window.

Question 4

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline runs a PySpark notebook that transforms bronze order events into the silver_orders Delta table in a Lakehouse. The target is expected to contain one current row per order_id. After a fix for late-arriving status changes, the pipeline succeeds, but QA finds duplicates.

Run evidence:

source rows selected: 100,000
order_id values already in silver_orders: 2,400
rows written by notebook: 100,000
duplicate order_id values after the run: 2,400
write step: mode("append")
pipeline retry policy: enabled

Which fix should you validate before promoting the notebook?

Options:

A. MERGE latest rows by order_id and repeat the same run.
B. Disable pipeline retries and keep the append write.
C. Use dropDuplicates(["order_id"]) before append.
D. Cache the source DataFrame and increase shuffle partitions.

Best answer: A

Explanation: The evidence points to a non-idempotent append, not a Spark capacity problem. Because existing order_id values are being written again, the notebook should upsert the latest row into the Delta target and be validated by rerunning the same input to prove duplicates are not reintroduced.

Notebook fixes for data transformations should be validated for both business correctness and repeatability. In this case, silver_orders has one row per order_id, but the notebook appends late-arriving updates that already exist in the target. A Delta MERGE keyed by order_id, after selecting the latest event for each order, updates existing rows and inserts only new orders. Rerunning the same batch is an operational reliability test: a retry or rerun should not change counts or create duplicate keys. Disabling retries would hide the failure mode rather than fixing it.

Disabling retries reduces resilience and leaves an append path that can still duplicate late-arriving updates.
Deduplicating only the source batch does not compare against existing target rows and may keep the wrong status event.
Caching or increasing partitions targets performance, not the incorrect write semantics shown by the duplicate keys.

Question 5

Topic: Monitor and Optimize an Analytics Solution

A team uses a Microsoft Fabric Warehouse in a production workspace for sales analytics. A nightly pipeline loads dbo.SalesFact; the load completes on time. Analysts report that a T-SQL report query against the Warehouse is slow.

Monitoring evidence:

Capacity metrics show no sustained throttling during the query window.
Query insights show most elapsed time is spent scanning dbo.SalesFact; 1.2 billion rows read and 45,000 rows returned.
The query filters with WHERE CONVERT(date, OrderTimestamp) = @OrderDate and selects many unused columns before aggregating.

You must reduce query latency without changing the business result. What should you do?

Options:

A. Repartition the notebook load for dbo.SalesFact.
B. Increase the Fabric capacity for the workspace.
C. Move the report query to an Eventhouse.
D. Filter OrderTimestamp by range and select only needed columns.

Best answer: D

Explanation: The best optimization target is the T-SQL query shape in the Warehouse. The performance evidence shows the query reads far more rows than it returns while capacity and ingestion are not bottlenecks, so making the predicate sargable and reducing projection should reduce scan work.

When performance evidence points to excessive row scanning in a Warehouse query, tune the query before scaling capacity or optimizing upstream jobs. Applying a function to the filtered column, such as CONVERT(date, OrderTimestamp), can force more row evaluation before the date comparison. A range predicate, such as OrderTimestamp >= @StartTime AND OrderTimestamp < @EndTime, preserves the same daily result while allowing the engine to eliminate more data. Selecting only columns needed by the aggregate also reduces data movement and memory pressure. Since capacity metrics do not show throttling and the pipeline load completes on time, capacity scaling and notebook tuning target the wrong issue.

Capacity scaling can help resource pressure, but the metrics do not show throttling and it will not fix the avoidable scan.
Notebook repartitioning targets load performance, but the load already completes on time and query insights isolate query scanning.
Eventhouse migration fits event and time-series KQL workloads, not a relational T-SQL Warehouse report over sales data.

Question 6

Topic: Monitor and Optimize an Analytics Solution

A Fabric workspace contains a Warehouse used by the Sales semantic model. Reports for the current business day show no sales after the scheduled refresh.

Exhibit: Monitoring evidence

02:00  Pipeline LoadSales started
02:44  Notebook BuildGoldTables succeeded
02:52  T-SQL activity LoadFactSales started
03:00  Sales semantic model refresh started
03:06  Sales semantic model refresh succeeded
03:18  T-SQL activity LoadFactSales succeeded

What should you do to prevent this issue from recurring?

Options:

A. Move the notebook to a separate workspace.
B. Enable incremental refresh on FactSales.
C. Increase the semantic model refresh timeout.
D. Orchestrate the refresh after LoadFactSales succeeds.

Best answer: D

Explanation: The semantic model refresh is succeeding, but it runs before the upstream Warehouse load completes. The fix is to make the refresh dependent on the successful completion of the Fabric data engineering process.

When a semantic model depends on Fabric pipeline, notebook, Dataflow Gen2, or Warehouse activities, the refresh schedule must align with upstream data readiness. In this case, monitoring shows the semantic model refresh completed at 03:06, but the LoadFactSales activity did not finish until 03:18. The model therefore cached data before the new fact rows were available. Orchestrating the semantic model refresh as a downstream step after the load succeeds prevents stale or missing data from being published.

Timeout tuning does not help because the refresh succeeded; it simply started too early.
Incremental refresh can reduce refresh volume but does not guarantee upstream data is ready.
Workspace separation does not address the dependency order between the Warehouse load and the refresh.

Question 7

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline triggers a Dataflow Gen2 refresh that reads CSV files from a OneLake shortcut, transforms the data, and writes to a Lakehouse table. The refresh fails before any rows are written.

Refresh output excerpt:

Entity: SalesStaging
Step: Changed Type
Column: NetAmount
Error: DataFormat.Error: We couldn't convert to Number.
Details: N/A

Which issue should you identify?

Options:

A. Missing pipeline permission on the Lakehouse item
B. Unavailable OneLake shortcut target
C. Invalid numeric conversion in a transformation step
D. Schema mismatch during Lakehouse table write

Best answer: C

Explanation: The failure occurs in the Dataflow Gen2 transformation output, not during pipeline orchestration or Lakehouse writing. The Changed Type step attempted to convert NetAmount to a number, but a value such as N/A could not be converted.

Dataflow Gen2 refresh errors often identify the entity, applied step, column, and Power Query error message. In this case, the named step is Changed Type, the affected column is NetAmount, and the message says the value could not be converted to a number. Because the refresh fails before any rows are written, the root cause is in the transformation logic or source data quality, not the destination write path.

The appropriate resolution would be to adjust the type conversion, replace or filter invalid values, or add error-handling logic before the numeric conversion step.

Pipeline permissions do not fit because the error is raised inside a named transformation step.
Shortcut availability would typically fail during source access, not during Changed Type conversion.
Lakehouse schema write does not fit because the refresh fails before rows are written.

Question 8

Topic: Monitor and Optimize an Analytics Solution

A Fabric pipeline loads transaction rows every hour into a Warehouse table named dbo.FactSales. The load meets its SLA, and Query insights shows the bottleneck is a T-SQL query that scans the warehouse table. Analysts must keep hourly freshness and transaction-level drill-through.

SELECT CustomerId, SUM(Amount) AS SalesAmount
FROM dbo.FactSales
WHERE CAST(SaleDateTime AS date) = @BusinessDate
GROUP BY CustomerId;

Which improvement should you make?

Options:

A. Rewrite the filter as a sargable date-time range.
B. Add a notebook to pre-aggregate sales before loading.
C. Move the source ingestion to an Eventstream.
D. Reduce the pipeline schedule from hourly to daily.

Best answer: A

Explanation: The stated bottleneck is the warehouse query, not ingestion or orchestration. Rewriting the predicate to avoid applying a function to the filtered column preserves the workload requirements while improving SQL query performance.

For a Warehouse query performance issue, optimize the SQL statement before changing a pipeline or ingestion pattern. Applying CAST to SaleDateTime in the WHERE clause can prevent efficient predicate evaluation. A sargable range, such as SaleDateTime >= @StartDateTime AND SaleDateTime < @EndDateTime, expresses the same business-date filter while giving the optimizer a better predicate to work with.

Changing the pipeline cadence, adding a preprocessing notebook, or switching ingestion technology targets a different layer and can violate the hourly freshness or transaction-level detail requirements.

Pipeline cadence fails because daily loading violates the hourly freshness requirement.
Pre-aggregation fails because it changes the workload design and can reduce transaction-level drill-through.
Eventstream ingestion fails because the evidence points to warehouse query execution, not source ingestion latency.

Question 9

Topic: Monitor and Optimize an Analytics Solution

You support a Fabric production workspace that contains a Dataflow Gen2 item used to ingest finance data from an on-premises SQL Server through a gateway and load a Warehouse table. After deployment from Dev to Prod, the scheduled refresh fails.

DataSource.Error: The connection 'PersonalGateway-FinanceSql' is not available to this user.
Action: Reconfigure credentials or select another connection.

The production refresh must run unattended, must not depend on an analyst’s personal credentials, and the transformations must remain in Dataflow Gen2. Which design should you implement?

Options:

A. Bind the dataflow to a shared gateway connection with production credentials
B. Pass the SQL password from a pipeline parameter
C. Grant the analyst Workspace Admin and keep the personal connection
D. Increase Fabric capacity and add refresh retries

Best answer: A

Explanation: Dataflow Gen2 source access is controlled by the configured connection and its stored credentials. The failure indicates that the deployed dataflow cannot use the personal gateway connection, so production should use a shared gateway connection with valid production credentials.

Dataflow Gen2 refreshes use the credentials and gateway binding stored in the selected connection. An error stating that the connection is unavailable or credentials must be reconfigured points to the connection object, not capacity or transformation logic. For production, create or update an on-premises gateway connection with a production database credential, share that connection with the dataflow owner or operators as needed, and update the dataflow source to use it. This keeps Power Query transformations in Dataflow Gen2 while removing dependence on a personal connection. Workspace Admin rights alone do not make a personal gateway connection suitable for unattended production refresh.

Workspace Admin workaround fails because it keeps the refresh dependent on an analyst’s personal connection.
Pipeline password parameter fails because Dataflow Gen2 uses managed connection credentials, not ad hoc pipeline passwords for source authentication.
Capacity and retries fail because the error is authentication and connection access, not a transient capacity issue.

Question 10

Topic: Monitor and Optimize an Analytics Solution

A Fabric workspace uses an Eventstream to load device telemetry into an Eventhouse table named Telemetry. A dashboard tile that should show only the last 2 hours is slow. Eventstream metrics show no backlog or dropped events, and capacity metrics show no throttling.

KQL query:
Telemetry
| where DeviceId == "D-431"
| summarize AvgTemp = avg(Temperature) by bin(EventTime, 1m)

Diagnostics:
Extents scanned: 100%
Highest CPU: Eventhouse query execution

Which change is the best fix?

Options:

A. Increase the Fabric capacity size.
B. Increase the Eventstream partition count.
C. Add an EventTime >= ago(2h) filter.
D. Run V-Order optimization on a Lakehouse table.

Best answer: C

Explanation: The evidence points to the Eventhouse query, not ingestion or capacity. Because the tile only needs the last 2 hours but the KQL query has no time predicate, Eventhouse scans all extents unnecessarily. Adding a time filter targets the actual bottleneck.

Eventhouse query performance should be optimized from the query diagnostics and the KQL pattern. Here, the Eventstream is healthy and capacity is not throttled, but the query scans 100% of extents while the business requirement is only a 2-hour window. Adding a selective EventTime >= ago(2h) predicate early in the KQL query lets the engine reduce the amount of data scanned before aggregation.

A capacity increase is a generic response and does not address the missing filter. The key takeaway is to tune the specific engine and item identified by the evidence.

Capacity scaling fails because the metrics show no throttling, so more capacity is not the primary fix.
Lakehouse optimization targets Delta tables, not the Eventhouse KQL query shown in the diagnostics.
Eventstream partitions affect ingestion throughput, but the Eventstream has no backlog or dropped events.

Continue with full practice

Use the Microsoft DP-700 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try Microsoft DP-700 on Web View Microsoft DP-700 Practice Test

Free review resource

Read the Microsoft DP-700 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026

Ingest and Transform Data

Free Practice Exam

Browse Certification Practice Tests by Exam Family

Microsoft DP-700: Analytics Optimization

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue with full practice

Related focused pages

Free review resource