Free Microsoft DP-750 Practice Questions: Deploy and Maintain Data Pipelines and Workloads

Practice 10 free Microsoft Certified: Azure Databricks Data Engineer Associate (Microsoft DP-750) questions on Deploy and Maintain Data Pipelines and Workloads, with answers, explanations, and the IT Mastery next step.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Microsoft DP-750 on Web

Topic snapshot

FieldDetail
Exam routeMicrosoft DP-750
Topic areaDeploy and Maintain Data Pipelines and Workloads
Blueprint weight32%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Deploy and Maintain Data Pipelines and Workloads for Microsoft DP-750. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 32% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These are original IT Mastery practice questions aligned to this topic area. They are not official Microsoft questions, copied live-exam content, or exam dumps. Use them for self-assessment, scope review, and deciding what to drill next.

Question 1

Topic: Deploy and Maintain Data Pipelines and Workloads

A CI service principal deploys a Databricks Asset Bundle for a Lakeflow Jobs workflow. The workflow writes to ${var.catalog}.silver.orders. The service principal has the required privileges only on the production Unity Catalog objects. You must preserve the production workspace target and least-privilege access.

Evidence:

targets:
  dev:
    workspace:
      host: https://adb-dev.example.azuredatabricks.net
    variables:
      catalog: sales_dev
  prod:
    workspace:
      host: https://adb-prod.example.azuredatabricks.net
    variables:
      catalog: sales_dev
Command: databricks bundle deploy -t prod
Target: prod
Workspace: https://adb-prod.example.azuredatabricks.net
Run failed: PERMISSION_DENIED: Missing USE CATALOG on catalog 'sales_dev'

Which change should you implement?

Options:

  • A. Recreate the workflow by using the Databricks REST API.

  • B. Set the prod target catalog variable to sales_prod and redeploy with -t prod.

  • C. Grant the CI service principal USE CATALOG on sales_dev.

  • D. Redeploy the bundle with databricks bundle deploy -t dev.

Best answer: B

Explanation: Databricks Asset Bundle targets define both where resources deploy and which target-specific variables are used. The CLI evidence confirms the deployment used the prod target and the production workspace host, so the workspace target is not the failure point. The failing catalog is sales_dev, which comes from the prod target variable in the bundle file. Because the CI service principal already has production Unity Catalog privileges, the least-privilege fix is to correct the production target’s catalog variable to the production catalog and redeploy to prod. Granting access to the dev catalog would hide the misconfiguration and allow a production workflow to write to the wrong governance boundary.

  • Granting dev access fails because it expands privileges and keeps the production workflow pointed at sales_dev.
  • Using the dev target fails because it deploys to the dev workspace instead of preserving production.
  • Switching to REST fails because the deployment mechanism is not the cause; the bundle target variable is wrong.

Question 2

Topic: Deploy and Maintain Data Pipelines and Workloads

An engineering team moved a multi-hop ingestion flow into a Lakeflow Job. The first task runs a notebook that previously served as a Lakeflow Spark Declarative Pipeline source. The run fails before downstream tasks start.

Run excerpt:

Task: build_bronze
Task type: Notebook
Notebook includes: streaming table declarations and expectations
Error: Declarative pipeline dataset definitions are not valid in this task context

What is the best root cause?

Options:

  • A. The downstream task dependency is configured in the wrong order.

  • B. The job schedule is disabled for the ingestion workflow.

  • C. Unity Catalog permissions are missing on the target schema.

  • D. Pipeline definitions are being run as a notebook job task.

Best answer: D

Explanation: The evidence points to a tool-boundary issue, not an orchestration dependency or schedule problem. Lakeflow Spark Declarative Pipelines are used to define pipeline datasets, dependencies inferred from those definitions, and data-quality expectations. Lakeflow Jobs are used to orchestrate tasks, dependencies, triggers, parameters, and retries. If declarative pipeline source code is executed directly as a standard notebook task, pipeline-specific dataset declarations are outside their intended execution context. The flow should remain defined as a Lakeflow Spark Declarative Pipeline and, if needed, be invoked or coordinated by a Lakeflow Job.

  • Dependency order does not explain why the first task fails before downstream tasks start.
  • Disabled schedule would prevent or delay triggering, not cause a running task to reject declarative pipeline definitions.
  • Missing schema permissions would usually surface as an access-denied error, not as an invalid task-context error.

Question 3

Topic: Deploy and Maintain Data Pipelines and Workloads

An Azure Databricks Lakeflow Job serves analyst queries against the Delta table prod.sales.events. The table is already partitioned by event_date. Most queries filter a 1- to 7-day date range and a small set of customer_id values.

EvidenceObservation
Partition pruningReads only matching event_date partitions
File pruningScans 80-90% of files inside those partitions
Compute profileNo spills or skew; Photon is enabled
Scaling testDoubling workers improved runtime by 3%

Which improvement should you configure next?

Options:

  • A. Double the job compute worker count.

  • B. Repartition the table by customer_id.

  • C. Schedule OPTIMIZE ZORDER BY (customer_id) on recent partitions.

  • D. Rewrite queries to remove event_date filters.

Best answer: C

Explanation: The workload already benefits from partition pruning on event_date, so changing the partitioning strategy is not the next best improvement. The remaining bottleneck is that queries scan most files inside the selected date partitions when filtering by customer_id. Clustering the data within those partitions, such as with OPTIMIZE ZORDER BY (customer_id), improves file-level data skipping for the selective customer filter. The compute evidence also matters: no spills, no skew, Photon enabled, and almost no gain from doubling workers indicate the issue is table layout rather than compute capacity.

  • Customer partitions are risky because high-cardinality partitioning can create many small partitions and metadata overhead.
  • More workers is weak because the scaling test showed only a small runtime improvement.
  • Removing date filters would defeat effective partition pruning and increase the amount of data scanned.

Question 4

Topic: Deploy and Maintain Data Pipelines and Workloads

A team deploys an Azure Databricks Lakeflow Jobs workflow from a CI pipeline. Developers changed the Databricks Asset Bundle to add a new task, but the release still runs an older workflow. The CI log shows:

Step: deploy_prod
Action: PATCH /api/2.1/jobs/update
Payload: jobs/prod-job.json
Bundle target: prod
Result: succeeded

The team requires command-line deployment automation. What is the best next diagnostic step?

Options:

  • A. Add a Unity Catalog grant on the target schema

  • B. Edit the production job manually in the workspace UI

  • C. Run the bundle deployment with Databricks CLI

  • D. Increase the job compute autoscaling range

Best answer: C

Explanation: For production deployment automation with Databricks Asset Bundles, the Databricks CLI is the appropriate command-line tool to validate and deploy the bundle target, such as prod. The log shows the pipeline is patching a standalone REST JSON payload, so changes in the bundle may not be reflected in the deployed workflow. The diagnostic focus should be whether CI is invoking the bundle through the CLI with the correct target and authentication, not whether the job can be edited manually.

The key takeaway is to keep command-line deployment automation tied to the bundle source of truth.

  • Manual UI edit fails because it bypasses the required automated deployment path.
  • Compute scaling does not explain why the newly added task is missing after deployment.
  • Catalog grants would affect data access at runtime, not whether the workflow definition was deployed from the bundle.

Question 5

Topic: Deploy and Maintain Data Pipelines and Workloads

A team stores an Azure Databricks Lakeflow Jobs workflow and notebooks in Git and packages them with Databricks Asset Bundles. A change to a transformation notebook must be validated before it can be deployed to the production catalog. Tests must run against a nonproduction Unity Catalog schema, and production deployment must occur only after the pull request is approved and checks pass. Which implementation should the team use?

Options:

  • A. Merge first, then run manual tests against the production catalog

  • B. Run bundle tests in a PR target, then deploy production from main

  • C. Run local sample-data tests, then deploy production from the feature branch

  • D. Deploy to production, then run a scheduled validation job afterward

Best answer: B

Explanation: For Azure Databricks code and pipeline changes, tests should be part of the pull request validation path before production deployment. With Databricks Asset Bundles, define separate targets such as test and production, point the test target to a nonproduction Unity Catalog schema, and run automated test tasks or test notebooks during CI. The pull request should require successful checks and approval before merge. Production deployment should then run only from the protected main branch using the production bundle target. This preserves environment isolation and prevents unvalidated code from reaching production.

  • Testing after merge fails because production can receive unvalidated code before the manual test result is known.
  • Local-only testing fails because it does not validate the deployed Databricks workflow against the required nonproduction Unity Catalog schema.
  • Post-deployment validation fails because it detects problems only after the production deployment has already occurred.

Question 6

Topic: Deploy and Maintain Data Pipelines and Workloads

An Azure Databricks team packages Lakeflow Jobs and SQL objects in a Databricks Asset Bundle. The bundle resources already reference ${var.catalog} and ${var.schema}. A production release pipeline must deploy the same bundle to the production workspace and use the prod Unity Catalog catalog and schema. Which bundle configuration should be used?

Options:

  • A. Set only the CLI profile to the production workspace.

  • B. Define a prod target with production workspace and variable overrides.

  • C. Hard-code prod catalog and schema in all resource definitions.

  • D. Create a separate bundle file for each production release.

Best answer: B

Explanation: Databricks Asset Bundles are designed to use targets for environment-specific deployment settings. A prod target can specify the production workspace, deployment mode, and variable overrides such as the Unity Catalog catalog and schema used by resources. The release pipeline can then deploy with the production target, for example by selecting prod, while keeping the bundle source reusable across environments. This avoids hard-coding production values into shared resources and keeps deployment behavior explicit and repeatable. A CLI profile may authenticate to a workspace, but it does not by itself change bundle variables or model the target environment.

  • Hard-coded values make the bundle less reusable and risk pushing production names into development deployments.
  • CLI profile only can select credentials or a host, but it does not set ${var.catalog} or ${var.schema}.
  • Separate bundle files increase drift risk and are unnecessary when targets provide environment-specific configuration.

Question 7

Topic: Deploy and Maintain Data Pipelines and Workloads

A production Lakeflow Job in Azure Databricks loads curated Delta tables each hour. The workspace diagnostic setting already streams job run logs to a Log Analytics workspace, and an Azure Monitor action group for the on-call team already exists. You must notify the team when any production job run fails, without changing notebook or pipeline code. What should you implement?

Options:

  • A. Add a Lakeflow task that writes failures to a Delta audit table.

  • B. Create a cluster policy that prevents nonproduction compute from running.

  • C. Enable email notifications in the Lakeflow Job settings for failed runs.

  • D. Create an Azure Monitor scheduled query alert on the Log Analytics job logs and attach the action group.

Best answer: D

Explanation: Azure Monitor alerts are appropriate when Azure Databricks operational events are already streamed to Log Analytics and the required response is notification through an Azure Monitor action group. A scheduled query alert can filter the Databricks job run logs for production job failures, evaluate the result on a defined frequency, and trigger the existing on-call action group. This preserves the constraint because it does not require changes to notebooks, Lakeflow pipeline code, or job task logic. Databricks-native job notifications can be useful, but they do not satisfy a requirement to use Azure Monitor for operational alerting.

  • Job email notification may alert on failures, but it bypasses the required Azure Monitor action group.
  • Audit table task changes workload logic and only records failures unless another alerting mechanism is added.
  • Cluster policy controls compute configuration and does not detect or notify on failed job runs.

Question 8

Topic: Deploy and Maintain Data Pipelines and Workloads

A data engineering team runs a nightly Lakeflow Jobs workload on Azure Databricks job compute. They need ongoing monitoring that can flag clusters that are consistently idle, CPU-saturated, or memory-constrained so they can resize or tune the workload. Which configuration best supports this requirement?

Options:

  • A. Stream cluster metrics to Log Analytics and create Azure Monitor alerts

  • B. Enable task alerts only for failed Lakeflow Jobs tasks

  • C. Set an automatic restart policy on the job compute

  • D. Grant users access to the Spark UI for the cluster

Best answer: A

Explanation: For ongoing cluster consumption monitoring, configure Azure Databricks telemetry to flow into Log Analytics in Azure Monitor, then define Azure Monitor alerts against resource metrics such as CPU utilization, memory pressure, and sustained idle time. This supports operational detection of underusing, overusing, or misusing compute resources without relying on manual inspection after each run.

Job task alerts and automatic restarts help with failure handling, but they do not show whether the cluster size matches the workload. Spark UI access is useful for interactive troubleshooting, but it is not an alerting configuration for continuous resource-consumption monitoring.

  • Failure alerts miss healthy but inefficient runs, such as clusters that are oversized or consistently CPU-bound.
  • Automatic restart improves recovery behavior but does not measure or classify compute utilization.
  • Spark UI access helps investigate a single run manually, but it does not provide ongoing alerting across runs.

Question 9

Topic: Deploy and Maintain Data Pipelines and Workloads

An Azure Databricks team streams Spark execution logs to Log Analytics in Azure Monitor for a slow Lakeflow Jobs task. They want an Azure Monitor alert that specifically identifies data skew when one partition causes imbalanced task execution. Which alert condition should they configure?

Options:

  • A. One stage has a much higher maximum task duration than its median task duration

  • B. The job run duration exceeds its expected service-level objective

  • C. The query profile shows a high total shuffle read volume

  • D. The cluster CPU utilization stays below the target utilization range

Best answer: A

Explanation: Data skew is identified by uneven work distribution across tasks in the same stage. In Spark UI, query profiles, or streamed execution telemetry, the key symptom is imbalance: most tasks complete normally while one or a few tasks take much longer, often with much larger input or shuffle data. An alert based on the gap between maximum task duration and median task duration targets that imbalance directly. A slow overall run can have many causes, and high shuffle volume can be normal for large joins or aggregations. The skew-specific signal is the outlier task pattern within a stage.

  • Overall duration can indicate a missed SLA, but it does not distinguish skew from slow compute, queuing, or inefficient logic.
  • Low CPU utilization may suggest underused compute or waiting, but it is not a direct uneven-partition signal.
  • High shuffle volume indicates data movement, but skew requires imbalance across tasks, not just a large shuffle.

Question 10

Topic: Deploy and Maintain Data Pipelines and Workloads

An Azure Databricks job loads a 9 TB Delta table, but analyst queries against main.sales.events are slow. The table has continuous appends and no current clustering. Most queries filter by customer_id and recent event_date.

SignalObservation
Files scanned83,000 of 91,000
Selectivity0.3% rows match
Filter columnscustomer_id, event_date
Distinct customer_idabout 1.8 million

Which optimization approach best follows the evidence while avoiding unnecessary maintenance?

Options:

  • A. Enable liquid clustering on customer_id and event_date.

  • B. Partition the table by customer_id.

  • C. Run VACUUM with a shorter retention period.

  • D. Schedule daily Z-ordering on customer_id.

Best answer: A

Explanation: The symptom is poor data skipping: the query is highly selective, but it scans nearly all files. Because customer_id has very high cardinality, partitioning by that column would create many small partition directories and ongoing layout management. Liquid clustering is a better fit for large Delta tables when query filters target high-cardinality columns and the workload needs improved pruning without rigid partition choices. It can cluster data around customer_id and event_date so fewer files must be read for common analyst queries.

Z-ordering can help in some cases, but it typically adds recurring manual optimization decisions. VACUUM manages old files for storage cleanup; it does not reorganize active data for query pruning.

  • High-cardinality partitions add many directories and small-file pressure when used with millions of customer values.
  • Daily Z-ordering may improve locality, but it creates more recurring maintenance than the better-fit clustering approach.
  • Shorter VACUUM retention removes obsolete files, but it does not address why selective queries scan most active files.

Continue in the web app

Use IT Mastery for interactive Microsoft DP-750 practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Microsoft DP-750 on Web

Browse Certification Practice Tests by Exam Family