Databricks Data Engineer Associate: Governance and Quality

Try 10 focused Databricks Data Engineer Associate questions on Governance and Quality, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try Databricks Data Engineer Associate on Web View full Databricks Data Engineer Associate practice page

Topic snapshot

FieldDetail
Exam routeDatabricks Data Engineer Associate
Topic areaData Governance & Quality
Blueprint weight35%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Data Governance & Quality for Databricks Data Engineer Associate. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 35% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Data Governance & Quality

A data engineer needs to give an external business partner read-only access to the latest curated orders data while keeping governance in Unity Catalog. They run:

CREATE SHARE partner_share;
ALTER SHARE partner_share ADD TABLE main.sales.gold_orders;
GRANT SELECT ON SHARE partner_share TO RECIPIENT retail_partner;

What is the best interpretation of this setup?

Options:

  • A. Direct query access to the provider’s external database through federation

  • B. Creation of an external table that the partner must attach to its storage

  • C. Write access to the shared table through Unity Catalog permissions

  • D. Governed read-only sharing of current table data without publishing a separate copy

Best answer: D

Explanation: The exhibit shows Delta Sharing SQL: a share is created, a table is added, and a recipient is granted SELECT on that share. This provides governed, read-only access to shared data without requiring the provider to first publish a separate exported copy for the partner.

Delta Sharing is Databricks’ governed data-sharing capability for giving internal or external recipients access to data such as Unity Catalog tables. In the exhibit, the provider creates a share, adds the gold_orders table to that share, and grants a recipient permission to read it. That means the partner consumes shared data through the share itself, while the provider keeps control over what is exposed. The important idea is controlled sharing, not creating and managing a full duplicate dataset for every consumer.

  • CREATE SHARE defines the share object.
  • ALTER SHARE ... ADD TABLE exposes a specific table.
  • GRANT SELECT ON SHARE ... TO RECIPIENT gives the recipient read-only access.

The closest distractor is federation, which is for querying external systems in place rather than publishing Databricks-managed data to recipients.

  • Federation mix-up confuses querying an external source in place with sharing a Databricks table to a recipient.
  • Write access fails because recipients of a share get read-only access, not permission to modify the provider’s table.
  • External table fails because the SQL shown defines a share and recipient access, not storage attachment or external table creation.

Question 2

Topic: Data Governance & Quality

A team can query an external operational database through Lakehouse Federation in Unity Catalog. Which requirement most strongly suggests they should ingest the data into Databricks-managed Delta tables instead of relying only on federated access?

Options:

  • A. Running a few ad hoc read-only analyses on operational data

  • B. Applying Unity Catalog governance to external source objects

  • C. Querying current external data without copying it into Databricks

  • D. Building curated silver and gold datasets with repeated joins and quality checks

Best answer: D

Explanation: Lakehouse Federation is designed for governed access to external data in place. When the workload requires substantial transformation and curation into downstream datasets, ingesting the data into Delta tables is usually the better fit for Databricks processing and optimization.

Lakehouse Federation is best when data should remain in the external system and Databricks mainly needs to query it in place. Typical fits include lightweight exploration, governed read access, or occasional analysis without building a full ingestion pipeline. By contrast, if a team needs repeated heavy transformations, quality enforcement, and curated silver or gold datasets, that usually points to bringing the data into Databricks-managed Delta tables.

Materializing the data in the lakehouse supports a more complete engineering workflow: transformation, curation, and downstream consumption on data stored and managed by Databricks. Federation is useful access, but it is not the best default for every performance- and curation-focused pipeline. The key distinction is simple in-place access versus a fully managed transformation workflow.

  • In-place querying matches a federation use case because the goal is to access current external data without moving it first.
  • Governed external access also fits federation because Unity Catalog can control access to federated objects.
  • Ad hoc read-only analysis usually does not justify ingesting and curating the data inside Databricks.

Question 3

Topic: Data Governance & Quality

A platform admin is troubleshooting this error in an internal analytics workspace:

SELECT * FROM main.sales.orders
--> PERMISSION_DENIED

The producer and analytics workspaces are both attached to the same Unity Catalog metastore. The producer team assumed Delta Sharing was required because the teams use different workspaces, but the requirement is only read-only access for an internal Databricks group. What is the best next step?

Options:

  • A. Recreate the consumer as an open sharing recipient.

  • B. Grant Unity Catalog privileges directly to the analytics group.

  • C. Configure Lakehouse Federation between the workspaces.

  • D. Recreate the consumer as a Databricks recipient.

Best answer: B

Explanation: When both teams are already in the same Unity Catalog metastore, access should usually be managed with Unity Catalog permissions. The issue is a permission problem inside one governance boundary, not a case that requires Delta Sharing.

The core concept is choosing the simplest governed access pattern that matches the boundary. If both workspaces are attached to the same Unity Catalog metastore, the tables are already under the same governance layer, so the right fix is to grant the internal group the needed Unity Catalog privileges, typically USE CATALOG, USE SCHEMA, and SELECT.

Delta Sharing is more appropriate when data must be shared across governance or platform boundaries, such as to another Databricks account/metastore or to a non-Databricks consumer. Lakehouse Federation is for querying external systems from Databricks, not for granting access to tables that are already governed in Unity Catalog. The key signal in the stem is that the producer and consumer are internal and share the same metastore.

  • Databricks recipient is unnecessary because the consumer is not outside the current Unity Catalog governance boundary.
  • Open sharing is meant for open-protocol or non-Databricks consumers, not an internal group already using the same metastore.
  • Lakehouse Federation connects Databricks to external data sources; it does not replace normal Unity Catalog permissions for internal tables.

Question 4

Topic: Data Governance & Quality

A daily Databricks workflow refreshes a staging table and now fails.

Exhibit:

Task: refresh_staging
Status: Failed
Error: [PATH_NOT_FOUND] Table location is missing
Catalog object: main.ops.orders_stg
Table type: EXTERNAL

The storage folder was deleted outside Databricks during cleanup. No external system needs direct file access, and the team wants Databricks to manage storage location and data cleanup automatically. What is the best next step?

Options:

  • A. Replace it with a Lakehouse Federation connection.

  • B. Recreate it as a Unity Catalog managed table.

  • C. Point the external table to a new storage path.

  • D. Run MSCK REPAIR TABLE on the table.

Best answer: B

Explanation: This failure happened because the external table depended on a storage location that was deleted outside Databricks. When no outside system needs file-level access and the goal is simpler lifecycle management, a Unity Catalog managed table is the best fit.

A managed table is the right choice when Databricks should handle both metadata and underlying data lifecycle. In this case, the workflow failed because the external table pointed to a storage path that someone removed outside Databricks. Since the team does not need other systems to access the files directly, keeping an external table only preserves the same operational risk.

With a Unity Catalog managed table, Databricks manages the table’s storage location and lifecycle behavior for that table. That reduces dependence on separately maintained object-storage paths and better matches the requirement for automatic cleanup behavior when the table is dropped.

The key takeaway is that a missing-path failure plus a desire for Databricks-managed lifecycle points to a managed table.

  • Pointing the external table to a new path may restore access, but it keeps the same externally managed lifecycle that caused the failure.
  • MSCK REPAIR TABLE helps with partition metadata discovery, not a deleted table location or lifecycle ownership.
  • Lakehouse Federation is for querying external systems in place, not for storing and managing this Delta table’s files.

Question 5

Topic: Data Governance & Quality

An analytics team needs governed, read-only exploration of ERP data from Databricks. They want to avoid duplicating the source data unless it later becomes part of a pipeline. Based on the exhibit, what is the best interpretation?

CREATE FOREIGN CATALOG erp_live
USING CONNECTION sqlserver_ops
OPTIONS (database 'erp');

SELECT customer_id, status
FROM erp_live.sales.orders
LIMIT 10;

Options:

  • A. It creates managed Delta tables stored in Databricks-managed storage.

  • B. It uses Lakehouse Federation for governed queries without copying ERP data.

  • C. It starts Auto Loader to ingest ERP data into bronze.

  • D. It uses Delta Sharing to publish ERP data to recipients.

Best answer: B

Explanation: The exhibit uses CREATE FOREIGN CATALOG with a Unity Catalog connection, which is the Lakehouse Federation pattern for querying external data sources. This is appropriate when teams want governed access in Databricks while avoiding an extra copy of the source data.

The key signal is CREATE FOREIGN CATALOG ... USING CONNECTION. In Databricks, that indicates Lakehouse Federation: an external database is exposed through Unity Catalog so users can query it from Databricks with centralized governance, without first ingesting the data into Delta tables. That is a strong fit for exploratory analysis or controlled read access when reducing duplication matters.

If the team later needs lakehouse-native transformations, optimized storage, or long-term retention, they can still ingest the data into Delta tables afterward. But the exhibit shows direct governed access to external data, not ingestion or outbound sharing. This differs from features that copy data into Databricks or publish Databricks data outward.

  • Sharing direction: the Delta Sharing option fails because the exhibit queries an external database from Databricks rather than publishing Databricks data to recipients.
  • Storage location: the managed-table option fails because no step in the exhibit copies data into Databricks-managed storage.
  • Ingestion pattern: the Auto Loader option fails because the exhibit defines a foreign catalog, not a file-based streaming ingestion workflow.

Question 6

Topic: Data Governance & Quality

A data team needs analysts to run ad hoc SQL from Databricks against a PostgreSQL operational database. The data must stay in the source system, access must be governed through Unity Catalog, and analysts may join the results to managed lakehouse tables. What is the best solution?

Options:

  • A. Create a Lakehouse Federation connection and foreign catalog.

  • B. Register the database as Unity Catalog external tables.

  • C. Ingest database exports with Auto Loader into Bronze Delta tables.

  • D. Use Delta Sharing to access the operational database tables.

Best answer: A

Explanation: Lakehouse Federation is the best fit when Databricks needs to query external database tables without copying them first. It keeps data in the source system while still using Unity Catalog governance, which matches the scenario’s core requirements.

Lakehouse Federation is used for in-place querying of external systems, such as relational databases, from Databricks. You create a connection and a foreign catalog in Unity Catalog, then users query the external objects with familiar SQL while permissions are governed centrally. That matches this scenario because the team explicitly does not want full ingestion, yet still needs governed access and the ability to join source data to managed lakehouse tables.

Ingestion is the better pattern when Databricks must store, transform, optimize, or retain the data inside a pipeline. Here, the requirement is governed access to live source data, so federation is the simpler and more appropriate choice.

  • Ingesting database exports with Auto Loader copies data into Delta tables, which breaks the requirement to keep data in the source system.
  • Using Delta Sharing fits secure sharing of Delta tables and views, not live querying of an operational database from Databricks.
  • Registering Unity Catalog external tables applies to files in external storage, not relational database tables queried in place.

Question 7

Topic: Data Governance & Quality

A data team consumes a Delta Sharing table and can run its scheduled transformation job in either an AWS or Azure Databricks workspace.

Exhibit:

Job: daily_customer_rollup
Current workspace: Azure East US
Source type: Delta Sharing table
Source: `sales_provider.share.raw_orders`
Source cloud/region: AWS us-east-1
Data read per run: 3.2 TB
Output per run: 25 GB summary table
Largest extra charge: cross-cloud data transfer

Which next step best reduces unnecessary cross-cloud movement cost without creating a full second copy of the source data?

Options:

  • A. Replace the share with Lakehouse Federation

  • B. Materialize a full Azure managed copy of the source

  • C. Run the job in AWS and share only the summary

  • D. Keep the job in Azure and use serverless SQL

Best answer: C

Explanation: The exhibit shows that the repeated 3.2 TB read from AWS into Azure is driving the extra cost. Co-locating the transformation with the AWS-hosted shared data reduces cross-cloud movement by sending only the much smaller summary result.

Cross-cloud sharing lets a team access data without first copying it, but it does not eliminate network transfer charges when compute runs in another cloud. In this exhibit, the expensive pattern is a large 3.2 TB read from an AWS-hosted shared table into Azure for every run, while the output is only 25 GB. That is a strong signal to place the heavy transformation near the source data.

  • Read the shared data in AWS.
  • Perform the aggregation there.
  • Share or move only the smaller summary if Azure users still need it.

Changing the compute type or governance method does not change the cross-cloud data path, and creating a full duplicate violates the stated constraint. The key takeaway is to co-locate high-volume access with the data whenever possible.

  • Full copy would avoid repeated reads later, but it creates a second full dataset and the stem explicitly rules that out.
  • Serverless in Azure changes the compute service, not the fact that 3.2 TB is still read across clouds.
  • Lakehouse Federation is for querying external systems, not for optimizing an already shared Delta dataset in this scenario.

Question 8

Topic: Data Governance & Quality

A team uses Auto Loader to ingest JSON files into a Unity Catalog bronze table, then a Lakeflow Spark Declarative Pipeline builds silver and gold tables. The source system owner plans to change the upstream schema next week. Before approving the change, the team must identify which downstream tables and notebooks depend on that source by using a built-in Databricks governance capability. What is the best next action?

Options:

  • A. Redeploy the pipeline with Databricks Asset Bundles.

  • B. Use Unity Catalog lineage to inspect downstream dependencies.

  • C. Publish the bronze table through Delta Sharing.

  • D. Review workspace audit logs for recent reads and writes.

Best answer: B

Explanation: Use Unity Catalog lineage when the goal is impact analysis for an upstream change. It traces upstream and downstream relationships across governed data assets so the team can see which tables and notebooks may be affected before the schema change.

Unity Catalog lineage is the Databricks feature used for impact analysis when an upstream table or data source is going to change. In this scenario, data lands in a Unity Catalog bronze table and then flows through a Lakeflow Spark Declarative Pipeline into silver and gold layers, so lineage can trace how that source feeds downstream assets. That makes it the right built-in governance capability for estimating the blast radius of a schema change before deployment.

Lineage answers dependency questions, not activity, sharing, or deployment questions. Audit logs help with investigation and compliance, while Delta Sharing and Databricks Asset Bundles solve different problems. For dependency-aware change planning, lineage is the best fit.

  • Audit focus: Audit logs show who accessed or changed assets, not the downstream dependency graph needed for impact analysis.
  • Sharing mismatch: Delta Sharing is for securely sharing data with recipients, not for mapping internal dependencies.
  • Deployment mismatch: Databricks Asset Bundles package and deploy resources, but they do not identify which downstream assets depend on an upstream source.

Question 9

Topic: Data Governance & Quality

A provider shares a Unity Catalog table by using Delta Sharing with an external recipient on a different cloud. One week later, finance reports higher costs.

Exhibit: Monitoring note

Sharing method: Delta Sharing
Recipient location: different cloud

Last 7 days:
- SQL warehouse DBUs: 420 -> 419
- Job compute DBUs: 188 -> 191
- Outbound data transfer: 0.3 TB -> 2.8 TB

Which is the best interpretation of this cost pattern?

Options:

  • A. Delta table maintenance operations are the main new cost concern.

  • B. Unity Catalog permission evaluation on the shared table is the main new cost concern.

  • C. Cross-cloud data transfer from shared reads is the main new cost concern.

  • D. Provider-side Databricks compute for the recipient’s queries is the main new cost concern.

Best answer: C

Explanation: The exhibit shows compute staying essentially flat while outbound data transfer rises sharply after cross-cloud Delta Sharing begins. That pattern indicates a data-transfer or sharing-related cost concern, not a Databricks compute-cost increase.

This scenario is testing the difference between compute cost signals and cross-cloud sharing cost signals. SQL warehouse DBUs and job DBUs are common indicators of Databricks compute usage, and both remain nearly unchanged here. The only metric that changes materially is outbound data transfer, and the share is being consumed from a different cloud. That makes cross-cloud data movement the most likely new cost driver.

With Delta Sharing, a recipient can read shared data without the provider needing a matching increase in warehouse or job compute for each recipient query. If compute were the main issue, the provider would typically see a noticeable rise in DBUs. The key takeaway is to investigate transferred data volume and read patterns before tuning compute.

  • Provider compute is not the best fit because the warehouse and job DBU metrics stay almost flat.
  • Permission checks are governance metadata activity and do not explain the large jump in outbound transfer.
  • Table maintenance would usually show up as added compute activity rather than isolated transfer growth.

Question 10

Topic: Data Governance & Quality

A data engineering team publishes a Delta Sharing share from Unity Catalog to an outside supplier.

Recipient type configured: Databricks-to-Databricks
Supplier environment: non-Databricks analytics platform
Result: onboarding failed

The provider wants the simplest governed sharing model whenever possible, but this supplier must consume the data outside Databricks. Which statement best describes the best next step?

Options:

  • A. Keep Databricks-to-Databricks sharing and grant the supplier direct SELECT on the shared tables.

  • B. Replace Delta Sharing with Lakehouse Federation so the supplier can read the provider’s data externally.

  • C. Switch to open sharing; Databricks-to-Databricks is simpler to govern only when both parties use Databricks.

  • D. Move the data to external tables and share cloud storage access for equivalent governance.

Best answer: C

Explanation: The failure comes from a recipient-type mismatch, not from a table permission problem. Databricks-to-Databricks sharing is the easier governed choice when both provider and recipient use Databricks, but open sharing is the right fit when collaboration must extend to non-Databricks consumers.

Delta Sharing gives you two common patterns at this level. Databricks-to-Databricks sharing is usually the simplest governed option because both sides stay within Databricks-native sharing workflows. But that convenience depends on the recipient also being a Databricks user.

In this scenario, onboarding fails because the supplier is outside Databricks. That means the best fix is to use open sharing for that recipient instead of trying to force a Databricks-to-Databricks setup.

  • Use Databricks-to-Databricks sharing for Databricks recipients.
  • Use open sharing for broader cross-platform collaboration.

The core tradeoff is simpler Databricks-native governance versus wider recipient compatibility.

  • Granting direct SELECT access does not solve sharing to an outside organization that is not consuming data through the provider’s Databricks environment.
  • Lakehouse Federation is for querying external systems from Databricks, not for publishing Databricks data to external consumers.
  • Sharing cloud storage for external tables bypasses the managed Delta Sharing model and does not provide the same governed sharing experience.

Continue with full practice

Use the Databricks Data Engineer Associate Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try Databricks Data Engineer Associate on Web View Databricks Data Engineer Associate Practice Test

Free review resource

Read the Databricks Data Engineer Associate Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026