Try 10 focused Databricks Data Engineer Associate questions on Governance and Quality, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try Databricks Data Engineer Associate on Web View full Databricks Data Engineer Associate practice page
| Field | Detail |
|---|---|
| Exam route | Databricks Data Engineer Associate |
| Topic area | Data Governance & Quality |
| Blueprint weight | 35% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Data Governance & Quality for Databricks Data Engineer Associate. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 35% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.
Topic: Data Governance & Quality
A data engineer needs to give an external business partner read-only access to the latest curated orders data while keeping governance in Unity Catalog. They run:
CREATE SHARE partner_share;
ALTER SHARE partner_share ADD TABLE main.sales.gold_orders;
GRANT SELECT ON SHARE partner_share TO RECIPIENT retail_partner;
What is the best interpretation of this setup?
Options:
A. Direct query access to the provider’s external database through federation
B. Creation of an external table that the partner must attach to its storage
C. Write access to the shared table through Unity Catalog permissions
D. Governed read-only sharing of current table data without publishing a separate copy
Best answer: D
Explanation: The exhibit shows Delta Sharing SQL: a share is created, a table is added, and a recipient is granted SELECT on that share. This provides governed, read-only access to shared data without requiring the provider to first publish a separate exported copy for the partner.
Delta Sharing is Databricks’ governed data-sharing capability for giving internal or external recipients access to data such as Unity Catalog tables. In the exhibit, the provider creates a share, adds the gold_orders table to that share, and grants a recipient permission to read it. That means the partner consumes shared data through the share itself, while the provider keeps control over what is exposed. The important idea is controlled sharing, not creating and managing a full duplicate dataset for every consumer.
CREATE SHARE defines the share object.ALTER SHARE ... ADD TABLE exposes a specific table.GRANT SELECT ON SHARE ... TO RECIPIENT gives the recipient read-only access.The closest distractor is federation, which is for querying external systems in place rather than publishing Databricks-managed data to recipients.
Topic: Data Governance & Quality
A team can query an external operational database through Lakehouse Federation in Unity Catalog. Which requirement most strongly suggests they should ingest the data into Databricks-managed Delta tables instead of relying only on federated access?
Options:
A. Running a few ad hoc read-only analyses on operational data
B. Applying Unity Catalog governance to external source objects
C. Querying current external data without copying it into Databricks
D. Building curated silver and gold datasets with repeated joins and quality checks
Best answer: D
Explanation: Lakehouse Federation is designed for governed access to external data in place. When the workload requires substantial transformation and curation into downstream datasets, ingesting the data into Delta tables is usually the better fit for Databricks processing and optimization.
Lakehouse Federation is best when data should remain in the external system and Databricks mainly needs to query it in place. Typical fits include lightweight exploration, governed read access, or occasional analysis without building a full ingestion pipeline. By contrast, if a team needs repeated heavy transformations, quality enforcement, and curated silver or gold datasets, that usually points to bringing the data into Databricks-managed Delta tables.
Materializing the data in the lakehouse supports a more complete engineering workflow: transformation, curation, and downstream consumption on data stored and managed by Databricks. Federation is useful access, but it is not the best default for every performance- and curation-focused pipeline. The key distinction is simple in-place access versus a fully managed transformation workflow.
Topic: Data Governance & Quality
A platform admin is troubleshooting this error in an internal analytics workspace:
SELECT * FROM main.sales.orders
--> PERMISSION_DENIED
The producer and analytics workspaces are both attached to the same Unity Catalog metastore. The producer team assumed Delta Sharing was required because the teams use different workspaces, but the requirement is only read-only access for an internal Databricks group. What is the best next step?
Options:
A. Recreate the consumer as an open sharing recipient.
B. Grant Unity Catalog privileges directly to the analytics group.
C. Configure Lakehouse Federation between the workspaces.
D. Recreate the consumer as a Databricks recipient.
Best answer: B
Explanation: When both teams are already in the same Unity Catalog metastore, access should usually be managed with Unity Catalog permissions. The issue is a permission problem inside one governance boundary, not a case that requires Delta Sharing.
The core concept is choosing the simplest governed access pattern that matches the boundary. If both workspaces are attached to the same Unity Catalog metastore, the tables are already under the same governance layer, so the right fix is to grant the internal group the needed Unity Catalog privileges, typically USE CATALOG, USE SCHEMA, and SELECT.
Delta Sharing is more appropriate when data must be shared across governance or platform boundaries, such as to another Databricks account/metastore or to a non-Databricks consumer. Lakehouse Federation is for querying external systems from Databricks, not for granting access to tables that are already governed in Unity Catalog. The key signal in the stem is that the producer and consumer are internal and share the same metastore.
Topic: Data Governance & Quality
A daily Databricks workflow refreshes a staging table and now fails.
Exhibit:
Task: refresh_staging
Status: Failed
Error: [PATH_NOT_FOUND] Table location is missing
Catalog object: main.ops.orders_stg
Table type: EXTERNAL
The storage folder was deleted outside Databricks during cleanup. No external system needs direct file access, and the team wants Databricks to manage storage location and data cleanup automatically. What is the best next step?
Options:
A. Replace it with a Lakehouse Federation connection.
B. Recreate it as a Unity Catalog managed table.
C. Point the external table to a new storage path.
D. Run MSCK REPAIR TABLE on the table.
Best answer: B
Explanation: This failure happened because the external table depended on a storage location that was deleted outside Databricks. When no outside system needs file-level access and the goal is simpler lifecycle management, a Unity Catalog managed table is the best fit.
A managed table is the right choice when Databricks should handle both metadata and underlying data lifecycle. In this case, the workflow failed because the external table pointed to a storage path that someone removed outside Databricks. Since the team does not need other systems to access the files directly, keeping an external table only preserves the same operational risk.
With a Unity Catalog managed table, Databricks manages the table’s storage location and lifecycle behavior for that table. That reduces dependence on separately maintained object-storage paths and better matches the requirement for automatic cleanup behavior when the table is dropped.
The key takeaway is that a missing-path failure plus a desire for Databricks-managed lifecycle points to a managed table.
MSCK REPAIR TABLE helps with partition metadata discovery, not a deleted table location or lifecycle ownership.Topic: Data Governance & Quality
An analytics team needs governed, read-only exploration of ERP data from Databricks. They want to avoid duplicating the source data unless it later becomes part of a pipeline. Based on the exhibit, what is the best interpretation?
CREATE FOREIGN CATALOG erp_live
USING CONNECTION sqlserver_ops
OPTIONS (database 'erp');
SELECT customer_id, status
FROM erp_live.sales.orders
LIMIT 10;
Options:
A. It creates managed Delta tables stored in Databricks-managed storage.
B. It uses Lakehouse Federation for governed queries without copying ERP data.
C. It starts Auto Loader to ingest ERP data into bronze.
D. It uses Delta Sharing to publish ERP data to recipients.
Best answer: B
Explanation: The exhibit uses CREATE FOREIGN CATALOG with a Unity Catalog connection, which is the Lakehouse Federation pattern for querying external data sources. This is appropriate when teams want governed access in Databricks while avoiding an extra copy of the source data.
The key signal is CREATE FOREIGN CATALOG ... USING CONNECTION. In Databricks, that indicates Lakehouse Federation: an external database is exposed through Unity Catalog so users can query it from Databricks with centralized governance, without first ingesting the data into Delta tables. That is a strong fit for exploratory analysis or controlled read access when reducing duplication matters.
If the team later needs lakehouse-native transformations, optimized storage, or long-term retention, they can still ingest the data into Delta tables afterward. But the exhibit shows direct governed access to external data, not ingestion or outbound sharing. This differs from features that copy data into Databricks or publish Databricks data outward.
Topic: Data Governance & Quality
A data team needs analysts to run ad hoc SQL from Databricks against a PostgreSQL operational database. The data must stay in the source system, access must be governed through Unity Catalog, and analysts may join the results to managed lakehouse tables. What is the best solution?
Options:
A. Create a Lakehouse Federation connection and foreign catalog.
B. Register the database as Unity Catalog external tables.
C. Ingest database exports with Auto Loader into Bronze Delta tables.
D. Use Delta Sharing to access the operational database tables.
Best answer: A
Explanation: Lakehouse Federation is the best fit when Databricks needs to query external database tables without copying them first. It keeps data in the source system while still using Unity Catalog governance, which matches the scenario’s core requirements.
Lakehouse Federation is used for in-place querying of external systems, such as relational databases, from Databricks. You create a connection and a foreign catalog in Unity Catalog, then users query the external objects with familiar SQL while permissions are governed centrally. That matches this scenario because the team explicitly does not want full ingestion, yet still needs governed access and the ability to join source data to managed lakehouse tables.
Ingestion is the better pattern when Databricks must store, transform, optimize, or retain the data inside a pipeline. Here, the requirement is governed access to live source data, so federation is the simpler and more appropriate choice.
Topic: Data Governance & Quality
A data team consumes a Delta Sharing table and can run its scheduled transformation job in either an AWS or Azure Databricks workspace.
Exhibit:
Job: daily_customer_rollup
Current workspace: Azure East US
Source type: Delta Sharing table
Source: `sales_provider.share.raw_orders`
Source cloud/region: AWS us-east-1
Data read per run: 3.2 TB
Output per run: 25 GB summary table
Largest extra charge: cross-cloud data transfer
Which next step best reduces unnecessary cross-cloud movement cost without creating a full second copy of the source data?
Options:
A. Replace the share with Lakehouse Federation
B. Materialize a full Azure managed copy of the source
C. Run the job in AWS and share only the summary
D. Keep the job in Azure and use serverless SQL
Best answer: C
Explanation: The exhibit shows that the repeated 3.2 TB read from AWS into Azure is driving the extra cost. Co-locating the transformation with the AWS-hosted shared data reduces cross-cloud movement by sending only the much smaller summary result.
Cross-cloud sharing lets a team access data without first copying it, but it does not eliminate network transfer charges when compute runs in another cloud. In this exhibit, the expensive pattern is a large 3.2 TB read from an AWS-hosted shared table into Azure for every run, while the output is only 25 GB. That is a strong signal to place the heavy transformation near the source data.
Changing the compute type or governance method does not change the cross-cloud data path, and creating a full duplicate violates the stated constraint. The key takeaway is to co-locate high-volume access with the data whenever possible.
Topic: Data Governance & Quality
A team uses Auto Loader to ingest JSON files into a Unity Catalog bronze table, then a Lakeflow Spark Declarative Pipeline builds silver and gold tables. The source system owner plans to change the upstream schema next week. Before approving the change, the team must identify which downstream tables and notebooks depend on that source by using a built-in Databricks governance capability. What is the best next action?
Options:
A. Redeploy the pipeline with Databricks Asset Bundles.
B. Use Unity Catalog lineage to inspect downstream dependencies.
C. Publish the bronze table through Delta Sharing.
D. Review workspace audit logs for recent reads and writes.
Best answer: B
Explanation: Use Unity Catalog lineage when the goal is impact analysis for an upstream change. It traces upstream and downstream relationships across governed data assets so the team can see which tables and notebooks may be affected before the schema change.
Unity Catalog lineage is the Databricks feature used for impact analysis when an upstream table or data source is going to change. In this scenario, data lands in a Unity Catalog bronze table and then flows through a Lakeflow Spark Declarative Pipeline into silver and gold layers, so lineage can trace how that source feeds downstream assets. That makes it the right built-in governance capability for estimating the blast radius of a schema change before deployment.
Lineage answers dependency questions, not activity, sharing, or deployment questions. Audit logs help with investigation and compliance, while Delta Sharing and Databricks Asset Bundles solve different problems. For dependency-aware change planning, lineage is the best fit.
Topic: Data Governance & Quality
A provider shares a Unity Catalog table by using Delta Sharing with an external recipient on a different cloud. One week later, finance reports higher costs.
Exhibit: Monitoring note
Sharing method: Delta Sharing
Recipient location: different cloud
Last 7 days:
- SQL warehouse DBUs: 420 -> 419
- Job compute DBUs: 188 -> 191
- Outbound data transfer: 0.3 TB -> 2.8 TB
Which is the best interpretation of this cost pattern?
Options:
A. Delta table maintenance operations are the main new cost concern.
B. Unity Catalog permission evaluation on the shared table is the main new cost concern.
C. Cross-cloud data transfer from shared reads is the main new cost concern.
D. Provider-side Databricks compute for the recipient’s queries is the main new cost concern.
Best answer: C
Explanation: The exhibit shows compute staying essentially flat while outbound data transfer rises sharply after cross-cloud Delta Sharing begins. That pattern indicates a data-transfer or sharing-related cost concern, not a Databricks compute-cost increase.
This scenario is testing the difference between compute cost signals and cross-cloud sharing cost signals. SQL warehouse DBUs and job DBUs are common indicators of Databricks compute usage, and both remain nearly unchanged here. The only metric that changes materially is outbound data transfer, and the share is being consumed from a different cloud. That makes cross-cloud data movement the most likely new cost driver.
With Delta Sharing, a recipient can read shared data without the provider needing a matching increase in warehouse or job compute for each recipient query. If compute were the main issue, the provider would typically see a noticeable rise in DBUs. The key takeaway is to investigate transferred data volume and read patterns before tuning compute.
Topic: Data Governance & Quality
A data engineering team publishes a Delta Sharing share from Unity Catalog to an outside supplier.
Recipient type configured: Databricks-to-Databricks
Supplier environment: non-Databricks analytics platform
Result: onboarding failed
The provider wants the simplest governed sharing model whenever possible, but this supplier must consume the data outside Databricks. Which statement best describes the best next step?
Options:
A. Keep Databricks-to-Databricks sharing and grant the supplier direct SELECT on the shared tables.
B. Replace Delta Sharing with Lakehouse Federation so the supplier can read the provider’s data externally.
C. Switch to open sharing; Databricks-to-Databricks is simpler to govern only when both parties use Databricks.
D. Move the data to external tables and share cloud storage access for equivalent governance.
Best answer: C
Explanation: The failure comes from a recipient-type mismatch, not from a table permission problem. Databricks-to-Databricks sharing is the easier governed choice when both provider and recipient use Databricks, but open sharing is the right fit when collaboration must extend to non-Databricks consumers.
Delta Sharing gives you two common patterns at this level. Databricks-to-Databricks sharing is usually the simplest governed option because both sides stay within Databricks-native sharing workflows. But that convenience depends on the recipient also being a Databricks user.
In this scenario, onboarding fails because the supplier is outside Databricks. That means the best fix is to use open sharing for that recipient instead of trying to force a Databricks-to-Databricks setup.
The core tradeoff is simpler Databricks-native governance versus wider recipient compatibility.
SELECT access does not solve sharing to an outside organization that is not consuming data through the provider’s Databricks environment.Use the Databricks Data Engineer Associate Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try Databricks Data Engineer Associate on Web View Databricks Data Engineer Associate Practice Test
Read the Databricks Data Engineer Associate Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.