SnowPro Data Engineer DEA-C02: Compute and Governance

Try 10 focused SnowPro Data Engineer DEA-C02 questions on Compute and Governance, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try SnowPro Data Engineer DEA-C02 on Web View full SnowPro Data Engineer DEA-C02 practice page

Topic snapshot

FieldDetail
Exam routeSnowPro Data Engineer DEA-C02
Topic areaCompute, Governance, Observability, and Performance
Blueprint weight16%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Compute, Governance, Observability, and Performance for SnowPro Data Engineer DEA-C02. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 16% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Compute, Governance, Observability, and Performance

A data engineering team needs to reduce latency for a frequent point-lookup workload and wants the most targeted optimization.

Query:
SELECT *
FROM sales_fact
WHERE order_id = 'O-918274';

Query profile summary:
- Rows returned: 1
- Partitions scanned: 7,942 / 7,948
- Scan time: 93%
- Bytes spilled to local/remote storage: 0
- Warehouse: LARGE
- Frequency: ~40,000 executions/day

Which optimization approach best matches this bottleneck?

Options:

  • A. Define a cluster key on order_id

  • B. Resize the warehouse to 2XLARGE

  • C. Add search optimization on equality for order_id

  • D. Enable Query Acceleration Service

Best answer: C

Explanation: The bottleneck is micro-partition pruning, not warehouse memory or spill. A highly selective equality predicate that scans nearly every partition is the classic fit for Search Optimization Service on the filtered column.

This workload is dominated by scan time, and the profile shows almost every partition is being scanned to return a single row. That means the main problem is poor pruning for an exact-match lookup, not insufficient compute: there is no local or remote spill, and the warehouse is already fairly large.

Search Optimization Service is the most targeted fix for frequent, selective lookups like WHERE order_id = ... because it helps Snowflake find matching rows without scanning nearly all micro-partitions. That aligns directly with the observed bottleneck and avoids using broader compute-based acceleration for a pruning problem.

A larger warehouse treats compute capacity, while clustering on a high-cardinality lookup column usually adds more maintenance overhead than needed for this pattern.

  • Clustering mismatch a cluster key is usually a less targeted choice for exact-match point lookups and can add ongoing reclustering cost.
  • More compute resizing the warehouse does not address scanning almost every partition when the query is not spilling.
  • Wrong accelerator Query Acceleration Service can help some scan-heavy workloads, but it is not the best first match for repeated equality lookups with poor pruning.

Question 2

Topic: Compute, Governance, Observability, and Performance

A team runs BI dashboards and hourly MERGE jobs on the same warehouse. Dashboards slow down only while the MERGE jobs are running, query history shows almost no queued_overload_time, and warehouse spend is within budget. Which term best describes the primary concern?

Options:

  • A. Warehouse cost

  • B. Query acceleration

  • C. Workload isolation

  • D. Warehouse queuing

Best answer: C

Explanation: This scenario indicates workload isolation. The stem rules out the other main concerns by stating that queue time is minimal and spending is acceptable, so the key issue is mixed workloads competing on the same warehouse.

This pattern points to workload isolation. In Snowflake, if dashboard queries become slow only when batch MERGE jobs are active on the same warehouse, the workloads are contending for shared compute resources. Because queued_overload_time is almost zero, the main issue is not that queries are waiting in line to start. Because spend is already within budget, cost is not the dominant concern either. The clearest signal is interference between interactive and batch activity on shared compute. A common design response is to separate those workloads onto different warehouses so each gets predictable performance and independent scaling behavior. Query acceleration is meant for eligible long-running queries, not for isolating overlapping workload types.

  • The warehouse queuing choice fails because the stem says queued_overload_time is almost zero.
  • The warehouse cost choice fails because spend is already within budget, so overspend is not the main symptom.
  • The query acceleration choice fails because the slowdown appears only during overlapping workloads, not from a consistently scan-heavy query pattern.

Question 3

Topic: Compute, Governance, Observability, and Performance

A data engineering team loads an 18 TB FACT_EVENTS table from many unordered micro-batches. Dashboard queries that filter 3- to 14-day windows on event_date have slowed from seconds to minutes.

  • Table scan is the dominant operator
  • 93% of micro-partitions scanned
  • Queued overload time: 0 ms
  • Warehouse load: low

What is the best next step?

Options:

  • A. Define a clustering key on event_date

  • B. Enable search optimization service

  • C. Scale up the warehouse size

  • D. Create materialized views for dashboards

Best answer: A

Explanation: This symptom points to a storage-layout problem, not a compute-capacity problem. When large tables are repeatedly filtered by the same range column and query profiles show most micro-partitions are scanned, clustering is the Snowflake optimization that most directly improves pruning.

Clustering matters when query performance is poor because Snowflake cannot prune enough micro-partitions. In this scenario, the warehouse is not queued, load is low, and the scan operator reads 93% of partitions even for selective date windows. That means the main issue is not insufficient compute; it is weak data locality for the dominant filter pattern.

Defining a clustering key on event_date helps Snowflake organize similar date values together across micro-partitions, so range predicates can skip more data. This is especially useful on very large tables with repeated time-window filters and unordered incremental loads, which tend to degrade natural clustering over time. The key takeaway is that clustering becomes material when poor pruning, not warehouse saturation, is the clear bottleneck.

  • More compute misses the root cause because the profile shows no queuing and the scan is still reading almost all partitions.
  • Search optimization is a better fit for highly selective lookups than recurring date-range scans across a huge fact table.
  • Materialized views can help fixed query patterns, but they are a narrower and heavier solution than fixing poor pruning on the base table.

Question 4

Topic: Compute, Governance, Observability, and Performance

An engineer investigates a nightly ELT batch that slows down only when many transform queries start at the same time.

Exhibit:

Warehouse: ETL_WH
Size: MEDIUM
Min clusters: 1
Max clusters: 1

Query profile summary
Elapsed time: 128 s
Queued (overload): 81 s
Execution time: 45 s
Bytes scanned: 2.1 GB
Partitions scanned: 412/420
Spilled to local/remote: 0 B / 0 B

Which change is the best next step to reduce runtime?

Options:

  • A. Enable Query Acceleration Service on ETL_WH

  • B. Increase MAX_CLUSTER_COUNT for ETL_WH

  • C. Enable Search Optimization Service on the largest table

  • D. Add a clustering key to the largest table

Best answer: B

Explanation: The query is slow mainly because it is waiting for warehouse resources, not because execution itself is slow. The large Queued (overload) time on a warehouse capped at one cluster points to a warehouse concurrency fix rather than data-layout tuning or an execution-time service.

The key signal is Queued (overload): 81 s versus only 45 s of actual execution on a warehouse with Max clusters: 1. That means the query spends most of its elapsed time waiting for compute capacity because concurrent work is saturating the warehouse. In this situation, the best performance action is to change warehouse behavior by allowing more clusters so concurrent queries can run sooner.

  • High overload queue time indicates a concurrency bottleneck.
  • Zero spill suggests memory pressure is not the main issue.
  • Data-layout features improve scanning or lookup efficiency during execution, not time spent waiting for compute.

So the problem is best addressed by warehouse scaling behavior, not by clustering, search optimization, or query acceleration.

  • Clustering key is tempting because pruning is not perfect, but most of the delay happens before execution starts.
  • Search optimization helps highly selective lookups, not a workload whose main symptom is overload queueing.
  • Query acceleration can reduce eligible execution work, but it does not remove wait time caused by a saturated single-cluster warehouse.

Question 5

Topic: Compute, Governance, Observability, and Performance

A data engineer reviews a slow query on a large fact table.

SELECT SUM(net_amount)
FROM sales_fact
WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31'
  AND region_id = 7;

Query profile summary

  • TABLE SCAN: 89%
  • Micro-partitions scanned: 78,420 / 79,100
  • Bytes scanned: 3.9 TB
  • Bytes spilled to local or remote storage: 0 B

Which action is the most relevant optimization direction?

Options:

  • A. Increase the warehouse size

  • B. Cluster by order_date and region_id

  • C. Create a materialized view for this query

  • D. Enable search optimization on all columns

Best answer: B

Explanation: The bottleneck is scanning almost every micro-partition, not spilling or underpowered compute. That points to poor pruning, so improving data clustering on the filter columns is the best next step.

This profile indicates a scan-efficiency problem. The query spends 89% of its time in TABLE SCAN, scans nearly all micro-partitions, and reads 3.9 TB, while spilling is 0 B. That means the warehouse is not primarily constrained by memory pressure; instead, Snowflake cannot effectively prune data for the order_date and region_id filters.

Clustering the table on the common filtering columns can improve micro-partition pruning so fewer partitions are scanned for similar range-and-filter queries. A larger warehouse may make the same bad scan run faster, but it does not solve the underlying pruning issue. Search optimization is typically better for highly selective lookup-style predicates, not broad range scanning on a large fact table. A materialized view could help only for a known repeated pattern, but the profile here most directly signals pruning inefficiency.

  • More compute only masks it because the profile shows a full-scan pattern, not spill-driven resource pressure.
  • Search optimization is mismatched because this looks like a large range-filter scan, not a highly selective point lookup.
  • A materialized view is not the first signal because the exhibit highlights poor pruning on the base table rather than a repeated precomputation need.

Question 6

Topic: Compute, Governance, Observability, and Performance

A task graph ingests CDC data and then calls a Python stored procedure that runs several SQL transformations. One task intermittently exceeds its 5-minute SLA, but TASK_HISTORY shows only the overall task duration. The team needs visibility into the timing of nested operations inside the procedure to isolate the bottleneck. What is the best next step?

Options:

  • A. Enable tracing for the procedure workflow and analyze spans in an event table.

  • B. Add custom log messages before and after each transformation step.

  • C. Create an alert when the task runtime exceeds 5 minutes.

  • D. Increase the warehouse size assigned to the task.

Best answer: A

Explanation: Tracing is the best fit when the requirement is end-to-end timing across nested operations. In this case, the team already knows the task is slow; the missing visibility is where time is spent inside the stored procedure and its child work.

Tracing is used when engineers need to follow a single execution path and measure latency across parent and child operations. Here, TASK_HISTORY already confirms the symptom at the task level, but it does not show which nested step inside the Python stored procedure is responsible. Enabling tracing records span data in an event table, so the team can correlate the procedure execution with its internal work and identify the slow segment.

Logging is better for emitting human-readable checkpoints or business-state messages, and alerts are better for notifying operators when a threshold is crossed. Those tools help with awareness or breadcrumbs, but they do not provide the same execution-path timing detail. When the visibility requirement is root-cause analysis across nested calls, tracing is the right response.

  • Logging is coarser because log messages can mark checkpoints, but they do not natively provide hierarchical span timing across nested execution.
  • Alerting is reactive because it can notify the team about an SLA breach, but it does not explain where the delay occurred.
  • More compute is not observability because resizing the warehouse may change runtime, but it does not reveal the internal bottleneck.

Question 7

Topic: Compute, Governance, Observability, and Performance

A provider will deliver a curated orders dataset to partners through Snowflake. In the same project, only the PII_ADMIN role may see full customer_email values, while all other roles must see masked values. Which Snowflake feature addresses the least-privilege governance requirement?

Options:

  • A. A secure share for partner consumption

  • B. Access History on the orders tables

  • C. An object tag on customer_email

  • D. A masking policy on customer_email

Best answer: D

Explanation: Masking policies are Snowflake’s enforcement mechanism for role-based protection of sensitive column values. The partner-delivery requirement is separate: sharing exposes data to consumers, but it does not itself enforce different column visibility by role.

The key distinction is enforcement versus delivery or observability. When a requirement says one role can see clear-text customer_email and other roles must see masked results, Snowflake uses a masking policy attached to that column. The policy evaluates context such as the active role and returns either full or masked data, making it a least-privilege governance control.

A secure share is for delivering data to consumers. An object tag helps classify or organize governance metadata, but it does not mask data by itself. Access History records who accessed data for auditing and investigation after the fact. Only the masking policy directly controls what each role sees at query time.

  • Secure sharing handles consumer delivery, but it does not by itself implement role-based masking of a column.
  • Access History is an audit capability, so it helps observe usage rather than prevent overexposure.
  • Object tagging supports classification and policy workflows, but a tag alone does not redact query results.

Question 8

Topic: Compute, Governance, Observability, and Performance

Which situation is the clearest sign that clustering-related choices could materially improve query performance in Snowflake, assuming the table is very large and warehouse sizing is already appropriate?

Options:

  • A. Reused aggregate summaries for dashboards

  • B. Point lookups on a few order_id values

  • C. Broad scans followed by expensive aggregation

  • D. Selective range filters on event_date and region

Best answer: D

Explanation: Clustering matters when table layout affects micro-partition pruning. On very large tables, repeated selective filters on consistent columns such as date and region are the clearest case where a clustering key can materially reduce scanned data and improve performance.

The core concept is micro-partition pruning. Clustering is most useful when many queries against a large table repeatedly filter on the same columns, especially selective range or equality predicates such as date, region, or similar dimensions. A good clustering key helps keep related values together across micro-partitions, letting Snowflake skip more partitions and scan less data.

A strong fit usually has these traits:

  • Very large table
  • Repeated selective predicates
  • Same filter columns across many queries
  • Performance limited by unnecessary data scanning

Clustering is not the default answer for every slow query. Point lookups, reused summaries, and broad scans with heavy computation usually point to other features or design choices instead.

  • Point lookups on a few keys are more closely associated with search optimization than with clustering.
  • Reused aggregate summaries are a better fit for materialized views than for changing table clustering.
  • Broad scans followed by expensive aggregation are less about pruning and more about compute behavior, query acceleration, or warehouse design.

Question 9

Topic: Compute, Governance, Observability, and Performance

A Snowflake team runs interactive dashboards and hourly ELT tasks on the same warehouse. Dashboard latency spikes only when the hourly tasks start, and the company wants BI and ELT compute tracked separately.

Exhibit:

Warehouse: PROD_WH
Size: MEDIUM
Max clusters: 1

Overlap window query history
- dashboard SELECT ...   execution 6s   queued_overload_time 84s
- task MERGE ...         execution 19s  queued_overload_time 79s
- dashboard SELECT ...   execution 5s   queued_overload_time 87s

Based on the exhibit and requirements, what is the best next step?

Options:

  • A. Use separate warehouses for BI and ELT workloads.

  • B. Rewrite the hourly MERGE statement to reduce execution time.

  • C. Convert PROD_WH to multi-cluster and keep both workloads together.

  • D. Add clustering keys to the dashboard tables.

Best answer: A

Explanation: The long queued_overload_time on multiple unrelated statements shows a shared warehouse concurrency problem, not a single bad query. Because BI and ELT also need separate chargeback and predictable behavior, dedicated warehouses are the best fit.

This is a compute-design issue. The exhibit shows different query types—dashboard SELECT statements and a task MERGE—waiting much longer for compute than they spend executing. That pattern points to warehouse contention on shared compute, not statement-specific inefficiency. When unrelated queries all show the same queueing symptom, the first fix is to redesign compute usage.

Separate BI and ELT warehouses provide workload isolation, more predictable dashboard performance during ETL windows, and clean cost tracking for each workload. A multi-cluster warehouse can help concurrency, but it still keeps both workloads inside one compute boundary. Tuning a single SQL statement or adding clustering only affects work after execution begins; it does not remove queue time caused by shared warehouse saturation. The key signal here is queueing across many statements, not slow execution in one statement.

  • Keeping both workloads on one multi-cluster warehouse can reduce waits, but it does not isolate SLAs and chargeback as cleanly.
  • Rewriting the MERGE targets one statement, while both dashboard and task queries show the same queueing pattern.
  • Clustering keys can improve scan efficiency for some queries, but they do not address time spent waiting for compute.

Question 10

Topic: Compute, Governance, Observability, and Performance

A team loads orders every 5 minutes, and the source system normally sends about 12,000 changed rows each cycle. Users report occasional incomplete dashboards even though no pipeline runs failed.

Exhibit:

CREATE OR REPLACE TASK load_orders
  WAREHOUSE = etl_wh
  SCHEDULE = '5 MINUTE'
AS
  MERGE INTO prod.orders t
  USING stage.orders_stream s
  ON t.order_id = s.order_id
  WHEN MATCHED THEN UPDATE SET amount = s.amount
  WHEN NOT MATCHED THEN INSERT (order_id, amount) VALUES (s.order_id, s.amount);

-- Last 12 runs: TASK_HISTORY state = SUCCEEDED for all
-- Rows merged: 11,980, 12,104, 0, 12,087, 12,031 ...

What is the best next Snowflake-native step?

Options:

  • A. Increase the task warehouse size to reduce runtime.

  • B. Replace the task with Snowpipe Streaming ingestion.

  • C. Move the MERGE logic into a Snowpark stored procedure.

  • D. Add data metric functions and an alert for freshness or volume anomalies.

Best answer: D

Explanation: The exhibit shows a task that completes successfully but still produces an unexpected zero-row merge during a normally steady feed. That means the immediate problem is visibility into silent data anomalies, so Snowflake monitoring with data metric functions and alerts is the best next step.

This scenario is about observability, not redesigning ingestion or transformation logic. TASK_HISTORY showing SUCCEEDED only confirms that the task SQL completed; it does not prove the pipeline delivered the expected data volume. When a feed usually merges about 12,000 rows every 5 minutes and one successful run merges 0 rows while dashboards are incomplete, the first Snowflake-native action is to add monitoring for data freshness or volume anomalies and notify operators.

  • Use data metric functions to measure row-count or freshness expectations.
  • Use an alert to notify the team when the metric falls outside normal bounds.
  • Keep the current task and MERGE pattern unless monitoring later proves an ingestion or logic defect.

Changing ingestion method, rewriting the logic, or adding compute may alter implementation, but none of those directly solves the lack of visibility shown here.

  • Streaming mismatch replacing the task with Snowpipe Streaming targets ingestion latency, but the exhibit shows a silent anomaly after successful runs rather than a proven latency issue.
  • Logic rewrite moving the MERGE into Snowpark changes how the transformation is coded, not how missing or abnormal data is detected.
  • Compute overreaction increasing warehouse size is a performance response, but the exhibit does not show queueing, slow execution, or task failure.

Continue with full practice

Use the SnowPro Data Engineer DEA-C02 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try SnowPro Data Engineer DEA-C02 on Web View SnowPro Data Engineer DEA-C02 Practice Test

Free review resource

Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026