Try 10 focused SnowPro Data Engineer DEA-C02 questions on Compute and Governance, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try SnowPro Data Engineer DEA-C02 on Web View full SnowPro Data Engineer DEA-C02 practice page
| Field | Detail |
|---|---|
| Exam route | SnowPro Data Engineer DEA-C02 |
| Topic area | Compute, Governance, Observability, and Performance |
| Blueprint weight | 16% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Compute, Governance, Observability, and Performance for SnowPro Data Engineer DEA-C02. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 16% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.
Topic: Compute, Governance, Observability, and Performance
A data engineering team needs to reduce latency for a frequent point-lookup workload and wants the most targeted optimization.
Query:
SELECT *
FROM sales_fact
WHERE order_id = 'O-918274';
Query profile summary:
- Rows returned: 1
- Partitions scanned: 7,942 / 7,948
- Scan time: 93%
- Bytes spilled to local/remote storage: 0
- Warehouse: LARGE
- Frequency: ~40,000 executions/day
Which optimization approach best matches this bottleneck?
Options:
A. Define a cluster key on order_id
B. Resize the warehouse to 2XLARGE
C. Add search optimization on equality for order_id
D. Enable Query Acceleration Service
Best answer: C
Explanation: The bottleneck is micro-partition pruning, not warehouse memory or spill. A highly selective equality predicate that scans nearly every partition is the classic fit for Search Optimization Service on the filtered column.
This workload is dominated by scan time, and the profile shows almost every partition is being scanned to return a single row. That means the main problem is poor pruning for an exact-match lookup, not insufficient compute: there is no local or remote spill, and the warehouse is already fairly large.
Search Optimization Service is the most targeted fix for frequent, selective lookups like WHERE order_id = ... because it helps Snowflake find matching rows without scanning nearly all micro-partitions. That aligns directly with the observed bottleneck and avoids using broader compute-based acceleration for a pruning problem.
A larger warehouse treats compute capacity, while clustering on a high-cardinality lookup column usually adds more maintenance overhead than needed for this pattern.
Topic: Compute, Governance, Observability, and Performance
A team runs BI dashboards and hourly MERGE jobs on the same warehouse. Dashboards slow down only while the MERGE jobs are running, query history shows almost no queued_overload_time, and warehouse spend is within budget. Which term best describes the primary concern?
Options:
A. Warehouse cost
B. Query acceleration
C. Workload isolation
D. Warehouse queuing
Best answer: C
Explanation: This scenario indicates workload isolation. The stem rules out the other main concerns by stating that queue time is minimal and spending is acceptable, so the key issue is mixed workloads competing on the same warehouse.
This pattern points to workload isolation. In Snowflake, if dashboard queries become slow only when batch MERGE jobs are active on the same warehouse, the workloads are contending for shared compute resources. Because queued_overload_time is almost zero, the main issue is not that queries are waiting in line to start. Because spend is already within budget, cost is not the dominant concern either. The clearest signal is interference between interactive and batch activity on shared compute. A common design response is to separate those workloads onto different warehouses so each gets predictable performance and independent scaling behavior. Query acceleration is meant for eligible long-running queries, not for isolating overlapping workload types.
queued_overload_time is almost zero.Topic: Compute, Governance, Observability, and Performance
A data engineering team loads an 18 TB FACT_EVENTS table from many unordered micro-batches. Dashboard queries that filter 3- to 14-day windows on event_date have slowed from seconds to minutes.
What is the best next step?
Options:
A. Define a clustering key on event_date
B. Enable search optimization service
C. Scale up the warehouse size
D. Create materialized views for dashboards
Best answer: A
Explanation: This symptom points to a storage-layout problem, not a compute-capacity problem. When large tables are repeatedly filtered by the same range column and query profiles show most micro-partitions are scanned, clustering is the Snowflake optimization that most directly improves pruning.
Clustering matters when query performance is poor because Snowflake cannot prune enough micro-partitions. In this scenario, the warehouse is not queued, load is low, and the scan operator reads 93% of partitions even for selective date windows. That means the main issue is not insufficient compute; it is weak data locality for the dominant filter pattern.
Defining a clustering key on event_date helps Snowflake organize similar date values together across micro-partitions, so range predicates can skip more data. This is especially useful on very large tables with repeated time-window filters and unordered incremental loads, which tend to degrade natural clustering over time. The key takeaway is that clustering becomes material when poor pruning, not warehouse saturation, is the clear bottleneck.
Topic: Compute, Governance, Observability, and Performance
An engineer investigates a nightly ELT batch that slows down only when many transform queries start at the same time.
Exhibit:
Warehouse: ETL_WH
Size: MEDIUM
Min clusters: 1
Max clusters: 1
Query profile summary
Elapsed time: 128 s
Queued (overload): 81 s
Execution time: 45 s
Bytes scanned: 2.1 GB
Partitions scanned: 412/420
Spilled to local/remote: 0 B / 0 B
Which change is the best next step to reduce runtime?
Options:
A. Enable Query Acceleration Service on ETL_WH
B. Increase MAX_CLUSTER_COUNT for ETL_WH
C. Enable Search Optimization Service on the largest table
D. Add a clustering key to the largest table
Best answer: B
Explanation: The query is slow mainly because it is waiting for warehouse resources, not because execution itself is slow. The large Queued (overload) time on a warehouse capped at one cluster points to a warehouse concurrency fix rather than data-layout tuning or an execution-time service.
The key signal is Queued (overload): 81 s versus only 45 s of actual execution on a warehouse with Max clusters: 1. That means the query spends most of its elapsed time waiting for compute capacity because concurrent work is saturating the warehouse. In this situation, the best performance action is to change warehouse behavior by allowing more clusters so concurrent queries can run sooner.
So the problem is best addressed by warehouse scaling behavior, not by clustering, search optimization, or query acceleration.
Topic: Compute, Governance, Observability, and Performance
A data engineer reviews a slow query on a large fact table.
SELECT SUM(net_amount)
FROM sales_fact
WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31'
AND region_id = 7;
Query profile summary
TABLE SCAN: 89%78,420 / 79,1003.9 TB0 BWhich action is the most relevant optimization direction?
Options:
A. Increase the warehouse size
B. Cluster by order_date and region_id
C. Create a materialized view for this query
D. Enable search optimization on all columns
Best answer: B
Explanation: The bottleneck is scanning almost every micro-partition, not spilling or underpowered compute. That points to poor pruning, so improving data clustering on the filter columns is the best next step.
This profile indicates a scan-efficiency problem. The query spends 89% of its time in TABLE SCAN, scans nearly all micro-partitions, and reads 3.9 TB, while spilling is 0 B. That means the warehouse is not primarily constrained by memory pressure; instead, Snowflake cannot effectively prune data for the order_date and region_id filters.
Clustering the table on the common filtering columns can improve micro-partition pruning so fewer partitions are scanned for similar range-and-filter queries. A larger warehouse may make the same bad scan run faster, but it does not solve the underlying pruning issue. Search optimization is typically better for highly selective lookup-style predicates, not broad range scanning on a large fact table. A materialized view could help only for a known repeated pattern, but the profile here most directly signals pruning inefficiency.
Topic: Compute, Governance, Observability, and Performance
A task graph ingests CDC data and then calls a Python stored procedure that runs several SQL transformations. One task intermittently exceeds its 5-minute SLA, but TASK_HISTORY shows only the overall task duration. The team needs visibility into the timing of nested operations inside the procedure to isolate the bottleneck. What is the best next step?
Options:
A. Enable tracing for the procedure workflow and analyze spans in an event table.
B. Add custom log messages before and after each transformation step.
C. Create an alert when the task runtime exceeds 5 minutes.
D. Increase the warehouse size assigned to the task.
Best answer: A
Explanation: Tracing is the best fit when the requirement is end-to-end timing across nested operations. In this case, the team already knows the task is slow; the missing visibility is where time is spent inside the stored procedure and its child work.
Tracing is used when engineers need to follow a single execution path and measure latency across parent and child operations. Here, TASK_HISTORY already confirms the symptom at the task level, but it does not show which nested step inside the Python stored procedure is responsible. Enabling tracing records span data in an event table, so the team can correlate the procedure execution with its internal work and identify the slow segment.
Logging is better for emitting human-readable checkpoints or business-state messages, and alerts are better for notifying operators when a threshold is crossed. Those tools help with awareness or breadcrumbs, but they do not provide the same execution-path timing detail. When the visibility requirement is root-cause analysis across nested calls, tracing is the right response.
Topic: Compute, Governance, Observability, and Performance
A provider will deliver a curated orders dataset to partners through Snowflake. In the same project, only the PII_ADMIN role may see full customer_email values, while all other roles must see masked values. Which Snowflake feature addresses the least-privilege governance requirement?
Options:
A. A secure share for partner consumption
B. Access History on the orders tables
C. An object tag on customer_email
D. A masking policy on customer_email
Best answer: D
Explanation: Masking policies are Snowflake’s enforcement mechanism for role-based protection of sensitive column values. The partner-delivery requirement is separate: sharing exposes data to consumers, but it does not itself enforce different column visibility by role.
The key distinction is enforcement versus delivery or observability. When a requirement says one role can see clear-text customer_email and other roles must see masked results, Snowflake uses a masking policy attached to that column. The policy evaluates context such as the active role and returns either full or masked data, making it a least-privilege governance control.
A secure share is for delivering data to consumers. An object tag helps classify or organize governance metadata, but it does not mask data by itself. Access History records who accessed data for auditing and investigation after the fact. Only the masking policy directly controls what each role sees at query time.
Topic: Compute, Governance, Observability, and Performance
Which situation is the clearest sign that clustering-related choices could materially improve query performance in Snowflake, assuming the table is very large and warehouse sizing is already appropriate?
Options:
A. Reused aggregate summaries for dashboards
B. Point lookups on a few order_id values
C. Broad scans followed by expensive aggregation
D. Selective range filters on event_date and region
Best answer: D
Explanation: Clustering matters when table layout affects micro-partition pruning. On very large tables, repeated selective filters on consistent columns such as date and region are the clearest case where a clustering key can materially reduce scanned data and improve performance.
The core concept is micro-partition pruning. Clustering is most useful when many queries against a large table repeatedly filter on the same columns, especially selective range or equality predicates such as date, region, or similar dimensions. A good clustering key helps keep related values together across micro-partitions, letting Snowflake skip more partitions and scan less data.
A strong fit usually has these traits:
Clustering is not the default answer for every slow query. Point lookups, reused summaries, and broad scans with heavy computation usually point to other features or design choices instead.
Topic: Compute, Governance, Observability, and Performance
A Snowflake team runs interactive dashboards and hourly ELT tasks on the same warehouse. Dashboard latency spikes only when the hourly tasks start, and the company wants BI and ELT compute tracked separately.
Exhibit:
Warehouse: PROD_WH
Size: MEDIUM
Max clusters: 1
Overlap window query history
- dashboard SELECT ... execution 6s queued_overload_time 84s
- task MERGE ... execution 19s queued_overload_time 79s
- dashboard SELECT ... execution 5s queued_overload_time 87s
Based on the exhibit and requirements, what is the best next step?
Options:
A. Use separate warehouses for BI and ELT workloads.
B. Rewrite the hourly MERGE statement to reduce execution time.
C. Convert PROD_WH to multi-cluster and keep both workloads together.
D. Add clustering keys to the dashboard tables.
Best answer: A
Explanation: The long queued_overload_time on multiple unrelated statements shows a shared warehouse concurrency problem, not a single bad query. Because BI and ELT also need separate chargeback and predictable behavior, dedicated warehouses are the best fit.
This is a compute-design issue. The exhibit shows different query types—dashboard SELECT statements and a task MERGE—waiting much longer for compute than they spend executing. That pattern points to warehouse contention on shared compute, not statement-specific inefficiency. When unrelated queries all show the same queueing symptom, the first fix is to redesign compute usage.
Separate BI and ELT warehouses provide workload isolation, more predictable dashboard performance during ETL windows, and clean cost tracking for each workload. A multi-cluster warehouse can help concurrency, but it still keeps both workloads inside one compute boundary. Tuning a single SQL statement or adding clustering only affects work after execution begins; it does not remove queue time caused by shared warehouse saturation. The key signal here is queueing across many statements, not slow execution in one statement.
MERGE targets one statement, while both dashboard and task queries show the same queueing pattern.Topic: Compute, Governance, Observability, and Performance
A team loads orders every 5 minutes, and the source system normally sends about 12,000 changed rows each cycle. Users report occasional incomplete dashboards even though no pipeline runs failed.
Exhibit:
CREATE OR REPLACE TASK load_orders
WAREHOUSE = etl_wh
SCHEDULE = '5 MINUTE'
AS
MERGE INTO prod.orders t
USING stage.orders_stream s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE SET amount = s.amount
WHEN NOT MATCHED THEN INSERT (order_id, amount) VALUES (s.order_id, s.amount);
-- Last 12 runs: TASK_HISTORY state = SUCCEEDED for all
-- Rows merged: 11,980, 12,104, 0, 12,087, 12,031 ...
What is the best next Snowflake-native step?
Options:
A. Increase the task warehouse size to reduce runtime.
B. Replace the task with Snowpipe Streaming ingestion.
C. Move the MERGE logic into a Snowpark stored procedure.
D. Add data metric functions and an alert for freshness or volume anomalies.
Best answer: D
Explanation: The exhibit shows a task that completes successfully but still produces an unexpected zero-row merge during a normally steady feed. That means the immediate problem is visibility into silent data anomalies, so Snowflake monitoring with data metric functions and alerts is the best next step.
This scenario is about observability, not redesigning ingestion or transformation logic. TASK_HISTORY showing SUCCEEDED only confirms that the task SQL completed; it does not prove the pipeline delivered the expected data volume. When a feed usually merges about 12,000 rows every 5 minutes and one successful run merges 0 rows while dashboards are incomplete, the first Snowflake-native action is to add monitoring for data freshness or volume anomalies and notify operators.
MERGE pattern unless monitoring later proves an ingestion or logic defect.Changing ingestion method, rewriting the logic, or adding compute may alter implementation, but none of those directly solves the lack of visibility shown here.
MERGE into Snowpark changes how the transformation is coded, not how missing or abnormal data is detected.Use the SnowPro Data Engineer DEA-C02 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try SnowPro Data Engineer DEA-C02 on Web View SnowPro Data Engineer DEA-C02 Practice Test
Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.