Try 10 focused SnowPro Data Engineer DEA-C02 questions on Streaming and Orchestration, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try SnowPro Data Engineer DEA-C02 on Web View full SnowPro Data Engineer DEA-C02 practice page
| Field | Detail |
|---|---|
| Exam route | SnowPro Data Engineer DEA-C02 |
| Topic area | Streaming, Orchestration, and Near Real-Time Pipeline Design |
| Blueprint weight | 20% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Streaming, Orchestration, and Near Real-Time Pipeline Design for SnowPro Data Engineer DEA-C02. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 20% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
An analytics team loads order-change files into RAW_ORDERS with Snowpipe every 1-2 minutes. They must apply MERGE-based CDC to CURATED_ORDERS within 5 minutes, then run a dependent aggregation step, all inside Snowflake. They also want to avoid keeping compute running when no new data arrives. Which design is the BEST fit?
Options:
A. Use a dynamic table with 5-minute target lag for CDC and sequencing.
B. Use a stream and a triggered task graph for MERGE and aggregation.
C. Use Snowpipe Streaming so ingestion automatically performs CDC and aggregation.
D. Use an external scheduler to call a stored procedure every minute.
Best answer: B
Explanation: Tasks are the natural Snowflake-native choice when pipeline logic must run on a schedule or only when new data appears. Here, a stream captures the changed rows, and a triggered task graph can run the MERGE and dependent aggregation step without external orchestration or misuse of ingestion features.
Tasks are Snowflake’s native orchestration mechanism for executing SQL or stored procedures on a schedule or in response to stream-backed changes. In this scenario, Snowpipe already handles file ingestion, so the remaining problem is downstream execution. A stream on RAW_ORDERS tracks CDC records, and a triggered root task can start only when the stream has data. Downstream tasks in the same task graph can then run the aggregation step after the MERGE succeeds.
This design keeps orchestration inside Snowflake, avoids constant external polling, and avoids running compute just to check for work. Dynamic tables are better for declarative table maintenance, while Snowpipe and Snowpipe Streaming are ingestion mechanisms rather than downstream orchestration tools.
MERGE-based CDC orchestration with explicit step sequencing.MERGE and aggregation logic.Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team uses Snowpipe Streaming to land events into a raw table within seconds. A reporting table in the same account still lags because two downstream SQL transformation layers must refresh in the correct order. The team wants Snowflake to manage those dependencies to a stated target lag instead of hand-built schedules. Which Snowflake concept best fits this bottleneck?
Options:
A. Materialized views
B. Tasks
C. Streams
D. Dynamic tables
Best answer: D
Explanation: Dynamic tables are designed for derived-table pipelines where freshness and dependency sequencing are the main problem. Here, ingestion is already near real time and consumers read in the same account, so the bottleneck is keeping downstream transformations current to a target lag.
Dynamic tables are the Snowflake feature for declarative, pipeline-style SQL transformations that must stay fresh. You define a target lag, and Snowflake manages refresh timing and dependency order across upstream dynamic tables. That matches this scenario because the raw data is already arriving quickly through Snowpipe Streaming, and there is no separate consumer-delivery issue to solve.
The closest alternative is tasks, but tasks require you to build and maintain the orchestration logic yourself rather than expressing freshness declaratively.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team ingests CDC rows into RAW_ORDERS. A dynamic table incrementally builds CURATED_ORDERS with TARGET_LAG = '1 minute'. Operations now require each change set to be applied only after a reconciliation table load finishes and a validation procedure succeeds, and they sometimes need to delay a cycle without losing captured changes. The current design produces partial downstream totals when reconciliation is late. What is the best next step?
Options:
A. Replace the dynamic table with a materialized view on RAW_ORDERS
B. Lower TARGET_LAG and run the dynamic table on a larger warehouse
C. Use a stream on RAW_ORDERS and a task graph to gate and apply changes
D. Add another dynamic table that filters CURATED_ORDERS after reconciliation
Best answer: C
Explanation: This issue is about execution control, not refresh speed. When CDC must wait for validation or reconciliation and sometimes be intentionally delayed, streams plus tasks are the Snowflake-native pattern.
Dynamic tables are best when Snowflake can manage incremental refresh timing automatically to meet a freshness target. In this scenario, the real requirement has changed: the team must control exactly when captured changes are applied, only after reconciliation and validation complete, and sometimes defer processing intentionally.
That gives explicit orchestration and dependency control, which a dynamic table’s automatic refresh model does not provide.
TARGET_LAG or more compute improves freshness, not conditional sequencing.Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team uses Snowpipe Streaming to land order events in RAW_ORDERS. New rows appear in RAW_ORDERS within 5 seconds of the source event, and the transform warehouse shows no queuing. However, the dashboard table MART_ORDER_METRICS is intermittently one minute behind even though both tasks usually finish in under 10 seconds.
Exhibit:
CREATE TASK merge_curated
SCHEDULE = '1 MINUTE'
AS MERGE INTO CURATED_ORDERS ... FROM RAW_ORDERS_STREAM ...;
CREATE TASK refresh_mart
SCHEDULE = '1 MINUTE'
AS INSERT OVERWRITE INTO MART_ORDER_METRICS
SELECT ... FROM CURATED_ORDERS;
What is the best next step to address the true bottleneck?
Options:
A. Lower the Snowpipe Streaming flush interval.
B. Make refresh_mart a child task AFTER merge_curated.
C. Run both tasks every 30 seconds instead of every minute.
D. Increase the transform warehouse size for both tasks.
Best answer: B
Explanation: This is a dependency-sequencing problem, not an ingestion-freshness or warehouse-capacity problem. Because both tasks are independently scheduled at the same interval, the mart refresh can run before the upstream merge commits; using a task graph with AFTER enforces the correct order.
The stem rules out freshness and compute as the primary bottlenecks: data is landing in RAW_ORDERS within seconds, task runtimes are short, and the warehouse is not queued. The intermittent one-minute staleness points to sequencing. Two separate tasks with the same schedule do not guarantee that the downstream task will start only after the upstream task has committed its changes.
In Snowflake, this is best handled with a task graph. Make the merge step the upstream task and define the mart refresh as a child task with AFTER merge_curated so it runs only after successful completion of the merge. You could also gate the root task with SYSTEM$STREAM_HAS_DATA, but the key fix is enforcing dependency order. Faster ingestion settings or more frequent schedules improve speed, not execution correctness.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
Which use case is the best fit for a Snowflake dynamic table?
Options:
A. Capture row-level changes for custom MERGE logic
B. Query cloud-storage files without loading them into Snowflake
C. Accelerate repeated base-table queries without a pipeline table
D. Maintain a transformed dataset with automatic refresh to a target lag
Best answer: D
Explanation: Dynamic tables are designed to keep a derived table current from a query definition. They are the right fit when Snowflake should maintain transformation results automatically and refresh them to a freshness target, rather than requiring separate change-capture and scheduling objects.
A dynamic table is a declarative pipeline object: you define the SQL for the derived dataset, and Snowflake keeps that table refreshed to the configured target lag. It is an appropriate choice when you want transformation maintenance handled by Snowflake and the workload can benefit from incremental refresh when the query pattern supports it.
This differs from lower-level orchestration patterns. Streams expose change data capture records, and tasks schedule SQL or procedural work, but neither one automatically maintains a transformed table result by itself. Materialized views focus on query acceleration for supported patterns, while external tables expose files in object storage. The key signal for dynamic tables is automated maintenance of a transformed table with freshness goals.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A retail platform receives order events continuously from microservices, not as files. The pipeline must make new records available to downstream tables in under 10 seconds, keep a replayable raw copy if transformations fail, and isolate ingestion from transformation compute. Which Snowflake architecture is the BEST fit?
Options:
A. Stored procedures that MERGE producer events directly into curated tables
B. Snowpipe Streaming to a raw table, then a stream and triggered tasks on a separate warehouse
C. External tables on cloud storage, then a dynamic table with one-minute target lag
D. Snowpipe auto-ingest files to curated tables, then a one-minute scheduled task
Best answer: B
Explanation: Snowpipe Streaming is designed for continuous low-latency ingestion without a file-staging layer. Landing events in a raw table creates a replay point, while streams and triggered tasks let transformations run on separate compute and recover independently.
The best pattern here is to decouple ingestion from downstream transformation. Snowpipe Streaming is Snowflake’s native choice for continuous event ingestion with near real-time latency, especially when producers emit rows rather than files. Writing first to an immutable raw table gives the pipeline a durable landing zone and replay source if validation or transformation logic fails.
A stream on the raw table captures only new changes, and triggered tasks can process those changes on separate compute from ingestion. That combination preserves reliability and limits blast radius: ingestion can continue even if a downstream task fails, and the raw data remains available for reprocessing.
File-based ingestion or direct upserts into curated tables either add unnecessary staging latency or remove the clean recovery boundary provided by the raw landing layer.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A table stream on RAW_ORDERS feeds a task that runs every minute and MERGEs changes into CURATED_ORDERS. The task succeeds, and the new rows are visible in CURATED_ORDERS. Immediately after the task run, an engineer queries the stream and gets no rows.
Which explanation best fits this behavior?
Options:
A. Freshness lag is delaying new changes from appearing in the stream
B. The stream requires a manual refresh before it can be queried again
C. Downstream consumption advanced the stream offset
D. Object scope shifted from the source table to the target table
Best answer: C
Explanation: This is a downstream consumption scenario, not a freshness or scope problem. Because the task successfully used the stream in a MERGE and the target table already has the new rows, the stream offset was advanced by that consuming DML operation.
Snowflake streams track a position, or offset, over changes on their source object. When a downstream DML operation such as MERGE, INSERT, UPDATE, or DELETE reads from the stream in a transaction, those change records are considered consumed and the stream advances to the new offset.
In this scenario, the decisive fact is that the task succeeded and CURATED_ORDERS already contains the new rows. That means the stream changes were available and were used downstream. This rules out a freshness delay. It also rules out an object-scope issue, because the stream does not switch to tracking the target table after being used in a task.
The key takeaway is that an empty stream after a successful downstream load often indicates normal consumption behavior, not a problem.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team ingests CDC events with Snowpipe Streaming into RAW_ORDERS. A stream on that table feeds three SQL transformations. An external scheduler calls a stored procedure every minute through the SQL API. When one run exceeds a minute, the next invocation overlaps, and curated tables fall behind. There are no non-Snowflake dependencies. What is the best next step?
Options:
A. Move orchestration to Airflow for stronger dependency handling
B. Increase the external scheduler frequency to reduce latency
C. Run the procedure on a larger warehouse every minute
D. Use a Snowflake task graph with SYSTEM$STREAM_HAS_DATA and AFTER dependencies
Best answer: D
Explanation: Because every step is already inside Snowflake, native tasks are the simplest and most appropriate orchestrator. A task graph using stream-aware execution avoids unnecessary external scheduling and reduces overlap-driven lag.
This is an orchestration design problem, not primarily a compute problem. When ingestion, change capture, and downstream SQL processing all stay inside Snowflake, a Snowflake task graph is usually the right control plane. Using a root task that checks SYSTEM$STREAM_HAS_DATA and chaining downstream work with AFTER keeps dependencies in-platform, avoids wasteful polling logic outside Snowflake, and simplifies operational ownership.
External schedulers are useful when a pipeline must coordinate jobs across multiple systems or satisfy outside control requirements. The stem explicitly says there are no such dependencies, so adding or keeping an external scheduler is unnecessary complexity. A larger warehouse might shorten runtime, but it does not fix the core issue that orchestration is being handled in a more complicated way than required.
Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team ingests order updates with Snowpipe Streaming. New rows reach RAW_ORDERS within 5 seconds. A stream on RAW_ORDERS feeds a task that MERGEs into CURATED_ORDERS every 15 minutes. The task usually finishes in 20 seconds. Same-region consumers read CURATED_ORDERS through a secure share and require data within 1 minute of arrival. Which change best addresses the problem?
Options:
A. Replicate the curated database to the consumer account
B. Change the stream on RAW_ORDERS to append-only
C. Add more Snowpipe Streaming channels to RAW_ORDERS
D. Use a triggered task on the stream for the MERGE
Best answer: D
Explanation: This is a downstream orchestration problem, not an ingestion problem. Snowpipe Streaming already lands data within seconds, but the 15-minute task schedule delays updates to the curated table that consumers actually read.
The core issue is end-to-end freshness. RAW_ORDERS is current within 5 seconds, and the MERGE itself only takes 20 seconds, so the missed SLA is caused by batching changes behind a 15-minute task schedule. Using a triggered task tied to the stream lets Snowflake run the transformation when new CDC records are available, so CURATED_ORDERS updates much closer to arrival time.
RAW_ORDERSMERGE when the stream has dataThe key takeaway is that improving ingest speed or changing delivery mechanics will not fix latency introduced by scheduled downstream processing.
RAW_ORDERS within seconds.Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design
A team needs order updates visible in orders_curated within 10 seconds. The source application emits one JSON message at a time, but the current pipeline writes staged files every 5 minutes, uses Snowpipe to load raw_orders, captures changes with a stream, and runs a scheduled task every minute to MERGE into orders_curated. Warehouses are not queued or saturated, yet end-to-end latency stays around 6 minutes. What is the best fix?
Options:
A. Use a larger warehouse and schedule the task every 10 seconds.
B. Query the staged files through an external table with AUTO_REFRESH.
C. Use Snowpipe Streaming, then process the stream with triggered tasks.
D. Replace the stream with a dynamic table refreshed every minute.
Best answer: C
Explanation: Near real-time pipelines are limited by the slowest stage. Here, the fixed 5-minute file batch is the dominant delay, so the best fix is continuous row ingestion with Snowpipe Streaming and downstream change processing driven from a stream with triggered tasks.
Near real-time design requires compatible choices across ingestion, change capture, and orchestration. In this scenario, the main bottleneck is not compute; it is the source pattern that only creates files every 5 minutes. File-based Snowpipe can load only after those files exist, so even faster tasks or bigger warehouses cannot meet a 10-second target. For a producer sending one message at a time, Snowpipe Streaming is the better ingestion fit because rows are written continuously to a Snowflake table without waiting for file creation. A stream on that landing table captures new changes, and triggered tasks can run the downstream MERGE when the stream has data.
More compute may reduce processing time, but it cannot remove source-side batching.
MERGE processing.Use the SnowPro Data Engineer DEA-C02 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try SnowPro Data Engineer DEA-C02 on Web View SnowPro Data Engineer DEA-C02 Practice Test
Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.