SnowPro Data Engineer DEA-C02: Streaming and Orchestration

Try 10 focused SnowPro Data Engineer DEA-C02 questions on Streaming and Orchestration, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try SnowPro Data Engineer DEA-C02 on Web View full SnowPro Data Engineer DEA-C02 practice page

Topic snapshot

FieldDetail
Exam routeSnowPro Data Engineer DEA-C02
Topic areaStreaming, Orchestration, and Near Real-Time Pipeline Design
Blueprint weight20%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Streaming, Orchestration, and Near Real-Time Pipeline Design for SnowPro Data Engineer DEA-C02. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 20% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

An analytics team loads order-change files into RAW_ORDERS with Snowpipe every 1-2 minutes. They must apply MERGE-based CDC to CURATED_ORDERS within 5 minutes, then run a dependent aggregation step, all inside Snowflake. They also want to avoid keeping compute running when no new data arrives. Which design is the BEST fit?

Options:

  • A. Use a dynamic table with 5-minute target lag for CDC and sequencing.

  • B. Use a stream and a triggered task graph for MERGE and aggregation.

  • C. Use Snowpipe Streaming so ingestion automatically performs CDC and aggregation.

  • D. Use an external scheduler to call a stored procedure every minute.

Best answer: B

Explanation: Tasks are the natural Snowflake-native choice when pipeline logic must run on a schedule or only when new data appears. Here, a stream captures the changed rows, and a triggered task graph can run the MERGE and dependent aggregation step without external orchestration or misuse of ingestion features.

Tasks are Snowflake’s native orchestration mechanism for executing SQL or stored procedures on a schedule or in response to stream-backed changes. In this scenario, Snowpipe already handles file ingestion, so the remaining problem is downstream execution. A stream on RAW_ORDERS tracks CDC records, and a triggered root task can start only when the stream has data. Downstream tasks in the same task graph can then run the aggregation step after the MERGE succeeds.

This design keeps orchestration inside Snowflake, avoids constant external polling, and avoids running compute just to check for work. Dynamic tables are better for declarative table maintenance, while Snowpipe and Snowpipe Streaming are ingestion mechanisms rather than downstream orchestration tools.

  • Dynamic tables are for declarative refresh, not imperative MERGE-based CDC orchestration with explicit step sequencing.
  • Snowpipe Streaming reduces ingestion latency, but it does not automatically coordinate downstream MERGE and aggregation logic.
  • External scheduling can work, but it adds polling and operational complexity when Snowflake tasks already meet the requirement natively.

Question 2

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team uses Snowpipe Streaming to land events into a raw table within seconds. A reporting table in the same account still lags because two downstream SQL transformation layers must refresh in the correct order. The team wants Snowflake to manage those dependencies to a stated target lag instead of hand-built schedules. Which Snowflake concept best fits this bottleneck?

Options:

  • A. Materialized views

  • B. Tasks

  • C. Streams

  • D. Dynamic tables

Best answer: D

Explanation: Dynamic tables are designed for derived-table pipelines where freshness and dependency sequencing are the main problem. Here, ingestion is already near real time and consumers read in the same account, so the bottleneck is keeping downstream transformations current to a target lag.

Dynamic tables are the Snowflake feature for declarative, pipeline-style SQL transformations that must stay fresh. You define a target lag, and Snowflake manages refresh timing and dependency order across upstream dynamic tables. That matches this scenario because the raw data is already arriving quickly through Snowpipe Streaming, and there is no separate consumer-delivery issue to solve.

  • Ingestion freshness is already addressed at the raw layer.
  • The lag appears in the dependent transformation layers.
  • Snowflake-managed refresh ordering is needed.

The closest alternative is tasks, but tasks require you to build and maintain the orchestration logic yourself rather than expressing freshness declaratively.

  • The stream option is for capturing table changes, not for automatically refreshing dependent derived tables to a target lag.
  • The task option can enforce execution order, but it is manual orchestration rather than Snowflake-managed freshness across dependencies.
  • The materialized view option helps with automatically maintained query results, not a multi-stage transformation pipeline with explicit dependency lag goals.

Question 3

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team ingests CDC rows into RAW_ORDERS. A dynamic table incrementally builds CURATED_ORDERS with TARGET_LAG = '1 minute'. Operations now require each change set to be applied only after a reconciliation table load finishes and a validation procedure succeeds, and they sometimes need to delay a cycle without losing captured changes. The current design produces partial downstream totals when reconciliation is late. What is the best next step?

Options:

  • A. Replace the dynamic table with a materialized view on RAW_ORDERS

  • B. Lower TARGET_LAG and run the dynamic table on a larger warehouse

  • C. Use a stream on RAW_ORDERS and a task graph to gate and apply changes

  • D. Add another dynamic table that filters CURATED_ORDERS after reconciliation

Best answer: C

Explanation: This issue is about execution control, not refresh speed. When CDC must wait for validation or reconciliation and sometimes be intentionally delayed, streams plus tasks are the Snowflake-native pattern.

Dynamic tables are best when Snowflake can manage incremental refresh timing automatically to meet a freshness target. In this scenario, the real requirement has changed: the team must control exactly when captured changes are applied, only after reconciliation and validation complete, and sometimes defer processing intentionally.

  • Use a stream to capture row-level changes from the source table.
  • Use tasks or a task graph to run validation and then apply the changes in the required order.
  • If processing is delayed, the stream keeps unconsumed changes available until they are consumed.

That gives explicit orchestration and dependency control, which a dynamic table’s automatic refresh model does not provide.

  • Faster refresh misses the point because lower TARGET_LAG or more compute improves freshness, not conditional sequencing.
  • Another dynamic table stays automatic and still does not provide run-by-run gating based on operational checks.
  • Materialized views are the wrong tool because they optimize query maintenance, not controlled CDC execution.

Question 4

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team uses Snowpipe Streaming to land order events in RAW_ORDERS. New rows appear in RAW_ORDERS within 5 seconds of the source event, and the transform warehouse shows no queuing. However, the dashboard table MART_ORDER_METRICS is intermittently one minute behind even though both tasks usually finish in under 10 seconds.

Exhibit:

CREATE TASK merge_curated
  SCHEDULE = '1 MINUTE'
  AS MERGE INTO CURATED_ORDERS ... FROM RAW_ORDERS_STREAM ...;

CREATE TASK refresh_mart
  SCHEDULE = '1 MINUTE'
  AS INSERT OVERWRITE INTO MART_ORDER_METRICS
     SELECT ... FROM CURATED_ORDERS;

What is the best next step to address the true bottleneck?

Options:

  • A. Lower the Snowpipe Streaming flush interval.

  • B. Make refresh_mart a child task AFTER merge_curated.

  • C. Run both tasks every 30 seconds instead of every minute.

  • D. Increase the transform warehouse size for both tasks.

Best answer: B

Explanation: This is a dependency-sequencing problem, not an ingestion-freshness or warehouse-capacity problem. Because both tasks are independently scheduled at the same interval, the mart refresh can run before the upstream merge commits; using a task graph with AFTER enforces the correct order.

The stem rules out freshness and compute as the primary bottlenecks: data is landing in RAW_ORDERS within seconds, task runtimes are short, and the warehouse is not queued. The intermittent one-minute staleness points to sequencing. Two separate tasks with the same schedule do not guarantee that the downstream task will start only after the upstream task has committed its changes.

In Snowflake, this is best handled with a task graph. Make the merge step the upstream task and define the mart refresh as a child task with AFTER merge_curated so it runs only after successful completion of the merge. You could also gate the root task with SYSTEM$STREAM_HAS_DATA, but the key fix is enforcing dependency order. Faster ingestion settings or more frequent schedules improve speed, not execution correctness.

  • More warehouse fails because the warehouse is not queued and both tasks already finish quickly.
  • Tighter streaming flush fails because rows are already arriving in the raw table within 5 seconds.
  • Shorter schedules fail because running both tasks more often still does not guarantee that the mart refresh waits for the merge.

Question 5

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

Which use case is the best fit for a Snowflake dynamic table?

Options:

  • A. Capture row-level changes for custom MERGE logic

  • B. Query cloud-storage files without loading them into Snowflake

  • C. Accelerate repeated base-table queries without a pipeline table

  • D. Maintain a transformed dataset with automatic refresh to a target lag

Best answer: D

Explanation: Dynamic tables are designed to keep a derived table current from a query definition. They are the right fit when Snowflake should maintain transformation results automatically and refresh them to a freshness target, rather than requiring separate change-capture and scheduling objects.

A dynamic table is a declarative pipeline object: you define the SQL for the derived dataset, and Snowflake keeps that table refreshed to the configured target lag. It is an appropriate choice when you want transformation maintenance handled by Snowflake and the workload can benefit from incremental refresh when the query pattern supports it.

This differs from lower-level orchestration patterns. Streams expose change data capture records, and tasks schedule SQL or procedural work, but neither one automatically maintains a transformed table result by itself. Materialized views focus on query acceleration for supported patterns, while external tables expose files in object storage. The key signal for dynamic tables is automated maintenance of a transformed table with freshness goals.

  • CDC only fits streams, because streams track row-level changes but do not maintain a transformed target table.
  • Query acceleration points to materialized views or search optimization, which are performance features rather than transformation pipeline objects.
  • Files in object storage describes external tables, which query staged file metadata and contents without creating a managed transformed table.

Question 6

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A retail platform receives order events continuously from microservices, not as files. The pipeline must make new records available to downstream tables in under 10 seconds, keep a replayable raw copy if transformations fail, and isolate ingestion from transformation compute. Which Snowflake architecture is the BEST fit?

Options:

  • A. Stored procedures that MERGE producer events directly into curated tables

  • B. Snowpipe Streaming to a raw table, then a stream and triggered tasks on a separate warehouse

  • C. External tables on cloud storage, then a dynamic table with one-minute target lag

  • D. Snowpipe auto-ingest files to curated tables, then a one-minute scheduled task

Best answer: B

Explanation: Snowpipe Streaming is designed for continuous low-latency ingestion without a file-staging layer. Landing events in a raw table creates a replay point, while streams and triggered tasks let transformations run on separate compute and recover independently.

The best pattern here is to decouple ingestion from downstream transformation. Snowpipe Streaming is Snowflake’s native choice for continuous event ingestion with near real-time latency, especially when producers emit rows rather than files. Writing first to an immutable raw table gives the pipeline a durable landing zone and replay source if validation or transformation logic fails.

A stream on the raw table captures only new changes, and triggered tasks can process those changes on separate compute from ingestion. That combination preserves reliability and limits blast radius: ingestion can continue even if a downstream task fails, and the raw data remains available for reprocessing.

File-based ingestion or direct upserts into curated tables either add unnecessary staging latency or remove the clean recovery boundary provided by the raw landing layer.

  • File-based Snowpipe is reliable for staged files, but it adds an unnecessary file tier here and direct loading to curated tables weakens replayability.
  • External tables plus a dynamic table are better for querying files in place than for continuous row-based event ingestion with second-level responsiveness.
  • Direct stored procedure upserts couple producers to transformation logic and remove the durable raw layer needed for safer recovery.

Question 7

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A table stream on RAW_ORDERS feeds a task that runs every minute and MERGEs changes into CURATED_ORDERS. The task succeeds, and the new rows are visible in CURATED_ORDERS. Immediately after the task run, an engineer queries the stream and gets no rows.

Which explanation best fits this behavior?

Options:

  • A. Freshness lag is delaying new changes from appearing in the stream

  • B. The stream requires a manual refresh before it can be queried again

  • C. Downstream consumption advanced the stream offset

  • D. Object scope shifted from the source table to the target table

Best answer: C

Explanation: This is a downstream consumption scenario, not a freshness or scope problem. Because the task successfully used the stream in a MERGE and the target table already has the new rows, the stream offset was advanced by that consuming DML operation.

Snowflake streams track a position, or offset, over changes on their source object. When a downstream DML operation such as MERGE, INSERT, UPDATE, or DELETE reads from the stream in a transaction, those change records are considered consumed and the stream advances to the new offset.

In this scenario, the decisive fact is that the task succeeded and CURATED_ORDERS already contains the new rows. That means the stream changes were available and were used downstream. This rules out a freshness delay. It also rules out an object-scope issue, because the stream does not switch to tracking the target table after being used in a task.

The key takeaway is that an empty stream after a successful downstream load often indicates normal consumption behavior, not a problem.

  • Freshness confusion fails because the target table already contains the new rows, so the changes were available in time for the task.
  • Scope confusion fails because a stream continues to track its defined source object; using it in a task does not retarget it.
  • Manual refresh confusion fails because standard stream querying does not depend on a separate manual refresh step after consumption.

Question 8

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team ingests CDC events with Snowpipe Streaming into RAW_ORDERS. A stream on that table feeds three SQL transformations. An external scheduler calls a stored procedure every minute through the SQL API. When one run exceeds a minute, the next invocation overlaps, and curated tables fall behind. There are no non-Snowflake dependencies. What is the best next step?

Options:

  • A. Move orchestration to Airflow for stronger dependency handling

  • B. Increase the external scheduler frequency to reduce latency

  • C. Run the procedure on a larger warehouse every minute

  • D. Use a Snowflake task graph with SYSTEM$STREAM_HAS_DATA and AFTER dependencies

Best answer: D

Explanation: Because every step is already inside Snowflake, native tasks are the simplest and most appropriate orchestrator. A task graph using stream-aware execution avoids unnecessary external scheduling and reduces overlap-driven lag.

This is an orchestration design problem, not primarily a compute problem. When ingestion, change capture, and downstream SQL processing all stay inside Snowflake, a Snowflake task graph is usually the right control plane. Using a root task that checks SYSTEM$STREAM_HAS_DATA and chaining downstream work with AFTER keeps dependencies in-platform, avoids wasteful polling logic outside Snowflake, and simplifies operational ownership.

External schedulers are useful when a pipeline must coordinate jobs across multiple systems or satisfy outside control requirements. The stem explicitly says there are no such dependencies, so adding or keeping an external scheduler is unnecessary complexity. A larger warehouse might shorten runtime, but it does not fix the core issue that orchestration is being handled in a more complicated way than required.

  • More polling increases orchestration churn and does not solve overlapping invocations.
  • Airflow migration adds another scheduler even though the workflow has no stated cross-platform dependency.
  • Bigger warehouse may reduce execution time, but it does not address the unnecessary external orchestration layer.

Question 9

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team ingests order updates with Snowpipe Streaming. New rows reach RAW_ORDERS within 5 seconds. A stream on RAW_ORDERS feeds a task that MERGEs into CURATED_ORDERS every 15 minutes. The task usually finishes in 20 seconds. Same-region consumers read CURATED_ORDERS through a secure share and require data within 1 minute of arrival. Which change best addresses the problem?

Options:

  • A. Replicate the curated database to the consumer account

  • B. Change the stream on RAW_ORDERS to append-only

  • C. Add more Snowpipe Streaming channels to RAW_ORDERS

  • D. Use a triggered task on the stream for the MERGE

Best answer: D

Explanation: This is a downstream orchestration problem, not an ingestion problem. Snowpipe Streaming already lands data within seconds, but the 15-minute task schedule delays updates to the curated table that consumers actually read.

The core issue is end-to-end freshness. RAW_ORDERS is current within 5 seconds, and the MERGE itself only takes 20 seconds, so the missed SLA is caused by batching changes behind a 15-minute task schedule. Using a triggered task tied to the stream lets Snowflake run the transformation when new CDC records are available, so CURATED_ORDERS updates much closer to arrival time.

  • Ingest into RAW_ORDERS
  • Capture changes with a stream
  • Run the MERGE when the stream has data
  • Let the secure share expose the updated curated table

The key takeaway is that improving ingest speed or changing delivery mechanics will not fix latency introduced by scheduled downstream processing.

  • More ingest throughput misses the issue because rows already land in RAW_ORDERS within seconds.
  • Replication instead of sharing addresses cross-region or failover patterns, not a transformation schedule that delays the curated table.
  • Append-only stream changes CDC behavior and can be inappropriate for updates, while still leaving the 15-minute delay in place.

Question 10

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team needs order updates visible in orders_curated within 10 seconds. The source application emits one JSON message at a time, but the current pipeline writes staged files every 5 minutes, uses Snowpipe to load raw_orders, captures changes with a stream, and runs a scheduled task every minute to MERGE into orders_curated. Warehouses are not queued or saturated, yet end-to-end latency stays around 6 minutes. What is the best fix?

Options:

  • A. Use a larger warehouse and schedule the task every 10 seconds.

  • B. Query the staged files through an external table with AUTO_REFRESH.

  • C. Use Snowpipe Streaming, then process the stream with triggered tasks.

  • D. Replace the stream with a dynamic table refreshed every minute.

Best answer: C

Explanation: Near real-time pipelines are limited by the slowest stage. Here, the fixed 5-minute file batch is the dominant delay, so the best fix is continuous row ingestion with Snowpipe Streaming and downstream change processing driven from a stream with triggered tasks.

Near real-time design requires compatible choices across ingestion, change capture, and orchestration. In this scenario, the main bottleneck is not compute; it is the source pattern that only creates files every 5 minutes. File-based Snowpipe can load only after those files exist, so even faster tasks or bigger warehouses cannot meet a 10-second target. For a producer sending one message at a time, Snowpipe Streaming is the better ingestion fit because rows are written continuously to a Snowflake table without waiting for file creation. A stream on that landing table captures new changes, and triggered tasks can run the downstream MERGE when the stream has data.

  • Ingest continuously with Snowpipe Streaming.
  • Capture table changes with a stream.
  • Orchestrate downstream transforms with triggered tasks or a task graph.

More compute may reduce processing time, but it cannot remove source-side batching.

  • More compute fails because warehouses are already healthy and the biggest delay happens before rows reach the table.
  • Dynamic table refresh fails because it still leaves the 5-minute file batching unchanged and is not the best fit for explicit CDC-style MERGE processing.
  • External table auto-refresh fails because it tracks files in object storage rather than providing the lowest-latency continuous ingestion into Snowflake tables.

Continue with full practice

Use the SnowPro Data Engineer DEA-C02 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try SnowPro Data Engineer DEA-C02 on Web View SnowPro Data Engineer DEA-C02 Practice Test

Free review resource

Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026