Free SnowPro DEA-C02 Full-Length Practice Exam: 60 Questions

May 1, 2026

Try 60 free SnowPro Data Engineer DEA-C02 questions across the exam domains, with explanations, then continue with full IT Mastery practice.

This free full-length SnowPro Data Engineer DEA-C02 practice exam includes 60 original IT Mastery questions across the exam domains.

These questions are for self-assessment. They are not official exam questions and do not imply affiliation with the exam sponsor.

Count note: this page uses the full-length practice count maintained in the Mastery exam catalog. Some certification vendors publish total questions, scored questions, duration, or unscored/pretest-item rules differently; always confirm exam-day rules with the sponsor.

Need concept review first? Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon, then return here for timed mocks and full IT Mastery practice.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try SnowPro Data Engineer DEA-C02 on Web View full SnowPro Data Engineer DEA-C02 practice page

Exam snapshot

Exam route: SnowPro Data Engineer DEA-C02
Practice-set question count: 60
Time limit: 120 minutes
Practice style: mixed-domain diagnostic run with answer explanations

Full-length exam mix

Domain	Weight
Data Sourcing, Storage, and Ingestion	22%
Transformations, Programmability, and Developer Workflows	24%
Streaming, Orchestration, and Near Real-Time Pipeline Design	20%
Sharing, Replication, and Cross-Platform Delivery	18%
Compute, Governance, Observability, and Performance	16%

Use this as one diagnostic run. IT Mastery gives you timed mocks, topic drills, analytics, code-reading practice where relevant, and full practice.

Practice questions

Questions 1-25

Question 1

Topic: Compute, Governance, Observability, and Performance

A data engineering team already ingests hourly order files with Snowpipe and merges them into a curated table with a task. Analysts only notice bad loads hours later, even though the load and merge complete successfully. They want a Snowflake-native way to detect missing ORDER_ID values or abnormally low row counts earlier, without redesigning the pipeline.

create or replace pipe raw.orders_pipe auto_ingest=true as
copy into raw.orders_stage
from @raw.orders_ext_stage
file_format = (type = json);
create or replace task etl.merge_orders
  warehouse = etl_wh
  schedule = 'USING CRON 5 * * * * UTC'
as
merge into prod.orders t
using raw.orders_stage s
on t.order_id = s.order_id
when matched then update set order_total = s.order_total
when not matched then insert (order_id, order_total)
values (s.order_id, s.order_total);

Which Snowflake feature is the best next step?

Options:

A. Create an alert that only checks task status
B. Replace the task with a dynamic table
C. Create a stream on RAW.ORDERS_STAGE
D. Associate data metric functions with PROD.ORDERS

Best answer: D

Explanation: Data metric functions are designed for table-level data quality observability. Here, the pipeline already runs, but the team needs earlier detection of bad data such as missing keys or unexpectedly low volumes without replacing Snowpipe or the task flow.

The core need is data quality observability on an existing pipeline, not a new ingestion or transformation design. Snowpipe and the merge task can both succeed even when upstream files contain missing business keys or unusually small batches. Associating data metric functions to PROD.ORDERS lets the team measure conditions like row counts or nulls on the table they already load, so issues can be surfaced earlier without rebuilding the pipeline.

This fits because it adds monitoring rather than changing processing:

keep the current Snowpipe load
keep the current task-based merge
evaluate table quality metrics on a schedule
use metric results for investigation or notifications

The closest alternative only checks whether the task ran, but execution success does not prove the data is healthy.

Streams are CDC: a stream tracks row changes for downstream processing, but it does not measure null spikes or low row-count anomalies.
Dynamic tables redesign flow: converting the task-based merge changes pipeline architecture instead of adding lightweight observability.
Task status is too narrow: a task can complete successfully while still producing incomplete or poor-quality data.

Question 2

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A data engineering team has validated the SQL in each task. However, publish_task sometimes reads curated.events before the latest refresh is complete.

Exhibit:

create or replace task ingest_task
  warehouse = etl_wh
  schedule = '5 MINUTES'
as
  copy into raw.events from @landing/events;

create or replace task enrich_task
  warehouse = etl_wh
  after ingest_task
as
  merge into curated.events c
  using raw.events r on c.id = r.id
  when matched then update set payload = r.payload
  when not matched then
    insert (id, payload) values (r.id, r.payload);

create or replace task publish_task
  warehouse = etl_wh
  after ingest_task
as
  insert into mart.event_counts
  select current_timestamp(), count(*) from curated.events;

What is the best next step?

Options:

A. Rewrite enrich_task as a stored procedure
B. Replace COPY INTO with Snowpipe Streaming
C. Make publish_task depend on enrich_task
D. Increase the warehouse size for enrich_task

Best answer: C

Explanation: This is a task-orchestration issue, not a transformation-logic issue. Because enrich_task and publish_task both use AFTER ingest_task, they are sibling tasks that can run independently after ingestion completes. To guarantee fresh curated data before publishing, publish_task must depend on enrich_task.

In Snowflake, the SQL inside a task defines what the task does, but the task graph defines when each step can run. Here, ingest_task is the scheduled root task. Both enrich_task and publish_task are configured with AFTER ingest_task, which makes them parallel child branches rather than a required sequence.

ingest_task starts the graph every 5 minutes.
enrich_task runs after ingestion finishes.
publish_task also runs after ingestion finishes.
Nothing currently requires publish_task to wait for enrich_task.

To enforce dependency ordering, change publish_task to AFTER enrich_task. The key takeaway is that stale downstream reads here are caused by task-graph design, not by the MERGE logic.

Stored procedure rewrite changes how the transform is implemented, but it does not add the missing dependency between sibling tasks.
Snowpipe Streaming switch changes ingestion method and latency, not the order between downstream tasks.
Warehouse resize may reduce runtime, but faster compute is not a guarantee of execution order.

Question 3

Topic: Transformations, Programmability, and Developer Workflows

A data engineering team loads CDC files into a staging table every 5 minutes. After each load, one task must iterate through a list of target tables, run MERGE statements, write batch results to an audit table, and stop the downstream share-refresh task if any step fails. Which Snowflake design is the best fit?

Options:

A. Use a scalar UDF inside each MERGE to write audit rows.
B. Use a Python UDF to loop through tables and execute SQL.
C. Use a Snowflake Scripting stored procedure called by the task.
D. Use an external function to send the batch to an API.

Best answer: C

Explanation: This workload needs statement orchestration, not reusable inline expression logic. A stored procedure is the right Snowflake abstraction for loops, MERGE and audit DML, and exception handling so the task can fail the batch and prevent downstream execution.

Choose a stored procedure when the logic must orchestrate SQL statements rather than compute a value inside a query. Here, the workflow must loop through multiple target tables, execute MERGE operations, insert audit records, and raise an error so the task graph stops before the share-refresh step runs.

Stored procedures support control flow, dynamic SQL, DML, and exception handling.
UDFs are meant for reusable calculations or row/table-returning logic embedded in SQL.
External functions are for calling remote services, not for native Snowflake batch orchestration.

If the requirement were reusable inline transformation logic inside SELECT, MERGE, or a dynamic table definition, a UDF would be the better abstraction.

Inline logic only the scalar UDF option fails because UDFs return values for SQL expressions and are not the right abstraction for batch side effects.
Still a UDF the Python UDF option is aimed at data computation, not task-driven orchestration of multiple SQL statements across tables.
Extra dependency the external function option adds unnecessary network and service complexity when the workflow can run natively inside Snowflake.

Question 4

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

Source-system change rows already land in RAW.CUSTOMER_CHANGES every 30 seconds. A task refreshes CORE.CUSTOMERS once per minute by scanning the raw table and filtering on an EVENT_TS watermark. Ingestion into RAW.CUSTOMER_CHANGES is current, but the task is lagging and some late-arriving updates with older EVENT_TS are skipped. What is the best next step?

Options:

A. Replace the table with an external table and process newly discovered files.
B. Increase the task warehouse size and keep using the EVENT_TS watermark.
C. Load smaller micro-batches into the raw table more frequently with Snowpipe.
D. Create a stream on RAW.CUSTOMER_CHANGES and MERGE from the stream.

Best answer: D

Explanation: This is a CDC design problem, not an ingestion problem. A stream on the landed table lets Snowflake track table changes directly, so the task can consume only new rows without repeated full-table scans or missed late arrivals. That is the right boundary when the data is already in Snowflake.

When source data has already landed in a Snowflake table, the main design choice is how to capture table changes reliably, not how to ingest files faster. A raw-table pattern that filters by EVENT_TS is only an approximation of CDC: it requires rescanning the table and can miss late-arriving records whose event time is older than the stored watermark.

A stream is Snowflake’s native change-tracking object for this situation. The downstream task should read from the stream and MERGE those rows into the target table. Snowflake then tracks which changes have been consumed by stream offset, instead of relying on a custom timestamp rule.

Scaling compute may reduce runtime, but it does not fix the incorrect CDC boundary or the skipped late-arriving updates.

External table mismatch applies to querying staged files before or instead of loading them, but the data is already in a Snowflake table.
More compute only masks it may reduce lag temporarily, yet it does not solve missed rows caused by watermark-based change detection.
Faster ingestion misses the issue improves landing cadence, but the symptom appears after landing, in downstream CDC processing.

Question 5

Topic: Data Sourcing, Storage, and Ingestion

A retail application emits individual order events continuously from its services. The team needs data loaded into a Snowflake table with latency measured in seconds and does not want to create or manage staged files. Which Snowflake-native ingestion pattern is the best fit?

Options:

A. An external table on the source storage
B. A scheduled COPY INTO load from staged files
C. Snowpipe Streaming into the target table
D. Snowpipe auto-ingest from a stage

Best answer: C

Explanation: Snowpipe Streaming fits continuous, low-latency ingestion when records are pushed directly from an application or connector into Snowflake. It is the Snowflake-native choice when the delivery style is event-by-event rather than staged files.

The deciding concept is file-based versus row-based ingestion. Snowpipe Streaming is built for near-real-time ingestion of records that arrive continuously from applications, services, or connectors, and it avoids the extra step of writing micro-files to a stage first.

In this scenario, the requirements are:

continuous event delivery
seconds-level latency
no staged file management

That combination maps directly to Snowpipe Streaming. By contrast, standard Snowpipe and COPY INTO patterns are file-oriented: data must first land in a stage and then be loaded. External tables are also not a load mechanism into native Snowflake tables; they let Snowflake query files where they already live. The closest distractor is standard Snowpipe, but it still assumes staged files.

Standard Snowpipe still depends on files arriving in a stage, so it does not match direct event delivery.
Scheduled COPY INTO is a batch or micro-batch pattern and adds orchestration plus file handling.
External tables expose data in object storage in place rather than ingesting it into a native Snowflake table.

Question 6

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

Orders arrive continuously into RAW_ORDERS through Snowpipe Streaming. The team needs a transformed CURATED_ORDERS dataset that joins reference tables and is usually no more than 5 minutes behind source data. The logic is SQL-only, no custom procedural branching is required, and refresh compute must run on a dedicated warehouse. Which design is the BEST fit?

Options:

A. Create a dynamic table with TARGET_LAG = '5 minutes' on a dedicated warehouse.
B. Create a stream and a 1-minute task graph to MERGE changes into a table.
C. Create a standard view and let analyst queries recompute the transformed results.
D. Add search optimization to RAW_ORDERS and query the raw table directly.

Best answer: A

Explanation: A dynamic table is the best balance here because the requirement is near-real-time freshness, SQL-only transformation, and low operational overhead. Setting a 5-minute TARGET_LAG lets Snowflake manage incremental refresh while a dedicated warehouse isolates refresh compute from analyst workloads.

Dynamic tables are designed for declarative incremental pipelines when you want transformed table outputs without hand-building change-processing orchestration. In this scenario, the team needs a persisted curated table, about 5-minute freshness, and isolated compute. A dynamic table fits because you define the SQL once, assign a warehouse, and use TARGET_LAG to express acceptable staleness. Snowflake then manages refresh planning and incremental maintenance for the dependency graph.

Use dynamic tables when SQL transformations are sufficient and low operational overhead matters.
Use streams and tasks when you need precise scheduling, procedural logic, or custom MERGE behavior.
Use views or search optimization for query access patterns, not for maintaining a transformed incremental table.

The key distinction is declarative freshness versus hand-managed orchestration.

The stream-and-task design can work, but it adds manual scheduling and MERGE orchestration that the requirements do not need.
The standard view option does not persist transformed results, so freshness depends on query execution and analyst compute.
Search optimization improves selective lookups on existing tables; it does not build or maintain a transformed incremental dataset.

Question 7

Topic: Compute, Governance, Observability, and Performance

An engineering team wants to monitor sales.orders with one built-in completeness metric and one company-specific metric. Review the exhibit and choose the best interpretation.

ALTER TABLE sales.orders
  ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT
    ON (customer_id);

CREATE OR REPLACE DATA METRIC FUNCTION governance.high_value_order_pct(
  t TABLE(amount NUMBER)
)
RETURNS NUMBER
AS 'SELECT 100 * COUNT_IF(amount > 1000) / COUNT(*) FROM t';

Options:

A. Both metrics are custom because each must be declared for monitoring.
B. SNOWFLAKE.CORE.NULL_COUNT is system-provided; high_value_order_pct is custom business logic.
C. high_value_order_pct is system-provided; NULL_COUNT is a custom table check.
D. Both metrics are system-provided because Snowflake evaluates both.

Best answer: B

Explanation: System-provided data metric functions are supplied by Snowflake, typically from SNOWFLAKE.CORE. A metric created with CREATE DATA METRIC FUNCTION is custom, and the amount > 1000 rule shows business-specific logic rather than a built-in platform metric.

Snowflake distinguishes system-provided metrics from custom measurements by where the metric logic originates. SNOWFLAKE.CORE.NULL_COUNT is a built-in data metric function that Snowflake provides for standard data-quality checks such as completeness. governance.high_value_order_pct is different: the team defines it with CREATE DATA METRIC FUNCTION, and its logic depends on a business rule, amount > 1000, so it is a custom measurement.

System DMFs are supplied by Snowflake and referenced from the system namespace.
Custom DMFs are created in your account to represent organization-specific quality or business expectations.

Attaching a metric to a table does not make it custom, and executing a custom metric inside Snowflake does not make it system-provided.

The choice claiming both are system-provided confuses execution location with metric origin; Snowflake can run both, but only the SNOWFLAKE.CORE metric is built in.
The choice reversing the roles ignores that CREATE DATA METRIC FUNCTION defines account-specific logic, while NULL_COUNT already exists as a Snowflake-supplied metric.
The choice claiming both are custom treats configuration as authorship; associating a built-in metric with a table does not change its source.

Question 8

Topic: Sharing, Replication, and Cross-Platform Delivery

An e-commerce platform emits order events continuously from application services, not as files. Data engineers need curated order tables refreshed within 5 minutes, and they want to avoid managing change-processing logic or complex orchestration where possible. A finance team in a separate Snowflake account in the same region must query a masked subset using its own warehouse. Which architecture is the best fit?

Options:

A. Use external tables on landed files, scheduled transformations, and daily file unloads.
B. Use Snowpipe Streaming, streams with task graphs, secure views with masking, and Secure Data Sharing.
C. Use Snowpipe auto-ingest from staged files, materialized views, and database replication.
D. Use Snowpipe Streaming, dynamic tables, secure views with masking, and Secure Data Sharing.

Best answer: D

Explanation: Snowpipe Streaming is the right ingestion fit for continuous application events without file staging. Dynamic tables meet the 5-minute freshness goal with less orchestration than streams and task graphs, and Secure Data Sharing lets the finance account query governed data using its own warehouse.

The core trade-off here is choosing the simplest Snowflake-native architecture that still satisfies low-latency ingestion, managed transformation freshness, and governed delivery. Snowpipe Streaming is built for continuous row-based ingestion from applications, so it fits better than file-based loading. Dynamic tables are the best match for keeping curated tables current within a target lag while reducing custom orchestration compared with building and operating streams plus task graphs. For delivery, Secure Data Sharing is the right mechanism because the finance team already has its own Snowflake account in the same region and needs separate compute. Exposing a secure view with masking preserves governance while sharing live data.

A streams-and-tasks design could also work technically, but it adds operational complexity the scenario explicitly wants to avoid.

Extra orchestration The streams-and-tasks design can satisfy freshness, but it adds change-processing and task-graph management that the stem says to minimize.
Wrong ingestion and delivery Snowpipe auto-ingest assumes staged files, and database replication is unnecessary when a same-region live share is required.
Too much delay External tables plus daily unloads turn the pipeline into file-based delivery and miss the near-real-time curated-data requirement.

Question 9

Topic: Compute, Governance, Observability, and Performance

A team uses one LARGE warehouse for a CDC pipeline and BI dashboards. The pipeline reads from a stream and runs MERGE tasks every minute. Query History shows dashboard queries spend 20-30 seconds in queued_overload_time during task runs, but only 4-6 seconds executing. Requirements: keep dashboards under 10 seconds, keep the pipeline schedule unchanged, and track credits separately for BI and ingestion. What is the best next action?

Options:

A. Create separate warehouses for CDC tasks and BI dashboards
B. Increase the warehouse size to XLARGE
C. Convert the current warehouse to multi-cluster
D. Enable Query Acceleration Service for dashboard queries

Best answer: A

Explanation: The dashboards are not slow because of execution time; they are waiting for compute. Since the delay happens when CDC tasks overlap and the team also needs separate credit tracking, the main concern is workload isolation, so separate warehouses are the best fit.

queued_overload_time indicates queries are waiting for warehouse resources, not spending that time executing. In this scenario, dashboard queries finish in 4-6 seconds once they start, so the real issue is contention from mixing minute-by-minute CDC MERGE tasks with BI traffic on the same warehouse. Because the pipeline schedule must stay unchanged and finance wants BI and ingestion credits tracked separately, the strongest Snowflake design is to isolate the workloads on different warehouses.

A multi-cluster warehouse can help with concurrency on a shared warehouse, but it still keeps both workloads on the same compute boundary. When predictable dashboard latency and per-workload cost visibility are explicit requirements, separate warehouses are the better design.

Multi-cluster warehouse helps reduce concurrency pressure, but it does not give separate cost attribution or the same workload boundary.
Query Acceleration Service targets eligible query processing, not waiting time caused by warehouse contention.
Larger warehouse size adds compute, but it still mixes BI and CDC activity and is a broader cost increase than necessary.

Question 10

Topic: Data Sourcing, Storage, and Ingestion

A task that runs every 5 minutes began failing after a partner changed the JSON payload shape. The current load extracts nested paths and casts them into a typed table:

COPY INTO curated.events(order_id, customer_id, amount, device_os)
FROM (
  SELECT
    $1:order.id::NUMBER,
    $1:customer.id::NUMBER,
    $1:amount::NUMBER(10,2),
    $1:device.os::STRING
  FROM @partner_stage
)
FILE_FORMAT = (TYPE = JSON);

New attributes can appear at any nested level, and the team must keep every record plus the full original payload for later reprocessing. What is the best next step?

Options:

A. Set ON_ERROR = CONTINUE on the COPY INTO
B. Load raw JSON into VARIANT, then transform downstream
C. Switch the pipeline to Snowpipe Streaming
D. Enable schema evolution with MATCH_BY_COLUMN_NAME

Best answer: B

Explanation: Because the source is semi-structured JSON with changing nested shape, the fragile part is the path-based typed COPY, not the file transport. Landing each record in VARIANT preserves all attributes and lets downstream transformations handle evolving schema safely.

When nested JSON structure changes, a COPY INTO that extracts specific paths and casts them into relational columns becomes brittle. Snowflake can ingest the full document into a VARIANT column without forcing those nested attributes to match a fixed schema at load time. That is the safest ingestion pattern here because the requirement is to keep every record and preserve the original payload for replay or backfill.

Ingest each JSON record into a raw table with a VARIANT column.
Capture metadata such as filename or load timestamp if needed.
Apply curated typing and validation downstream with SQL, Snowpark, streams/tasks, or dynamic tables.

Schema evolution and name-based matching help only in narrower direct-to-column mapping scenarios; they do not make path-based transformed JSON loads resilient to arbitrary nested drift.

Schema evolution helps with direct column mapping, but it does not make a transformed nested-path load resilient to arbitrary JSON shape changes.
Snowpipe Streaming changes ingestion transport and latency, not the fact that typed path extraction is failing on evolving semi-structured payloads.
ON_ERROR = CONTINUE would skip problematic rows, which breaks the requirement to keep every record and full payload.

Question 11

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

An events pipeline uses Snowpipe Streaming to ingest into raw.events. Two independent tasks run every minute and both read the same stream, raw.events_strm: one writes error events to ops.alerts, and the other writes minute aggregates to mart.usage_minute. Ingestion latency is under 30 seconds, but the alerts task often sees no rows while the aggregate task succeeds. Both consumers must receive all changes within 2 minutes with the smallest Snowflake-native change. What should the team do?

Options:

A. Create a second stream on raw.events so each task consumes its own stream.
B. Recreate the stream on a view used by both tasks.
C. Replace Snowpipe Streaming with Snowpipe and keep one shared stream.
D. Increase task frequency and warehouse size for both tasks.

Best answer: A

Explanation: This is a downstream consumption problem, not a freshness problem. The data is arriving on time, but two independent tasks are competing to consume the same stream, so each consumer needs its own stream offset.

The issue is downstream consumption. The stem says ingestion is already meeting the latency target, so freshness is not the bottleneck. It also does not describe missing changes caused by tracking the wrong object; both tasks need the same changes from the same base table.

A Snowflake stream is not a fan-out queue for multiple independent consumers. When a task consumes a stream in a committed DML transaction, that stream advances its offset. If two tasks share one stream, one task can consume the available changes before the other task processes them.

For this pattern, create separate streams on the same source table so each downstream task has its own independent offset. Creating a view-based stream would only help if the problem were object scope, such as needing to track changes in a different derived dataset.

Recreating the stream on a view changes object scope, but it does not solve two consumers competing for one stream offset.
Replacing Snowpipe Streaming addresses ingestion freshness, but the stem already says data arrives within the required latency.
Increasing task frequency or warehouse size may speed execution, but it cannot stop one task from advancing the shared stream before the other consumes it.

Question 12

Topic: Sharing, Replication, and Cross-Platform Delivery

A provider account currently runs hourly COPY INTO unloads to cloud storage so a partner can ingest a curated sales dataset into its own Snowflake account. The partner reports stale data between unloads, and the provider’s security team notes that revoking access does not remove files the partner already copied. Both Snowflake accounts are in the same region. What is the best next step?

Options:

A. Create a secure share of the curated objects for the partner account
B. Increase the COPY INTO unload frequency and tighten storage access
C. Have the partner create an external table on the exported files
D. Replicate the database to the partner account each hour

Best answer: A

Explanation: This issue comes from using copied files as a sharing mechanism. Secure data sharing is designed for controlled consumer access: the provider keeps the source of truth, the consumer sees current data, and access can be revoked without managing exported file copies.

Use secure data sharing when the goal is controlled, read-only access to Snowflake data rather than distributing copies. In this scenario, hourly unloads create two built-in problems: the partner only sees data as fresh as the last export, and any files already copied remain outside the provider’s control after access is revoked. A secure share exposes selected Snowflake objects directly from the provider account to the consumer account without copying the underlying data. The consumer queries the live shared data with its own compute, while the provider retains governance over access. Database replication is for maintaining another copy, often for availability or distribution needs, and more frequent unloads or exported-file external tables still rely on copies.

Replication mismatch creates another data copy and is better suited to availability or distribution, not centrally governed live partner access.
More frequent unloads may reduce staleness but do not solve the copied-file revocation problem.
External table on exports still depends on exported files, so it does not remove the copy-based governance issue.

Question 13

Topic: Transformations, Programmability, and Developer Workflows

Order files land every minute and load into RAW_ORDERS on time. Query history shows no warehouse queuing for the downstream task, but the curated table is still wrong because the next step must apply several business rules in one run: keep the latest event per order_id, MERGE valid rows into ORDERS_CURATED, write invalid rows to ORDERS_REJECTED, and roll back if any step fails. What is the best next step?

Options:

A. Upsize the task warehouse and enable more clusters.
B. Call a Snowflake Scripting stored procedure from the task.
C. Replace the file load with Snowpipe Streaming.
D. Use a dynamic table for all downstream processing.

Best answer: B

Explanation: This is primarily a transformation-design problem, not a compute or ingestion problem. The files already load on time and the warehouse is not queued; the missing capability is multi-step business logic with conditional DML, reject handling, and transactional rollback, which fits a Snowflake Scripting stored procedure invoked by a task.

The key signal is that ingestion and warehouse capacity are healthy, yet the pipeline still cannot enforce the required business rules. When a downstream step must execute ordered logic such as deduplication, MERGE, reject-table writes, and rollback on failure, the right design is to encapsulate that logic in a Snowflake Scripting stored procedure and invoke it from the task. This gives the pipeline one governed unit that can use variables, multiple SQL statements, control flow, exception handling, and transaction control.

Deduplicate to the latest row per business key.
Separate valid and invalid records.
MERGE valid rows and insert rejected rows.
Fail and roll back consistently if any step errors.

More warehouse capacity might reduce runtime, but it does not add the missing business-logic structure.

More compute fails because the stem already says there is no warehouse queuing, so capacity is not the blocker.
Faster ingestion misses the point because loading into RAW_ORDERS is already current.
Dynamic table only is tempting for declarative transforms, but it does not fit the required multi-step side effects and rollback pattern.

Question 14

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

Exhibit:

create or replace dynamic table order_enriched_dt
  target_lag = '5 minutes'
  warehouse = etl_wh
as
select order_id, region, amount
from raw_orders;

create or replace dynamic table sales_kpi_dt
  target_lag = '15 minutes'
  warehouse = etl_wh
as
select region, sum(amount) as total_amount
from order_enriched_dt
group by region;

A dashboard reads only sales_kpi_dt, and up to 15 minutes of staleness is acceptable. order_enriched_dt is not queried directly. Which change best aligns incremental maintenance with the downstream freshness expectation?

Options:

A. Keep the lags and use a larger warehouse.
B. Set order_enriched_dt to TARGET_LAG = DOWNSTREAM.
C. Lower both dynamic tables to TARGET_LAG = '1 minute'.
D. Set sales_kpi_dt to TARGET_LAG = DOWNSTREAM.

Best answer: B

Explanation: The freshness SLA exists at sales_kpi_dt, not at the intermediate table. Using TARGET_LAG = DOWNSTREAM on order_enriched_dt lets Snowflake refresh that upstream table only when needed for the 15-minute downstream requirement, reducing unnecessary refresh work.

In a dynamic table pipeline, the explicit TARGET_LAG should usually sit where the business freshness expectation actually exists. Here, the dashboard consumes only sales_kpi_dt and allows data to be up to 15 minutes old, so that table should remain the freshness anchor. Since order_enriched_dt is only an intermediate step, TARGET_LAG = DOWNSTREAM is the better pattern: Snowflake refreshes it as needed so dependent dynamic tables can meet their own lag.

Keep an explicit lag on the consumer-facing dynamic table.
Use DOWNSTREAM for upstream-only dynamic tables in the chain.
Avoid maintaining intermediates more frequently than downstream consumers require.

Making the final table DOWNSTREAM removes the clear SLA anchor, while lowering lags or adding more compute increases work without improving the required business outcome.

Using DOWNSTREAM on the final table is flawed because the consumer-facing object should carry the explicit freshness target.
Lowering both lags to 1 minute exceeds the stated need and drives more frequent refresh activity than necessary.
Increasing warehouse size may add compute, but it does not fix the mismatch between the intermediate table’s refresh rate and the downstream SLA.

Question 15

Topic: Sharing, Replication, and Cross-Platform Delivery

Which statement correctly differentiates Snowflake replication from secure data sharing when the requirement is resilience, data locality, or promoted failover behavior?

Options:

A. Secure data sharing creates a failover-ready secondary copy; replication is mainly for consumer read access.
B. Replication creates a secondary copy that can be promoted for failover; secure data sharing exposes provider-managed data for read access.
C. Replication is for live no-copy access; secure data sharing is for cross-region resilience and locality.
D. Both features support promoted failover because both create the same secondary-database behavior.

Best answer: B

Explanation: For resilience, locality, and explicit failover, Snowflake replication is the relevant mechanism because it maintains a secondary copy that can be promoted. Secure data sharing is designed for governed read access to provider-managed data, not for creating a failover target.

The core distinction is copied secondary versus shared access. Replication creates and synchronizes a secondary database or account in another region or cloud, which supports disaster recovery, locality, and promoted failover behavior when configured with Snowflake replication and failover features. Secure data sharing lets another account query shared objects without taking ownership of a failover-ready secondary copy. That makes sharing excellent for collaboration and data distribution, but it does not by itself satisfy requirements such as regional resilience or promoting a secondary after an outage. When the requirement names recovery, locality, or failover promotion, think replication; when it names governed consumer access to the same data, think secure data sharing.

DR copy confusion The option claiming secure data sharing creates a failover-ready secondary mixes up consumer access with replicated recovery targets.
Reversed use case The option saying replication is for no-copy live access swaps the primary purpose of the two features.
Equivalent behavior The option treating both features as promotable secondaries ignores that promoted failover depends on replicated copies, not shares.

Question 16

Topic: Sharing, Replication, and Cross-Platform Delivery

A provider account ingests partner order files as JSON into a raw table with Snowpipe every 5 minutes. Loads finish within 3 minutes, so the 15-minute SLA is already met. Two consumer accounts need the data for BI with consistent customer, product, and calendar definitions, and the shared dataset cannot expose raw email addresses. Which design is the BEST next step?

Options:

A. Add search optimization to the raw table and share it directly.
B. Replace Snowpipe with Snowpipe Streaming for lower-latency ingestion.
C. Replicate the raw database so each consumer can model independently.
D. Build a curated dimensional model and share secure views with masked emails.

Best answer: D

Explanation: This is primarily a modeling and delivery problem, not an ingestion problem. The current Snowpipe pipeline already meets the latency target, so the best next step is to create a curated analytics-ready model and share governed secure views from that layer.

When ingestion already satisfies the SLA, the remaining gap is how data is organized for downstream use. Snowpipe gets files into Snowflake, but it does not create conformed customer, product, and calendar definitions for BI consumers. The better design is to keep the raw landing layer, transform it into curated fact and dimension tables, and publish secure views that expose only the governed, analytics-ready shape.

Land JSON in a raw layer.
Transform it into conformed business entities for reuse.
Share secure views that mask, hash, or omit sensitive fields.

Changing ingestion speed, tuning raw-table access, or pushing modeling to each consumer does not solve the consistency and governance requirements in the stem.

Lower latency misses the point because the stated 15-minute ingestion SLA is already being met.
Consumer-side modeling creates inconsistent business definitions across accounts and weakens centralized governance.
Raw-table tuning may improve some query patterns, but it does not organize data into a governed BI-ready structure.

Question 17

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A table stream on RAW_ORDERS feeds a task that runs every minute and MERGEs changes into CURATED_ORDERS. The task succeeds, and the new rows are visible in CURATED_ORDERS. Immediately after the task run, an engineer queries the stream and gets no rows.

Which explanation best fits this behavior?

Options:

A. Freshness lag is delaying new changes from appearing in the stream
B. Downstream consumption advanced the stream offset
C. Object scope shifted from the source table to the target table
D. The stream requires a manual refresh before it can be queried again

Best answer: B

Explanation: This is a downstream consumption scenario, not a freshness or scope problem. Because the task successfully used the stream in a MERGE and the target table already has the new rows, the stream offset was advanced by that consuming DML operation.

Snowflake streams track a position, or offset, over changes on their source object. When a downstream DML operation such as MERGE, INSERT, UPDATE, or DELETE reads from the stream in a transaction, those change records are considered consumed and the stream advances to the new offset.

In this scenario, the decisive fact is that the task succeeded and CURATED_ORDERS already contains the new rows. That means the stream changes were available and were used downstream. This rules out a freshness delay. It also rules out an object-scope issue, because the stream does not switch to tracking the target table after being used in a task.

The key takeaway is that an empty stream after a successful downstream load often indicates normal consumption behavior, not a problem.

Freshness confusion fails because the target table already contains the new rows, so the changes were available in time for the task.
Scope confusion fails because a stream continues to track its defined source object; using it in a task does not retarget it.
Manual refresh confusion fails because standard stream querying does not depend on a separate manual refresh step after consumption.

Question 18

Topic: Sharing, Replication, and Cross-Platform Delivery

A data engineering team lands source data as Parquet in cloud object storage. Snowflake must transform the data into curated datasets, and Spark and Trino users must query those same curated tables without a separate export pipeline. Which Snowflake-oriented architecture best fits?

Options:

A. Use native tables with dynamic tables
B. Use Iceberg tables for the curated layer
C. Use external tables on the landing files
D. Use native tables with secure data sharing

Best answer: B

Explanation: Iceberg tables are the best fit when curated data must be usable both in Snowflake and in other query engines. They balance storage, transformation, and delivery by keeping the curated layer in an open table format instead of creating a separate export path.

The key concept is choosing an architecture that supports cross-platform delivery without duplicating the curated dataset. Iceberg tables are designed for this pattern: data stays in object storage in an open table format, Snowflake can transform and manage the curated layer, and engines such as Spark or Trino can read the same tables.

In this scenario, the deciding factor is not just transformation automation but the delivery model. External tables expose files in place, but they are not a Snowflake-managed curated table architecture. Native tables work well for Snowflake-only consumption, and dynamic tables help maintain derived data, but both keep the curated layer inside Snowflake. The best trade-off here is an open table architecture that serves both Snowflake and non-Snowflake consumers.

External files only misses the need for a managed curated layer; external tables mainly expose existing files for query.
Secure sharing is strong for Snowflake-to-Snowflake delivery, but it does not make the same tables directly readable by Spark or Trino.
Dynamic tables help automate incremental transformations, but they do not turn native Snowflake tables into an open cross-engine format.

Question 19

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team continuously loads RAW_ORDERS into Snowflake. Downstream transformations are implemented as a chain of streams and tasks, but when one task fails or is suspended, curated tables can stay stale until engineers manually resume and replay the chain. The business only needs data within 10 minutes, transformations are SQL-based, and the team wants less orchestration code rather than exact cron control. What is the best Snowflake-native fix?

Options:

A. Use chained dynamic tables with an appropriate TARGET_LAG
B. Point BI queries at the stream instead of curated tables
C. Build a materialized view directly on RAW_ORDERS
D. Keep streams and tasks, but run them more frequently

Best answer: A

Explanation: Dynamic tables fit SQL-based incremental pipelines that need a freshness SLA more than exact scheduling control. In this scenario, setting TARGET_LAG addresses stale downstream data while removing much of the fragile stream-and-task orchestration.

Dynamic tables are the Snowflake-native choice when a pipeline is primarily SQL, needs incremental results, and can be governed by a freshness target instead of explicit task schedules. Here, the core problem is orchestration fragility: a failed task leaves downstream tables stale until someone intervenes. By defining each transformation as a dynamic table and setting TARGET_LAG to the 10-minute requirement, Snowflake manages dependency-aware refreshes and incremental maintenance automatically. That gives a simpler operating model than streams plus tasks while still keeping near-real-time data current. The closest alternative is keeping streams and tasks, but that preserves the monitoring, scheduling, and replay burden the team wants to reduce.

More frequent tasks still leave the pipeline dependent on manual scheduling, monitoring, and replay after failures.
Materialized view is too limited for a general multi-step transformation pipeline and does not replace the orchestration pattern described.
Querying the stream is not a stable curated-serving design and bypasses managed incremental table maintenance.

Question 20

Topic: Sharing, Replication, and Cross-Platform Delivery

A data provider shares sales data with dozens of external consumers. Replication to the required regions is already in place. Most consumer queries are dashboards filtered by month, region, and category, and the provider wants a stable long-term contract.

Exhibit: current shared object

create or replace secure view publish.v_customer_orders as
select
  o.order_ts,
  c.region,
  p.category,
  oi.quantity,
  oi.unit_price
from oltp.orders o
join oltp.order_items oi on o.order_id = oi.order_id
join oltp.customers c on o.customer_id = c.customer_id
join oltp.products p on oi.product_id = p.product_id;

What is the best interpretation?

Options:

A. Replace sharing with a failover group
B. Add search optimization to the OLTP tables
C. Expose a curated analytics model instead of OLTP joins
D. Increase the provider warehouse for shared queries

Best answer: C

Explanation: The main issue is architectural fit, not a missing tuning switch. The shared object exposes normalized operational joins to many analytic consumers, so a curated reporting model is the better pattern for stable contracts and predictable consumption.

This scenario points to a data-modeling decision. The secure view is built directly on OLTP-style tables (orders, order_items, customers, products), while the consumers are running repeated analytical queries by business dimensions such as month, region, and category. For shared analytics, the stronger pattern is to publish a consumer-facing model at the intended reporting grain, such as fact/dimension tables or another curated reporting layer, and share that layer.

Feature-level tuning does not solve the core mismatch:

Search optimization can help certain selective access patterns, but it does not turn a normalized operational model into the best shared analytics contract.
Replication and failover address geographic availability and recovery, not consumer-facing model design.
Shared queries use consumer compute, so increasing the provider warehouse does not fix consumer query behavior.

The key takeaway is to choose the right published data product shape first, then tune it if needed.

Search optimization only helps specific lookup patterns, but the exhibit still exposes an OLTP-style join structure to analytics consumers.
Failover group swap changes replication and recovery mechanics, not the shared model consumers query.
Bigger provider warehouse misses that consumers run shared queries on their own compute, not the provider’s warehouse.

Question 21

Topic: Compute, Governance, Observability, and Performance

A BI team runs dashboard queries in large bursts for about 5 minutes at the top of each hour. They want low latency during those bursts without keeping oversized compute running the rest of the hour.

Exhibit:

Warehouse: BI_WH
Size: MEDIUM
Min clusters: 1
Max clusters: 1
Auto-suspend: 60
09:00-09:05 queries: 1,180
Avg execution time: 2.1 sec
Avg queued overload time: 28.4 sec
Avg bytes scanned/query: 35 MB

Based on the exhibit, which warehouse design best matches this workload?

Options:

A. Resize BI_WH to a single X-Large warehouse.
B. Run the dashboard workload with serverless tasks.
C. Use a multi-cluster warehouse with auto-scale from 1 to 4 clusters.
D. Enable query acceleration on BI_WH.

Best answer: C

Explanation: The warehouse is spending far more time queued than executing, and each query scans relatively little data. That points to concurrency pressure from bursty dashboard traffic, so a multi-cluster warehouse is the best fit.

The key design choice is to match compute to the bottleneck. In the exhibit, average execution time is only 2.1 seconds, but average queued overload time is 28.4 seconds during a short burst window. That means the main problem is many concurrent queries competing for slots, not slow individual query processing.

Multi-cluster warehouses are best for short, bursty, highly concurrent workloads.
A larger single cluster is more appropriate when individual queries are consistently slow.
Query acceleration targets eligible scan-heavy queries, not warehouse queue buildup.

For dashboard spikes that last only a few minutes, auto-scaling clusters up and back down is the most appropriate compute design.

Single bigger cluster is less targeted because the exhibit shows queueing from burst concurrency, and the team wants to avoid oversized compute outside the burst window.
Query acceleration addresses certain scan-bound query patterns, not short queries waiting in the overload queue.
Serverless tasks are for scheduled orchestration and task execution, not interactive dashboard query serving.

Question 22

Topic: Data Sourcing, Storage, and Ingestion

A data lake team stores event data as Apache Iceberg tables in cloud object storage. Spark and Snowflake must query the same tables, the lake platform must continue to own Iceberg metadata maintenance, and the team does not want to copy data into Snowflake-managed storage. What is the best Snowflake design choice?

Options:

A. Load the files with Snowpipe into a native table
B. Create a Snowflake-managed Iceberg table and migrate the data
C. Create an externally managed Iceberg table in Snowflake
D. Create an external table over the data files

Best answer: C

Explanation: This is a storage-pattern decision, not an ingestion decision. An externally managed Iceberg table lets Snowflake query the same Iceberg data in object storage while another platform continues managing the table metadata and maintenance.

The deciding requirement is metadata ownership. When data already exists as Iceberg tables in object storage and another platform must keep managing snapshots, manifests, and table maintenance, Snowflake should attach to that table as an externally managed Iceberg table. That preserves a shared table definition across engines and avoids copying data into Snowflake-managed storage.

An external table can read files in place, but it does not use the existing Iceberg table metadata as the authoritative shared table definition. Snowflake-managed Iceberg tables keep the Iceberg table under Snowflake’s management, which conflicts with the stated ownership requirement. Snowpipe is an ingestion mechanism for loading data into Snowflake tables, so it solves the wrong problem here.

The key takeaway is to choose the storage pattern first when the real issue is where data lives and who maintains metadata.

External table mismatch reads files in place, but it does not preserve the existing Iceberg table metadata model shared with Spark.
Wrong metadata owner with a Snowflake-managed Iceberg table conflicts with the requirement that the lake platform keep control of metadata maintenance.
Ingestion instead of storage with Snowpipe moves data into a native Snowflake table, which breaks the no-copy requirement.

Question 23

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team loads staged files into RAW.ORDERS and then merges into CURATED.ORDERS with these tasks:

CREATE TASK load_raw
  WAREHOUSE = etl_wh
  SCHEDULE = '1 MINUTE'
AS COPY INTO RAW.ORDERS FROM @landing/orders;

CREATE TASK merge_curated
  WAREHOUSE = etl_wh
  SCHEDULE = '1 MINUTE'
AS MERGE INTO CURATED.ORDERS ...

load_raw finishes in 20 to 30 seconds, merge_curated in about 10 seconds, and query history shows no warehouse queuing. CURATED.ORDERS must refresh within 1 minute of a successful load, and the merge must not run if the load fails. What is the best next action?

Options:

A. Make load_raw the scheduled root and run merge_curated AFTER load_raw.
B. Keep both schedules and offset merge_curated by 45 seconds.
C. Increase etl_wh size for both tasks.
D. Run merge_curated on a separate warehouse.

Best answer: A

Explanation: This is a dependency-management problem, not a warehouse-capacity problem. A task graph with load_raw as the scheduled root and merge_curated as a child using AFTER ensures the merge starts only after a successful load and still fits the latency target.

The key issue is that the two tasks are independently scheduled, so Snowflake is free to start them separately even when one step logically depends on the other. Because runtimes are already short and query history shows no warehouse queuing, warehouse selection is not the bottleneck.

In this scenario, the right design is to use a task graph:

Keep load_raw as the scheduled root task.
Define merge_curated as a child task with AFTER load_raw.
Let the child run only when the parent completes successfully.

That directly satisfies both requirements: near-real-time sequencing and failure-aware dependency control. A larger or separate warehouse may help with queueing or long runtimes, but it does not enforce execution order or success-based orchestration.

Offset schedule is brittle because runtime can vary, so a clock offset does not guarantee success-based sequencing.
Larger warehouse does not address the real issue because the tasks already finish quickly and show no queuing.
Separate warehouse isolates compute, but it still does not prevent the merge from starting before the load succeeds.

Question 24

Topic: Data Sourcing, Storage, and Ingestion

A retail team currently ingests order files with the following Snowflake object. They are redesigning the source so the application emits one order event at a time, and the business wants the data queryable within seconds without first landing files in cloud storage.

Exhibit:

create or replace pipe orders_pipe
  auto_ingest = true
  as
  copy into raw.orders
  from @landing/orders/
  file_format = (type = json);

Which is the best interpretation of the new requirement?

Options:

A. Keep the pipe because AUTO_INGEST already streams events
B. Run COPY INTO every minute with a task
C. Use an external table with auto-refresh
D. Switch to Snowpipe Streaming for direct event ingestion

Best answer: D

Explanation: AUTO_INGEST on a pipe means Snowpipe continuously loads new files that arrive in a stage. The new requirement removes staged files and asks for per-event data available within seconds, which points to Snowpipe Streaming.

The key distinction is the ingestion unit and the latency target. A PIPE with AUTO_INGEST and COPY INTO represents continuous ingestion of staged files: Snowflake reacts when new files land and loads them into a table. That is different from a streaming design where an application sends individual records directly and expects very low latency without writing files first.

For that new pattern, Snowpipe Streaming is the native Snowflake fit. A task that runs COPY INTO every minute is still batch ingestion on a schedule, and it still depends on files being available to load. An external table can expose staged files for querying, but it does not replace direct row or event ingestion into a Snowflake table. The closest distractor is keeping the existing pipe, but AUTO_INGEST still depends on files arriving in the stage.

Keep the pipe fails because AUTO_INGEST watches for new files in a stage; it does not ingest application events directly.
Scheduled COPY INTO is still batch-oriented and still assumes files exist to be copied.
External table refresh exposes files in place for querying; it does not provide direct low-latency event ingestion into a table.

Question 25

Topic: Data Sourcing, Storage, and Ingestion

A vendor delivers small CSV files to an external stage at irregular times. Snowflake currently loads them with a scheduled task:

CREATE TASK load_vendor_files
 SCHEDULE = '15 MINUTE'
AS
COPY INTO raw.vendor_events
FROM @vendor_stage;

Dashboards are often 10-15 minutes behind. The SLA is data available in under 1 minute after each file arrives, and the team wants the most Snowflake-native, low-operations approach while keeping file-based delivery. What is the best next step?

Options:

A. Configure Snowpipe auto-ingest on the stage for event-driven file loading.
B. Create an external table with AUTO_REFRESH and query the staged files.
C. Increase task frequency and warehouse size for scheduled COPY INTO loads.
D. Rebuild the feed to use Snowpipe Streaming for direct row ingestion.

Best answer: A

Explanation: Snowpipe is designed for continuous, file-based ingestion when files land in a stage and low operational overhead is required. It fits the under-1-minute target better than a polling task because it loads on file arrival instead of waiting for the next schedule.

The deciding concept is matching the ingestion mechanism to the delivery model and latency target. The source is still delivering files, so the best Snowflake-native fix is to move from scheduled COPY INTO polling to Snowpipe auto-ingest, which uses event-driven loading when new files arrive in the stage.

This fits the stated requirements:

File-based delivery remains unchanged.
Low latency improves because loading starts on arrival, not on a 15-minute schedule.
Low operations is achieved because Snowpipe is managed by Snowflake and avoids custom orchestration loops.

Snowpipe Streaming is for direct row ingestion from applications or streaming clients, not for a vendor that is already sending files. The key takeaway is that continuous file arrival plus a near-real-time SLA usually points to Snowpipe, not scheduled tasks.

Direct row ingestion is tempting, but Snowpipe Streaming is the wrong fit when the upstream system is delivering files rather than records.
Query files in place does not solve the ingestion requirement; external tables expose staged files but do not load them into the target table.
Faster polling can reduce lag, but it still relies on scheduled warehouse-backed COPY INTO execution instead of event-driven file ingestion.

Questions 26-50

Question 26

Topic: Data Sourcing, Storage, and Ingestion

A team uses Snowpipe to auto-ingest partner CSV files into a raw table. After a vendor release, ingestion lag grows even though file sizes and arrival rate are unchanged. COPY_HISTORY shows repeated parse errors:

unexpected end of record
found '\n' while expecting record delimiter

The partner confirms a comments field now contains embedded line breaks inside double quotes. What is the best next step?

Options:

A. Switch to Snowpipe Streaming for lower-latency ingestion.
B. Update the CSV file format for quoted multi-line fields, then reload failed files.
C. Enable schema evolution on the target table for new columns.
D. Increase the warehouse size used by downstream transformations.

Best answer: B

Explanation: This is a parsing problem that looks like a latency problem. The files are arriving, but Snowflake cannot correctly identify CSV record boundaries after the source started sending quoted fields with embedded line breaks.

The key clue is the parse error text, not the lag symptom. Snowpipe is receiving files, but the load is failing because the CSV shape changed: a field now contains embedded newlines inside quotes. That means the file format must correctly handle quoted multi-line fields before ingestion can resume normally.

In this situation, the right action is to update the CSV file format settings to match the source layout, then replay the failed files. Schema evolution is for column changes, not broken row parsing. More compute also does not help, because the problem occurs while Snowflake is interpreting the file contents. The main takeaway is to check source-data shape and file-format assumptions before treating backlog as a throughput issue.

Streaming mismatch changes the ingest method, but it does not solve incorrect parsing rules for the incoming file structure.
More warehouse compute can help downstream query pressure, but Snowpipe parse failures happen before transformation workloads matter.
Schema evolution confusion applies to added or changed columns, not to embedded newlines that break CSV row boundaries.

Question 27

Topic: Transformations, Programmability, and Developer Workflows

An analytics team ingests JSON order events into raw Snowflake tables with Snowpipe every minute. The next step deduplicates records, flattens VARIANT, joins to dimension tables already in Snowflake, and upserts a curated fact table by using window functions and MERGE. The curated table must refresh within 5 minutes, keep existing masking policies and access history, and use isolated compute so BI queries are not affected. Which design is the BEST fit?

Options:

A. Replace the pipeline with external tables over files and expose secure views to consumers.
B. Unload data each minute to cloud storage, transform it in Spark, then reload the curated table.
C. Send rows to a remote service through external functions for cleansing before loading the curated table.
D. Use streams and a task graph to run SQL/Snowflake Scripting MERGE logic in Snowflake on a dedicated warehouse.

Best answer: D

Explanation: Because the source data, reference tables, and transformation logic are already well served by Snowflake SQL, the pipeline should stay in Snowflake. Streams and tasks with Snowflake Scripting meet the 5-minute SLA, isolate compute with a dedicated warehouse, and avoid needless unload/reload steps.

This scenario fits Snowflake-native transformation processing: data already lands in Snowflake, the required work is relational and semi-structured SQL, and governance must stay close to the curated table. Exporting data to Spark or a remote service would add data movement, orchestration, failure points, and reload delay without addressing a capability gap.

Create a stream on the raw table to capture new rows.
Run a task graph on a dedicated warehouse for compute isolation.
Use SQL or Snowflake Scripting to flatten, deduplicate, and MERGE into the curated table.

That design keeps latency low while preserving Snowflake-native governance and observability. External engines are better only when there is a proven requirement Snowflake SQL or Snowflake Scripting cannot reasonably meet.

Unloading to Spark solves no stated limitation and adds transfer, orchestration, and reload latency.
External tables are for querying files in object storage, not for maintaining a governed curated table with in-database upserts.
External functions are intended for remote service calls, not as the primary engine for routine dedupe, joins, and MERGE processing.

Question 28

Topic: Transformations, Programmability, and Developer Workflows

A data engineering team receives newline-delimited JSON order events and related warranty PDFs in cloud object storage. Requirements: JSON attributes must be queryable within 5 minutes, PDF files must remain governed and accessible from Snowflake, and the team wants to avoid a separate Spark pipeline. Which design is the BEST way to preserve Snowflake-native processing for both content types?

Options:

A. Flatten JSON into relational columns at ingest and store PDF binary content in tables.
B. Use object-store events to trigger Spark parsing, then load curated rows into Snowflake.
C. Create external tables over JSON and PDF files, then query both directly with SQL.
D. Snowpipe JSON into VARIANT; stage PDFs with a directory table; process with SQL/Snowpark.

Best answer: D

Explanation: The best fit is to use Snowpipe for continuous JSON ingestion into VARIANT and keep PDFs as unstructured files on a stage with a directory table. That preserves Snowflake-native processing, meets the low-latency query requirement, and avoids introducing an external compute platform.

For this scenario, Snowflake-native processing means using the platform’s built-in patterns for each data type instead of pushing parsing and transformation outside Snowflake. Semi-structured JSON is best ingested continuously with Snowpipe into a table that includes a VARIANT column, so attributes can be queried quickly and flexibly. Unstructured files such as PDFs should remain on a Snowflake stage, where a directory table can expose file metadata and Snowpark can be used when file-aware processing is needed.

This approach fits the stated requirements:

low-latency ingestion for JSON
governed access to PDF files from Snowflake
no separate Spark pipeline to operate or recover

The closest distractor is the external-table approach, which is appropriate for structured or semi-structured lake data, but not for treating PDFs as directly queryable tabular content.

External tables misuse fails because PDFs are unstructured files, not native tabular rows to query through an external table.
Early relational flattening removes JSON flexibility and storing PDF bytes in tables is not the normal Snowflake pattern for unstructured content.
Spark offload may work technically, but it breaks the requirement to keep processing Snowflake-native and adds unnecessary operational complexity.

Question 29

Topic: Data Sourcing, Storage, and Ingestion

A data engineer uses the following design for files in an external cloud location:

CREATE OR REPLACE STAGE doc_stage
  URL='s3://acme-legal/contracts/'
  DIRECTORY=(ENABLE=TRUE);

CREATE OR REPLACE EXTERNAL TABLE ext_contracts (...)
  WITH LOCATION=@doc_stage
  AUTO_REFRESH=TRUE
  FILE_FORMAT=(TYPE=PARQUET);

The location will stop receiving Parquet files and will instead receive scanned PDF contracts. Analysts only need file inventory in Snowflake before downstream document processing. What is the best next design choice?

Options:

A. Load the PDFs with COPY INTO and column matching.
B. Switch the external table to TYPE=JSON for the PDFs.
C. Keep the external table and add search optimization.
D. Use the stage directory table for metadata; process PDFs separately.

Best answer: D

Explanation: The key change is that Parquet is a queryable file format, while scanned PDFs are unstructured files. For an inventory-only requirement, keep the files in a stage and use the directory table for metadata instead of an external table.

External tables are designed for externally stored structured or semi-structured files whose contents can be projected into columns. When the source changes from Parquet to scanned PDFs, the design choice changes because PDFs are unstructured data. Changing the file format or trying to load them with tabular load options does not make the document contents queryable as rows.

Because the requirement is only to inventory files inside Snowflake, the existing stage with DIRECTORY=(ENABLE=TRUE) is the correct anchor. Query the stage’s directory table for metadata such as file path, size, and last modified time, and use a separate unstructured-data processing step later if document content must be extracted.

The closest distractor is switching file format, but file formats do not convert binary documents into VARIANT records.

Switch file format fails because file formats parse supported structured or semi-structured encodings, not scanned PDFs.
Search optimization is a performance feature and does not solve the mismatch between external tables and unstructured files.
COPY INTO with column matching applies to tabular file loads, not binary document ingestion.

Question 30

Topic: Data Sourcing, Storage, and Ingestion

An engineering team started with the following definition for lake data already used by Spark:

CREATE EXTERNAL TABLE lake_orders (
  order_id NUMBER,
  order_ts TIMESTAMP_NTZ
)
LOCATION=@orders_stage
FILE_FORMAT=(TYPE=PARQUET);

The same object storage path is already maintained as an Apache Iceberg table by another platform, and both Snowflake and that platform must read the same table metadata without copying the data. What is the best next step?

Options:

A. Create a dynamic table on top of the external table
B. Create an externally managed Iceberg table
C. Load the files into a Snowflake native table
D. Keep the external table and enable auto-refresh

Best answer: B

Explanation: This scenario is driven by open table format interoperability, not just reading Parquet files. Because another platform already maintains Apache Iceberg metadata and Snowflake must use that same metadata without copying data, an externally managed Iceberg table is the best fit.

Iceberg-related table patterns fit when the requirement is to preserve an open table format and share table semantics across engines. Here, the lake path is already managed as an Apache Iceberg table by another platform, and Snowflake must read that same table metadata in place. That points to an externally managed Iceberg table.

An external table can expose files in object storage, but it is still a file-oriented pattern and does not make Snowflake participate in the shared Iceberg metadata layer. Loading the data into a native table breaks the no-copy requirement, and a dynamic table solves transformation refresh, not cross-engine table interoperability.

When the deciding constraint is shared open-format metadata, choose an Iceberg table pattern instead of a basic external table.

Auto-refresh is insufficient because external tables can track file changes, but they do not use shared Iceberg table metadata.
Loading copies data into Snowflake storage, which violates the requirement to keep the data in place.
Wrong abstraction because a dynamic table manages derived results and refresh logic, not open table format interoperability.

Question 31

Topic: Data Sourcing, Storage, and Ingestion

A retail team lands small CSV files in an external stage at unpredictable times throughout the day. Operators currently run this command whenever files arrive:

COPY INTO raw.orders
FROM @landing/orders/
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1);

They want each new file loaded automatically, without manual reruns or scheduled polling. What is the best next step?

Options:

A. Schedule a task to rerun COPY INTO every 5 minutes.
B. Create a Snowpipe with auto-ingest for the stage.
C. Use Snowpipe Streaming to load the staged CSV files.
D. Create a stream on raw.orders to detect new files.

Best answer: B

Explanation: Snowpipe is the Snowflake-native choice when files arrive incrementally in a stage and should load with minimal operational effort. The team already has a working COPY INTO pattern, so turning that pattern into a pipe is the best fit.

Snowpipe is Snowflake’s continuous file ingestion service for data that lands in an internal or external stage. It is the right choice when files arrive one by one or in small batches over time and the goal is to avoid manually rerunning COPY INTO.

In this scenario, the files are already staged, the load statement already exists, and the team explicitly wants automatic loading without a polling schedule. A Snowpipe can use the same loading logic and ingest only newly arrived files as they appear.

A scheduled task can automate repeated execution, but Snowpipe is the purpose-built feature for incremental staged-file ingestion.

Scheduled polling can automate the load, but it is still a polling pattern rather than the native staged-file auto-ingestion service.
Using a stream is incorrect because streams track row changes in tables, not file arrivals in a stage.
Using Snowpipe Streaming is incorrect because it is meant for row-based streaming ingestion through an API, not for files already landing in a stage.

Question 32

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

raw.customer_landing already receives inserts, updates, and deletes from an upstream application every minute. A downstream curated.customer_current table must reflect those changes within 2 minutes, and the pipeline should process only rows changed since the last successful run. The team does not want to manage custom watermark logic or rescan the full landing table. Which design is the BEST fit?

Options:

A. Create a Snowpipe on the source stage and load the same files again into curated.customer_current.
B. Replace the target with a dynamic table and rely on refreshes as the CDC consumption mechanism.
C. Track the highest processed ingest_ts in a control table and run a scheduled MERGE from raw.customer_landing.
D. Create a standard stream on raw.customer_landing and a triggered task to MERGE its changes into curated.customer_current.

Best answer: D

Explanation: Because the data has already landed in Snowflake, this is a downstream CDC consumption problem, not a raw ingestion problem. A standard stream plus triggered task processes only new row changes, supports inserts/updates/deletes, and avoids custom watermark tracking or full rescans.

When source data is already in a Snowflake table, the best CDC pattern is a stream plus task rather than another ingestion mechanism. A standard stream on the landing table records row-level changes since the last consumption point, including the information needed to propagate updates and deletes. A triggered task can then MERGE those changes into the curated table quickly, which matches the low-latency requirement.

This design fits the stated needs:

process only new changes after landing
avoid custom watermark tables
resume cleanly after failures using stream offsets
apply row-level CDC into the target table

Re-ingesting files with Snowpipe solves file loading, not post-landing CDC, and timestamp-based scans push state management and edge cases back onto the engineering team.

Snowpipe misuse re-reads staged files, but the requirement starts after rows are already in a Snowflake table.
Manual watermarking can be made to work, but it adds operational state, late-arrival edge cases, and possible rescans.
Dynamic table mismatch maintains a derived result set, but it is not the clearest mechanism for consuming row-level CDC records from a landing table.

Question 33

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team needs new application events queryable in Snowflake within 2 minutes. Reliability is the higher priority: raw events must stay replayable if downstream transforms fail. Which architecture best fits?

Options:

A. Query the landing files through an external table only.
B. Stage micro-batch files, load with Snowpipe, then use streams and tasks.
C. Ingest rows with Snowpipe Streaming directly into target tables.
D. Build a dynamic table over an external table on the files.

Best answer: B

Explanation: Snowpipe is the best match when the latency target is a minute or two and reliability is the priority. Landing files in a stage keeps a durable replay point, and Snowflake can still load them continuously enough to meet the stated near real-time goal.

The core concept is choosing file-based continuous ingestion when you need both near real-time delivery and strong recoverability. Snowpipe loads new staged files automatically into a raw table, so the original files remain available for replay, audit, or backfill if a later transformation fails.

A strong pattern is:

land small files in a stage
let Snowpipe load a raw table continuously
use a stream to capture new rows
use tasks for incremental transforms

Snowpipe Streaming is designed for lower-latency row ingestion, but the stem prioritizes preserving a durable raw replay point over minimizing latency to seconds. External tables and dynamic tables can be useful, but they do not replace this simple, reliable raw-ingestion pattern.

Direct row ingest meets lower-latency goals, but it does not preserve a file-based replay point as cleanly as staged-file loading.
External table only exposes files in place, but it does not provide the same raw-table ingestion pattern for downstream recovery and processing.
Dynamic table on files helps maintain derived results, but it is not the primary mechanism for reliable raw ingestion and replay.

Question 34

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A team needs to process only changed rows from a staging table, but its main requirement is to control exactly when each step runs and in what order validation, enrichment, and load logic execute. Which Snowflake approach best fits this need?

Options:

A. Materialized views
B. Snowpipe Streaming
C. Streams with tasks
D. Dynamic tables

Best answer: C

Explanation: This requirement is about explicit orchestration, not just incremental refresh. Streams capture change data, and tasks let engineers decide the schedule, dependencies, and execution order of downstream processing.

Use streams and tasks when the pipeline must react to changed rows and engineers need direct control over execution timing and sequencing. A stream records inserts, updates, and deletes since the last consumption point, which makes it suitable for CDC-style processing. Tasks then run SQL or procedure logic on a defined schedule or in a task graph, so validation, enrichment, and load steps can be orchestrated in an exact order.

Dynamic tables are better when the goal is declarative, automatically maintained transformations with Snowflake managing refresh behavior. That is different from a requirement to explicitly control when each step runs. The key distinction is automatic refresh versus engineer-controlled execution.

Dynamic refresh fits incremental transformation, but Snowflake manages refresh behavior rather than giving precise step-by-step execution control.
View maintenance keeps query results current, but materialized views are not a general orchestration mechanism for multi-step pipelines.
Ingestion only reduces ingest latency, but Snowpipe Streaming does not by itself coordinate downstream validation and transformation order.

Question 35

Topic: Sharing, Replication, and Cross-Platform Delivery

A data engineering team must keep a production pipeline available if its primary Snowflake account becomes unavailable. The secondary account is in a different cloud region, and the requirement explicitly includes controlled failover and failback.

Exhibit:

ALTER DATABASE prod_etl
  ENABLE REPLICATION TO ACCOUNTS orgname.dr_account;

Based on the exhibit, what is the best next step?

Options:

A. Continue database replication and promote the secondary database if needed
B. Use Secure Data Sharing from the primary account to the DR account
C. Create a failover group for prod_etl and supported account objects
D. Publish prod_etl through a listing with auto-fulfillment

Best answer: C

Explanation: The exhibit shows database replication for a single database. Because the requirement is business continuity across regions or clouds with controlled failover and failback, the better Snowflake mechanism is a failover group, which is built for that outcome.

ALTER DATABASE ... ENABLE REPLICATION sets up replication for that database, but the stem asks for a continuity solution, not just a replicated copy. In Snowflake, failover groups are the mechanism intended for cross-region or cross-cloud business continuity because they support controlled failover and failback and can include the replicated database plus supported account objects needed by the workload.

A good way to separate the concepts is:

Database replication copies database state.
Failover groups add continuity semantics for outage recovery.
Sharing and listings distribute data, not disaster-recovery control.

The closest alternative is continuing with database replication alone, but that does not provide the full continuity model requested in the scenario.

Database only continuing the current database replication setup does not provide the broader failover/failback mechanism required for continuity.
Sharing instead of recovery Secure Data Sharing exposes data to consumers, but it is not a disaster recovery or failover feature.
Distribution instead of continuity listings with auto-fulfillment help publish data across regions or clouds, not recover a production pipeline environment after an outage.

Question 36

Topic: Compute, Governance, Observability, and Performance

A data engineering team runs a 5-minute CDC task graph and interactive dashboards on the same warehouse. From 8:00 to 10:00, CDC tasks miss SLA and many dashboard queries show high queued_overload_time. Once queries start, execution time is normal and there is little spill. The dashboard pattern is hundreds of short concurrent queries; the CDC pattern is a small number of MERGE statements that must run predictably every 5 minutes, even if dashboard traffic doubles. Which compute design is the best next step?

Options:

A. Enable query acceleration service on the shared warehouse
B. Replace the CDC task graph with dynamic tables on the same warehouse
C. Split workloads across warehouses and make the dashboard warehouse multi-cluster
D. Increase the shared warehouse size and keep all workloads together

Best answer: C

Explanation: The symptom points to compute contention, not slow query execution. A dedicated warehouse for CDC protects predictable task runtimes, while a multi-cluster warehouse is the better fit for many short concurrent dashboard queries.

queued_overload_time with normal execution after start usually means the main problem is warehouse concurrency pressure. Here, two very different workloads are sharing compute: bursty BI queries and time-sensitive CDC MERGE statements. The best design is to isolate them on separate warehouses so the CDC pipeline keeps a stable SLA, then configure the dashboard warehouse as multi-cluster because that pattern benefits from scaling out for many simultaneous short queries.

A larger single warehouse may help somewhat, but it still mixes workloads and does not give the CDC process predictable isolation if dashboard demand grows. The key design choice is matching scale-out concurrency handling to BI while reserving dedicated compute for pipeline reliability.

Single bigger warehouse may reduce pressure, but it still lets dashboard bursts interfere with CDC runtimes.
Query acceleration targets eligible query execution, not the start-time queuing shown by queued_overload_time.
Dynamic tables change refresh orchestration, but they do not solve contention if both workloads still share the same warehouse.

Question 37

Topic: Sharing, Replication, and Cross-Platform Delivery

A data engineering team wants data to remain in cloud object storage in an open table format, while Snowflake and non-Snowflake engines query the same table metadata. Which Snowflake architectural pattern best fits this requirement?

Options:

A. Iceberg tables
B. Materialized views
C. Dynamic tables
D. External tables

Best answer: A

Explanation: Iceberg tables are Snowflake’s architectural pattern for open-format tables stored in object storage and used across engines. The key requirement is shared table metadata and interoperability, not simply querying files or enabling a refresh feature.

The core concept is choosing an interoperable table architecture. Iceberg tables are designed for data that remains in object storage while being represented as a table through open metadata and snapshots that compatible engines can understand. That makes them the right fit when the requirement is cross-platform table access, not just reading external files or optimizing Snowflake-only queries.

Use Iceberg tables when you need an open table format and shared table semantics across engines.
Use external tables when you need to query files in a stage without loading them into native Snowflake tables.
Use dynamic tables when you need automated transformation refresh.
Use materialized views when you need precomputed performance optimization on existing tables.

The closest distractor is external tables, but they query file-based data rather than provide the same open table architecture.

External files only queries staged files, but it does not provide the shared open-table metadata model required here.
Transformation refresh with dynamic tables solves pipeline maintenance, not cross-engine table interoperability.
Query acceleration pattern with materialized views improves performance on base tables, not open lakehouse-style table sharing.

Question 38

Topic: Transformations, Programmability, and Developer Workflows

A task scheduled every 5 minutes now misses its SLA. It calls a Python stored procedure that reads rows from raw_events, loops through them, and performs only LATERAL FLATTEN, type casts, joins, ROW_NUMBER() deduplication, and a final MERGE into a curated table. What is the best next step?

Options:

A. Move the transformation into external functions that call a REST service
B. Rewrite the transformation as set-based Snowflake SQL run directly by the task
C. Increase the task warehouse size and keep the Python procedure unchanged
D. Split the procedure into per-row UDF calls for each shaping step

Best answer: B

Explanation: The transformation described is a natural fit for Snowflake SQL: flatten semi-structured data, join, deduplicate with window functions, and upsert with MERGE. Rewriting it as set-based SQL is the most direct fix because it removes Python handler and row-by-row overhead instead of adding more layers.

When the required shaping is already expressible with native Snowflake SQL constructs, SQL is usually the best transformation choice. In this scenario, LATERAL FLATTEN, casts, joins, ROW_NUMBER() with QUALIFY, and MERGE are all set-based operations that Snowflake can optimize directly. A Python stored procedure adds runtime startup and often encourages row-wise logic, which is a poor fit for this workload and can cause missed task SLAs.

A better design is to keep orchestration in the task and move the transformation into SQL, for example:

expand JSON with LATERAL FLATTEN
deduplicate with window functions
upsert with a set-based MERGE

Adding more compute may hide the symptom temporarily, but it does not fix the implementation mismatch.

More compute only may shorten runtime somewhat, but it does not remove the overhead of a row-oriented Python implementation.
External service call adds unnecessary complexity because the transformation uses native Snowflake SQL features, not external business logic.
Per-row UDF refactor keeps the workload row-based, which is the opposite of the set-based pattern Snowflake optimizes best.

Question 39

Topic: Transformations, Programmability, and Developer Workflows

A retailer already loads JSON order documents into a raw Snowflake table every minute. Each document contains nested line_items and discounts arrays. Analysts need item-level relational tables within 5 minutes, and the original document must remain available for replay if transformation logic changes. Which Snowflake design is the best fit?

Options:

A. Create an external table over the staged JSON files
B. Store raw rows in VARIANT and use LATERAL FLATTEN downstream
C. Load each document into VARCHAR columns for string parsing
D. Treat the payloads as unstructured files with a directory table

Best answer: B

Explanation: The main concern is nested semi-structured transformation, not file discovery or unstructured file processing. Keeping the raw JSON in VARIANT and flattening arrays downstream preserves replayability and efficiently produces relational rows from nested elements.

When the source is JSON documents with nested arrays, the primary Snowflake concern is semi-structured transformation. VARIANT keeps the original payload intact for replay or reprocessing, and LATERAL FLATTEN turns arrays such as line_items into one row per element for relational analytics. That directly matches the need for item-level tables while retaining the original document.

Keep the raw JSON in a landing table as VARIANT.
Extract scalar values with path notation such as payload:order_id.
Use LATERAL FLATTEN on nested arrays to create item-level rows.

File-oriented features are better for staged-file access or unstructured file inventories, not for the core task of transforming nested JSON already loaded into Snowflake.

The external table option is file-centric and is not the best choice when the data is already loaded and needs repeated relational transforms.
The directory table option targets unstructured file management, which does not solve nested JSON array expansion.
The string-parsing option bypasses Snowflake’s native semi-structured capabilities and makes maintenance and replay harder.

Question 40

Topic: Compute, Governance, Observability, and Performance

A data engineering team uses Snowpipe Streaming to load customer interactions into a curated table with sub-minute latency. Support engineers need full email and phone values, but marketing analysts must query the same table and see those fields only in masked form. The team wants centralized enforcement without copying data or redesigning downstream tasks. Which Snowflake feature is the BEST fit?

Options:

A. Use a secure view that removes the sensitive columns
B. Apply a row access policy to the table
C. Apply masking policies to the sensitive columns
D. Create a dynamic table with pre-masked data

Best answer: C

Explanation: Masking policies are Snowflake’s column-level governance control for exposing sensitive data differently by role at query time. They let authorized users see clear values while other users see masked values from the same base table, with no extra copy of the data.

The core concept is column-level controlled exposure. A masking policy is designed for sensitive fields such as email and phone, where some roles need full values and others should see masked results from the same table. Because the policy is enforced at query time on the existing columns, the Snowpipe Streaming ingestion pattern, table design, and downstream task flow do not need to change. This gives centralized governance with one governed copy of the data.

A row access policy governs which rows are visible, not how individual column values are obscured. A secure view or a separate dynamic table could work around the requirement, but both add extra objects and access patterns when Snowflake already provides a native policy feature for this exact use case.

The key takeaway is to use masking policies when the requirement is selective exposure of sensitive column values.

Row filtering changes row visibility, not whether email or phone values are masked.
Secure view removal hides columns entirely and adds a separate access object instead of governing the source columns directly.
Pre-masked copy adds unnecessary pipeline complexity and data duplication for a requirement that is policy-driven.

Question 41

Topic: Data Sourcing, Storage, and Ingestion

Which Snowflake feature is built for low-latency row ingestion from a client application directly into a table, rather than detecting or processing changes only after the data is already stored in Snowflake?

Options:

A. Task
B. Dynamic table
C. Stream on a table
D. Snowpipe Streaming

Best answer: D

Explanation: Snowpipe Streaming is the ingestion mechanism in this set. It lets applications push rows into Snowflake tables with very low latency, while the other choices act only after data has already landed and then track, transform, or orchestrate work on it.

The key distinction is ingestion versus post-ingestion change processing. Snowpipe Streaming is designed to move rows from an application or connector into a Snowflake table with low latency, without depending on staged files as the primary pattern. By contrast, a stream records change data for rows that already exist in a Snowflake object, a task runs scheduled or triggered SQL after data is present, and a dynamic table maintains a derived result from source objects already in Snowflake. If the requirement is to get events into Snowflake first, choose the ingestion feature. If the requirement begins only after rows have landed, then streams, tasks, or dynamic tables are the relevant tools.

CDC confusion A stream captures changes from data already stored in Snowflake; it does not ingest rows from an application.
Refresh confusion A dynamic table maintains transformed results from existing source data, so it is downstream of ingestion.
Orchestration confusion A task schedules or triggers processing logic after data lands, but it is not a streaming ingestion interface.

Question 42

Topic: Sharing, Replication, and Cross-Platform Delivery

Which Snowflake feature is intended for cross-region disaster recovery by replicating eligible databases together with supported account objects and allowing promotion of the secondary environment during an outage?

Options:

A. Failover group
B. Reader account
C. Secure data sharing
D. Replication group

Best answer: A

Explanation: A failover group is the Snowflake construct designed for recovery-oriented replication when you need both replicated objects and the ability to fail over. It covers eligible databases plus supported account objects and lets the secondary environment be promoted during an outage.

The key concept is the difference between replication-only designs and failover-capable designs. A failover group is used for business continuity and disaster recovery across regions or clouds when the secondary environment must be promotable. It replicates eligible databases together with supported account objects, then allows failover when the primary environment is unavailable.

A replication group is similar in scope for replicated objects, but it does not provide the failover behavior the requirement explicitly calls for. Secure data sharing is for live data access, not for creating a recovery copy you can promote. A reader account helps consumers access shared data without their own Snowflake account, but it is not a disaster recovery mechanism.

If the requirement includes both replication and promotion during an outage, use a failover group.

Replication only is tempting because it moves objects to another region, but it does not satisfy the stated need to promote the secondary during an outage.
Live sharing sounds useful for regional access, but it does not create a recovery-oriented secondary environment for failover.
Consumer access is meant for sharing data with users who do not have a Snowflake account, not for cross-region disaster recovery.

Question 43

Topic: Data Sourcing, Storage, and Ingestion

A team loads order files from an external stage. New JSON files land every 1–2 minutes, analysts need them queryable within 5 minutes, and the team wants to stop rerunning loads manually when urgent files arrive.

CREATE OR REPLACE TASK load_orders_task
  WAREHOUSE = ingest_wh
  SCHEDULE = '60 MINUTE'
AS
COPY INTO raw.orders
FROM @orders_ext_stage
FILE_FORMAT = (TYPE = JSON);

Which next step best fits the requirement?

Options:

A. Use Snowpipe Streaming for the files
B. Increase the size of ingest_wh
C. Replace the task with Snowpipe auto-ingest
D. Create a stream on raw.orders

Best answer: C

Explanation: The shown design is still an hourly batch pattern, so it cannot reliably meet a 5-minute availability target. For files arriving continuously in an external stage, Snowpipe auto-ingest is the Snowflake-native fit because it loads new files as they land.

This is a file-based continuous-ingestion requirement, not just a faster batch job. The task runs COPY INTO only once every 60 minutes, so even if the warehouse is powerful, data can still wait nearly an hour before loading. That makes the pattern both too slow for the SLA and likely to trigger manual reruns when urgent files arrive.

For files landing in an external stage every 1–2 minutes, Snowpipe auto-ingest is the appropriate design because it uses event-driven loading for newly arrived files. Streams help only after data is already in a table, and Snowpipe Streaming is intended for pushing records directly from applications rather than reading staged files.

The key takeaway is to use event-driven file ingestion when latency requirements are tighter than the batch schedule.

Bigger warehouse reduces COPY INTO runtime, but it does not remove the 60-minute schedule delay.
Add a stream tracks table changes after ingestion; it does not load files from a stage.
Use Snowpipe Streaming is a mismatch for staged file ingestion because it is designed for direct row-level streaming from apps or services.

Question 44

Topic: Transformations, Programmability, and Developer Workflows

Which requirement is the clearest fit for Snowflake AI text-processing functions in a data engineering pipeline, rather than a generic analytics or data-science workload?

Options:

A. Enable search optimization to speed selective filters on case IDs.
B. Add sentiment and short-summary columns to each incoming case note for downstream routing.
C. Build a dynamic table of daily case counts by product and region.
D. Train a model to forecast weekly case volume from historical patterns.

Best answer: B

Explanation: Pipeline enrichment uses AI text functions to turn unstructured text into structured attributes during ingestion or transformation. Adding sentiment and summary columns to incoming case notes is row-level text enrichment, while the other choices are reporting, forecasting, or performance tuning tasks.

The key distinction is whether the workload extracts new structured information from text as part of the pipeline. Adding sentiment and a summary to each case note is row-level enrichment: the pipeline writes derived text attributes back into Snowflake so downstream SQL, tasks, or dynamic tables can use them immediately. That differs from summarizing already-structured metrics, training a predictive model from historical data, or tuning query performance.

If the goal is to label, summarize, translate, or extract from free text in-flight, it is a text-processing enrichment use case.
If the goal is trend reporting, forecasting, or faster query execution, it belongs to analytics, data science, or performance engineering.

The closest distractor is the dynamic-table aggregate, which is still analytics over structured results rather than text enrichment.

Dynamic-table aggregate is a reporting pattern that maintains summarized metrics, not row-by-row text enrichment.
Forecasting model is a data-science use case focused on prediction from historical patterns.
Search optimization improves selective query performance and does not process natural-language content.

Question 45

Topic: Sharing, Replication, and Cross-Platform Delivery

A company curates sales data in Snowflake and must deliver it to two external consumers: a distributor that already uses Snowflake in another cloud, and a reseller that does not have a Snowflake account. Requirements: publish one governed business model, enforce consumer-specific row filtering and PII masking, and avoid building separate ETL copies for each recipient. Which design is the BEST fit?

Options:

A. Publish a curated share layer with secure views and policies, using a private listing with auto-fulfillment for the distributor and a reader account for the reseller.
B. Replicate raw databases to each consumer environment and let each consumer model and secure its own data.
C. Unload curated tables to object storage and require both consumers to ingest them with Snowpipe.
D. Share base tables directly and depend on tags and separate warehouses for consumer governance.

Best answer: A

Explanation: This is a data-product architecture decision, not a single-feature setting. The best pattern is to model a curated share layer once, enforce security in Snowflake with secure views and policies, and then choose the sharing mechanism that matches each consumer type.

When requirements combine reusable business modeling, per-consumer governance, and external delivery, the right answer is usually an architectural pattern: create a provider-owned curated layer and expose only governed projections of it. In Snowflake, secure views are a common boundary for publishing a stable business model, while row access and masking policies enforce consumer-specific restrictions.

For a partner that already has Snowflake in another cloud, a private listing with auto-fulfillment fits cross-cloud delivery. For a recipient without Snowflake, a reader account avoids building a separate export pipeline. Replicating raw data or unloading files pushes modeling and governance downstream, creating duplication and reducing control. The key takeaway is to separate internal storage from the governed share layer.

Replicating raw databases moves data, but it pushes business modeling and governance to consumers instead of enforcing one governed model centrally.
Unloading files and reingesting with Snowpipe creates extra copies and pipeline maintenance that the scenario explicitly wants to avoid.
Sharing base tables with tags does not enforce row filtering or masking by itself; governed views and policies are the enforcement layer.

Question 46

Topic: Transformations, Programmability, and Developer Workflows

A data engineering team ingests customer chat transcripts into RAW_CHAT every minute. They need a summary for each transcript within 5 minutes, raw text must stay inside Snowflake-governed processing, and they want to extend an existing stream/task pipeline without adding new infrastructure. Which approach is BEST?

Options:

A. Use a stream and task with a Snowflake Cortex AISQL summarization function.
B. Use an external function to send transcript text to a third-party LLM.
C. Run a custom summarizer in Snowpark Container Services for new rows.
D. Enable search optimization on RAW_CHAT and summarize in dashboards.

Best answer: A

Explanation: A stream plus task with Snowflake Cortex keeps summarization where the data already lives and stays within Snowflake governance. It also matches the team’s existing pipeline pattern and meets the near-real-time requirement without adding external services or heavier custom infrastructure.

The key design choice is to use Snowflake-native AI processing directly in the data pipeline. A stream tracks newly ingested transcript rows, and a task can call a Snowflake Cortex AISQL summarization function to write results into a downstream table on a frequent schedule. That keeps raw text inside Snowflake’s governance boundary, supports incremental processing, and avoids operational overhead from managing separate services or exporting data.

This is a better fit than sending text to an external model endpoint, which breaks the boundary requirement, or using search optimization, which improves lookup performance but does not create summaries. A custom containerized summarizer could work, but it is unnecessarily complex when built-in summarization already satisfies the stated needs.

External API fails because transcript text would be sent outside the Snowflake processing boundary.
Containerized model is over-engineered here because built-in summarization already fits the requirement.
Search optimization helps query performance, but it does not generate or persist transcript summaries.

Question 47

Topic: Transformations, Programmability, and Developer Workflows

A team stores customer case notes in Snowflake and needs row-level summaries generated as part of a SQL pipeline. The design goal is to keep text processing near the data and avoid sending the text to an external service. Which Snowflake-native approach best fits?

Options:

A. An external function that calls a third-party LLM
B. A Snowflake Cortex AI summarization function in SQL
C. Snowpipe Streaming into a summary table
D. A dynamic table that refreshes from the notes table

Best answer: B

Explanation: The best fit is a Snowflake Cortex AI summarization function used directly in SQL. It keeps the text and processing inside Snowflake instead of exporting data to an outside service.

The core concept is using Snowflake-native AI functions for text work directly where the data already lives. A Cortex AI summarization function can be called from SQL, and it can be incorporated into pipeline logic such as tasks or downstream transformations without moving the text outside Snowflake. That satisfies the requirement to keep summarization near the data and inside the pipeline boundary.

A dynamic table can persist results from a query, but it is not itself the summarization capability. Snowpipe Streaming is for low-latency ingestion, not text generation. An external function can reach an outside model, but that sends processing outside Snowflake, which the stem explicitly wants to avoid.

Dynamic table confusion: a dynamic table materializes query results, but it does not provide the summarization logic by itself.
Ingestion confusion: Snowpipe Streaming is designed for continuous ingestion, not AI text processing.
External call: an external function can invoke an LLM, but it pushes processing beyond the Snowflake pipeline boundary.

Question 48

Topic: Transformations, Programmability, and Developer Workflows

A data engineering team runs CDC every minute with a task:

SELECT apply_cdc();

apply_cdc() is a Python UDF intended to read a stream, run two MERGE statements into target tables, and insert audit rows. Task history shows failures when the function tries to execute SQL. What is the best next step?

Options:

A. Replace the task with a dynamic table on the stream
B. Convert apply_cdc to a UDTF and keep the MERGE logic there
C. Rewrite apply_cdc as a stored procedure and CALL it from the task
D. Increase the warehouse size for the task

Best answer: C

Explanation: This is a programmability-boundary problem, not a compute problem. UDFs are for reusable computation that returns values inside SQL, while stored procedures are for procedural workflows that execute multi-step DML and can be invoked by tasks.

In Snowflake, use a UDF when you need reusable logic that computes and returns a value to a SQL statement, such as normalizing text or deriving a column from inputs. This scenario is different: the routine must coordinate CDC processing by reading change data, running multiple MERGE statements, and writing audit records. That is procedural orchestration with database side effects, which fits a stored procedure.

A task can CALL a stored procedure so the procedure can manage multi-statement SQL, control flow, and status reporting for the CDC step. A UDF or UDTF may help transform data returned to a query, but they are not the right abstraction for orchestrating multi-target updates and audit writes.

The key takeaway is to use UDFs for expression-level computation and stored procedures for operational pipeline steps.

A UDTF can return rows to SQL, but it does not make multi-statement CDC updates and audit writes the right pattern.
More warehouse compute may reduce runtime pressure, but it does not fix a routine that uses the wrong Snowflake abstraction.
A dynamic table maintains a query result; it is not a replacement for procedural logic that updates multiple targets and logs outcomes.

Question 49

Topic: Transformations, Programmability, and Developer Workflows

A Snowflake task populates ORDER_LINES from an API landing table. Each row contains one JSON document in a VARIANT column, and line_items is an array of objects. The task inserts NULL for sku and qty, even though order_id loads correctly.

SELECT
  payload:order_id::NUMBER,
  payload:line_items:sku::STRING AS sku,
  payload:line_items:qty::NUMBER AS qty
FROM raw_orders;

Which action best addresses the issue?

Options:

A. Reload the stage with a different JSON file format
B. Use LATERAL FLATTEN on payload:line_items before selecting child fields
C. Apply document extraction to the source files
D. Run the task on a larger warehouse

Best answer: B

Explanation: This is a nested semi-structured transformation problem, not a file-loading or document-extraction problem. The payload is already in VARIANT, and only the array fields return NULL, which points to missing array flattening in the SQL.

When JSON is loaded into a VARIANT column, top-level object paths such as payload:order_id can be selected directly. Arrays are different: payload:line_items contains multiple elements, so child fields like sku and qty are not projected correctly by simple path notation. The symptom here—top-level extraction works but array attributes become NULL—shows the main concern is nested data handling. The fix is to use LATERAL FLATTEN on payload:line_items, then read each element through the flattened value column. Changing file-loading settings or treating the source as unstructured documents would not address the transformation logic, and adding compute would only make the wrong query run faster. The key distinction is nested JSON traversal versus file-based or document-like processing.

Reloading files misses that the data is already loaded and top-level JSON paths are working.
Document extraction fits unstructured documents such as PDFs or images, not JSON arrays inside VARIANT.
More compute may reduce runtime or queuing, but it does not fix incorrect handling of nested array elements.

Question 50

Topic: Compute, Governance, Observability, and Performance

A team ingests orders with Snowpipe Streaming into raw_orders, then uses a stream and task to populate curated_orders, which is shared to analysts. Requirements: curated data must be available within 5 minutes, and the pipeline uses a dedicated X-Small warehouse.

Monitoring:

latest row in raw_orders: 1 minute old
latest row in curated_orders: 18 minutes old
task runs every minute and recent runs are SUCCEEDED
null and duplicate checks are passing
warehouse credit usage is unchanged

How should the on-call engineer classify this incident?

Options:

A. A data quality incident
B. A cost control incident
C. A freshness incident
D. An execution failure incident

Best answer: C

Explanation: This is a freshness problem because the curated table is older than the 5-minute SLA while upstream ingestion is still current. The monitoring signals show a timeliness gap between raw arrival and curated availability.

Freshness monitoring is about whether data is available when consumers expect it. In this scenario, new rows are reaching raw_orders within 1 minute, but curated_orders is 18 minutes behind even though the SLA is 5 minutes. That means the pipeline is producing data too late for downstream use.

In Snowflake terms, this usually points to delay between ingestion and publication, such as stream consumption, task logic, or downstream transformation latency. The first step is to treat the alert as a freshness breach and trace where the delay is introduced between raw_orders and curated_orders.

A succeeded task run does not by itself prove that the data is current.

Quality mismatch fails because the checks for nulls and duplicates are passing; the symptom is lateness, not invalid data.
Cost mismatch fails because warehouse credit usage is stable and there is no sign of unexpected spend.
Failure mismatch fails because the task is running successfully; the primary issue is delayed availability of curated data.

Questions 51-60

Question 51

Topic: Sharing, Replication, and Cross-Platform Delivery

An organization replicates a production database from one Snowflake account to a secondary account in another cloud for disaster recovery. During failover tests, the tables are current in the secondary account, but the application cannot resume quickly because users, roles, and warehouse settings must be recreated manually. The continuity requirement is cross-region or cross-cloud failover with minimal reconfiguration. What is the best next step?

Options:

A. Convert the database to a secure share for the DR account.
B. Unload data to external storage and reload it after an outage.
C. Use a failover group to replicate databases and account objects.
D. Keep database replication only and rebuild account objects manually.

Best answer: C

Explanation: This scenario shows that data replication alone is not enough for continuity. A failover group is the Snowflake mechanism built to replicate supported databases plus supported account objects, which reduces manual work during cross-region or cross-cloud recovery.

The key issue is that the secondary environment has current table data but lacks the surrounding Snowflake objects required to resume service quickly. Database replication addresses database contents, but continuity plans often also need supported account objects such as users, roles, and warehouses. Failover groups are intended for disaster recovery and business continuity because they can replicate both supported databases and supported account objects between Snowflake accounts, including across regions or clouds where supported.

Using a secure share would expose data, not create a recoverable standby environment. Keeping database replication alone leaves manual rebuild steps in the recovery path, which violates the minimal-reconfiguration requirement. Exporting and reloading data is slower and more operationally complex than using Snowflake’s native continuity features.

When recovery must preserve more than just data, choose failover groups over database replication by itself.

Secure sharing mismatch exposes live data to another account, but it does not provide a full continuity mechanism for promoted recovery.
Data-only replication leaves critical account-level configuration outside the recovery plan, so failover still requires manual rebuild work.
Unload and reload adds extra movement and recovery time instead of using Snowflake-native replication and failover capabilities.

Question 52

Topic: Data Sourcing, Storage, and Ingestion

A retailer’s custom checkout service emits one JSON order event at a time, and the application team can modify the producer code. The data engineering team needs each event queryable in Snowflake within about 5 seconds, wants to avoid managing staged files, and will handle downstream merges only after the raw events land. Which Snowflake design BEST meets these requirements?

Options:

A. Use a table stream and tasks to ingest newly arriving events.
B. Use scheduled COPY INTO jobs from a stage every minute.
C. Use Snowpipe Streaming to ingest rows directly into a raw table.
D. Use Snowpipe auto-ingest on staged micro-batch event files.

Best answer: C

Explanation: This is primarily a streaming-ingestion requirement, not a post-load CDC requirement. Snowpipe Streaming lets a custom producer send rows directly into Snowflake with very low latency and no staged files, while downstream change processing can be added later if needed.

The key decision is whether the problem is getting data into Snowflake or reacting to changes after it is already there. In this scenario, the source is a custom application sending row-level events, the latency target is only a few seconds, and the team does not want to manage files in a stage. That makes Snowpipe Streaming the best fit because it ingests rows directly into a Snowflake table with low latency.

A stream is not an ingestion mechanism from an external producer. It records row changes for a Snowflake table after inserts, updates, or deletes have already occurred, and tasks orchestrate follow-on SQL or procedures. Standard Snowpipe and scheduled COPY INTO are file-based or batch-oriented approaches, so they add staging overhead or more delay than required. The stream-and-task idea is the closest distractor, but it starts after ingestion, not before.

File staging mismatch Snowpipe auto-ingest is useful for file-based ingestion, but it conflicts with the requirement to avoid staged files.
Post-land processing A table stream with tasks can process changes after rows land, but it cannot be the primary entry path for external events.
Batch instead of streaming Scheduled COPY INTO from a stage is a micro-batch pattern and is weaker for a 5-second direct-ingestion target.

Question 53

Topic: Transformations, Programmability, and Developer Workflows

A stream-driven task is failing after a team wrapped CDC logic in a SQL UDF:

CREATE OR REPLACE TASK apply_orders
  WHEN SYSTEM&#36;STREAM_HAS_DATA('orders_stream')
AS
  SELECT APPLY_CDC();

Inside APPLY_CDC, the team tries to run a MERGE into fact_orders, delete bad rows from order_errors, and return a status message. What is the most appropriate Snowflake fix?

Options:

A. Use a stored procedure and let the task CALL it.
B. Keep the UDF and upsize the warehouse.
C. Use a UDTF and select from it in the task.
D. Use an external function for the CDC logic.

Best answer: A

Explanation: This CDC step needs to execute Snowflake DML, not just return a computed value. A stored procedure is the right extensibility mechanism for multi-statement pipeline logic, and the task can invoke it directly with CALL.

The key distinction is between computing query results and performing actions. UDFs and UDTFs are meant to be used inside SQL queries to return a value or a row set. They are not the right mechanism when the logic itself must perform side effects such as MERGE, INSERT, UPDATE, or DELETE.

In this scenario, the task is orchestrating CDC application from a stream. That usually requires multi-statement execution, conditional logic, and status handling. A stored procedure fits that pattern because it can run DML and encapsulate procedural workflow, while the task still runs a single SQL statement:

CALL APPLY_CDC();

An external function is for calling an external service from SQL, not for internal table-maintenance logic. The practical rule is simple: use functions for reusable query-time computation, and use procedures for pipeline actions.

UDTF mismatch returns rows to a query, but it is not the mechanism for executing MERGE and DELETE side effects.
External service focus is for invoking remote endpoints, not for orchestrating internal CDC DML in Snowflake.
More compute can help runtime pressure, but it does not make a UDF valid for multi-statement DML logic.

Question 54

Topic: Transformations, Programmability, and Developer Workflows

A Snowflake pipeline ingests customer comments every minute. A downstream task runs a Python stored procedure that sends each comment to an external NLP API for issue-type classification. The task is now missing its SLA because the API rate-limits requests, and the team wants the enrichment to stay inside Snowflake with minimal new components. What is the best next step?

Options:

A. Increase the task warehouse size and keep the external API calls.
B. Add search optimization on the comment column before classification.
C. Replace the stored procedure with an external function that calls the same NLP API.
D. Use a Cortex AISQL text-classification function in the task SQL.

Best answer: D

Explanation: The problem is caused by an external dependency, not by Snowflake compute. Using a Snowflake Cortex AISQL text-classification function directly in the pipeline keeps data in-platform and removes the rate-limited API call that is delaying the task.

When text enrichment or classification must happen inside an existing Snowflake pipeline, the best fit is an in-platform text-processing function that runs directly where the data already resides. In this scenario, the task misses its SLA because every batch depends on a rate-limited external NLP service. Replacing that step with a Snowflake Cortex AISQL classification function lets the team perform the enrichment inside task SQL, reducing operational complexity and removing the external bottleneck.

Keep the existing orchestration pattern if it already works.
Apply the text-classification function during the transformation step.
Scale warehouses only when Snowflake-side compute is the actual constraint.

The closest distractor is adding warehouse compute, but more Snowflake compute cannot overcome throttling imposed by an outside service.

Increasing warehouse size adds Snowflake compute, but it does not remove rate limits from the external NLP API.
Replacing the procedure with an external function still depends on the same outside service, so the core bottleneck remains.
Search optimization helps selective lookup patterns, not row-by-row text classification or enrichment.

Question 55

Topic: Compute, Governance, Observability, and Performance

A data engineering team ingests clickstream events into a 30 TB EVENTS table with Snowpipe Streaming. Product dashboards run ad hoc queries like the one below every few minutes. The queries must finish in under 15 seconds, filters change often across several dimensions, and each run still scans tens of millions of rows before aggregating to fewer than 20 output rows. Query history shows most elapsed time in scan/filter/aggregate steps, not queuing.

SELECT campaign_id, SUM(revenue)
FROM events
WHERE event_ts >= DATEADD(day,-7,CURRENT_TIMESTAMP())
  AND region IN ('US','CA')
  AND device_type = 'mobile'
GROUP BY campaign_id
ORDER BY SUM(revenue) DESC
LIMIT 20;

What is the best Snowflake performance lever?

Options:

A. Create a minute-refresh dynamic table mirroring EVENTS
B. Define a clustering key across all filter columns
C. Add Search Optimization Service to the filtered columns
D. Enable Query Acceleration Service on the dashboard warehouse

Best answer: D

Explanation: Query Acceleration Service is the best fit because these queries are scan-heavy and spend most of their time filtering and aggregating large volumes of data before returning a tiny result set. It improves that warehouse-level execution pattern without requiring table-specific search structures for many changing predicates.

Search Optimization Service and Query Acceleration Service address different bottlenecks. Search optimization is strongest when Snowflake needs to find a very small set of rows quickly, such as point lookups, highly selective joins, or targeted searches on specific columns or semi-structured paths. Here, the workload is not a row-location problem: each query still reads tens of millions of rows from a recent time window and spends most of its time in scan, filter, and aggregate work before producing a short ranked output. That pattern aligns better with Query Acceleration Service, which adds shared compute to speed eligible selective analytical queries without redesigning the table.

Large scans remain necessary
Predicates vary across dimensions
Final result sets are very small

The closest distractor is search optimization, but it is usually stronger for pinpoint row retrieval than for broad analytical scans like this one.

Search optimization mismatch fails because the workload still needs broad scans and aggregation rather than rapid lookup of a tiny row set.
Clustering overreach is weaker because changing predicates across many columns make one clustering strategy less effective as the primary lever.
Dynamic table duplication adds refresh overhead and keeps the same raw-grain scan problem if it simply mirrors EVENTS.

Question 56

Topic: Streaming, Orchestration, and Near Real-Time Pipeline Design

A data engineering team already lands source records into RAW.ORDERS_LANDING in Snowflake. A downstream process must consume only the rows that changed since its last run, including inserts, updates, and deletes, without rescanning the full table. Which Snowflake object is designed for this CDC pattern?

Options:

A. A stream on RAW.ORDERS_LANDING
B. A Snowpipe on the source stage
C. A task that queries RAW.ORDERS_LANDING
D. A dynamic table built from RAW.ORDERS_LANDING

Best answer: A

Explanation: A stream is Snowflake’s CDC object for data that is already stored in a table. It exposes row-level changes since the stream’s current offset, so downstream processing can read only new inserts, updates, and deletes.

Use a stream when the data has already landed in Snowflake and downstream logic must process only the changed rows. A stream uses change tracking on the underlying table to expose a CDC delta from its current offset, which avoids rereading the entire landing table each time.

Load data into the raw table.
Create a stream on that table.
Have a task or procedure consume the stream.
The stream offset advances when the consuming DML transaction commits.

A dynamic table can maintain a derived target table, but it is not the object that provides a consumable CDC feed from the source table.

Dynamic table confusion: it maintains derived table state, but it does not expose a consumable CDC offset from the source table.
Task confusion: it orchestrates execution, but it does not track changed rows by itself.
Snowpipe confusion: it ingests newly arrived files, but it is not a CDC object over an existing Snowflake table.

Question 57

Topic: Data Sourcing, Storage, and Ingestion

A team lands raw order JSON in RAW_ORDERS and currently uses an external ETL job to build a curated table every few minutes. They are evaluating this Snowflake design:

CREATE OR REPLACE DYNAMIC TABLE curated_orders
  TARGET_LAG = '5 minutes'
  WAREHOUSE = etl_wh
AS
SELECT
  payload:order_id::NUMBER AS order_id,
  payload:customer_id::NUMBER AS customer_id,
  payload:order_total::NUMBER(10,2) AS order_total
FROM raw_orders;

What is the best interpretation of this design?

Options:

A. It uses Snowpipe to continuously load staged files without warehouse compute.
B. It uses a dynamic table to keep transformed results fresh in Snowflake.
C. It uses an external table to query object-store files in place.
D. It uses zero-copy cloning to keep a copy synchronized automatically.

Best answer: B

Explanation: The exhibit defines a dynamic table with a freshness target and a transformation query over RAW_ORDERS. That means Snowflake manages the refresh of the curated result inside the platform, reducing the need to export data to a separate ETL or Spark system for this common engineering task.

This is a dynamic table pattern. A dynamic table stores the result of a query and Snowflake refreshes it to stay within the specified TARGET_LAG, using the named warehouse for the refresh work. In this case, the query parses JSON from a raw landing table into typed relational columns, which is a common transformation that teams often push to external ETL tools.

Because the transformation and refresh both happen inside Snowflake, data does not need to be moved out to a separate processing platform just to keep the curated table current. That is the key Snowflake-native service being demonstrated here. The closest confusion is Snowpipe, but Snowpipe is for ingesting files, not for maintaining transformed query results.

Snowpipe confusion mixes up ingestion with transformation; the exhibit defines a dynamic table and names a warehouse for refresh work.
External table confusion is wrong because the source shown is a Snowflake table, not files queried in place from object storage.
Clone confusion fails because zero-copy clones are point-in-time copies, not auto-refreshing transformed datasets.

Question 58

Topic: Transformations, Programmability, and Developer Workflows

A data engineering team needs a SQL-native transformation step that runs every 5 minutes, skips work when no CDC rows are available, and returns a simple status to an orchestrating task. They review this procedure body:

DECLARE has_rows BOOLEAN;
BEGIN
  SELECT SYSTEM&#36;STREAM_HAS_DATA('ORDERS_STM')
    INTO :has_rows;

  IF (has_rows) THEN
    MERGE INTO CURATED_ORDERS t
    USING ORDERS_STM s
      ON t.ORDER_ID = s.ORDER_ID
    WHEN MATCHED THEN UPDATE SET STATUS = s.STATUS
    WHEN NOT MATCHED THEN
      INSERT (ORDER_ID, STATUS)
      VALUES (s.ORDER_ID, s.STATUS);
  END IF;

  RETURN 'ok';
END;

Which Snowflake design choice best matches this fragment and the requirements?

Options:

A. A dynamic table for automatic incremental refresh
B. Snowpipe Streaming into the curated target table
C. An external table over staged CDC files
D. A SQL stored procedure using Snowflake Scripting

Best answer: D

Explanation: This fragment demonstrates Snowflake Scripting used for a SQL stored procedure pattern. The code assigns a value to a variable, conditionally runs a MERGE, and returns a status, which fits a SQL-native orchestration step that can be called by a task.

Snowflake Scripting is Snowflake’s SQL procedural language for SQL stored procedures and anonymous blocks. In this fragment, SYSTEM$STREAM_HAS_DATA checks whether the stream has CDC rows, INTO stores that result in a variable, IF decides whether to run the MERGE, and RETURN sends a status back to the caller. That is a procedural wrapper around SQL transformations, which is useful when a team wants branching logic while staying entirely SQL-native.

Variables hold intermediate state.
Control flow decides whether DML should run.
MERGE performs the CDC upsert.
A task can invoke this procedure on a schedule.

A declarative refresh feature could maintain derived data, but it would not be the main concept shown by this scripting fragment.

Dynamic refresh fits declarative table maintenance, but it does not directly show variable-based branching and a returned status.
Streaming ingest solves low-latency row ingestion, not procedural SQL logic around an existing stream.
External table query is for files in object storage, not for conditional CDC upserts from a stream object.

Question 59

Topic: Compute, Governance, Observability, and Performance

A governance team plans to drop a column from a shared table. They need to identify which users queried that column recently and which downstream views depend on it. They do not need to monitor null rates, freshness, or row-count drift. Which Snowflake capability best matches this observability requirement?

Options:

A. Access history and object dependency metadata
B. Object tags
C. Data metric functions
D. Masking policies

Best answer: A

Explanation: The requirement is to audit usage and understand downstream impact before a schema change. In Snowflake, access history covers who queried data, and object dependency metadata covers which objects rely on it; data metric functions are for data-quality signals instead.

This is an access observability and impact analysis problem, not a data-quality problem. Access history is used to investigate who accessed a table or column and when, while object dependency metadata is used to find downstream objects such as views that reference the source object. Together, they answer the two questions in the stem: “who used it” and “what depends on it.”

Data metric functions are the wrong fit because they evaluate data characteristics such as freshness, null counts, or row-level quality expectations. Those metrics help validate data quality, but they do not provide an audit trail of access or a dependency graph for schema-change planning. The key takeaway is to choose access and dependency metadata when the goal is governance observability, not quality measurement.

Data quality focus Data metric functions monitor characteristics like freshness or nulls, not user access patterns or downstream object dependencies.
Policy enforcement Masking policies control how values are exposed at query time, but they do not tell you who used a column or what objects reference it.
Classification only Object tags help label and govern data, but tags are not an audit log and do not provide dependency tracing.

Question 60

Topic: Compute, Governance, Observability, and Performance

A team wants Snowflake-native data-quality monitoring on PROD.SALES.ORDERS without writing custom validation pipelines. Checks must run every 15 minutes, track null and duplicate counts, and support rules such as ORDER_ID null count = 0.

ALTER TABLE prod.sales.orders
  SET DATA_METRIC_SCHEDULE = '15 MINUTE';

Based on the exhibit, what is the best next step?

Options:

A. Associate Data Metric Functions and expectations
B. Enable search optimization on ORDER_ID
C. Create an alert with validation SQL
D. Create a stream and task graph

Best answer: A

Explanation: The exhibit only schedules when Snowflake evaluates data-quality checks; it does not define any checks by itself. To monitor nulls, duplicates, and threshold-based rules natively, the engineer should associate Data Metric Functions and add expectations.

Snowflake’s native feature for metric-level and rule-level data-quality monitoring is Data Metric Functions (DMFs). DATA_METRIC_SCHEDULE sets the cadence for evaluation, but the table still needs metric associations such as SNOWFLAKE.CORE.NULL_COUNT or SNOWFLAKE.CORE.DUPLICATE_COUNT. If the team also wants pass/fail logic like “null count must be 0,” expectations can be attached so Snowflake evaluates that rule directly. This keeps the monitoring definition inside Snowflake and avoids building custom audit pipelines for common quality checks.

Streams, tasks, or alerts can automate custom logic, but they are not the purpose-built feature the exhibit is setting up.

Stream and task graph can implement custom validation, but the stem asks for native metric/rule monitoring after setting a data metric schedule.
Alert with validation SQL can notify on a condition, but it still depends on custom query logic rather than Snowflake’s quality-metric feature.
Search optimization improves selective query performance and does not define or monitor null, duplicate, or rule-based quality checks.

Continue with full practice

Use the SnowPro Data Engineer DEA-C02 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try SnowPro Data Engineer DEA-C02 on Web View SnowPro Data Engineer DEA-C02 Practice Test

Focused topic pages

Free review resource

Read the SnowPro Data Engineer DEA-C02 Cheat Sheet on Tech Exam Lexicon for concept review before another timed run.

Revised on Thursday, May 14, 2026

Compute and Governance

Browse Certification Practice Tests by Exam Family

Free SnowPro DEA-C02 Full-Length Practice Exam: 60 Questions

Exam snapshot

Full-length exam mix

Practice questions

Questions 1-25

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Questions 26-50

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Question 48

Question 49

Question 50

Questions 51-60

Question 51

Question 52

Question 53

Question 54

Question 55

Question 56

Question 57

Question 58

Question 59

Question 60

Continue with full practice

Focused topic pages

Free review resource