How to Use This Quick Reference
This independent Quick Reference supports preparation for the Snowflake SnowPro Advanced: Data Engineer (DEA-C02) exam from Snowflake. It focuses on practical decisions a data engineer must make in Snowflake: ingestion pattern, transformation design, stream/task orchestration, table optimization, governance, and troubleshooting.
Use it to review:
- Which Snowflake feature fits a scenario.
- How ingestion, CDC, and orchestration components interact.
- High-yield SQL patterns and common traps.
- Performance and reliability signals to check during troubleshooting.
Core Snowflake Architecture Map
| Layer / Concept | What It Does | Data Engineer Exam Relevance |
|---|
| Cloud services layer | Authentication, metadata, optimizer, access control, transactions | Explains metadata pruning, query compilation, RBAC, Time Travel metadata, and object management |
| Storage layer | Compressed columnar data in immutable micro-partitions | Drives pruning, clustering, cloning, Time Travel, and storage design |
| Virtual warehouses | User-managed compute for SQL, loading, transformations, Snowpark | Tune size for query complexity; scale out for concurrency |
| Serverless compute | Snowflake-managed compute for selected services | Used by features such as Snowpipe, automatic services, and some task/service patterns |
| Micro-partitions | Internal storage units with metadata such as ranges and distinct values | Enables pruning; clustering can improve pruning for large tables |
| Metadata | File load history, table statistics, object definitions, access metadata | Used for idempotent loads, query optimization, auditing, and troubleshooting |
| Result cache | Reuses prior query result when eligible | Useful but should not be treated as a pipeline correctness mechanism |
| Warehouse cache | Local data cache on a running warehouse | Can improve repeated scans; lost when warehouse suspends or changes depending on execution context |
Feature Selection Matrix
| Need | Prefer | Avoid / Watch For |
|---|
| Batch load staged files | COPY INTO <table> | Do not build custom loaders unless you need special processing outside Snowflake |
| Continuous file ingestion | Snowpipe | Snowpipe is not the same as transactional streaming; design for notification and ingestion latency |
| Low-latency row ingestion | Snowpipe Streaming or connector-based streaming pattern | Do not force tiny files through batch COPY if continuous row ingestion is required |
| Declarative refresh of derived tables | Dynamic tables | Not ideal for procedural branching, complex side effects, or custom retry logic |
| Incremental CDC from Snowflake table changes | Streams + tasks | Streams are offsets over change data, not physical event queues |
| Scheduled SQL orchestration | Tasks | Remember task graphs need resumed tasks and dependency design |
| Procedural orchestration in Snowflake | Stored procedures called by tasks | Avoid putting simple set-based SQL into procedural loops |
| Python/Java/Scala transformation pushed to Snowflake | Snowpark | Watch for code that pulls large data to the client instead of pushing down |
| Point lookup acceleration | Search Optimization Service | Do not use as a default substitute for good table design |
| Repeated expensive aggregate/query pattern | Materialized view, dynamic table, or precomputed table | Confirm freshness and maintenance tradeoffs |
| Scan-heavy outlier queries | Query Acceleration Service where eligible | Not a fix for bad joins, poor filters, or under-sized architecture |
| Cross-region/cloud DR | Replication / failover groups where supported | Do not confuse Time Travel with disaster recovery |
| Share data without copying | Secure Data Sharing | Secure views/policies may be needed to restrict exposed rows/columns |
Data Loading and Ingestion
Loading Pattern Decision Table
| Pattern | Best For | Key Objects | Exam Traps |
|---|
| Manual or scheduled batch load | Periodic files, controlled windows | Stage, file format, COPY INTO | COPY load metadata prevents duplicate file loads unless forced; understand ON_ERROR behavior |
| Snowpipe auto-ingest | Event-driven files from cloud storage | External stage, pipe, cloud notification integration | It is continuous ingestion, not instant transformation orchestration |
| Snowpipe REST API | Application-triggered file ingest | Pipe, REST call | Application still stages files; not row-by-row streaming |
| Snowpipe Streaming | Low-latency streaming rows | SDK/connector/channel pattern | Different mental model from staged-file COPY |
| External tables | Query data in external storage | External stage, external table, metadata refresh | Querying external data is not the same as loading into Snowflake-managed storage |
| Iceberg tables | Open table format / lakehouse interoperability | Catalog integration, external volume, Iceberg table | Understand ownership of metadata and storage before choosing |
| Connector ingestion | SaaS/app/system ingestion | Snowflake connectors or partner tools | Focus on governance, idempotency, and operational monitoring |
| Object / Option | Purpose | Practical Notes |
|---|
| Internal stage | Snowflake-managed file staging | Good for controlled uploads and transient staging |
| External stage | Cloud object storage location | Prefer storage integrations over embedded cloud credentials |
| Named file format | Reusable parsing definition | Keeps COPY statements consistent and auditable |
PATTERN | Regex filter for staged files | Useful but easy to overcomplicate; test carefully |
FILES | Explicit list of files | Good for deterministic loads |
VALIDATION_MODE | Validate load without committing rows | Useful before production loads |
ON_ERROR | Controls behavior for bad rows/files | CONTINUE can hide data quality issues if not monitored |
MATCH_BY_COLUMN_NAME | Map columns by name where supported | Useful when file column order changes |
| Metadata columns | Capture filename, row number, timestamps | Use for lineage, replay, and quarantine workflows |
High-Yield COPY INTO Pattern
CREATE OR REPLACE FILE FORMAT ff_json
TYPE = JSON
STRIP_OUTER_ARRAY = TRUE;
COPY INTO raw.events
(payload, source_file, loaded_at)
FROM (
SELECT
$1,
METADATA$FILENAME,
CURRENT_TIMESTAMP()
FROM @raw_event_stage
)
FILE_FORMAT = (FORMAT_NAME = ff_json)
ON_ERROR = 'CONTINUE';
Key points:
- Use named stages and file formats for repeatability.
- Persist source file metadata for lineage and replay.
- Monitor rejected rows/files;
ON_ERROR = 'CONTINUE' is not a data quality solution by itself. COPY maintains load metadata, so reloading the same file requires intentional design.
Snowpipe Reference
CREATE OR REPLACE PIPE ingest.event_pipe
AUTO_INGEST = TRUE
AS
COPY INTO raw.events
FROM @raw_event_stage
FILE_FORMAT = (FORMAT_NAME = ff_json);
| Topic | Remember |
|---|
| Auto-ingest | Uses cloud notifications to trigger ingestion from a stage |
| Pipe definition | Contains a COPY INTO statement |
| Monitoring | Use pipe history and copy history views/functions |
| Transformations | Keep pipe transformations simple; use downstream tasks/dynamic tables for complex logic |
| Idempotency | File naming, load metadata, and replay procedure matter |
| Errors | Check pipe status and load history; failed file handling must be operationalized |
Streams, Tasks, and Dynamic Tables
| Feature | Use When | Core Concept | Common Trap |
|---|
| Stream | Need change data from table/view changes | Tracks offset of changes since last consumption | A stream is not a standalone queue; it depends on source data retention and consumption |
| Standard stream | Need inserts, deletes, updates | Updates appear with change metadata | Must handle delete/update semantics correctly |
| Append-only stream | Only new inserts matter | More efficient insert-only change tracking | Wrong choice if updates/deletes must be captured |
| Task | Need scheduled or event-like SQL execution | Runs SQL, stored procedure, or graph step | Tasks must be resumed; child tasks depend on graph state |
| Task graph | Need ordered multi-step pipeline | Root task plus child dependencies | Design for retries, failure isolation, and idempotency |
| Dynamic table | Need declarative refreshed result table | Snowflake refreshes toward target lag | Not a replacement for every task/procedure workflow |
| Stored procedure | Need procedural control flow | JavaScript, Snowpark Python/Java/Scala, SQL procedures | Avoid row-by-row procedural processing for set operations |
| Metadata Column | Meaning | Use |
|---|
METADATA$ACTION | Insert or delete action | Drive MERGE, deletes, audit logic |
METADATA$ISUPDATE | Whether row is part of an update operation | Distinguish update pairs from simple insert/delete behavior |
METADATA$ROW_ID | Stable row identifier for change tracking | Useful for dedup/change handling patterns |
Task and Stream Pattern
CREATE OR REPLACE STREAM stg.orders_stream
ON TABLE stg.orders;
CREATE OR REPLACE TASK mart.merge_orders_task
WAREHOUSE = etl_wh
SCHEDULE = 'USING CRON 0 * * * * UTC'
WHEN SYSTEM$STREAM_HAS_DATA('stg.orders_stream')
AS
MERGE INTO mart.orders AS t
USING (
SELECT *
FROM stg.orders_stream
WHERE METADATA$ACTION = 'INSERT'
) AS s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE SET
t.status = s.status,
t.updated_at = s.updated_at
WHEN NOT MATCHED THEN INSERT
(order_id, status, updated_at)
VALUES
(s.order_id, s.status, s.updated_at);
ALTER TASK mart.merge_orders_task RESUME;
Exam reminders:
- Consuming a stream occurs when the DML statement using it commits.
- Wrap multi-step stream consumption in a transaction when consistency matters.
SYSTEM$STREAM_HAS_DATA is useful as a task WHEN condition, but the downstream SQL still needs to be idempotent.- Separate ingestion, validation, merge, and serving layers when troubleshooting clarity matters.
Dynamic Table Pattern
CREATE OR REPLACE DYNAMIC TABLE mart.daily_sales
TARGET_LAG = '1 hour'
WAREHOUSE = etl_wh
AS
SELECT
order_date,
SUM(net_amount) AS net_sales,
COUNT(*) AS order_count
FROM clean.orders
GROUP BY order_date;
| Dynamic Table Choice | Choose It When |
|---|
TARGET_LAG | You need Snowflake-managed refresh toward a freshness objective |
TARGET_LAG = DOWNSTREAM | Refresh should be driven by downstream dependencies |
| Incremental refresh | Query is eligible and incremental maintenance is efficient |
| Full refresh | Logic is not incrementally maintainable or full recompute is acceptable |
| Tasks instead | You need procedural steps, conditional branches, external calls, or custom retry logic |
Table Design and Storage
Table Type Reference
| Table Type | Best For | Key Behavior |
|---|
| Permanent table | Durable production data | Supports Snowflake data protection features such as Time Travel and Fail-safe behavior |
| Transient table | Rebuildable data, staging, derived data | No Fail-safe; useful when recovery requirements are lower |
| Temporary table | Session-scoped intermediate work | Exists only for the session; can shadow permanent object names in the session |
| External table | Query files in external storage | Metadata must reflect external file changes |
| Dynamic table | Managed refreshed table from a query | Stores derived results maintained by Snowflake |
| Iceberg table | Open table format interoperability | Requires understanding catalog/storage ownership |
Micro-Partition and Clustering Reference
| Concept | Practical Meaning |
|---|
| Micro-partition | Immutable internal storage unit; Snowflake stores metadata for pruning |
| Pruning | Skips micro-partitions that cannot satisfy query predicates |
| Natural clustering | Data order created by load patterns may already support pruning |
| Clustering key | Expression list used to improve organization for pruning |
| Clustering depth | Indicator of how well data is clustered for selected keys |
| Automatic Clustering | Snowflake-managed maintenance for defined clustering keys |
| Re-clustering cost | More clustering is not always better; use for large tables with selective filters |
Choose clustering when:
- Table is large enough that pruning matters.
- Queries repeatedly filter on predictable columns.
- Natural load order does not already cluster data well.
- Maintenance cost is justified by query improvement.
Avoid clustering when:
- Table is small.
- Filters are unpredictable.
- High-churn DML causes excessive maintenance.
- Query bottleneck is joins, spilling, or concurrency rather than scan pruning.
Optimization Feature Comparison
| Feature | Helps With | Not For |
|---|
| Clustering key | Range/filter pruning on large tables | Small tables or random access workloads alone |
| Search Optimization Service | Highly selective point lookups, some semi-structured search patterns | Broad scans and general warehouse sizing problems |
| Materialized view | Repeated expensive query patterns with maintained results | Arbitrary complex transformations or every dashboard query |
| Dynamic table | Fresh, declarative derived table pipeline | Procedural orchestration |
| Query Acceleration Service | Eligible scan-heavy queries with selective processing | Bad SQL logic, missing joins, or universal acceleration |
| Warehouse scale-up | Complex single queries, memory pressure, large joins | High concurrency alone |
| Multi-cluster warehouse | Concurrent user/query demand | Making one individual query faster |
| Result cache | Repeated identical eligible queries | ETL correctness, freshness guarantees, or parameterized workload design |
Semi-Structured Data
Data Types and Functions
| Item | Use |
|---|
VARIANT | Store semi-structured values such as JSON |
OBJECT | Key-value structure |
ARRAY | Ordered list |
PARSE_JSON | Convert JSON text to VARIANT |
TO_VARIANT | Convert SQL value to VARIANT |
FLATTEN | Explode arrays/objects into rows |
| Dot / bracket notation | Navigate nested values |
Casts such as ::STRING, ::NUMBER | Convert variant values for typed processing |
SELECT
payload:customer.id::STRING AS customer_id,
item.value:sku::STRING AS sku,
item.value:quantity::NUMBER AS quantity
FROM raw.events,
LATERAL FLATTEN(input => payload:items) AS item;
High-yield distinctions:
- JSON
null and SQL NULL are not always equivalent in processing. - Cast extracted values before joining, grouping, or applying numeric logic.
- Flatten only the level required; excessive flattening can explode row counts.
- For frequently queried attributes, consider projecting into typed columns or derived tables.
Snowpark, UDFs, and Stored Procedures
| Tool | Best Use | Exam Distinction |
|---|
| Snowpark DataFrame API | Pushdown transformations in Python, Java, or Scala | Lazy execution; operations are planned for Snowflake execution |
| Scalar UDF | Reusable row-level function | Good for deterministic expressions, not orchestration |
| UDTF | Function returning rows/table output | Useful for expanding or parsing custom structures |
| Stored procedure | Control flow, orchestration, multi-step logic | Can be called from tasks |
| External function | Call external service through configured integration | Requires security/network design |
| External access integration | Allows governed outbound access from supported code | Do not hard-code secrets or network assumptions |
| Packages/imports | Bring dependencies into Snowpark code | Versioning and allowed packages matter operationally |
Practical design rules:
- Prefer SQL set operations for relational transformations.
- Use Snowpark when code libraries, complex logic, or developer language preference justify it.
- Keep large datasets inside Snowflake; avoid collecting data to the client.
- Use stored procedures for orchestration, not as a substitute for set-based SQL performance.
Security, Governance, and Data Protection
RBAC and Grants
| Concept | Meaning | Exam Focus |
|---|
| Role-based access control | Privileges granted to roles, roles granted to users/roles | Design least-privilege access |
USAGE privilege | Allows use of database, schema, warehouse, integration | Required but not sufficient for object access |
| Object privileges | SELECT, INSERT, UPDATE, DELETE, etc. | Grant only what the pipeline needs |
OWNERSHIP | Full control over object and grant management | Transferring ownership can affect grants |
| Future grants | Apply grants to future objects in a schema/database | Useful for repeatable pipeline object creation |
| Managed access schema | Centralizes grant management through schema owner | Good for controlled environments |
| Secondary roles | Additional active roles for a session | Know how effective privileges are determined |
Example least-privilege pattern:
GRANT USAGE ON WAREHOUSE etl_wh TO ROLE data_engineer_role;
GRANT USAGE ON DATABASE analytics TO ROLE data_engineer_role;
GRANT USAGE ON SCHEMA analytics.stg TO ROLE data_engineer_role;
GRANT SELECT, INSERT, UPDATE, DELETE
ON ALL TABLES IN SCHEMA analytics.stg
TO ROLE data_engineer_role;
Governance Feature Reference
| Feature | Purpose | Use When |
|---|
| Masking policy | Dynamically mask column values | Sensitive fields require role/context-based display |
| Row access policy | Filter rows based on policy logic | Multi-tenant, regional, or entitlement-based access |
| Tag | Classify objects/columns | Governance, lineage, policy automation |
| Tag-based masking | Apply masking based on classification | Many sensitive columns need consistent control |
| Secure view | Hide query definition and restrict optimization exposure | Controlled data sharing or sensitive logic |
| Secure UDF | Protect function definition | Sensitive business logic |
| Access History | Audit who accessed what | Compliance, investigations, least-privilege review |
| Object dependencies | Understand upstream/downstream impact | Change management and pipeline safety |
Data Protection and Recovery
| Feature | Use | Do Not Confuse With |
|---|
| Time Travel | Query, clone, or restore prior object state within retention | Long-term backup strategy by itself |
UNDROP | Restore recently dropped objects when eligible | Fixing corrupted logic after retention expires |
| Zero-copy clone | Create metadata-based copy for dev/test/recovery | Full physical copy at creation time |
| Fail-safe | Snowflake-managed last-resort recovery for permanent data | User-queryable recovery workflow |
| Replication | Copy supported databases/account objects across regions/clouds | Query cache or Time Travel |
| Failover group | Coordinated failover for supported objects | Simple table clone |
CREATE TABLE dev.orders_clone
CLONE prod.orders;
CREATE TABLE recovery.orders_before_change
CLONE prod.orders
AT (TIMESTAMP => '2026-06-18 10:00:00'::TIMESTAMP);
Data Sharing and Collaboration
| Feature | Best For | Notes |
|---|
| Secure Data Sharing | Share live data without copying | Provider controls shared objects and grants |
| Reader account | Share with consumers without their own Snowflake account | Provider administers reader account |
| Listing / Marketplace | Discoverable data product distribution | Can be private or public depending on setup |
| Secure view in share | Restrict columns/rows before sharing | Combine with masking/row access where needed |
| Replicated share | Cross-region/cloud consumer access patterns | Requires replication design and operational awareness |
Exam traps:
- Sharing exposes objects through grants; it does not copy table data into the consumer account.
- Secure views are common when the provider must restrict shared data.
- Consumer performance depends on consumer-side compute for querying shared data.
- Governance policies must be tested from the consumer’s effective role/context.
Symptom-to-Action Table
| Symptom | Check First | Likely Actions |
|---|
| Query scans too much data | Query Profile, partitions scanned, filter predicates | Add selective predicates, improve clustering, project typed columns |
| Query spills to disk | Query Profile spill indicators, join size, warehouse size | Scale up warehouse, reduce data earlier, optimize joins |
| Many queued queries | Warehouse load, concurrency, multi-cluster settings | Scale out, separate workloads, adjust task/user warehouses |
| One complex query is slow | Execution plan, joins, aggregates, memory | Scale up, rewrite SQL, precompute, use materialized view/dynamic table |
| Dashboard repeats same expensive query | Query history, result reuse, freshness needs | Consider materialized view, dynamic table, aggregate table |
| Point lookups slow on huge table | Filter selectivity, search predicates | Consider Search Optimization Service |
| Pipeline reprocesses files | Copy history, file names, FORCE use | Fix idempotency and replay design |
| Task did not run | Task state, schedule, dependencies, WHEN condition | Resume task, inspect task history, check stream has data |
| Stream stale or empty unexpectedly | Source retention, consumption transaction, task failures | Consume regularly; monitor stream/task health |
| Snowpipe not loading | Pipe status, notification setup, copy history | Validate stage path, file format, cloud events, pipe errors |
Query Profile Review Checklist
| Query Profile Area | What It Tells You |
|---|
| Compilation time | Excessive optimization/metadata overhead or complex SQL generation |
| Bytes scanned | Whether pruning/projection is effective |
| Partitions scanned | Micro-partition pruning quality |
| Join nodes | Join type, build/probe imbalance, missing filters |
| Aggregate nodes | High-cardinality grouping or late reduction |
| Sort nodes | Large ordering/window operations |
| Spill indicators | Memory pressure; warehouse sizing or SQL rewrite needed |
| Remote vs local I/O | Cache effectiveness and scan cost |
| Queuing | Warehouse concurrency or resource availability issue |
Warehouse Sizing Decisions
| Scenario | Better Lever |
|---|
| Single long-running complex query | Larger warehouse, SQL rewrite, precompute |
| Many simultaneous queries | Multi-cluster warehouse or workload separation |
| ETL and BI interfere with each other | Separate warehouses by workload |
| Intermittent workloads | Auto-suspend/auto-resume with appropriate settings |
| Repeated transformations | Tasks/dynamic tables with dedicated warehouse |
| Cost governance | Resource monitors, query review, workload isolation |
Operational Monitoring Reference
| Need | Snowflake Area to Check |
|---|
| Query execution details | Query History, Query Profile |
| Warehouse credit and load patterns | Warehouse metering/load history |
| File loads | Copy history, load history |
| Snowpipe health | Pipe status, pipe usage/history |
| Task runs | Task history and task graph state |
| Stream consumption | Stream metadata, task history, source table changes |
| Access auditing | Access History, Login History |
| Object changes | Account Usage views, object dependency metadata |
| Data quality issues | Rejected files, quarantine tables, validation queries |
| Replication/failover | Replication/failover group status and history |
Useful monitoring query patterns:
SELECT
query_id,
warehouse_name,
execution_status,
start_time,
bytes_scanned
FROM snowflake.account_usage.query_history
WHERE warehouse_name = 'ETL_WH'
ORDER BY start_time DESC;
SELECT *
FROM TABLE(information_schema.task_history(
TASK_NAME => 'MERGE_ORDERS_TASK'
))
ORDER BY scheduled_time DESC;
Data Engineering Design Patterns
Bronze / Silver / Gold Mapping
| Layer | Snowflake Objects | Design Goal |
|---|
| Raw / Bronze | Raw tables, VARIANT, source metadata | Preserve source fidelity and lineage |
| Clean / Silver | Typed tables, validation results, deduped records | Standardize schema and enforce quality |
| Curated / Gold | Marts, aggregates, dynamic tables, materialized views | Serve analytics, ML, and downstream apps |
| Governance | Policies, tags, secure views | Enforce access and classification |
| Operations | Tasks, streams, monitoring tables | Make pipeline health visible |
Idempotent Pipeline Checklist
- Use deterministic file naming and load tracking.
- Capture
METADATA$FILENAME and load timestamp. - Separate raw ingestion from business transformation.
- Use
MERGE for upserts where duplicate events are possible. - Process streams in transactions when multiple target updates must stay consistent.
- Design replay procedures before production failure.
- Quarantine bad records instead of silently discarding them.
- Monitor task, pipe, and copy history.
- Keep transformations set-based wherever possible.
Data Quality Controls
| Control | Example |
|---|
| Type validation | TRY_TO_NUMBER, TRY_TO_DATE, explicit casts |
| Required fields | Reject or quarantine rows with missing keys |
| Range checks | Validate amounts, timestamps, status codes |
| Deduplication | Window functions over business keys and event timestamps |
| Referential checks | Compare staged keys to dimension tables |
| Schema drift detection | Compare inferred file schema to expected contract |
| Audit counts | Source count, loaded count, rejected count, merged count |
| Lineage fields | Source file, row number, batch ID, load time |
High-Yield Exam Traps
| Trap | Correct Mental Model |
|---|
| “Bigger warehouse always fixes performance.” | Scale up helps some single-query bottlenecks; concurrency may need scale-out or workload isolation |
| “A stream stores all change rows forever.” | A stream tracks change offsets over retained source data; monitor staleness and consumption |
| “Snowpipe replaces transformations.” | Snowpipe loads files; downstream tasks/dynamic tables usually transform |
| “Temporary tables are harmless with duplicate names.” | Temporary objects can shadow permanent names in a session |
| “Fail-safe is for normal user recovery.” | Use Time Travel, cloning, and UNDROP for routine recovery |
| “Clustering should be added to every large table.” | Add clustering only when query pruning benefit justifies maintenance |
| “Semi-structured data requires no modeling.” | Frequently queried fields often belong in typed columns or curated tables |
| “Secure Data Sharing copies provider data.” | Sharing allows live access without copying into the consumer account |
| “Stored procedures are faster than SQL.” | Set-based SQL is usually preferred for relational transformations |
| “Result cache proves the query is optimized.” | Cache can hide inefficient scans; inspect cold-query behavior and profile |
Final DEA-C02 Review Checklist
Before taking DEA-C02, make sure you can quickly answer:
- Which ingestion option fits batch files, continuous files, and low-latency streaming rows?
- How do
COPY INTO, Snowpipe, stages, file formats, and load history interact? - When should you choose streams/tasks versus dynamic tables?
- How do stream metadata columns affect
MERGE logic? - What table type fits production, staging, session-only, and externally stored data?
- How do micro-partitions, pruning, clustering, and search optimization differ?
- When should you scale a warehouse up versus scale out?
- How do masking policies, row access policies, secure views, and tags work together?
- How do Time Travel, cloning,
UNDROP, replication, and Fail-safe differ? - What monitoring view or history area would you check for a failed load, task, pipe, or slow query?
Next step: turn each table above into scenario drills, then practice timed Snowflake data engineering questions that force you to choose the best ingestion, transformation, security, and performance pattern for the stated constraints.