Keep this page open while drilling questions. Prioritize “default picks” and trade-offs: performance, cost, security, and operational simplicity.
1) Core architecture (Domain 1)
- Separation of storage and compute: storage is shared; compute happens in virtual warehouses.
- Virtual warehouses: independent compute clusters; resize for performance, auto-suspend to control cost.
- Micro-partitions: automatic partitioning; good filters enable pruning (less data scanned).
- Caching (high-level): repeated queries often benefit from caching; changing underlying data can affect cached results.
Mental model: You pay for warehouse time (compute) and stored data (storage). Most optimization is about using the right warehouse behavior for the workload.
2) Access control & security (Domain 2)
- RBAC: privileges flow through roles; assign roles to users (and roles to roles) for least privilege.
- Ownership matters: the owning role controls grants; resolve “permission denied” by checking the active role and object ownership.
- Common controls:
- Network policies to restrict allowed client IPs.
- Key-pair auth / SSO for stronger auth patterns.
- Secure views + masking/row access policies (when available) to limit exposure.
- Practice tip: in scenario questions, prefer solutions that are least-privilege and easy to audit.
3) Performance & cost (Domain 3)
- Start small, scale intentionally: pick the smallest warehouse that meets SLAs; scale up/out only when needed.
- Concurrency pressure: use multi-cluster behavior (when appropriate) rather than permanently oversizing.
- Query tuning basics: filter early, avoid scanning wide tables unnecessarily, leverage pruning-friendly predicates.
- Operational defaults: enable auto-suspend and size warehouses per workload (ETL vs BI vs ad-hoc).
4) Data loading & unloading (Domain 4)
- Stages: internal or external locations used with
COPY INTO. - File formats: define formats explicitly (CSV/JSON/Parquet/etc.) to avoid brittle defaults.
- Snowpipe (concept): managed, event-driven ingestion for near-real-time loads (vs batch
COPY INTO). - Unloading:
COPY INTO <location> exports query results to a stage/external location.
5) Transformations (Domain 5)
- DDL/DML: be fluent with tables, views, and core DML patterns (
INSERT, MERGE, UPDATE, DELETE). - Semi-structured:
VARIANT + FLATTEN are common for JSON-like structures. - Automation primitives: tasks/streams (where used) support incremental pipelines and scheduled transforms.
6) Data protection & sharing (Domain 6)
- Time Travel: recover or query historical data within the retention window (good for accidental deletes/updates).
- Fail-safe: last-resort recovery option after Time Travel (operationally controlled by Snowflake).
- Zero-copy cloning: fast clones for dev/test and experimentation (be clear on what’s copied vs referenced).
- Data sharing: share data securely without copying it; understand provider/consumer flows and permissions.
Next: drill by objective
- Follow the Syllabus domain-by-domain.
- Launch drills in Practice and keep this cheatsheet open as a reinforcement tool.