DS0-002 — CompTIA DataSys+ V2 Quick Reference

Last revised: June 18, 2026

Compact DS0-002 Quick Reference for CompTIA DataSys+ V2 candidates covering database design, SQL, security, operations, performance, and recovery.

Exam-use focus

This independent Quick Reference is for candidates preparing for CompTIA DataSys+ V2 (DS0-002). Use it to review database administration decisions quickly: data modeling, SQL behavior, deployment patterns, access control, backup and recovery, monitoring, performance, and operational change management.

High-yield DBA decision map

Task	Choose / remember	Common exam trap
Design transactional system	Normalize, enforce constraints, index frequent predicates	Adding indexes for every column without considering write cost
Design analytics system	Dimensional model, columnar storage, aggregates, ETL/ELT pipelines	Treating OLAP like OLTP and over-normalizing reporting tables
Protect data	Least privilege, encryption, auditing, masking/tokenization where needed	Confusing encryption with authorization
Improve slow query	Check plan, indexes, statistics, joins, predicates, locks	Scaling hardware before reading the execution plan
Recover from data loss	Restore last good full backup, apply differential/incremental/logs as appropriate, validate	Having backups but never testing restores
Meet low downtime requirement	Replication, clustering, failover, load balancing, tested runbooks	Assuming backup alone provides high availability
Move data between systems	Migration plan, validation, rollback, CDC or replication if low downtime	Big-bang migration without reconciliation
Troubleshoot contention	Identify locks, long transactions, isolation level, deadlock victims	Killing sessions without preserving evidence
Improve data quality	Constraints, validation rules, profiling, deduplication, lineage	Fixing reports instead of correcting source data
Manage changes	Versioned scripts, pre-prod testing, approvals, rollback plan	Manual schema changes directly in production

Data types, structures, and workload fit

Data category reference

Data category	Examples	Typical handling
Structured	Rows and columns, well-defined schema	Relational DBMS, SQL, constraints
Semi-structured	JSON, XML, logs with variable fields	Document stores, data lakes, schema-on-read
Unstructured	Images, audio, PDFs, free text	Object storage, search indexes, metadata catalog
Master data	Customer, product, supplier records	Governance, deduplication, stewardship
Transactional data	Orders, payments, inventory movements	ACID controls, constraints, auditability
Reference data	Country codes, status codes, taxonomies	Controlled updates, versioning
Metadata	Schema, lineage, ownership, sensitivity labels	Cataloging, governance, impact analysis

Workload selection matrix

Workload	Primary goal	Common design	Storage/query pattern
OLTP	Fast, consistent transactions	Normalized relational schema	Short reads/writes, indexed lookups
OLAP	Analysis across large history	Star/snowflake schema, aggregates	Large scans, joins, grouping
HTAP / mixed	Transactional plus near-real-time analytics	Separate serving paths or specialized platform	Avoid analytics queries degrading OLTP
Streaming	Continuous event processing	Event logs, stream processors, time windows	Append-first, low-latency processing
Data lake	Flexible raw and curated data	Object storage zones, catalog, governance	Batch/ELT, schema-on-read
Search	Text relevance and filtering	Inverted indexes	Tokenization, ranking, faceting
Time-series	Metrics, telemetry, sensor data	Timestamp-based partitions/retention	Range scans, downsampling

Database model selection

Model	Best for	Strengths	Watch for
Relational	Structured business transactions	ACID, SQL, constraints, joins	Rigid schema changes if poorly managed
Key-value	Session state, cache, simple lookups	Very fast access by key	Limited querying and relationships
Document	JSON-like entities, flexible attributes	Schema flexibility, nested data	Duplicated data and inconsistent shapes
Column-family / wide-column	Massive sparse datasets, high write scale	Horizontal scale, high throughput	Query design must follow access patterns
Graph	Highly connected relationships	Traversal, path analysis	Not ideal for simple tabular reporting
Time-series	Metrics over time	Retention, compression, time windows	Cardinality management
Object storage + catalog	Raw files, lake architectures	Low-cost durable storage, many formats	Governance and query performance depend on design

Relational design quick reference

Core relational terms

Term	Meaning	Exam note
Entity	Thing represented by a table	Example: Customer, Order, Product
Attribute	Column describing an entity	Choose data type and constraints carefully
Tuple / row	One record	Should represent one instance of the entity
Primary key	Unique row identifier	Should be stable, unique, and non-null
Foreign key	Reference to another table’s key	Enforces referential integrity
Candidate key	Column set that could uniquely identify rows	One is selected as primary key
Surrogate key	Artificial key, such as generated ID	Useful when natural keys are unstable
Natural key	Real-world identifier	Can change or contain business meaning
Cardinality	Relationship count	One-to-one, one-to-many, many-to-many
Optionality	Whether relationship is required	Implemented through nullability and constraints
Junction table	Resolves many-to-many relationship	Contains foreign keys to both parent tables

Relationship patterns

Pattern	Implementation	Example
One-to-one	FK with unique constraint, or shared PK	User and user profile
One-to-many	FK on child table	Customer to orders
Many-to-many	Junction/bridge table	Students to courses
Hierarchical	Self-referencing FK	Employee to manager
Recursive graph-like	Edge table with source/target IDs	Network links, dependencies

Constraint reference

Constraint	Protects against	Example
NOT NULL	Missing required data	Order date must exist
UNIQUE	Duplicate values	Email address must be unique
PRIMARY KEY	Missing or duplicate row identity	CustomerID
FOREIGN KEY	Orphan records	Order must reference valid customer
CHECK	Invalid domain values	Quantity greater than 0
DEFAULT	Missing routine value	CreatedDate defaults to current timestamp
EXCLUDE / specialized constraint	Overlapping or conflicting ranges	Room bookings cannot overlap, if supported

Normalization decision table

Form	Main rule	Fixes	Practical exam cue
1NF	Atomic values; no repeating groups	Multi-value columns	Split comma-separated phone numbers into child table
2NF	1NF plus no partial dependency on composite key	Redundant data in composite-key tables	Non-key attribute depends on only part of key
3NF	2NF plus no transitive dependency	Non-key data depending on other non-key data	Move ZIP-to-city mapping to reference table
BCNF	Every determinant is a candidate key	Edge cases with overlapping candidate keys	Stricter than 3NF
Denormalization	Intentionally duplicate or precompute data	Read/report performance	Requires consistency strategy

High-yield distinction: normalization reduces update anomalies; denormalization can improve read performance but increases maintenance and consistency risk.

Dimensional and analytics modeling

Concept	Meaning	Use when
Fact table	Numeric events or measurements	Sales amount, clicks, shipments
Dimension table	Descriptive context	Date, customer, product, region
Star schema	Fact table directly linked to dimensions	Simpler, faster BI queries
Snowflake schema	Dimensions further normalized	Lower redundancy, more joins
Grain	Level of detail in fact table	“One row per order line”
Slowly changing dimension Type 1	Overwrite old value	Current-state reporting only
Slowly changing dimension Type 2	Add new versioned row	Historical reporting needed
Aggregate table	Pre-summarized data	Improve frequent reports
Data mart	Subject-specific analytics store	Department or domain reporting

SQL compact reference

SQL statement classes

Class	Purpose	Examples
DDL	Define structures	CREATE, ALTER, DROP, TRUNCATE
DML	Manipulate data	SELECT, INSERT, UPDATE, DELETE, MERGE
DCL	Control permissions	GRANT, REVOKE
TCL	Control transactions	COMMIT, ROLLBACK, SAVEPOINT
DQL	Query data	SELECT; often treated as part of DML

Logical SELECT processing order

Order	Clause	Purpose
1	FROM / JOIN	Identify source rows
2	WHERE	Filter individual rows
3	GROUP BY	Form groups
4	HAVING	Filter groups
5	SELECT	Return expressions
6	DISTINCT	Remove duplicates
7	ORDER BY	Sort result
8	LIMIT / OFFSET / FETCH	Return subset, syntax varies

Trap: WHERE filters rows before grouping. HAVING filters groups after aggregation.

SELECT c.customer_id, COUNT(*) AS order_count
FROM customers c
JOIN orders o
  ON o.customer_id = c.customer_id
WHERE o.order_date >= DATE '2026-01-01'
GROUP BY c.customer_id
HAVING COUNT(*) > 5
ORDER BY order_count DESC;

Join behavior

Join type	Returns	Common use
INNER JOIN	Matching rows only	Required relationship
LEFT OUTER JOIN	All left rows plus matching right rows	Find optional related data
RIGHT OUTER JOIN	All right rows plus matching left rows	Less common; can rewrite as left join
FULL OUTER JOIN	All rows from both sides	Reconciliation, data comparison
CROSS JOIN	Cartesian product	Generate combinations; dangerous if accidental
SELF JOIN	Table joined to itself	Hierarchies, comparisons

-- Find customers with no orders
SELECT c.customer_id
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;

NULL behavior

Expression	Result / note
`column = NULL`	Incorrect; comparisons to NULL are unknown
`column IS NULL`	Correct NULL test
`column IS NOT NULL`	Correct non-NULL test
`COUNT(*)`	Counts rows
`COUNT(column)`	Counts non-NULL values
`COALESCE(a, b)`	Returns first non-NULL value
NULL in arithmetic	Usually produces NULL

Aggregation and window functions

Technique	Purpose	Example use
GROUP BY	Collapse rows into groups	Sales by region
HAVING	Filter aggregate groups	Regions with sales above threshold
Window function	Calculate across related rows without collapsing	Rank orders by customer
PARTITION BY	Defines window groups	Per customer, per department
ORDER BY inside window	Defines calculation order	Running total, row number

SELECT
  customer_id,
  order_id,
  order_total,
  ROW_NUMBER() OVER (
    PARTITION BY customer_id
    ORDER BY order_date DESC
  ) AS order_rank
FROM orders;

DDL and constraints example

CREATE TABLE orders (
  order_id      INTEGER PRIMARY KEY,
  customer_id   INTEGER NOT NULL,
  order_date    DATE NOT NULL,
  order_total   DECIMAL(12,2) NOT NULL CHECK (order_total >= 0),
  status        VARCHAR(20) NOT NULL DEFAULT 'NEW',
  CONSTRAINT fk_orders_customer
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Transaction example

BEGIN;

UPDATE accounts
SET balance = balance - 100
WHERE account_id = 10;

UPDATE accounts
SET balance = balance + 100
WHERE account_id = 20;

COMMIT;
-- Use ROLLBACK instead of COMMIT if validation fails.

Transactions, concurrency, and consistency

ACID reference

Property	Meaning	DBA relevance
Atomicity	All-or-nothing transaction	Prevent partial transfers or half-written changes
Consistency	Database moves between valid states	Constraints and business rules remain valid
Isolation	Concurrent transactions do not improperly interfere	Controlled by isolation levels and locks
Durability	Committed data survives failure	Logs, storage, checkpoints, replication

Isolation and anomalies

Isolation concept	Prevents / allows	Exam cue
Read uncommitted	May allow dirty reads	Fast but unsafe for correctness
Read committed	Prevents dirty reads	Common baseline behavior
Repeatable read	Prevents non-repeatable reads	Same row reread remains stable
Serializable	Strongest isolation; behaves like serial execution	More blocking/overhead possible
Snapshot / MVCC	Readers see consistent version	Reduces read/write blocking, may create version storage pressure

Anomaly	Meaning
Dirty read	Read data from uncommitted transaction
Non-repeatable read	Same row read twice returns different committed values
Phantom read	Re-running query returns new/deleted matching rows
Lost update	One update overwrites another without detection
Deadlock	Transactions wait on each other in a cycle

Locking and contention

Symptom	Likely cause	First checks	Possible action
Queries hang or time out	Blocking locks	Active sessions, wait events, lock tables	Commit/rollback long transaction; tune query
Deadlock errors	Conflicting access order	Deadlock logs/graphs	Access objects in consistent order; shorten transactions
High write latency	Index overhead, log pressure, contention	Write waits, index count, log I/O	Reduce unnecessary indexes, batch carefully
Readers block writers	Lock-based isolation	Isolation settings, query duration	Use appropriate isolation/MVCC if supported
Version store grows	Long snapshot readers	Long-running queries	End stale sessions; tune reporting workload

Indexing and query performance

Index selection

Index type / pattern	Use when	Avoid / watch for
B-tree / balanced tree	Equality and range predicates, sorting	Low-selectivity columns may not help
Composite index	Queries filter/sort by multiple columns	Column order matters
Covering index	Query can be satisfied from index	Extra storage and write overhead
Unique index	Enforce uniqueness and speed lookup	Duplicates will fail
Filtered / partial index	Only subset is frequently queried	Predicate must match supported syntax
Full-text index	Natural language search	Not same as `LIKE '%term%'`
Hash index	Equality lookup, if supported	Usually not for range queries
Bitmap index	Low-cardinality analytics, if supported	Often poor for high-concurrency OLTP

SARGability checklist

A predicate is more index-friendly when the database can search the index directly.

Less index-friendly	More index-friendly
`WHERE YEAR(order_date) = 2026`	`WHERE order_date >= DATE '2026-01-01' AND order_date < DATE '2027-01-01'`
`WHERE LOWER(email) = 'a@x.com'`	Store normalized email or use supported function-based index
`WHERE amount + 10 > 100`	`WHERE amount > 90`
`WHERE name LIKE '%son'`	Use full-text/search index if suffix search is required

Query tuning sequence

Confirm the performance symptom and baseline.
Review execution plan: scan vs seek, join method, sort, spill, estimated vs actual rows if available.
Check predicates for SARGability.
Validate relevant indexes and index column order.
Check statistics freshness and cardinality estimates.
Inspect joins, missing filters, accidental cross joins, and implicit conversions.
Check locks, waits, I/O, CPU, memory, and temp space.
Test changes in non-production and compare measured results.

Performance tools and signals

Signal	Indicates	Typical response
Full table scan	No useful index, low selectivity, or optimizer choice	Add/tune index only if beneficial
High CPU	Complex calculations, poor plan, too many executions	Tune query, cache results, reduce frequency
High disk I/O	Large scans, missing indexes, poor caching	Index, partition, archive, tune queries
Memory pressure	Sort/hash spills, insufficient cache	Tune query, review memory config/capacity
Temp space growth	Large sorts, hash joins, temp tables	Add indexes, reduce intermediate rows
Stale statistics	Bad cardinality estimates	Update statistics/maintenance
Parameter sensitivity	Plan good for one value, bad for another	Recompile/plan strategy, query rewrite where supported

Storage, partitioning, and capacity

Storage concepts

Concept	DBA meaning	Exam note
Data file	Stores table/index data	Placement affects I/O
Log / journal	Records changes for durability and recovery	Critical for point-in-time restore
Tablespace / filegroup	Logical storage grouping, vendor term varies	Can separate data, indexes, partitions
Temp space	Used for sorts, joins, temp objects	Running out can break queries
Checkpoint	Flushes dirty pages to durable storage	Interacts with recovery time
Compression	Reduces storage and I/O	May increase CPU
Tiering	Move data to different storage classes	Balance performance, cost, retention

Partitioning patterns

Pattern	Use when	Benefit	Trap
Range partition	Date/time or ordered key	Fast pruning and archival	Bad partition key creates imbalance
List partition	Known categories	Region/status separation	Too many categories can be hard to manage
Hash partition	Spread data evenly	Reduces hotspot risk	Less intuitive for pruning
Composite partition	Multiple strategies	Large complex workloads	More operational complexity

Partitioning is not automatically a performance fix. It helps most when queries filter on the partition key, maintenance can operate per partition, or data lifecycle actions need efficient archive/drop.

Deployment and architecture patterns

Database deployment decision table

Pattern	Best for	Strengths	Risks / controls
Single instance	Small/simple workload, dev/test	Simple administration	Single point of failure unless backed by platform features
Primary-replica	Read scaling, reporting, HA support	Offload reads, failover option	Replication lag, consistency expectations
Active-active / multi-writer	Global or high-write availability	Continuous service potential	Conflict resolution and complexity
Clustered database	High availability within environment	Automated failover, shared governance	Quorum/split-brain design matters
Sharding	Very large scale-out workload	Horizontal write/read scale	Cross-shard queries and rebalancing complexity
Managed database service	Reduce operational burden	Automated maintenance options, platform integration	Shared responsibility still applies
Containerized database	Dev/test portability, some specialized deployments	Repeatable deployment	Persistent storage, backup, and performance need care
Serverless database	Variable workload, simplified scaling	Operational simplicity	Cold starts/latency/cost behavior may vary by platform

Replication choices

Replication type	Description	Choose when
Synchronous	Commit waits for replica acknowledgment	Very low data-loss tolerance
Asynchronous	Primary commits before replica catches up	Distance/performance more important than zero-lag
Physical/block-level	Replicates storage/log changes	Disaster recovery or exact copy
Logical	Replicates rows/statements/changes	Selective replication, migration, integration
Snapshot	Periodic copy	Reporting or initial synchronization
CDC	Captures data changes over time	Near-real-time pipelines and migrations

CAP and distributed data systems

Concept	Meaning
Consistency	Reads return latest correct data according to model
Availability	Requests receive non-error responses
Partition tolerance	System continues despite network partitions
Eventual consistency	Replicas converge over time
Strong consistency	Reads reflect committed writes according to defined rules

Exam distinction: CAP applies under network partition conditions. It does not mean every system permanently chooses only two characteristics in all situations.

Security quick reference

Security control matrix

Control	Protects	Examples / notes
Authentication	Who are you?	Passwords, MFA, certificates, federation
Authorization	What can you do?	Roles, grants, row-level access
Accounting / auditing	What happened?	Login logs, query audit, DDL audit
Encryption in transit	Network confidentiality	TLS/secure client connections
Encryption at rest	Stored data protection	Database/file/storage encryption
Key management	Control encryption keys	Rotation, separation of duties, access policy
Masking	Hide sensitive values in displays	Useful for non-prod or limited users
Tokenization	Replace sensitive data with token	Reduces exposure of original value
Hashing	One-way transformation	Password verification, integrity checks
Salting	Random value added before hashing	Defends against precomputed hash attacks
Data classification	Label sensitivity	Drives access, retention, and monitoring
Secrets management	Protect credentials	Avoid hardcoded passwords
Network segmentation	Limit reachable paths	Private subnets, firewalls, allowlists
Patching	Remove known vulnerabilities	Test, schedule, rollback plan

Least privilege SQL example

-- Example pattern; exact syntax varies by platform.
CREATE ROLE reporting_reader;

GRANT SELECT ON sales_summary TO reporting_reader;

GRANT reporting_reader TO analyst_user;

REVOKE INSERT, UPDATE, DELETE ON sales_summary FROM reporting_reader;

Role design

Pattern	Good practice	Trap
User-specific grants	Use sparingly	Hard to audit and revoke
Role-based access control	Grant privileges to roles, users to roles	Overly broad shared roles
Separation of duties	Split DBA, security admin, developer duties	One account can alter, approve, and audit itself
Break-glass access	Emergency elevated access	Must be logged, time-limited, reviewed
Service accounts	Application/database connectivity	Use rotation and scoped permissions

Sensitive data handling

Technique	Reversible?	Primary use
Encryption	Yes, with key	Protect stored/transmitted data
Hashing	No	Verify password or integrity
Masking	Usually display-level	Reduce exposure to users
Tokenization	Yes, through token vault/mapping	Replace sensitive values in workflows
Redaction	Usually no in output	Remove sensitive text from logs/documents
Anonymization	Intended no	Analytics without identifying individuals
Pseudonymization	Possible with mapping	Reduce direct identifiability

Audit and logging focus

Event type	Why it matters
Failed logins	Brute force or credential misuse
Privilege changes	Escalation or misconfiguration
DDL changes	Schema drift, unauthorized alteration
Access to sensitive tables	Data exposure investigation
Bulk export	Potential exfiltration
Backup/restore events	Data movement and recovery assurance
Configuration changes	Security or availability impact

Trap: audit logs must be protected from tampering. Logging sensitive values can create a second data exposure location.

Backup, restore, and disaster recovery

RPO, RTO, and availability

Term	Meaning	Practical implication
RPO	Maximum acceptable data loss	Determines backup/log replication frequency
RTO	Maximum acceptable recovery time	Determines restore automation and HA design
MTTD	Mean time to detect	Monitoring and alerting quality
MTTR	Mean time to repair/recover	Runbooks, automation, staff readiness
HA	Reduce downtime during component failure	Clustering, failover, redundancy
DR	Recover from site/region/system disaster	Backups, replication, alternate environment

\[ \text{Availability} = \frac{\text{Total time} - \text{Downtime}}{\text{Total time}} \times 100 \]

Backup type comparison

Backup type	Captures	Strengths	Watch for
Full	Entire database/data set	Simplest restore base	Largest time/storage
Differential	Changes since last full	Faster restore than many incrementals	Grows until next full
Incremental	Changes since last backup	Efficient storage	Restore may require chain
Transaction log / redo log	Ordered changes	Point-in-time recovery	Log chain must be intact
Snapshot	Point-in-time storage image	Fast creation	Must understand consistency and dependency on storage
Logical export	Schema/data as statements/files	Migration, selective restore	May be slower; consistency must be managed
Physical backup	Data files/logs	Fast full restore	Platform/version compatibility matters

Restore sequence reference

Scenario	Typical restore approach
Full backup only	Restore full backup
Full + differential	Restore full, then latest differential
Full + incrementals	Restore full, then each incremental in order
Full + logs	Restore full, then logs to target point
Full + differential + logs	Restore full, latest differential, then logs
Corrupt object only	Consider object-level restore/export if supported
Accidental DELETE	Point-in-time restore to separate environment, extract/replay valid data

Backup validation checklist

Confirm backups complete successfully.
Test restore in a non-production environment.
Verify application can connect after restore.
Validate row counts, checksums, or reconciliation totals.
Document restore steps and owners.
Protect backups with encryption and access controls.
Store backups separately from the primary failure domain.
Monitor backup age, duration, failure, and capacity.
Review retention and deletion settings against business requirements.

HA/DR decision path

    flowchart TD
	    A[Requirement: protect database service] --> B{Main concern?}
	    B -->|Short outage from server failure| C[High availability: cluster or failover replica]
	    B -->|Data loss tolerance is very low| D[Synchronous replication or frequent log shipping]
	    B -->|Regional/site disaster| E[Disaster recovery environment plus offsite backups]
	    B -->|Accidental data change| F[Point-in-time restore and audit trail]
	    C --> G[Test failover runbook]
	    D --> G
	    E --> G
	    F --> H[Test restore and data reconciliation]

Data lifecycle, integration, and governance

ETL, ELT, CDC, and streaming

Pattern	Description	Choose when
ETL	Transform before loading target	Target requires curated data before storage
ELT	Load raw first, transform in target	Scalable analytics platform handles transforms
CDC	Capture inserts/updates/deletes from source	Low-latency sync, migrations, audit feeds
Batch	Periodic bulk movement	Large scheduled processing
Streaming	Continuous event processing	Real-time alerts, telemetry, near-real-time analytics
API integration	Application-level data exchange	Controlled business operations and validation
Message queue	Decouple producers and consumers	Resilience and asynchronous processing

Data quality checks

Check	Detects
Completeness	Missing required values
Validity	Values outside allowed domain
Uniqueness	Duplicate keys/entities
Consistency	Conflicting values across systems
Accuracy	Values not matching real-world source
Timeliness	Stale or late-arriving data
Integrity	Broken relationships or corrupted values
Conformity	Wrong format, unit, code set, or standard

-- Basic reconciliation checks after migration/load
SELECT COUNT(*) AS source_count FROM source_orders;
SELECT COUNT(*) AS target_count FROM target_orders;

SELECT SUM(order_total) AS source_total FROM source_orders;
SELECT SUM(order_total) AS target_total FROM target_orders;

Governance and metadata

Area	DBA relevance
Data catalog	Find datasets, owners, schemas, descriptions
Lineage	Understand where data came from and downstream impact
Classification	Identify sensitive, regulated, or business-critical data
Retention	Keep or dispose data according to policy
Data ownership	Assign accountability for definitions and quality
Stewardship	Day-to-day data quality and definition management
Change impact	Know which reports, apps, or pipelines depend on a table
Master data management	Create trusted shared entities across systems

Monitoring and troubleshooting

Baseline metrics

Category	Metrics / signals
Availability	Uptime, connection success, failover events
Workload	Transactions per second, query rate, batch duration
Latency	Query response time, commit latency, replication lag
Resource	CPU, memory, disk I/O, network throughput
Storage	Free space, growth rate, temp usage, log usage
Locks/waits	Blocking sessions, deadlocks, wait classes
Cache	Buffer/cache hit ratio, plan cache behavior
Backup	Success/failure, duration, backup age
Security	Failed logins, privilege changes, unusual access
Data pipeline	Job failures, late data, rejected records

Symptom-to-action table

Symptom	Likely causes	First action
Database unavailable	Service down, network issue, storage failure, auth issue	Check service health, connectivity, logs
Sudden slow queries	Bad plan, stale stats, blocking, workload spike	Compare to baseline; review plan and waits
Disk full	Data growth, logs not truncating, temp spill, failed cleanup	Identify consuming files; protect data before cleanup
Replication lag	Network latency, slow replica, large transaction	Check replica health and apply queue
Backup failure	Permission, capacity, I/O, schedule conflict	Review job logs and destination capacity
Login failures	Credential changes, lockout, expired secret, attack	Validate auth path and security logs
Data mismatch	ETL bug, partial load, constraint disabled, duplicate source	Reconcile counts/totals and review load logs
Deadlocks	Conflicting transaction order, missing indexes	Review deadlock details and access pattern
Corruption alert	Storage fault, software issue, abrupt failure	Stop risky writes if needed; follow vendor recovery guidance

Incident response for database issues

Detect and classify severity.
Preserve logs, alerts, and current state.
Stabilize service; avoid destructive “quick fixes.”
Identify blast radius: users, applications, tables, replicas, backups.
Communicate status through the agreed channel.
Apply tested recovery or remediation steps.
Validate service and data correctness.
Document root cause, timeline, and preventive actions.

Maintenance, change, and release management

Routine DBA operations

Operation	Purpose	Key caution
Patch database engine	Security and stability	Test compatibility and rollback
Update statistics	Better optimizer estimates	Schedule to avoid peak impact
Rebuild/reorganize indexes	Address fragmentation where relevant	Do not perform blindly; measure benefit
Archive/purge data	Control growth and performance	Validate retention and dependencies
Rotate credentials/keys	Reduce credential exposure	Coordinate application updates
Review permissions	Enforce least privilege	Remove stale users and excessive grants
Test restores	Confirm recoverability	Test regularly, not only after incidents
Capacity review	Avoid outages and performance issues	Trend storage, CPU, memory, I/O

Schema change checklist

Use version-controlled migration scripts.
Test in development and staging with realistic data volume.
Identify dependent applications, views, reports, jobs, and permissions.
Back up or snapshot before high-risk changes.
Plan locking and downtime impact.
Use backward-compatible changes where possible.
Include rollback or roll-forward strategy.
Validate data after migration.
Document change, owner, approval, and implementation time.

Safe migration sequence

Phase	Activities
Assess	Inventory objects, dependencies, data volume, compatibility
Plan	Choose migration method, window, rollback, validation
Prepare	Provision target, security, connectivity, schema
Load	Initial copy or replication setup
Validate	Counts, checksums, sample queries, application tests
Cut over	Freeze or sync final changes, switch traffic
Monitor	Watch errors, latency, locks, data drift
Decommission	Retire old system only after acceptance and backup

Cloud and managed database responsibilities

Responsibility area	Customer usually still manages	Provider/platform may manage
Data model	Schema, indexes, constraints	Not usually automatic
Access	Users, roles, secrets, app permissions	IAM integration features
Data protection	Classification, encryption choices, backup policy validation	Infrastructure encryption options, backup tooling
Patching	Application compatibility testing, scheduling decisions	Engine or host patch automation depending on service
Availability	Architecture choice, failover testing	Platform redundancy mechanisms
Monitoring	Alerts, query performance, business KPIs	Built-in metrics/log delivery
Compliance support	Policies, evidence, data handling	Platform controls and reports
Cost/capacity	Workload sizing, scaling choices	Metering and scaling mechanisms

Trap: managed database does not mean unmanaged data. The organization still owns data quality, access design, query behavior, and recovery requirements.

Common exam distinctions

Distinction	Remember
Backup vs replication	Backup protects against deletion/corruption history; replication can copy bad changes quickly
HA vs DR	HA handles local/component failure; DR handles broader disaster recovery
Authentication vs authorization	Authentication proves identity; authorization grants actions
Encryption vs hashing	Encryption is reversible with key; hashing is one-way
Masking vs encryption	Masking changes what users see; encryption protects stored/transmitted data
OLTP vs OLAP	OLTP optimizes transactions; OLAP optimizes analysis
Primary key vs unique key	Both enforce uniqueness; primary key is main row identifier and non-null
Foreign key vs join	FK enforces relationship; join retrieves related data
WHERE vs HAVING	WHERE filters rows; HAVING filters groups
DELETE vs TRUNCATE	DELETE is row operation and can filter; TRUNCATE removes all rows more directly, behavior varies by platform
Logical vs physical backup	Logical exports objects/data; physical backs up files/pages/logs
Incremental vs differential	Incremental since last backup; differential since last full
Scale up vs scale out	Scale up adds resources to one node; scale out adds nodes/partitions
Normalization vs denormalization	Normalize for integrity; denormalize intentionally for read performance
Data lake vs data warehouse	Lake stores flexible raw/curated files; warehouse stores structured optimized analytics
CDC vs full reload	CDC captures changes; full reload replaces or reloads entire set

Compact command/query patterns to recognize

-- Add an index for a frequent lookup pattern
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);

-- Enforce valid status values
ALTER TABLE orders
ADD CONSTRAINT chk_orders_status
CHECK (status IN ('NEW', 'PAID', 'SHIPPED', 'CANCELLED'));

-- Identify duplicate business keys
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- Find orphaned child rows
SELECT o.order_id
FROM orders o
LEFT JOIN customers c
  ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL;

Final DS0-002 review checklist

Can you select relational, NoSQL, warehouse, lake, or streaming architecture based on workload?
Can you explain primary keys, foreign keys, constraints, joins, NULLs, and transaction isolation?
Can you read SQL and predict result-set behavior?
Can you choose an index strategy and a query-tuning sequence?
Can you distinguish authentication, authorization, auditing, encryption, hashing, masking, and tokenization?
Can you map RPO/RTO requirements to backup, restore, replication, HA, and DR designs?
Can you troubleshoot slow queries, blocking, failed backups, replication lag, and storage growth?
Can you describe safe schema changes, migrations, validation, and rollback planning?
Can you connect data governance, classification, lineage, and quality checks to DBA operations?

Practical next step

Use this Quick Reference as a checklist, then move into timed DS0-002 practice questions and hands-on SQL/database administration scenarios. Focus review on any item where you cannot explain both the correct choice and why the tempting alternatives are wrong.

Scenario Guide

Database Fundamentals