Exam-use focus
This independent Quick Reference is for candidates preparing for CompTIA DataSys+ V2 (DS0-002). Use it to review database administration decisions quickly: data modeling, SQL behavior, deployment patterns, access control, backup and recovery, monitoring, performance, and operational change management.
High-yield DBA decision map
| Task | Choose / remember | Common exam trap |
|---|
| Design transactional system | Normalize, enforce constraints, index frequent predicates | Adding indexes for every column without considering write cost |
| Design analytics system | Dimensional model, columnar storage, aggregates, ETL/ELT pipelines | Treating OLAP like OLTP and over-normalizing reporting tables |
| Protect data | Least privilege, encryption, auditing, masking/tokenization where needed | Confusing encryption with authorization |
| Improve slow query | Check plan, indexes, statistics, joins, predicates, locks | Scaling hardware before reading the execution plan |
| Recover from data loss | Restore last good full backup, apply differential/incremental/logs as appropriate, validate | Having backups but never testing restores |
| Meet low downtime requirement | Replication, clustering, failover, load balancing, tested runbooks | Assuming backup alone provides high availability |
| Move data between systems | Migration plan, validation, rollback, CDC or replication if low downtime | Big-bang migration without reconciliation |
| Troubleshoot contention | Identify locks, long transactions, isolation level, deadlock victims | Killing sessions without preserving evidence |
| Improve data quality | Constraints, validation rules, profiling, deduplication, lineage | Fixing reports instead of correcting source data |
| Manage changes | Versioned scripts, pre-prod testing, approvals, rollback plan | Manual schema changes directly in production |
Data types, structures, and workload fit
Data category reference
| Data category | Examples | Typical handling |
|---|
| Structured | Rows and columns, well-defined schema | Relational DBMS, SQL, constraints |
| Semi-structured | JSON, XML, logs with variable fields | Document stores, data lakes, schema-on-read |
| Unstructured | Images, audio, PDFs, free text | Object storage, search indexes, metadata catalog |
| Master data | Customer, product, supplier records | Governance, deduplication, stewardship |
| Transactional data | Orders, payments, inventory movements | ACID controls, constraints, auditability |
| Reference data | Country codes, status codes, taxonomies | Controlled updates, versioning |
| Metadata | Schema, lineage, ownership, sensitivity labels | Cataloging, governance, impact analysis |
Workload selection matrix
| Workload | Primary goal | Common design | Storage/query pattern |
|---|
| OLTP | Fast, consistent transactions | Normalized relational schema | Short reads/writes, indexed lookups |
| OLAP | Analysis across large history | Star/snowflake schema, aggregates | Large scans, joins, grouping |
| HTAP / mixed | Transactional plus near-real-time analytics | Separate serving paths or specialized platform | Avoid analytics queries degrading OLTP |
| Streaming | Continuous event processing | Event logs, stream processors, time windows | Append-first, low-latency processing |
| Data lake | Flexible raw and curated data | Object storage zones, catalog, governance | Batch/ELT, schema-on-read |
| Search | Text relevance and filtering | Inverted indexes | Tokenization, ranking, faceting |
| Time-series | Metrics, telemetry, sensor data | Timestamp-based partitions/retention | Range scans, downsampling |
Database model selection
| Model | Best for | Strengths | Watch for |
|---|
| Relational | Structured business transactions | ACID, SQL, constraints, joins | Rigid schema changes if poorly managed |
| Key-value | Session state, cache, simple lookups | Very fast access by key | Limited querying and relationships |
| Document | JSON-like entities, flexible attributes | Schema flexibility, nested data | Duplicated data and inconsistent shapes |
| Column-family / wide-column | Massive sparse datasets, high write scale | Horizontal scale, high throughput | Query design must follow access patterns |
| Graph | Highly connected relationships | Traversal, path analysis | Not ideal for simple tabular reporting |
| Time-series | Metrics over time | Retention, compression, time windows | Cardinality management |
| Object storage + catalog | Raw files, lake architectures | Low-cost durable storage, many formats | Governance and query performance depend on design |
Relational design quick reference
Core relational terms
| Term | Meaning | Exam note |
|---|
| Entity | Thing represented by a table | Example: Customer, Order, Product |
| Attribute | Column describing an entity | Choose data type and constraints carefully |
| Tuple / row | One record | Should represent one instance of the entity |
| Primary key | Unique row identifier | Should be stable, unique, and non-null |
| Foreign key | Reference to another table’s key | Enforces referential integrity |
| Candidate key | Column set that could uniquely identify rows | One is selected as primary key |
| Surrogate key | Artificial key, such as generated ID | Useful when natural keys are unstable |
| Natural key | Real-world identifier | Can change or contain business meaning |
| Cardinality | Relationship count | One-to-one, one-to-many, many-to-many |
| Optionality | Whether relationship is required | Implemented through nullability and constraints |
| Junction table | Resolves many-to-many relationship | Contains foreign keys to both parent tables |
Relationship patterns
| Pattern | Implementation | Example |
|---|
| One-to-one | FK with unique constraint, or shared PK | User and user profile |
| One-to-many | FK on child table | Customer to orders |
| Many-to-many | Junction/bridge table | Students to courses |
| Hierarchical | Self-referencing FK | Employee to manager |
| Recursive graph-like | Edge table with source/target IDs | Network links, dependencies |
Constraint reference
| Constraint | Protects against | Example |
|---|
| NOT NULL | Missing required data | Order date must exist |
| UNIQUE | Duplicate values | Email address must be unique |
| PRIMARY KEY | Missing or duplicate row identity | CustomerID |
| FOREIGN KEY | Orphan records | Order must reference valid customer |
| CHECK | Invalid domain values | Quantity greater than 0 |
| DEFAULT | Missing routine value | CreatedDate defaults to current timestamp |
| EXCLUDE / specialized constraint | Overlapping or conflicting ranges | Room bookings cannot overlap, if supported |
Normalization decision table
| Form | Main rule | Fixes | Practical exam cue |
|---|
| 1NF | Atomic values; no repeating groups | Multi-value columns | Split comma-separated phone numbers into child table |
| 2NF | 1NF plus no partial dependency on composite key | Redundant data in composite-key tables | Non-key attribute depends on only part of key |
| 3NF | 2NF plus no transitive dependency | Non-key data depending on other non-key data | Move ZIP-to-city mapping to reference table |
| BCNF | Every determinant is a candidate key | Edge cases with overlapping candidate keys | Stricter than 3NF |
| Denormalization | Intentionally duplicate or precompute data | Read/report performance | Requires consistency strategy |
High-yield distinction: normalization reduces update anomalies; denormalization can improve read performance but increases maintenance and consistency risk.
Dimensional and analytics modeling
| Concept | Meaning | Use when |
|---|
| Fact table | Numeric events or measurements | Sales amount, clicks, shipments |
| Dimension table | Descriptive context | Date, customer, product, region |
| Star schema | Fact table directly linked to dimensions | Simpler, faster BI queries |
| Snowflake schema | Dimensions further normalized | Lower redundancy, more joins |
| Grain | Level of detail in fact table | “One row per order line” |
| Slowly changing dimension Type 1 | Overwrite old value | Current-state reporting only |
| Slowly changing dimension Type 2 | Add new versioned row | Historical reporting needed |
| Aggregate table | Pre-summarized data | Improve frequent reports |
| Data mart | Subject-specific analytics store | Department or domain reporting |
SQL compact reference
SQL statement classes
| Class | Purpose | Examples |
|---|
| DDL | Define structures | CREATE, ALTER, DROP, TRUNCATE |
| DML | Manipulate data | SELECT, INSERT, UPDATE, DELETE, MERGE |
| DCL | Control permissions | GRANT, REVOKE |
| TCL | Control transactions | COMMIT, ROLLBACK, SAVEPOINT |
| DQL | Query data | SELECT; often treated as part of DML |
Logical SELECT processing order
| Order | Clause | Purpose |
|---|
| 1 | FROM / JOIN | Identify source rows |
| 2 | WHERE | Filter individual rows |
| 3 | GROUP BY | Form groups |
| 4 | HAVING | Filter groups |
| 5 | SELECT | Return expressions |
| 6 | DISTINCT | Remove duplicates |
| 7 | ORDER BY | Sort result |
| 8 | LIMIT / OFFSET / FETCH | Return subset, syntax varies |
Trap: WHERE filters rows before grouping. HAVING filters groups after aggregation.
SELECT c.customer_id, COUNT(*) AS order_count
FROM customers c
JOIN orders o
ON o.customer_id = c.customer_id
WHERE o.order_date >= DATE '2026-01-01'
GROUP BY c.customer_id
HAVING COUNT(*) > 5
ORDER BY order_count DESC;
Join behavior
| Join type | Returns | Common use |
|---|
| INNER JOIN | Matching rows only | Required relationship |
| LEFT OUTER JOIN | All left rows plus matching right rows | Find optional related data |
| RIGHT OUTER JOIN | All right rows plus matching left rows | Less common; can rewrite as left join |
| FULL OUTER JOIN | All rows from both sides | Reconciliation, data comparison |
| CROSS JOIN | Cartesian product | Generate combinations; dangerous if accidental |
| SELF JOIN | Table joined to itself | Hierarchies, comparisons |
-- Find customers with no orders
SELECT c.customer_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;
NULL behavior
| Expression | Result / note |
|---|
column = NULL | Incorrect; comparisons to NULL are unknown |
column IS NULL | Correct NULL test |
column IS NOT NULL | Correct non-NULL test |
COUNT(*) | Counts rows |
COUNT(column) | Counts non-NULL values |
COALESCE(a, b) | Returns first non-NULL value |
| NULL in arithmetic | Usually produces NULL |
Aggregation and window functions
| Technique | Purpose | Example use |
|---|
| GROUP BY | Collapse rows into groups | Sales by region |
| HAVING | Filter aggregate groups | Regions with sales above threshold |
| Window function | Calculate across related rows without collapsing | Rank orders by customer |
| PARTITION BY | Defines window groups | Per customer, per department |
| ORDER BY inside window | Defines calculation order | Running total, row number |
SELECT
customer_id,
order_id,
order_total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS order_rank
FROM orders;
DDL and constraints example
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date DATE NOT NULL,
order_total DECIMAL(12,2) NOT NULL CHECK (order_total >= 0),
status VARCHAR(20) NOT NULL DEFAULT 'NEW',
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Transaction example
BEGIN;
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 10;
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 20;
COMMIT;
-- Use ROLLBACK instead of COMMIT if validation fails.
Transactions, concurrency, and consistency
ACID reference
| Property | Meaning | DBA relevance |
|---|
| Atomicity | All-or-nothing transaction | Prevent partial transfers or half-written changes |
| Consistency | Database moves between valid states | Constraints and business rules remain valid |
| Isolation | Concurrent transactions do not improperly interfere | Controlled by isolation levels and locks |
| Durability | Committed data survives failure | Logs, storage, checkpoints, replication |
Isolation and anomalies
| Isolation concept | Prevents / allows | Exam cue |
|---|
| Read uncommitted | May allow dirty reads | Fast but unsafe for correctness |
| Read committed | Prevents dirty reads | Common baseline behavior |
| Repeatable read | Prevents non-repeatable reads | Same row reread remains stable |
| Serializable | Strongest isolation; behaves like serial execution | More blocking/overhead possible |
| Snapshot / MVCC | Readers see consistent version | Reduces read/write blocking, may create version storage pressure |
| Anomaly | Meaning |
|---|
| Dirty read | Read data from uncommitted transaction |
| Non-repeatable read | Same row read twice returns different committed values |
| Phantom read | Re-running query returns new/deleted matching rows |
| Lost update | One update overwrites another without detection |
| Deadlock | Transactions wait on each other in a cycle |
Locking and contention
| Symptom | Likely cause | First checks | Possible action |
|---|
| Queries hang or time out | Blocking locks | Active sessions, wait events, lock tables | Commit/rollback long transaction; tune query |
| Deadlock errors | Conflicting access order | Deadlock logs/graphs | Access objects in consistent order; shorten transactions |
| High write latency | Index overhead, log pressure, contention | Write waits, index count, log I/O | Reduce unnecessary indexes, batch carefully |
| Readers block writers | Lock-based isolation | Isolation settings, query duration | Use appropriate isolation/MVCC if supported |
| Version store grows | Long snapshot readers | Long-running queries | End stale sessions; tune reporting workload |
Index selection
| Index type / pattern | Use when | Avoid / watch for |
|---|
| B-tree / balanced tree | Equality and range predicates, sorting | Low-selectivity columns may not help |
| Composite index | Queries filter/sort by multiple columns | Column order matters |
| Covering index | Query can be satisfied from index | Extra storage and write overhead |
| Unique index | Enforce uniqueness and speed lookup | Duplicates will fail |
| Filtered / partial index | Only subset is frequently queried | Predicate must match supported syntax |
| Full-text index | Natural language search | Not same as LIKE '%term%' |
| Hash index | Equality lookup, if supported | Usually not for range queries |
| Bitmap index | Low-cardinality analytics, if supported | Often poor for high-concurrency OLTP |
SARGability checklist
A predicate is more index-friendly when the database can search the index directly.
| Less index-friendly | More index-friendly |
|---|
WHERE YEAR(order_date) = 2026 | WHERE order_date >= DATE '2026-01-01' AND order_date < DATE '2027-01-01' |
WHERE LOWER(email) = 'a@x.com' | Store normalized email or use supported function-based index |
WHERE amount + 10 > 100 | WHERE amount > 90 |
WHERE name LIKE '%son' | Use full-text/search index if suffix search is required |
Query tuning sequence
- Confirm the performance symptom and baseline.
- Review execution plan: scan vs seek, join method, sort, spill, estimated vs actual rows if available.
- Check predicates for SARGability.
- Validate relevant indexes and index column order.
- Check statistics freshness and cardinality estimates.
- Inspect joins, missing filters, accidental cross joins, and implicit conversions.
- Check locks, waits, I/O, CPU, memory, and temp space.
- Test changes in non-production and compare measured results.
| Signal | Indicates | Typical response |
|---|
| Full table scan | No useful index, low selectivity, or optimizer choice | Add/tune index only if beneficial |
| High CPU | Complex calculations, poor plan, too many executions | Tune query, cache results, reduce frequency |
| High disk I/O | Large scans, missing indexes, poor caching | Index, partition, archive, tune queries |
| Memory pressure | Sort/hash spills, insufficient cache | Tune query, review memory config/capacity |
| Temp space growth | Large sorts, hash joins, temp tables | Add indexes, reduce intermediate rows |
| Stale statistics | Bad cardinality estimates | Update statistics/maintenance |
| Parameter sensitivity | Plan good for one value, bad for another | Recompile/plan strategy, query rewrite where supported |
Storage, partitioning, and capacity
Storage concepts
| Concept | DBA meaning | Exam note |
|---|
| Data file | Stores table/index data | Placement affects I/O |
| Log / journal | Records changes for durability and recovery | Critical for point-in-time restore |
| Tablespace / filegroup | Logical storage grouping, vendor term varies | Can separate data, indexes, partitions |
| Temp space | Used for sorts, joins, temp objects | Running out can break queries |
| Checkpoint | Flushes dirty pages to durable storage | Interacts with recovery time |
| Compression | Reduces storage and I/O | May increase CPU |
| Tiering | Move data to different storage classes | Balance performance, cost, retention |
Partitioning patterns
| Pattern | Use when | Benefit | Trap |
|---|
| Range partition | Date/time or ordered key | Fast pruning and archival | Bad partition key creates imbalance |
| List partition | Known categories | Region/status separation | Too many categories can be hard to manage |
| Hash partition | Spread data evenly | Reduces hotspot risk | Less intuitive for pruning |
| Composite partition | Multiple strategies | Large complex workloads | More operational complexity |
Partitioning is not automatically a performance fix. It helps most when queries filter on the partition key, maintenance can operate per partition, or data lifecycle actions need efficient archive/drop.
Deployment and architecture patterns
Database deployment decision table
| Pattern | Best for | Strengths | Risks / controls |
|---|
| Single instance | Small/simple workload, dev/test | Simple administration | Single point of failure unless backed by platform features |
| Primary-replica | Read scaling, reporting, HA support | Offload reads, failover option | Replication lag, consistency expectations |
| Active-active / multi-writer | Global or high-write availability | Continuous service potential | Conflict resolution and complexity |
| Clustered database | High availability within environment | Automated failover, shared governance | Quorum/split-brain design matters |
| Sharding | Very large scale-out workload | Horizontal write/read scale | Cross-shard queries and rebalancing complexity |
| Managed database service | Reduce operational burden | Automated maintenance options, platform integration | Shared responsibility still applies |
| Containerized database | Dev/test portability, some specialized deployments | Repeatable deployment | Persistent storage, backup, and performance need care |
| Serverless database | Variable workload, simplified scaling | Operational simplicity | Cold starts/latency/cost behavior may vary by platform |
Replication choices
| Replication type | Description | Choose when |
|---|
| Synchronous | Commit waits for replica acknowledgment | Very low data-loss tolerance |
| Asynchronous | Primary commits before replica catches up | Distance/performance more important than zero-lag |
| Physical/block-level | Replicates storage/log changes | Disaster recovery or exact copy |
| Logical | Replicates rows/statements/changes | Selective replication, migration, integration |
| Snapshot | Periodic copy | Reporting or initial synchronization |
| CDC | Captures data changes over time | Near-real-time pipelines and migrations |
CAP and distributed data systems
| Concept | Meaning |
|---|
| Consistency | Reads return latest correct data according to model |
| Availability | Requests receive non-error responses |
| Partition tolerance | System continues despite network partitions |
| Eventual consistency | Replicas converge over time |
| Strong consistency | Reads reflect committed writes according to defined rules |
Exam distinction: CAP applies under network partition conditions. It does not mean every system permanently chooses only two characteristics in all situations.
Security quick reference
Security control matrix
| Control | Protects | Examples / notes |
|---|
| Authentication | Who are you? | Passwords, MFA, certificates, federation |
| Authorization | What can you do? | Roles, grants, row-level access |
| Accounting / auditing | What happened? | Login logs, query audit, DDL audit |
| Encryption in transit | Network confidentiality | TLS/secure client connections |
| Encryption at rest | Stored data protection | Database/file/storage encryption |
| Key management | Control encryption keys | Rotation, separation of duties, access policy |
| Masking | Hide sensitive values in displays | Useful for non-prod or limited users |
| Tokenization | Replace sensitive data with token | Reduces exposure of original value |
| Hashing | One-way transformation | Password verification, integrity checks |
| Salting | Random value added before hashing | Defends against precomputed hash attacks |
| Data classification | Label sensitivity | Drives access, retention, and monitoring |
| Secrets management | Protect credentials | Avoid hardcoded passwords |
| Network segmentation | Limit reachable paths | Private subnets, firewalls, allowlists |
| Patching | Remove known vulnerabilities | Test, schedule, rollback plan |
Least privilege SQL example
-- Example pattern; exact syntax varies by platform.
CREATE ROLE reporting_reader;
GRANT SELECT ON sales_summary TO reporting_reader;
GRANT reporting_reader TO analyst_user;
REVOKE INSERT, UPDATE, DELETE ON sales_summary FROM reporting_reader;
Role design
| Pattern | Good practice | Trap |
|---|
| User-specific grants | Use sparingly | Hard to audit and revoke |
| Role-based access control | Grant privileges to roles, users to roles | Overly broad shared roles |
| Separation of duties | Split DBA, security admin, developer duties | One account can alter, approve, and audit itself |
| Break-glass access | Emergency elevated access | Must be logged, time-limited, reviewed |
| Service accounts | Application/database connectivity | Use rotation and scoped permissions |
Sensitive data handling
| Technique | Reversible? | Primary use |
|---|
| Encryption | Yes, with key | Protect stored/transmitted data |
| Hashing | No | Verify password or integrity |
| Masking | Usually display-level | Reduce exposure to users |
| Tokenization | Yes, through token vault/mapping | Replace sensitive values in workflows |
| Redaction | Usually no in output | Remove sensitive text from logs/documents |
| Anonymization | Intended no | Analytics without identifying individuals |
| Pseudonymization | Possible with mapping | Reduce direct identifiability |
Audit and logging focus
| Event type | Why it matters |
|---|
| Failed logins | Brute force or credential misuse |
| Privilege changes | Escalation or misconfiguration |
| DDL changes | Schema drift, unauthorized alteration |
| Access to sensitive tables | Data exposure investigation |
| Bulk export | Potential exfiltration |
| Backup/restore events | Data movement and recovery assurance |
| Configuration changes | Security or availability impact |
Trap: audit logs must be protected from tampering. Logging sensitive values can create a second data exposure location.
Backup, restore, and disaster recovery
RPO, RTO, and availability
| Term | Meaning | Practical implication |
|---|
| RPO | Maximum acceptable data loss | Determines backup/log replication frequency |
| RTO | Maximum acceptable recovery time | Determines restore automation and HA design |
| MTTD | Mean time to detect | Monitoring and alerting quality |
| MTTR | Mean time to repair/recover | Runbooks, automation, staff readiness |
| HA | Reduce downtime during component failure | Clustering, failover, redundancy |
| DR | Recover from site/region/system disaster | Backups, replication, alternate environment |
\[
\text{Availability} = \frac{\text{Total time} - \text{Downtime}}{\text{Total time}} \times 100
\]
Backup type comparison
| Backup type | Captures | Strengths | Watch for |
|---|
| Full | Entire database/data set | Simplest restore base | Largest time/storage |
| Differential | Changes since last full | Faster restore than many incrementals | Grows until next full |
| Incremental | Changes since last backup | Efficient storage | Restore may require chain |
| Transaction log / redo log | Ordered changes | Point-in-time recovery | Log chain must be intact |
| Snapshot | Point-in-time storage image | Fast creation | Must understand consistency and dependency on storage |
| Logical export | Schema/data as statements/files | Migration, selective restore | May be slower; consistency must be managed |
| Physical backup | Data files/logs | Fast full restore | Platform/version compatibility matters |
Restore sequence reference
| Scenario | Typical restore approach |
|---|
| Full backup only | Restore full backup |
| Full + differential | Restore full, then latest differential |
| Full + incrementals | Restore full, then each incremental in order |
| Full + logs | Restore full, then logs to target point |
| Full + differential + logs | Restore full, latest differential, then logs |
| Corrupt object only | Consider object-level restore/export if supported |
| Accidental DELETE | Point-in-time restore to separate environment, extract/replay valid data |
Backup validation checklist
- Confirm backups complete successfully.
- Test restore in a non-production environment.
- Verify application can connect after restore.
- Validate row counts, checksums, or reconciliation totals.
- Document restore steps and owners.
- Protect backups with encryption and access controls.
- Store backups separately from the primary failure domain.
- Monitor backup age, duration, failure, and capacity.
- Review retention and deletion settings against business requirements.
HA/DR decision path
flowchart TD
A[Requirement: protect database service] --> B{Main concern?}
B -->|Short outage from server failure| C[High availability: cluster or failover replica]
B -->|Data loss tolerance is very low| D[Synchronous replication or frequent log shipping]
B -->|Regional/site disaster| E[Disaster recovery environment plus offsite backups]
B -->|Accidental data change| F[Point-in-time restore and audit trail]
C --> G[Test failover runbook]
D --> G
E --> G
F --> H[Test restore and data reconciliation]
Data lifecycle, integration, and governance
ETL, ELT, CDC, and streaming
| Pattern | Description | Choose when |
|---|
| ETL | Transform before loading target | Target requires curated data before storage |
| ELT | Load raw first, transform in target | Scalable analytics platform handles transforms |
| CDC | Capture inserts/updates/deletes from source | Low-latency sync, migrations, audit feeds |
| Batch | Periodic bulk movement | Large scheduled processing |
| Streaming | Continuous event processing | Real-time alerts, telemetry, near-real-time analytics |
| API integration | Application-level data exchange | Controlled business operations and validation |
| Message queue | Decouple producers and consumers | Resilience and asynchronous processing |
Data quality checks
| Check | Detects |
|---|
| Completeness | Missing required values |
| Validity | Values outside allowed domain |
| Uniqueness | Duplicate keys/entities |
| Consistency | Conflicting values across systems |
| Accuracy | Values not matching real-world source |
| Timeliness | Stale or late-arriving data |
| Integrity | Broken relationships or corrupted values |
| Conformity | Wrong format, unit, code set, or standard |
-- Basic reconciliation checks after migration/load
SELECT COUNT(*) AS source_count FROM source_orders;
SELECT COUNT(*) AS target_count FROM target_orders;
SELECT SUM(order_total) AS source_total FROM source_orders;
SELECT SUM(order_total) AS target_total FROM target_orders;
| Area | DBA relevance |
|---|
| Data catalog | Find datasets, owners, schemas, descriptions |
| Lineage | Understand where data came from and downstream impact |
| Classification | Identify sensitive, regulated, or business-critical data |
| Retention | Keep or dispose data according to policy |
| Data ownership | Assign accountability for definitions and quality |
| Stewardship | Day-to-day data quality and definition management |
| Change impact | Know which reports, apps, or pipelines depend on a table |
| Master data management | Create trusted shared entities across systems |
Monitoring and troubleshooting
Baseline metrics
| Category | Metrics / signals |
|---|
| Availability | Uptime, connection success, failover events |
| Workload | Transactions per second, query rate, batch duration |
| Latency | Query response time, commit latency, replication lag |
| Resource | CPU, memory, disk I/O, network throughput |
| Storage | Free space, growth rate, temp usage, log usage |
| Locks/waits | Blocking sessions, deadlocks, wait classes |
| Cache | Buffer/cache hit ratio, plan cache behavior |
| Backup | Success/failure, duration, backup age |
| Security | Failed logins, privilege changes, unusual access |
| Data pipeline | Job failures, late data, rejected records |
Symptom-to-action table
| Symptom | Likely causes | First action |
|---|
| Database unavailable | Service down, network issue, storage failure, auth issue | Check service health, connectivity, logs |
| Sudden slow queries | Bad plan, stale stats, blocking, workload spike | Compare to baseline; review plan and waits |
| Disk full | Data growth, logs not truncating, temp spill, failed cleanup | Identify consuming files; protect data before cleanup |
| Replication lag | Network latency, slow replica, large transaction | Check replica health and apply queue |
| Backup failure | Permission, capacity, I/O, schedule conflict | Review job logs and destination capacity |
| Login failures | Credential changes, lockout, expired secret, attack | Validate auth path and security logs |
| Data mismatch | ETL bug, partial load, constraint disabled, duplicate source | Reconcile counts/totals and review load logs |
| Deadlocks | Conflicting transaction order, missing indexes | Review deadlock details and access pattern |
| Corruption alert | Storage fault, software issue, abrupt failure | Stop risky writes if needed; follow vendor recovery guidance |
Incident response for database issues
- Detect and classify severity.
- Preserve logs, alerts, and current state.
- Stabilize service; avoid destructive “quick fixes.”
- Identify blast radius: users, applications, tables, replicas, backups.
- Communicate status through the agreed channel.
- Apply tested recovery or remediation steps.
- Validate service and data correctness.
- Document root cause, timeline, and preventive actions.
Maintenance, change, and release management
Routine DBA operations
| Operation | Purpose | Key caution |
|---|
| Patch database engine | Security and stability | Test compatibility and rollback |
| Update statistics | Better optimizer estimates | Schedule to avoid peak impact |
| Rebuild/reorganize indexes | Address fragmentation where relevant | Do not perform blindly; measure benefit |
| Archive/purge data | Control growth and performance | Validate retention and dependencies |
| Rotate credentials/keys | Reduce credential exposure | Coordinate application updates |
| Review permissions | Enforce least privilege | Remove stale users and excessive grants |
| Test restores | Confirm recoverability | Test regularly, not only after incidents |
| Capacity review | Avoid outages and performance issues | Trend storage, CPU, memory, I/O |
Schema change checklist
- Use version-controlled migration scripts.
- Test in development and staging with realistic data volume.
- Identify dependent applications, views, reports, jobs, and permissions.
- Back up or snapshot before high-risk changes.
- Plan locking and downtime impact.
- Use backward-compatible changes where possible.
- Include rollback or roll-forward strategy.
- Validate data after migration.
- Document change, owner, approval, and implementation time.
Safe migration sequence
| Phase | Activities |
|---|
| Assess | Inventory objects, dependencies, data volume, compatibility |
| Plan | Choose migration method, window, rollback, validation |
| Prepare | Provision target, security, connectivity, schema |
| Load | Initial copy or replication setup |
| Validate | Counts, checksums, sample queries, application tests |
| Cut over | Freeze or sync final changes, switch traffic |
| Monitor | Watch errors, latency, locks, data drift |
| Decommission | Retire old system only after acceptance and backup |
Cloud and managed database responsibilities
| Responsibility area | Customer usually still manages | Provider/platform may manage |
|---|
| Data model | Schema, indexes, constraints | Not usually automatic |
| Access | Users, roles, secrets, app permissions | IAM integration features |
| Data protection | Classification, encryption choices, backup policy validation | Infrastructure encryption options, backup tooling |
| Patching | Application compatibility testing, scheduling decisions | Engine or host patch automation depending on service |
| Availability | Architecture choice, failover testing | Platform redundancy mechanisms |
| Monitoring | Alerts, query performance, business KPIs | Built-in metrics/log delivery |
| Compliance support | Policies, evidence, data handling | Platform controls and reports |
| Cost/capacity | Workload sizing, scaling choices | Metering and scaling mechanisms |
Trap: managed database does not mean unmanaged data. The organization still owns data quality, access design, query behavior, and recovery requirements.
Common exam distinctions
| Distinction | Remember |
|---|
| Backup vs replication | Backup protects against deletion/corruption history; replication can copy bad changes quickly |
| HA vs DR | HA handles local/component failure; DR handles broader disaster recovery |
| Authentication vs authorization | Authentication proves identity; authorization grants actions |
| Encryption vs hashing | Encryption is reversible with key; hashing is one-way |
| Masking vs encryption | Masking changes what users see; encryption protects stored/transmitted data |
| OLTP vs OLAP | OLTP optimizes transactions; OLAP optimizes analysis |
| Primary key vs unique key | Both enforce uniqueness; primary key is main row identifier and non-null |
| Foreign key vs join | FK enforces relationship; join retrieves related data |
| WHERE vs HAVING | WHERE filters rows; HAVING filters groups |
| DELETE vs TRUNCATE | DELETE is row operation and can filter; TRUNCATE removes all rows more directly, behavior varies by platform |
| Logical vs physical backup | Logical exports objects/data; physical backs up files/pages/logs |
| Incremental vs differential | Incremental since last backup; differential since last full |
| Scale up vs scale out | Scale up adds resources to one node; scale out adds nodes/partitions |
| Normalization vs denormalization | Normalize for integrity; denormalize intentionally for read performance |
| Data lake vs data warehouse | Lake stores flexible raw/curated files; warehouse stores structured optimized analytics |
| CDC vs full reload | CDC captures changes; full reload replaces or reloads entire set |
Compact command/query patterns to recognize
-- Add an index for a frequent lookup pattern
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);
-- Enforce valid status values
ALTER TABLE orders
ADD CONSTRAINT chk_orders_status
CHECK (status IN ('NEW', 'PAID', 'SHIPPED', 'CANCELLED'));
-- Identify duplicate business keys
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
-- Find orphaned child rows
SELECT o.order_id
FROM orders o
LEFT JOIN customers c
ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL;
Final DS0-002 review checklist
- Can you select relational, NoSQL, warehouse, lake, or streaming architecture based on workload?
- Can you explain primary keys, foreign keys, constraints, joins, NULLs, and transaction isolation?
- Can you read SQL and predict result-set behavior?
- Can you choose an index strategy and a query-tuning sequence?
- Can you distinguish authentication, authorization, auditing, encryption, hashing, masking, and tokenization?
- Can you map RPO/RTO requirements to backup, restore, replication, HA, and DR designs?
- Can you troubleshoot slow queries, blocking, failed backups, replication lag, and storage growth?
- Can you describe safe schema changes, migrations, validation, and rollback planning?
- Can you connect data governance, classification, lineage, and quality checks to DBA operations?
Practical next step
Use this Quick Reference as a checklist, then move into timed DS0-002 practice questions and hands-on SQL/database administration scenarios. Focus review on any item where you cannot explain both the correct choice and why the tempting alternatives are wrong.