DAMA CDMP Data Quality Specialist Quick Reference

Last revised: June 25, 2026

Compact independent reference for DAMA International CDMP Quality exam prep: data quality dimensions, rules, profiling, remediation, governance, and metrics.

Exam Identity and Study Focus

Item	Reference
Vendor/provider	DAMA International
Official exam title	DAMA CDMP Data Quality Specialist
Official exam code	CDMP Quality
Page purpose	Independent Quick Reference for candidates reviewing data quality concepts, processes, roles, controls, and practical decision points

Data quality management is not just defect cleanup. For exam purposes, treat it as a governed management discipline that defines quality expectations, measures conformance, analyzes causes, remediates issues, and prevents recurrence.

High-yield framing:

Data quality = fitness for use by a defined business purpose, not abstract perfection.
Quality rules must trace to business rules, critical data, regulatory/reporting needs, operational risks, or customer outcomes.
Prevention is usually better than detection, but mature programs use both.
Root-cause remediation is stronger than downstream cleansing when the source or process can be changed.
Quality is contextual: the same data can be acceptable for trend analysis but unacceptable for billing, identity proofing, or regulatory reporting.

Core Data Quality Management Lifecycle

Stage	What Happens	Exam-Relevant Outputs	Common Trap
Define expectations	Identify business needs, data consumers, critical data elements, quality dimensions, tolerances	Data quality requirements, business rules, acceptance criteria	Starting with tool scans before defining what “good” means
Profile and assess	Examine actual data values, patterns, relationships, duplicates, anomalies	Baseline quality report, defect categories, issue inventory	Treating profiling results as business rules without validation
Define rules and metrics	Convert requirements into measurable checks and thresholds	Data quality rules, scorecards, KPIs/KRIs, exception criteria	Measuring what is easy instead of what matters
Analyze root causes	Determine why defects occur	Root-cause findings, impact analysis, remediation options	Fixing symptoms in reports while source processes remain broken
Remediate	Correct data, process, application, integration, or governance gaps	Cleansed records, process changes, transformation fixes, steward actions	Assuming all remediation means overwriting data
Monitor and control	Continuously measure and escalate exceptions	Dashboards, alerts, SLA/OLA measures, issue workflow	One-time cleanup with no ongoing control
Improve	Refine standards, rules, ownership, training, and architecture	Prevention controls, updated policies, lessons learned	No feedback loop into governance or systems development

Data Quality Dimensions

Use dimensions as lenses for requirements and measurement. A good exam answer usually ties the dimension to a business outcome, testable rule, and acceptable threshold.

Dimension	Meaning	Example Check	Watch For
Accuracy	Data correctly represents the real-world object or event	Customer date of birth matches authoritative source	Accuracy often requires comparison to a trusted source, not just internal format validation
Completeness	Required data is present to the needed level	Mandatory tax identifier is populated for reportable customers	“Complete enough” depends on purpose; optional fields are not automatically defects
Validity	Data conforms to allowed format, type, range, or domain	Order status is one of approved status codes	Valid data can still be inaccurate
Consistency	Data values agree across systems, records, or business rules	Customer status in CRM matches billing eligibility	Consistency does not prove correctness if all systems copied the same wrong value
Timeliness	Data is available within the required time window	Inventory position refreshed before order promising	Timeliness includes latency, currency, and availability at point of use
Currency	Data reflects the most recent accepted state	Address updated after verified change of residence	Current data is not always the same as historically correct data
Uniqueness	Real-world entity or event is represented once where required	No duplicate active customer master records	Duplicates may be legitimate in transaction data but not in master data
Integrity	Relationships and dependencies are preserved	Invoice has a valid customer ID and valid order reference	Includes referential integrity and cross-field logic
Conformity	Data follows required standards and representations	Phone numbers stored in standard international format	Standardization supports matching, integration, and reporting
Precision	Level of detail is appropriate	Coordinates captured to required decimal precision	Excess precision can imply false confidence; insufficient precision may break use cases
Reasonableness	Value is plausible in business context	Employee age is within realistic employment range	Reasonableness checks detect anomalies but may require human review
Accessibility	Data can be obtained by authorized users/processes when needed	Analysts can access approved data product	Do not confuse accessibility with lack of security controls

Business Rules vs Data Quality Rules

Concept	Definition	Example	Exam Distinction
Business rule	Policy or constraint about how the business operates	A policy must have one active policyholder	Expressed in business language; may exist without implementation
Data rule	Implemented rule about acceptable data representation	`policyholder_id` must not be null for active policies	Converts business expectation into measurable data condition
Data quality rule	Test used to assess data against a dimension and threshold	Active policies with null policyholder ID must be below approved tolerance	Includes metric, scope, owner, severity, and action
Validation rule	Control that prevents or flags bad input	UI rejects invalid product code	Usually preventive and embedded in system or workflow
Transformation rule	Logic used to derive or move data	Map legacy customer type `R` to retail	Can create quality issues if undocumented or inconsistent
Reconciliation rule	Check that data agrees across processes or systems	Sum of source transactions equals ledger load total	Often used in ETL, finance, and regulatory reporting

Anatomy of a Strong Data Quality Rule

Component	What to Specify	Example
Business purpose	Why the rule matters	Required for regulatory customer identification
Data scope	Systems, tables, entities, records, period	Active customers in onboarding platform
Data element or relationship	Field, composite field, reference, hierarchy	`customer_id`, `country_code`, parent account
Dimension	Quality aspect being tested	Completeness, validity, uniqueness
Rule logic	Exact condition to evaluate	`country_code` must exist in approved reference list
Threshold/tolerance	Acceptable level or boundary	Zero tolerance for blocked onboarding; limited tolerance for legacy archive
Severity	Business risk level	Critical, high, medium, low
Owner/steward	Accountable party	Customer data owner, data steward, system owner
Exception handling	Review, correction, waiver, escalation	Send exceptions to steward queue within agreed workflow
Measurement frequency	Batch, real time, daily, monthly, event-driven	Daily load check; real-time transaction validation
Evidence	Report, log, control result, audit trail	Scorecard and issue record

Profiling and Assessment Techniques

Technique	Purpose	Typical Findings	Best Used When
Column profiling	Examine nulls, min/max, patterns, lengths, data types	Unexpected null rates, invalid lengths, outliers	First-pass understanding of unfamiliar data
Domain/value frequency	Count distinct values and distributions	Invalid codes, rare values, skewed values	Validity and reference data checks
Pattern analysis	Identify structural patterns	Mixed date formats, inconsistent identifiers	Standardization and parsing work
Cross-field analysis	Compare related fields in same record	End date before start date	Integrity and reasonableness checks
Cross-system comparison	Compare values between systems	CRM and billing customer address mismatch	Consistency assessment
Referential integrity check	Verify valid parent/child relationships	Orphan invoice without valid customer	Relational and integration quality
Duplicate detection	Find likely duplicate entities or events	Same person under multiple customer IDs	Master data and identity resolution
Time-series monitoring	Track metrics over time	Sudden spike in missing values after release	Operational monitoring and regression detection
Reconciliation	Compare totals/counts across processing steps	Source count differs from warehouse load count	ETL, financial, regulatory, and audit-sensitive flows
Sampling and review	Human review of selected records	False positives, ambiguous cases	Accuracy checks where no fully automated source exists

Key Metrics and Formulas

Use metrics to make quality visible, comparable, and actionable. Avoid presenting a single score without showing what it measures.

\[ \text{Completeness \%} = \frac{\text{Required values populated}}{\text{Required values expected}} \times 100 \]\[ \text{Defect rate} = \frac{\text{Records failing rule}}{\text{Records evaluated}} \times 100 \]\[ \text{Validity \%} = \frac{\text{Values conforming to rule}}{\text{Values tested}} \times 100 \]\[ \text{Weighted data quality score} = \sum_{i=1}^{n}(\text{Dimension score}_i \times \text{Weight}_i) \]

Metric	What It Shows	Good Use	Caution
Rule pass rate	Share of records passing a specific check	Operational control monitoring	High pass rate can hide severe defects in critical records
Defect count	Number of failing records	Work queue sizing	Counts alone ignore population size
Defect rate	Defects relative to tested population	Comparing systems or periods	Requires stable denominator and rule definition
Completeness rate	Presence of required values	Mandatory attribute checks	Null is not the only form of missing data
Duplicate rate	Likely duplicate records per population	Master data improvement	Match logic affects results significantly
Timeliness lag	Delay between event and data availability	Data pipeline and reporting SLAs	Some latency may be acceptable by use case
Reconciliation variance	Difference between source and target totals	ETL and financial controls	Must account for legitimate filters and transformations
Issue aging	Time unresolved quality issues remain open	Stewardship and remediation performance	Aging without severity can mislead
Recurrence rate	Reappearance of previously fixed issue	Root-cause effectiveness	Requires issue classification discipline

Critical Data Elements and Prioritization

Not all data deserves the same level of control. Prioritize quality work by business impact.

Priority Factor	Questions to Ask	Higher Priority When
Business criticality	Does the data drive revenue, operations, customer service, reporting, risk, or compliance?	It affects key decisions, obligations, or customer outcomes
Usage frequency	How often and by whom is it used?	Many processes or high-value consumers depend on it
Risk exposure	What happens if it is wrong, late, missing, or duplicated?	Financial loss, regulatory exposure, safety risk, fraud, reputational impact
Propagation	How many downstream systems consume it?	Defects spread broadly through integration and analytics
Correction cost	How hard is it to fix after capture?	Late correction is expensive or impossible
Authoritativeness	Is there a trusted source of truth?	Multiple conflicting sources exist
Change volatility	How often does it change?	High volatility requires stronger monitoring
Data lifecycle stage	Is it created, transformed, archived, or reported?	Quality needs differ across lifecycle stages

Remediation Decision Table

Situation	Prefer This Response	Why
Bad data originates at manual entry	Add input validation, training, workflow controls, or required fields	Prevents recurrence at capture
Source system allows invalid combinations	Update application rules or reference controls	Stronger than downstream correction
Integration mapping is wrong	Fix transformation logic and reload if appropriate	Corrects systemic propagation
Data is valid but inconsistent across systems	Define authoritative source, synchronization rules, and stewardship workflow	Resolves ownership and lineage conflict
Duplicate master records exist	Standardize, match, merge/link, apply survivorship, prevent future duplicates	Treats entity resolution as process and governance issue
Legacy data has known defects but low operational value	Document limitations, isolate, apply risk-based cleanup	Avoids wasteful perfectionism
Data must be corrected but source cannot change immediately	Apply controlled remediation with audit trail and exception process	Balances business need with traceability
Defect is caused by unclear business definition	Clarify glossary, policy, ownership, and rule semantics	Prevents teams from measuring different things
External data is poor	Validate provider quality, contract expectations, monitoring, alternative sources	Quality responsibility must be managed even if data is acquired
False positives overwhelm stewards	Tune rules, thresholds, matching weights, and severity logic	Improves trust and operational usability

Prevention, Detection, and Correction Controls

Control Type	Examples	Strength	Limitation
Preventive	Required fields, domain validation, referential constraints, workflow approvals, controlled reference data	Stops defects before creation	Can slow processes or reject unusual valid cases
Detective	Profiling, monitoring dashboards, reconciliation, anomaly detection, audit reports	Finds defects after creation	Requires remediation workflow
Corrective	Cleansing, standardization, deduplication, enrichment, manual correction	Improves existing data	Can mask source problems if used alone
Compensating	Downstream reasonableness checks, exception reporting, disclosure of limitations	Reduces risk when primary control is unavailable	Should not become permanent substitute for root-cause fix
Governance control	Ownership, standards, issue escalation, policy, stewardship	Creates accountability	Ineffective without measurement and enforcement
Technical control	Constraints, validation services, metadata-driven checks, pipeline tests	Automates repeatability	Needs business-approved rules

Root-Cause Analysis Reference

Root-Cause Category	Symptoms	Example Corrective Action
Process design	Missing steps, unclear handoffs, rekeying	Redesign workflow, remove duplicate capture, assign approval point
People/training	Inconsistent entry, misunderstanding definitions	Training, job aids, clearer business glossary
Application design	No validation, poorly designed screens, optional critical fields	UI/API validation, required fields, controlled values
Integration/transformation	Mapping errors, truncation, code conversion defects	Correct mappings, lineage review, pipeline tests
Metadata/definition	Teams use different meanings for same field	Business glossary, semantic standards, data contracts
Reference data	Outdated or inconsistent code sets	Reference data governance, controlled updates
Master data	Duplicate entities, conflicting golden records	MDM process, matching rules, survivorship policy
Policy/governance	No owner, no escalation, unclear accountability	Data ownership model, stewardship process
External provider	Late, incomplete, or inconsistent third-party feeds	Provider quality monitoring, acceptance criteria
Architecture	Multiple uncontrolled copies, batch latency, no lineage	Authoritative sources, integration standards, metadata management

High-yield distinction: root cause is why the defect is produced; impact is what the defect causes; symptom is what the measurement detected.

Matching, Deduplication, and Survivorship

Term	Meaning	Exam Tip
Parsing	Breaking a value into components	Needed before standardizing names, addresses, identifiers
Standardization	Converting values to common formats	Improves matching and conformity
Normalization	Reducing representational variation	Example: casing, punctuation, abbreviations
Exact match	Records match only when values are identical	High precision, low tolerance for variation
Deterministic match	Rule-based matching using defined conditions	Transparent and explainable
Probabilistic match	Uses likelihood/weights across attributes	Handles variation but requires tuning and review
Fuzzy match	Finds similar but not identical values	Useful for names/addresses; can create false positives
Blocking	Reduces match comparisons by grouping candidates	Improves performance but can miss cross-block matches
Survivorship	Rules for choosing retained values after merge	Must align with trust, recency, source priority, or business policy
Golden record	Consolidated trusted representation of an entity	Requires governance, not only tooling
Link vs merge	Link keeps records separate but associated; merge consolidates	Use merge carefully when identity confidence is high

Data Cleansing and Enrichment

Technique	Purpose	Good Candidate	Risk
Standardization	Make formats consistent	Addresses, phone numbers, product codes	May alter meaning if standards are wrong
Correction	Replace wrong values with known correct values	Verified spelling, code correction	Requires trusted basis and auditability
Imputation	Fill missing values using inference	Analytical datasets with known assumptions	Can introduce bias; should be flagged
Enrichment	Add data from internal/external source	Geocoding, industry codes, demographics	External source quality and rights must be managed
Deduplication	Remove or consolidate redundant records	Customer, supplier, product masters	Incorrect merges are costly
Reference validation	Compare to controlled list	Country, currency, product category	Reference list must be governed
Exception handling	Route unresolved defects for review	Ambiguous duplicates, unusual transactions	Backlogs reduce effectiveness

Exam trap: cleansing is not automatically improvement if it changes data without lineage, approval, audit trail, or business justification.

Metadata, Lineage, and Data Quality

Metadata Type	Data Quality Use
Business metadata	Defines meaning, ownership, criticality, approved business terms
Technical metadata	Identifies schemas, fields, data types, constraints, transformations
Operational metadata	Captures job runs, load times, failures, volumes, latency
Lineage metadata	Shows where data came from, how it changed, and where it goes
Quality metadata	Stores rules, scores, defects, thresholds, exceptions, issue status
Reference metadata	Describes allowed code sets and valid value domains

Why it matters:

Lineage supports impact analysis and root-cause tracing.
Business definitions reduce inconsistent interpretation.
Technical metadata helps automate profiling and controls.
Operational metadata helps detect pipeline and timeliness issues.
Quality metadata provides evidence of monitoring and improvement.

Data Quality and Governance Roles

Role	Primary Responsibilities	Not the Same As
Data owner	Accountable for data within a business domain; approves rules, priorities, and risk decisions	Usually not the person doing every correction
Data steward	Manages definitions, rules, issues, quality monitoring, and coordination	Not merely an IT support role
Data custodian	Operates technical environment, storage, access, backups, platforms	Does not define business meaning alone
Data consumer	Uses data and identifies fitness-for-use needs	Not passive; should report quality issues
Data producer	Creates or captures data	Must understand downstream quality impacts
Data quality analyst	Profiles data, defines measurements, analyzes defects, supports remediation	Does not own all business decisions
Data governance council	Resolves cross-domain standards, priorities, and escalations	Should not become a bottleneck for every minor issue
System owner/product owner	Ensures application/process changes support data quality requirements	Needs alignment with data ownership
Data architect	Designs structures, integration, lineage, and standards support	Architecture alone cannot create quality without process controls

Data Quality Issue Management

Step	Key Questions	Output
Log issue	What rule failed? Where? How many records? Who detected it?	Issue record with evidence
Classify	Which domain, dimension, severity, source, and impact?	Prioritized category
Assign owner	Who can decide and who can fix?	Accountable owner and responsible resolver
Analyze	What is root cause? Is it isolated or systemic?	Root-cause assessment
Decide treatment	Correct, accept, defer, monitor, redesign, or escalate?	Remediation plan
Implement	What data/process/system change is required?	Controlled fix
Validate	Did the fix resolve the defect without side effects?	Test and quality result
Close or monitor	Has recurrence risk been addressed?	Closure evidence and monitoring rule

Common severity criteria:

Critical reporting or legal exposure
Financial statement or billing impact
Customer harm or operational stoppage
Security, privacy, or access-control implications
Number and importance of affected records
Time sensitivity and downstream propagation

Data Quality in Data Warehousing, BI, and Analytics

Area	Quality Concern	Practical Control
Source extraction	Missing records, late files, changed schemas	Source counts, schema checks, arrival monitoring
Staging	Type conversion, truncation, invalid encodings	Profiling and reject/error tables
Transformation	Incorrect mapping, business logic drift	Mapping review, test cases, lineage documentation
Loading	Duplicate loads, partial loads, referential failures	Reconciliation and restart controls
Reporting	Misleading metrics, inconsistent definitions	Certified metrics and semantic layer governance
Analytics/AI	Biased, stale, incomplete, mislabeled training data	Data suitability checks, drift monitoring, documentation
Historical data	Slowly changing meaning, late arriving facts	Effective dating, versioned reference data
Self-service BI	Uncontrolled copies and inconsistent calculations	Governed data products, catalogs, quality indicators

High-yield distinction: analytics data can be technically valid but analytically unsuitable because of bias, missing populations, stale features, or unclear definitions.

Data Quality in Master and Reference Data

Data Type	Quality Focus	Typical Controls
Master data	Core entities such as customer, product, supplier, employee	Identity resolution, uniqueness, survivorship, stewardship
Reference data	Controlled values such as codes, statuses, country lists	Change governance, valid value lists, versioning, synchronization
Transaction data	Business events such as orders, payments, claims	Completeness, timeliness, reconciliation, auditability
Metadata	Definitions and descriptions of data	Glossary governance, lineage, ownership
Analytical data	Aggregated, derived, modeled, or feature-engineered data	Definition consistency, reproducibility, lineage, suitability

Exam trap: master data quality often requires organizational agreement on identity and ownership, not only duplicate detection.

Practical SQL Patterns for Data Quality Checks

Use SQL-like checks to understand measurement logic. Syntax varies by platform.

Null or Missing Required Values

SELECT
  COUNT(*) AS total_rows,
  SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) AS missing_customer_id
FROM orders;

Invalid Domain Values

SELECT order_status, COUNT(*) AS row_count
FROM orders
WHERE order_status NOT IN ('NEW', 'APPROVED', 'SHIPPED', 'CANCELLED')
GROUP BY order_status;

Duplicate Candidate Keys

SELECT email_address, COUNT(*) AS record_count
FROM customer
WHERE email_address IS NOT NULL
GROUP BY email_address
HAVING COUNT(*) > 1;

Referential Integrity Exceptions

SELECT o.order_id, o.customer_id
FROM orders o
LEFT JOIN customer c
  ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

Cross-Field Logic

SELECT contract_id, start_date, end_date
FROM contract
WHERE end_date < start_date;

Source-to-Target Reconciliation

SELECT 'source' AS system_name, COUNT(*) AS row_count FROM source_orders
UNION ALL
SELECT 'target' AS system_name, COUNT(*) AS row_count FROM warehouse_orders;

Decision Matrix: Where Should a Quality Rule Run?

Rule Location	Use When	Advantages	Risks
User interface	Human entry can be corrected immediately	Prevents defects early	May not cover APIs or batch loads
API/service layer	Multiple channels create/update data	Centralized validation	Requires service adoption
Database constraint	Rule is stable and structural	Strong enforcement	Less flexible for contextual rules
ETL/ELT pipeline	Data moves between systems	Detects integration and transformation defects	Can become downstream patching
Data quality platform	Cross-system monitoring and scorecards needed	Reusable profiling, dashboards, stewardship workflow	Tool outputs still require governance
Reporting/semantic layer	Rule is presentation-specific	Protects metric interpretation	Too late for operational correction
Steward workflow	Judgment or business approval is needed	Handles ambiguous cases	Manual backlog risk

Data Quality Scorecards and Dashboards

Element	Include	Avoid
Business context	Domain, data product, consumer, purpose	Generic technical score with no owner
Dimensions	Completeness, validity, timeliness, etc. selected by use case	Assuming every dimension has equal value
Rule-level results	Pass/fail counts, defect rate, trend	Only aggregate score with no drill-down
Thresholds	Target, tolerance, breach level	Hidden or arbitrary thresholds
Severity	Business impact classification	Treating all defects equally
Trends	Change over time, release-related spikes	One-time snapshots only
Issue workflow	Open defects, aging, owner, status	Dashboard with no action path
Lineage	Source and downstream impact	No way to trace affected reports/processes
Notes/limitations	Known exclusions, sampling assumptions	False precision

High-Yield Distinctions

Distinction	Know This
Accuracy vs validity	Valid means conforms to rules; accurate means correctly represents reality
Completeness vs optionality	Missing required data is a defect; missing optional data may be acceptable
Timeliness vs currency	Timeliness is availability within needed time; currency is whether value reflects current state
Consistency vs correctness	Consistent values can all be wrong; inconsistency requires authoritative resolution
Detection vs prevention	Detection finds defects; prevention reduces creation of defects
Cleansing vs remediation	Cleansing fixes data values; remediation may fix process, system, governance, or architecture
Data owner vs data steward	Owner is accountable for decisions; steward manages and coordinates quality activities
Business rule vs technical constraint	Business rule expresses policy; technical constraint implements or tests it
Profiling vs monitoring	Profiling explores and baselines; monitoring checks defined rules over time
DQ metric vs KPI	DQ metric measures data conformance; KPI measures business performance
Root cause vs symptom	Failed rule is symptom; underlying process/system/design issue is root cause
Golden record vs source of record	Golden record is consolidated trusted view; source of record is authoritative for specified data creation/maintenance

Common Exam Traps

Assuming data quality is owned only by IT.
Treating profiling tools as a substitute for business definitions.
Equating format validity with accuracy.
Choosing cleanup when source prevention is feasible.
Ignoring downstream consumers when defining quality requirements.
Applying one universal quality threshold to all data.
Measuring too many low-value rules while ignoring critical data elements.
Forgetting that data quality requirements can conflict across use cases.
Assuming duplicates are always defects without considering business context.
Confusing data governance, data management, and data quality management.
Ignoring metadata and lineage in impact analysis.
Failing to distinguish accepted risk/waiver from unresolved defect.
Closing issues after data correction without monitoring recurrence.
Overlooking reference data as a major cause of validity and consistency problems.
Assuming a dashboard improves quality without ownership, workflow, and remediation.

Quick Review Checklist

Before exam day, be able to answer these quickly:

Can you define data quality as fitness for use and explain why context matters?
Can you distinguish accuracy, validity, completeness, consistency, timeliness, uniqueness, and integrity?
Can you convert a business rule into a measurable data quality rule?
Can you identify critical data elements and prioritize quality work by business impact?
Can you choose between profiling, monitoring, reconciliation, cleansing, and root-cause remediation?
Can you explain why prevention controls are preferred when defects can be stopped at source?
Can you map issues to owners, stewards, custodians, and governance escalation paths?
Can you explain how metadata and lineage support quality assessment and remediation?
Can you identify appropriate metrics, thresholds, and scorecard content?
Can you recognize when data is valid but inaccurate, consistent but wrong, or complete but not fit for use?

Practical Next Step

Use this Quick Reference as a checklist while practicing scenario questions for the DAMA International DAMA CDMP Data Quality Specialist (CDMP Quality) exam. For each missed question, classify the miss by dimension, lifecycle stage, role, control type, or remediation decision, then drill that category with additional original practice questions.

Scenario Guide

Data Quality Foundations and Business Fitness