DAMA CDMP Data Quality Specialist Quick Reference

Compact independent reference for DAMA International CDMP Quality exam prep: data quality dimensions, rules, profiling, remediation, governance, and metrics.

Exam Identity and Study Focus

ItemReference
Vendor/providerDAMA International
Official exam titleDAMA CDMP Data Quality Specialist
Official exam codeCDMP Quality
Page purposeIndependent Quick Reference for candidates reviewing data quality concepts, processes, roles, controls, and practical decision points

Data quality management is not just defect cleanup. For exam purposes, treat it as a governed management discipline that defines quality expectations, measures conformance, analyzes causes, remediates issues, and prevents recurrence.

High-yield framing:

  • Data quality = fitness for use by a defined business purpose, not abstract perfection.
  • Quality rules must trace to business rules, critical data, regulatory/reporting needs, operational risks, or customer outcomes.
  • Prevention is usually better than detection, but mature programs use both.
  • Root-cause remediation is stronger than downstream cleansing when the source or process can be changed.
  • Quality is contextual: the same data can be acceptable for trend analysis but unacceptable for billing, identity proofing, or regulatory reporting.

Core Data Quality Management Lifecycle

StageWhat HappensExam-Relevant OutputsCommon Trap
Define expectationsIdentify business needs, data consumers, critical data elements, quality dimensions, tolerancesData quality requirements, business rules, acceptance criteriaStarting with tool scans before defining what “good” means
Profile and assessExamine actual data values, patterns, relationships, duplicates, anomaliesBaseline quality report, defect categories, issue inventoryTreating profiling results as business rules without validation
Define rules and metricsConvert requirements into measurable checks and thresholdsData quality rules, scorecards, KPIs/KRIs, exception criteriaMeasuring what is easy instead of what matters
Analyze root causesDetermine why defects occurRoot-cause findings, impact analysis, remediation optionsFixing symptoms in reports while source processes remain broken
RemediateCorrect data, process, application, integration, or governance gapsCleansed records, process changes, transformation fixes, steward actionsAssuming all remediation means overwriting data
Monitor and controlContinuously measure and escalate exceptionsDashboards, alerts, SLA/OLA measures, issue workflowOne-time cleanup with no ongoing control
ImproveRefine standards, rules, ownership, training, and architecturePrevention controls, updated policies, lessons learnedNo feedback loop into governance or systems development

Data Quality Dimensions

Use dimensions as lenses for requirements and measurement. A good exam answer usually ties the dimension to a business outcome, testable rule, and acceptable threshold.

DimensionMeaningExample CheckWatch For
AccuracyData correctly represents the real-world object or eventCustomer date of birth matches authoritative sourceAccuracy often requires comparison to a trusted source, not just internal format validation
CompletenessRequired data is present to the needed levelMandatory tax identifier is populated for reportable customers“Complete enough” depends on purpose; optional fields are not automatically defects
ValidityData conforms to allowed format, type, range, or domainOrder status is one of approved status codesValid data can still be inaccurate
ConsistencyData values agree across systems, records, or business rulesCustomer status in CRM matches billing eligibilityConsistency does not prove correctness if all systems copied the same wrong value
TimelinessData is available within the required time windowInventory position refreshed before order promisingTimeliness includes latency, currency, and availability at point of use
CurrencyData reflects the most recent accepted stateAddress updated after verified change of residenceCurrent data is not always the same as historically correct data
UniquenessReal-world entity or event is represented once where requiredNo duplicate active customer master recordsDuplicates may be legitimate in transaction data but not in master data
IntegrityRelationships and dependencies are preservedInvoice has a valid customer ID and valid order referenceIncludes referential integrity and cross-field logic
ConformityData follows required standards and representationsPhone numbers stored in standard international formatStandardization supports matching, integration, and reporting
PrecisionLevel of detail is appropriateCoordinates captured to required decimal precisionExcess precision can imply false confidence; insufficient precision may break use cases
ReasonablenessValue is plausible in business contextEmployee age is within realistic employment rangeReasonableness checks detect anomalies but may require human review
AccessibilityData can be obtained by authorized users/processes when neededAnalysts can access approved data productDo not confuse accessibility with lack of security controls

Business Rules vs Data Quality Rules

ConceptDefinitionExampleExam Distinction
Business rulePolicy or constraint about how the business operatesA policy must have one active policyholderExpressed in business language; may exist without implementation
Data ruleImplemented rule about acceptable data representationpolicyholder_id must not be null for active policiesConverts business expectation into measurable data condition
Data quality ruleTest used to assess data against a dimension and thresholdActive policies with null policyholder ID must be below approved toleranceIncludes metric, scope, owner, severity, and action
Validation ruleControl that prevents or flags bad inputUI rejects invalid product codeUsually preventive and embedded in system or workflow
Transformation ruleLogic used to derive or move dataMap legacy customer type R to retailCan create quality issues if undocumented or inconsistent
Reconciliation ruleCheck that data agrees across processes or systemsSum of source transactions equals ledger load totalOften used in ETL, finance, and regulatory reporting

Anatomy of a Strong Data Quality Rule

ComponentWhat to SpecifyExample
Business purposeWhy the rule mattersRequired for regulatory customer identification
Data scopeSystems, tables, entities, records, periodActive customers in onboarding platform
Data element or relationshipField, composite field, reference, hierarchycustomer_id, country_code, parent account
DimensionQuality aspect being testedCompleteness, validity, uniqueness
Rule logicExact condition to evaluatecountry_code must exist in approved reference list
Threshold/toleranceAcceptable level or boundaryZero tolerance for blocked onboarding; limited tolerance for legacy archive
SeverityBusiness risk levelCritical, high, medium, low
Owner/stewardAccountable partyCustomer data owner, data steward, system owner
Exception handlingReview, correction, waiver, escalationSend exceptions to steward queue within agreed workflow
Measurement frequencyBatch, real time, daily, monthly, event-drivenDaily load check; real-time transaction validation
EvidenceReport, log, control result, audit trailScorecard and issue record

Profiling and Assessment Techniques

TechniquePurposeTypical FindingsBest Used When
Column profilingExamine nulls, min/max, patterns, lengths, data typesUnexpected null rates, invalid lengths, outliersFirst-pass understanding of unfamiliar data
Domain/value frequencyCount distinct values and distributionsInvalid codes, rare values, skewed valuesValidity and reference data checks
Pattern analysisIdentify structural patternsMixed date formats, inconsistent identifiersStandardization and parsing work
Cross-field analysisCompare related fields in same recordEnd date before start dateIntegrity and reasonableness checks
Cross-system comparisonCompare values between systemsCRM and billing customer address mismatchConsistency assessment
Referential integrity checkVerify valid parent/child relationshipsOrphan invoice without valid customerRelational and integration quality
Duplicate detectionFind likely duplicate entities or eventsSame person under multiple customer IDsMaster data and identity resolution
Time-series monitoringTrack metrics over timeSudden spike in missing values after releaseOperational monitoring and regression detection
ReconciliationCompare totals/counts across processing stepsSource count differs from warehouse load countETL, financial, regulatory, and audit-sensitive flows
Sampling and reviewHuman review of selected recordsFalse positives, ambiguous casesAccuracy checks where no fully automated source exists

Key Metrics and Formulas

Use metrics to make quality visible, comparable, and actionable. Avoid presenting a single score without showing what it measures.

\[ \text{Completeness \%} = \frac{\text{Required values populated}}{\text{Required values expected}} \times 100 \]\[ \text{Defect rate} = \frac{\text{Records failing rule}}{\text{Records evaluated}} \times 100 \]\[ \text{Validity \%} = \frac{\text{Values conforming to rule}}{\text{Values tested}} \times 100 \]\[ \text{Weighted data quality score} = \sum_{i=1}^{n}(\text{Dimension score}_i \times \text{Weight}_i) \]
MetricWhat It ShowsGood UseCaution
Rule pass rateShare of records passing a specific checkOperational control monitoringHigh pass rate can hide severe defects in critical records
Defect countNumber of failing recordsWork queue sizingCounts alone ignore population size
Defect rateDefects relative to tested populationComparing systems or periodsRequires stable denominator and rule definition
Completeness ratePresence of required valuesMandatory attribute checksNull is not the only form of missing data
Duplicate rateLikely duplicate records per populationMaster data improvementMatch logic affects results significantly
Timeliness lagDelay between event and data availabilityData pipeline and reporting SLAsSome latency may be acceptable by use case
Reconciliation varianceDifference between source and target totalsETL and financial controlsMust account for legitimate filters and transformations
Issue agingTime unresolved quality issues remain openStewardship and remediation performanceAging without severity can mislead
Recurrence rateReappearance of previously fixed issueRoot-cause effectivenessRequires issue classification discipline

Critical Data Elements and Prioritization

Not all data deserves the same level of control. Prioritize quality work by business impact.

Priority FactorQuestions to AskHigher Priority When
Business criticalityDoes the data drive revenue, operations, customer service, reporting, risk, or compliance?It affects key decisions, obligations, or customer outcomes
Usage frequencyHow often and by whom is it used?Many processes or high-value consumers depend on it
Risk exposureWhat happens if it is wrong, late, missing, or duplicated?Financial loss, regulatory exposure, safety risk, fraud, reputational impact
PropagationHow many downstream systems consume it?Defects spread broadly through integration and analytics
Correction costHow hard is it to fix after capture?Late correction is expensive or impossible
AuthoritativenessIs there a trusted source of truth?Multiple conflicting sources exist
Change volatilityHow often does it change?High volatility requires stronger monitoring
Data lifecycle stageIs it created, transformed, archived, or reported?Quality needs differ across lifecycle stages

Remediation Decision Table

SituationPrefer This ResponseWhy
Bad data originates at manual entryAdd input validation, training, workflow controls, or required fieldsPrevents recurrence at capture
Source system allows invalid combinationsUpdate application rules or reference controlsStronger than downstream correction
Integration mapping is wrongFix transformation logic and reload if appropriateCorrects systemic propagation
Data is valid but inconsistent across systemsDefine authoritative source, synchronization rules, and stewardship workflowResolves ownership and lineage conflict
Duplicate master records existStandardize, match, merge/link, apply survivorship, prevent future duplicatesTreats entity resolution as process and governance issue
Legacy data has known defects but low operational valueDocument limitations, isolate, apply risk-based cleanupAvoids wasteful perfectionism
Data must be corrected but source cannot change immediatelyApply controlled remediation with audit trail and exception processBalances business need with traceability
Defect is caused by unclear business definitionClarify glossary, policy, ownership, and rule semanticsPrevents teams from measuring different things
External data is poorValidate provider quality, contract expectations, monitoring, alternative sourcesQuality responsibility must be managed even if data is acquired
False positives overwhelm stewardsTune rules, thresholds, matching weights, and severity logicImproves trust and operational usability

Prevention, Detection, and Correction Controls

Control TypeExamplesStrengthLimitation
PreventiveRequired fields, domain validation, referential constraints, workflow approvals, controlled reference dataStops defects before creationCan slow processes or reject unusual valid cases
DetectiveProfiling, monitoring dashboards, reconciliation, anomaly detection, audit reportsFinds defects after creationRequires remediation workflow
CorrectiveCleansing, standardization, deduplication, enrichment, manual correctionImproves existing dataCan mask source problems if used alone
CompensatingDownstream reasonableness checks, exception reporting, disclosure of limitationsReduces risk when primary control is unavailableShould not become permanent substitute for root-cause fix
Governance controlOwnership, standards, issue escalation, policy, stewardshipCreates accountabilityIneffective without measurement and enforcement
Technical controlConstraints, validation services, metadata-driven checks, pipeline testsAutomates repeatabilityNeeds business-approved rules

Root-Cause Analysis Reference

Root-Cause CategorySymptomsExample Corrective Action
Process designMissing steps, unclear handoffs, rekeyingRedesign workflow, remove duplicate capture, assign approval point
People/trainingInconsistent entry, misunderstanding definitionsTraining, job aids, clearer business glossary
Application designNo validation, poorly designed screens, optional critical fieldsUI/API validation, required fields, controlled values
Integration/transformationMapping errors, truncation, code conversion defectsCorrect mappings, lineage review, pipeline tests
Metadata/definitionTeams use different meanings for same fieldBusiness glossary, semantic standards, data contracts
Reference dataOutdated or inconsistent code setsReference data governance, controlled updates
Master dataDuplicate entities, conflicting golden recordsMDM process, matching rules, survivorship policy
Policy/governanceNo owner, no escalation, unclear accountabilityData ownership model, stewardship process
External providerLate, incomplete, or inconsistent third-party feedsProvider quality monitoring, acceptance criteria
ArchitectureMultiple uncontrolled copies, batch latency, no lineageAuthoritative sources, integration standards, metadata management

High-yield distinction: root cause is why the defect is produced; impact is what the defect causes; symptom is what the measurement detected.

Matching, Deduplication, and Survivorship

TermMeaningExam Tip
ParsingBreaking a value into componentsNeeded before standardizing names, addresses, identifiers
StandardizationConverting values to common formatsImproves matching and conformity
NormalizationReducing representational variationExample: casing, punctuation, abbreviations
Exact matchRecords match only when values are identicalHigh precision, low tolerance for variation
Deterministic matchRule-based matching using defined conditionsTransparent and explainable
Probabilistic matchUses likelihood/weights across attributesHandles variation but requires tuning and review
Fuzzy matchFinds similar but not identical valuesUseful for names/addresses; can create false positives
BlockingReduces match comparisons by grouping candidatesImproves performance but can miss cross-block matches
SurvivorshipRules for choosing retained values after mergeMust align with trust, recency, source priority, or business policy
Golden recordConsolidated trusted representation of an entityRequires governance, not only tooling
Link vs mergeLink keeps records separate but associated; merge consolidatesUse merge carefully when identity confidence is high

Data Cleansing and Enrichment

TechniquePurposeGood CandidateRisk
StandardizationMake formats consistentAddresses, phone numbers, product codesMay alter meaning if standards are wrong
CorrectionReplace wrong values with known correct valuesVerified spelling, code correctionRequires trusted basis and auditability
ImputationFill missing values using inferenceAnalytical datasets with known assumptionsCan introduce bias; should be flagged
EnrichmentAdd data from internal/external sourceGeocoding, industry codes, demographicsExternal source quality and rights must be managed
DeduplicationRemove or consolidate redundant recordsCustomer, supplier, product mastersIncorrect merges are costly
Reference validationCompare to controlled listCountry, currency, product categoryReference list must be governed
Exception handlingRoute unresolved defects for reviewAmbiguous duplicates, unusual transactionsBacklogs reduce effectiveness

Exam trap: cleansing is not automatically improvement if it changes data without lineage, approval, audit trail, or business justification.

Metadata, Lineage, and Data Quality

Metadata TypeData Quality Use
Business metadataDefines meaning, ownership, criticality, approved business terms
Technical metadataIdentifies schemas, fields, data types, constraints, transformations
Operational metadataCaptures job runs, load times, failures, volumes, latency
Lineage metadataShows where data came from, how it changed, and where it goes
Quality metadataStores rules, scores, defects, thresholds, exceptions, issue status
Reference metadataDescribes allowed code sets and valid value domains

Why it matters:

  • Lineage supports impact analysis and root-cause tracing.
  • Business definitions reduce inconsistent interpretation.
  • Technical metadata helps automate profiling and controls.
  • Operational metadata helps detect pipeline and timeliness issues.
  • Quality metadata provides evidence of monitoring and improvement.

Data Quality and Governance Roles

RolePrimary ResponsibilitiesNot the Same As
Data ownerAccountable for data within a business domain; approves rules, priorities, and risk decisionsUsually not the person doing every correction
Data stewardManages definitions, rules, issues, quality monitoring, and coordinationNot merely an IT support role
Data custodianOperates technical environment, storage, access, backups, platformsDoes not define business meaning alone
Data consumerUses data and identifies fitness-for-use needsNot passive; should report quality issues
Data producerCreates or captures dataMust understand downstream quality impacts
Data quality analystProfiles data, defines measurements, analyzes defects, supports remediationDoes not own all business decisions
Data governance councilResolves cross-domain standards, priorities, and escalationsShould not become a bottleneck for every minor issue
System owner/product ownerEnsures application/process changes support data quality requirementsNeeds alignment with data ownership
Data architectDesigns structures, integration, lineage, and standards supportArchitecture alone cannot create quality without process controls

Data Quality Issue Management

StepKey QuestionsOutput
Log issueWhat rule failed? Where? How many records? Who detected it?Issue record with evidence
ClassifyWhich domain, dimension, severity, source, and impact?Prioritized category
Assign ownerWho can decide and who can fix?Accountable owner and responsible resolver
AnalyzeWhat is root cause? Is it isolated or systemic?Root-cause assessment
Decide treatmentCorrect, accept, defer, monitor, redesign, or escalate?Remediation plan
ImplementWhat data/process/system change is required?Controlled fix
ValidateDid the fix resolve the defect without side effects?Test and quality result
Close or monitorHas recurrence risk been addressed?Closure evidence and monitoring rule

Common severity criteria:

  • Critical reporting or legal exposure
  • Financial statement or billing impact
  • Customer harm or operational stoppage
  • Security, privacy, or access-control implications
  • Number and importance of affected records
  • Time sensitivity and downstream propagation

Data Quality in Data Warehousing, BI, and Analytics

AreaQuality ConcernPractical Control
Source extractionMissing records, late files, changed schemasSource counts, schema checks, arrival monitoring
StagingType conversion, truncation, invalid encodingsProfiling and reject/error tables
TransformationIncorrect mapping, business logic driftMapping review, test cases, lineage documentation
LoadingDuplicate loads, partial loads, referential failuresReconciliation and restart controls
ReportingMisleading metrics, inconsistent definitionsCertified metrics and semantic layer governance
Analytics/AIBiased, stale, incomplete, mislabeled training dataData suitability checks, drift monitoring, documentation
Historical dataSlowly changing meaning, late arriving factsEffective dating, versioned reference data
Self-service BIUncontrolled copies and inconsistent calculationsGoverned data products, catalogs, quality indicators

High-yield distinction: analytics data can be technically valid but analytically unsuitable because of bias, missing populations, stale features, or unclear definitions.

Data Quality in Master and Reference Data

Data TypeQuality FocusTypical Controls
Master dataCore entities such as customer, product, supplier, employeeIdentity resolution, uniqueness, survivorship, stewardship
Reference dataControlled values such as codes, statuses, country listsChange governance, valid value lists, versioning, synchronization
Transaction dataBusiness events such as orders, payments, claimsCompleteness, timeliness, reconciliation, auditability
MetadataDefinitions and descriptions of dataGlossary governance, lineage, ownership
Analytical dataAggregated, derived, modeled, or feature-engineered dataDefinition consistency, reproducibility, lineage, suitability

Exam trap: master data quality often requires organizational agreement on identity and ownership, not only duplicate detection.

Practical SQL Patterns for Data Quality Checks

Use SQL-like checks to understand measurement logic. Syntax varies by platform.

Null or Missing Required Values

SELECT
  COUNT(*) AS total_rows,
  SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) AS missing_customer_id
FROM orders;

Invalid Domain Values

SELECT order_status, COUNT(*) AS row_count
FROM orders
WHERE order_status NOT IN ('NEW', 'APPROVED', 'SHIPPED', 'CANCELLED')
GROUP BY order_status;

Duplicate Candidate Keys

SELECT email_address, COUNT(*) AS record_count
FROM customer
WHERE email_address IS NOT NULL
GROUP BY email_address
HAVING COUNT(*) > 1;

Referential Integrity Exceptions

SELECT o.order_id, o.customer_id
FROM orders o
LEFT JOIN customer c
  ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

Cross-Field Logic

SELECT contract_id, start_date, end_date
FROM contract
WHERE end_date < start_date;

Source-to-Target Reconciliation

SELECT 'source' AS system_name, COUNT(*) AS row_count FROM source_orders
UNION ALL
SELECT 'target' AS system_name, COUNT(*) AS row_count FROM warehouse_orders;

Decision Matrix: Where Should a Quality Rule Run?

Rule LocationUse WhenAdvantagesRisks
User interfaceHuman entry can be corrected immediatelyPrevents defects earlyMay not cover APIs or batch loads
API/service layerMultiple channels create/update dataCentralized validationRequires service adoption
Database constraintRule is stable and structuralStrong enforcementLess flexible for contextual rules
ETL/ELT pipelineData moves between systemsDetects integration and transformation defectsCan become downstream patching
Data quality platformCross-system monitoring and scorecards neededReusable profiling, dashboards, stewardship workflowTool outputs still require governance
Reporting/semantic layerRule is presentation-specificProtects metric interpretationToo late for operational correction
Steward workflowJudgment or business approval is neededHandles ambiguous casesManual backlog risk

Data Quality Scorecards and Dashboards

ElementIncludeAvoid
Business contextDomain, data product, consumer, purposeGeneric technical score with no owner
DimensionsCompleteness, validity, timeliness, etc. selected by use caseAssuming every dimension has equal value
Rule-level resultsPass/fail counts, defect rate, trendOnly aggregate score with no drill-down
ThresholdsTarget, tolerance, breach levelHidden or arbitrary thresholds
SeverityBusiness impact classificationTreating all defects equally
TrendsChange over time, release-related spikesOne-time snapshots only
Issue workflowOpen defects, aging, owner, statusDashboard with no action path
LineageSource and downstream impactNo way to trace affected reports/processes
Notes/limitationsKnown exclusions, sampling assumptionsFalse precision

High-Yield Distinctions

DistinctionKnow This
Accuracy vs validityValid means conforms to rules; accurate means correctly represents reality
Completeness vs optionalityMissing required data is a defect; missing optional data may be acceptable
Timeliness vs currencyTimeliness is availability within needed time; currency is whether value reflects current state
Consistency vs correctnessConsistent values can all be wrong; inconsistency requires authoritative resolution
Detection vs preventionDetection finds defects; prevention reduces creation of defects
Cleansing vs remediationCleansing fixes data values; remediation may fix process, system, governance, or architecture
Data owner vs data stewardOwner is accountable for decisions; steward manages and coordinates quality activities
Business rule vs technical constraintBusiness rule expresses policy; technical constraint implements or tests it
Profiling vs monitoringProfiling explores and baselines; monitoring checks defined rules over time
DQ metric vs KPIDQ metric measures data conformance; KPI measures business performance
Root cause vs symptomFailed rule is symptom; underlying process/system/design issue is root cause
Golden record vs source of recordGolden record is consolidated trusted view; source of record is authoritative for specified data creation/maintenance

Common Exam Traps

  • Assuming data quality is owned only by IT.
  • Treating profiling tools as a substitute for business definitions.
  • Equating format validity with accuracy.
  • Choosing cleanup when source prevention is feasible.
  • Ignoring downstream consumers when defining quality requirements.
  • Applying one universal quality threshold to all data.
  • Measuring too many low-value rules while ignoring critical data elements.
  • Forgetting that data quality requirements can conflict across use cases.
  • Assuming duplicates are always defects without considering business context.
  • Confusing data governance, data management, and data quality management.
  • Ignoring metadata and lineage in impact analysis.
  • Failing to distinguish accepted risk/waiver from unresolved defect.
  • Closing issues after data correction without monitoring recurrence.
  • Overlooking reference data as a major cause of validity and consistency problems.
  • Assuming a dashboard improves quality without ownership, workflow, and remediation.

Quick Review Checklist

Before exam day, be able to answer these quickly:

  • Can you define data quality as fitness for use and explain why context matters?
  • Can you distinguish accuracy, validity, completeness, consistency, timeliness, uniqueness, and integrity?
  • Can you convert a business rule into a measurable data quality rule?
  • Can you identify critical data elements and prioritize quality work by business impact?
  • Can you choose between profiling, monitoring, reconciliation, cleansing, and root-cause remediation?
  • Can you explain why prevention controls are preferred when defects can be stopped at source?
  • Can you map issues to owners, stewards, custodians, and governance escalation paths?
  • Can you explain how metadata and lineage support quality assessment and remediation?
  • Can you identify appropriate metrics, thresholds, and scorecard content?
  • Can you recognize when data is valid but inaccurate, consistent but wrong, or complete but not fit for use?

Practical Next Step

Use this Quick Reference as a checklist while practicing scenario questions for the DAMA International DAMA CDMP Data Quality Specialist (CDMP Quality) exam. For each missed question, classify the miss by dimension, lifecycle stage, role, control type, or remediation decision, then drill that category with additional original practice questions.

Browse Certification Practice Tests by Exam Family