Free CompTIA Data+ DA0-002 Practice Questions: Data Acquisition and Preparation

Last revised: July 14, 2026

Practice 10 free CompTIA Data+ V2 (CompTIA Data+ DA0-002) questions on Data Acquisition and Preparation, with answers, explanations, and the IT Mastery next step.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try CompTIA Data+ DA0-002 on Web

Topic snapshot

Field	Detail
Practice target	CompTIA Data+ DA0-002
Topic area	Data Acquisition and Preparation
Blueprint weight	22%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Data Acquisition and Preparation for CompTIA Data+ DA0-002. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These are original IT Mastery practice questions aligned to this topic area. They are not official CompTIA questions, copied live-exam content, or exam dumps. Use them to preview question style and explanation depth before continuing with topic drills, mixed sets, and timed mocks in IT Mastery.

Question 1

Topic: Data Acquisition and Preparation

A retail analyst is preparing customer purchase data for a loyalty dashboard. The business wants to compare spending behavior by easy-to-read customer segments rather than individual dollar amounts.

Exhibit: Data profile

Field	Sample values	Requirement
`annual_spend`	124.50, 985.00, 2,430.75	Show as Low, Medium, High, VIP
`customer_id`	C1021, C1022, C1023	Preserve as identifier

Which preparation method best supports the requirement?

Options:

A. Parse customer_id into separate character fields
B. Impute missing annual_spend values with the mean
C. Standardize annual_spend to z-scores
D. Bin annual_spend into defined spending ranges

Best answer: D

Explanation: Binning is the appropriate transformation when detailed or continuous values need to be grouped into meaningful ranges. Here, annual_spend contains precise numeric amounts, but the dashboard requirement is to display spending segments such as Low, Medium, High, and VIP. The analyst should define business-approved range boundaries and assign each amount to the matching category. This preserves the ability to compare groups without showing every unique dollar value. Standardization changes scale for analysis, and imputation handles missing values; neither creates the requested business categories.

Parsing identifiers fails because customer_id should be preserved, not split for spending analysis.
Mean imputation only addresses missing values and does not create spending segments.
Z-score standardization supports scale comparison, not business-friendly range labels.

Question 2

Topic: Data Acquisition and Preparation

A data analyst is profiling orders_extract before publishing a monthly revenue dashboard. The dashboard filters status = "Completed" and sums order_amount. Finance reports the dashboard total appears overstated.

Profile summary:

Field	Profile finding	Business rule
`order_id`	128 duplicate values	One row per completed order
`order_date`	2 null values	Required for trend charts
`order_amount`	47 negative values	Valid approved adjustments
`customer_segment`	22 nulls; 3 label variants	Optional grouping field

Which inconsistency should be investigated first?

Options:

A. Inconsistent customer_segment labels
B. Negative order_amount values
C. Duplicated order_id values
D. Null order_date values

Best answer: C

Explanation: The first investigation should target the inconsistency most likely to affect the stated business problem and the primary metric. The dashboard sums order_amount for completed orders, and finance says the total is overstated. Because the business rule says each order_id should appear once, duplicate order_id values could cause completed orders to be counted more than once. That directly threatens the revenue total. Null dates may affect trend placement, segment label variants may affect grouping, and negative amounts are explicitly valid approved adjustments. Prioritize the issue that both violates a key rule and matches the reported KPI symptom.

Null dates may affect time-based visuals, but only two records are affected and the reported issue is total revenue overstatement.
Negative amounts look unusual, but the business rule states they are valid approved adjustments.
Segment variants can fragment grouped views, but customer_segment is optional and does not explain an overstated total.

Question 3

Topic: Data Acquisition and Preparation

A marketing analyst is preparing customer records for an exploratory campaign review. The manager does not have predefined customer categories but wants customers grouped based on similar purchase frequency, average order value, and product mix. Which preparation approach best meets this requirement?

Options:

A. Apply clustering to group similar records
B. Bin average order value into fixed ranges
C. Parse product names into separate fields
D. Standardize currency values to one format

Best answer: A

Explanation: Clustering is used during preparation or exploratory segmentation when the goal is to discover natural groups in records based on similarity across selected variables. In this scenario, the manager has no predefined customer categories and wants groups based on multiple behavioral measures, so clustering fits the requirement better than rule-based transformations. Binning would create fixed ranges for one variable, while parsing and standardization improve field usability but do not discover similar customer segments.

Fixed ranges fail because binning groups values by predefined cutoffs, not by similarity across several customer attributes.
Field parsing helps separate text components but does not create customer segments.
Format cleanup improves consistency but does not identify natural groups in the records.

Question 4

Topic: Data Acquisition and Preparation

A data analyst receives a new extract for a customer retention analysis and is about to start cleansing and joining it to CRM data. Based on the intake note, what should the analyst do before preparing the data?

Exhibit: Intake note

Field	Status
Source system	Support ticket export
Source owner	Not listed
Collection scope	All ticket text, attachments, and emails
Permitted use	Not documented
Sensitive fields	Customer email, free-text complaints

Options:

A. Remove duplicate tickets and standardize email formats
B. Join the tickets to CRM using customer email
C. Prepare the data and mask emails before publishing
D. Document ownership, scope, and permitted use first

Best answer: D

Explanation: Before data is cleaned, transformed, or joined, the analyst must confirm that the collection scope, source ownership, and permitted use are documented. This is especially important when the extract includes sensitive fields and broad content such as free text, attachments, and emails. Preparation activities can create new risk by combining sources, exposing personal data, or using data outside its approved purpose.

Data quality work can happen after the governance basics are clear. The key takeaway is to establish permission and accountability before changing or integrating the dataset.

Cleansing first misses that duplicates and formatting are secondary to confirming whether the data may be used.
Joining by email increases sensitivity by linking sources before permitted use is documented.
Masking later may reduce exposure, but it does not replace documenting source ownership, scope, and allowed use.

Question 5

Topic: Data Acquisition and Preparation

A data analyst supports a weekly product dashboard. The source system stores several years of event-level clickstream data, but the dashboard only needs recent summarized measures. Which acquisition pattern best meets the need?

Exhibit: Source and dashboard profile

Item	Detail
Source table	12 TB, append-only events
New data	About 80 GB per day
Dashboard need	90-day totals by date, product, region
Current process	Full raw export to CSV each week
Issue	Long transfer time and duplicate reprocessing

Options:

A. Use incremental source-side filtering and aggregation
B. Take a random sample of events before extraction
C. Export the full raw table to a staging folder weekly
D. Replicate the entire source database into the BI tool

Best answer: A

Explanation: The best acquisition pattern is to reduce data as close to the source as possible while preserving the analytical grain required by the report. Because the dashboard needs only 90-day totals by date, product, and region, the extract should use source-side filters, incremental logic for new rows, and aggregation before movement. This avoids repeatedly transferring terabytes of raw events and prevents duplicate reprocessing, but it still retains the dimensions and measures needed for analysis.

A full replica or full weekly export moves unnecessary data. A random sample reduces volume, but it can distort dashboard totals and does not satisfy the stated reporting requirement.

Full raw export repeats the current inefficient pattern and reprocesses data that has already been acquired.
Random sampling reduces volume but does not preserve accurate totals for the dashboard.
Full replication may support many future uses, but it moves far more data than this dashboard requires.

Question 6

Topic: Data Acquisition and Preparation

A data analyst maintains a weekly sales dashboard. The refresh query used to finish in under 1 minute, but after adding a customer dimension table and expanding history from 3 months to 3 years, it now runs for 25 minutes. The KPI definitions have not changed, the source system is online, and executives need the same metrics by region today. What is the best professional decision?

Options:

A. Replace the dashboard with a static spreadsheet export
B. Inspect filters, indexes, joins, and row counts first
C. Redefine the KPIs to use a sampled dataset
D. Remove the customer dimension from the analysis

Best answer: B

Explanation: A sudden refresh slowdown after adding a dimension table and much more history is a query performance symptom, not a reason to change the business analysis immediately. The analyst should first check whether the query is scanning unnecessary rows, missing useful filters, joining on inefficient or incorrect keys, multiplying rows, or failing to use available indexes. This preserves the approved KPI definitions while targeting the most likely causes of the performance problem. If the diagnosis shows excess history or a problematic join, the analyst can then narrow the extraction or adjust the join logic with evidence.

Sampling the KPI fails because it changes the approved metric and may reduce accuracy without first diagnosing the performance issue.
Static export fails because it avoids the refresh problem instead of addressing the repeatable extraction requirement.
Dropping the dimension fails because it may remove required regional reporting detail before confirming the join is actually the cause.

Question 7

Topic: Data Acquisition and Preparation

A data analyst is preparing a monthly churn report. The extraction query is becoming hard to validate because the same filtered customer set is rebuilt in several downstream steps.

Exhibit: Query review note

Finding	Detail
Base rows	8,200,000 customer activity records
Reused subset	Active customers in the last 12 months
Downstream use	Revenue, tickets, plan changes
Need	Check row counts between steps and rerun during one report build

Which next action best supports the analyst’s need?

Options:

A. Repeat the filter in each nested subquery
B. Add a dashboard filter for active customers
C. Export the subset to a local spreadsheet
D. Stage the reused subset in a temporary table

Best answer: D

Explanation: Temporary tables are useful intermediate structures when a query has multiple steps that reuse the same derived dataset. In this case, the active-customer subset is needed by several downstream aggregations and must be checked during the report build. Staging that subset in a temporary table can make the workflow easier to validate, reduce repeated logic, and keep the intermediate data scoped to the session or job rather than creating a permanent managed table. This supports repeatable extraction while preserving a clear point for row-count checks. The key distinction is that the analyst needs an intermediate query structure, not a presentation filter or manual file export.

Dashboard filter changes report presentation but does not simplify or validate the extraction query steps.
Local spreadsheet can break repeatability and lineage for a monthly extraction workflow.
Repeated nested filters preserve the current complexity and increase the chance of inconsistent logic.

Question 8

Topic: Data Acquisition and Preparation

A data analyst is building a repeatable monthly query that combines sales, returns, and customer tables. The logic requires several joins, filters out test accounts, and calculates return rates by region before loading a small reporting table. The source tables are large, and the analyst needs a clear way to validate the intermediate regional totals before the final load. What is the best professional decision?

Options:

A. Stage the cleaned regional results in a temporary table
B. Put all joins and calculations into one nested query
C. Export each source table to separate spreadsheets
D. Create a permanent duplicate of each source table

Best answer: A

Explanation: Temporary tables are useful intermediate structures when query work has multiple steps, large source tables, or validation checkpoints. In this scenario, the analyst can filter test accounts, join the needed data, aggregate return rates by region, and store those staged results temporarily. That makes the final load simpler and gives the analyst a concrete intermediate dataset to inspect for data quality issues before publishing the reporting table. Temporary tables are especially appropriate when the staged data is needed only during the current workflow or session, not as a long-term governed copy.

Single nested query may work technically, but it is harder to validate and maintain when the logic has several steps.
Spreadsheet exports add manual handling and reduce repeatability for a monthly extraction process.
Permanent duplicates create unnecessary storage and governance concerns when only intermediate staging is needed.

Question 9

Topic: Data Acquisition and Preparation

A data analyst prepares a monthly sales variance report for regional managers. The report uses the same database query each month, but the region, product category, and date range change by request. The analyst must reduce manual edits and avoid hard-coded filter values while keeping the extraction repeatable. Which approach is the BEST professional decision?

Options:

A. Export all sales data to a spreadsheet
B. Create a parameterized extraction query
C. Save separate queries for each region
D. Edit the WHERE clause before each run

Best answer: B

Explanation: Parameterized extraction is best when a recurring analysis uses the same query pattern but needs different input values each run. In this scenario, the filters are predictable inputs: region, product category, and date range. Using parameters avoids repeatedly editing SQL text, reduces the risk of accidental hard-coded values, and makes the extraction easier to document and rerun. It also supports repeatability without building unnecessary separate pipelines or exporting more data than needed.

The key takeaway is to keep the query logic stable and make the changing filter values explicit inputs.

Separate saved queries increase maintenance because each version can drift from the standard logic.
Full spreadsheet export ignores the need for targeted, repeatable extraction and may create unnecessary data handling risk.
Manual WHERE edits are error-prone and keep the process dependent on hard-coded filter changes.

Question 10

Topic: Data Acquisition and Preparation

A marketing analyst is preparing a monthly campaign dataset for a revenue dashboard. The source file contains order_amount_raw as text; most values are valid amounts, but some contain entries such as TBD, blank values, or currency symbols. The dashboard needs a numeric amount for aggregation, and the data steward wants the original quality issue to remain auditable. Which preparation approach best meets these requirements?

Options:

A. Replace all invalid amounts with 0 before loading
B. Overwrite order_amount_raw with cleaned numeric values
C. Remove every row with a nonnumeric amount
D. Create a numeric derived field and retain the raw field with an error flag

Best answer: D

Explanation: Analysis readiness improves when a transformation creates a usable field without destroying evidence of the source issue. In this case, the dashboard needs a numeric value for aggregation, but the steward also needs auditability. A derived numeric amount field supports calculations, while retaining order_amount_raw preserves lineage. Adding an error or validity flag makes invalid values visible for quality review instead of silently hiding them. This approach separates reporting usability from data-quality remediation.

Overwriting raw data removes the original values, making it harder to audit how quality issues were handled.
Using zero imputation can distort revenue totals and hide whether values were truly zero or invalid.
Dropping rows may bias the dashboard and removes evidence needed for follow-up with the source owner.

Continue in the web app

Use IT Mastery for interactive CompTIA Data+ DA0-002 practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try CompTIA Data+ DA0-002 on Web

Data Concepts and Environments

Data Analysis

Free CompTIA Data+ DA0-002 Practice Questions: Data Acquisition and Preparation

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue in the web app

Related focused pages

Browse Certification Practice Tests by Exam Family