Browse Certification Practice Tests by Exam Family

CompTIA Data+ DA0-002: Data Acquisition and Preparation

Try 10 focused CompTIA Data+ DA0-002 questions on Data Acquisition and Preparation, with explanations, then continue with IT Mastery.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try CompTIA Data+ DA0-002 on Web View full CompTIA Data+ DA0-002 practice page

Topic snapshot

FieldDetail
Exam routeCompTIA Data+ DA0-002
Topic areaData Acquisition and Preparation
Blueprint weight22%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Data Acquisition and Preparation for CompTIA Data+ DA0-002. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These original IT Mastery practice questions are aligned to this topic area. Use them for self-assessment, scope review, and deciding what to drill next.

Question 1

Topic: Data Acquisition and Preparation

A retail analyst is preparing customer purchase data for a loyalty dashboard. The business wants to compare spending behavior by easy-to-read customer segments rather than individual dollar amounts.

Exhibit: Data profile

FieldSample valuesRequirement
annual_spend124.50, 985.00, 2,430.75Show as Low, Medium, High, VIP
customer_idC1021, C1022, C1023Preserve as identifier

Which preparation method best supports the requirement?

Options:

  • A. Parse customer_id into separate character fields

  • B. Impute missing annual_spend values with the mean

  • C. Standardize annual_spend to z-scores

  • D. Bin annual_spend into defined spending ranges

Best answer: D

Explanation: Binning is the appropriate transformation when detailed or continuous values need to be grouped into meaningful ranges. Here, annual_spend contains precise numeric amounts, but the dashboard requirement is to display spending segments such as Low, Medium, High, and VIP. The analyst should define business-approved range boundaries and assign each amount to the matching category. This preserves the ability to compare groups without showing every unique dollar value. Standardization changes scale for analysis, and imputation handles missing values; neither creates the requested business categories.

  • Parsing identifiers fails because customer_id should be preserved, not split for spending analysis.
  • Mean imputation only addresses missing values and does not create spending segments.
  • Z-score standardization supports scale comparison, not business-friendly range labels.

Question 2

Topic: Data Acquisition and Preparation

A data analyst is profiling orders_extract before publishing a monthly revenue dashboard. The dashboard filters status = "Completed" and sums order_amount. Finance reports the dashboard total appears overstated.

Profile summary:

FieldProfile findingBusiness rule
order_id128 duplicate valuesOne row per completed order
order_date2 null valuesRequired for trend charts
order_amount47 negative valuesValid approved adjustments
customer_segment22 nulls; 3 label variantsOptional grouping field

Which inconsistency should be investigated first?

Options:

  • A. Inconsistent customer_segment labels

  • B. Negative order_amount values

  • C. Duplicated order_id values

  • D. Null order_date values

Best answer: C

Explanation: The first investigation should target the inconsistency most likely to affect the stated business problem and the primary metric. The dashboard sums order_amount for completed orders, and finance says the total is overstated. Because the business rule says each order_id should appear once, duplicate order_id values could cause completed orders to be counted more than once. That directly threatens the revenue total. Null dates may affect trend placement, segment label variants may affect grouping, and negative amounts are explicitly valid approved adjustments. Prioritize the issue that both violates a key rule and matches the reported KPI symptom.

  • Null dates may affect time-based visuals, but only two records are affected and the reported issue is total revenue overstatement.
  • Negative amounts look unusual, but the business rule states they are valid approved adjustments.
  • Segment variants can fragment grouped views, but customer_segment is optional and does not explain an overstated total.

Question 3

Topic: Data Acquisition and Preparation

A marketing analyst is preparing customer records for an exploratory campaign review. The manager does not have predefined customer categories but wants customers grouped based on similar purchase frequency, average order value, and product mix. Which preparation approach best meets this requirement?

Options:

  • A. Apply clustering to group similar records

  • B. Bin average order value into fixed ranges

  • C. Parse product names into separate fields

  • D. Standardize currency values to one format

Best answer: A

Explanation: Clustering is used during preparation or exploratory segmentation when the goal is to discover natural groups in records based on similarity across selected variables. In this scenario, the manager has no predefined customer categories and wants groups based on multiple behavioral measures, so clustering fits the requirement better than rule-based transformations. Binning would create fixed ranges for one variable, while parsing and standardization improve field usability but do not discover similar customer segments.

  • Fixed ranges fail because binning groups values by predefined cutoffs, not by similarity across several customer attributes.
  • Field parsing helps separate text components but does not create customer segments.
  • Format cleanup improves consistency but does not identify natural groups in the records.

Question 4

Topic: Data Acquisition and Preparation

A data analyst receives a new extract for a customer retention analysis and is about to start cleansing and joining it to CRM data. Based on the intake note, what should the analyst do before preparing the data?

Exhibit: Intake note

FieldStatus
Source systemSupport ticket export
Source ownerNot listed
Collection scopeAll ticket text, attachments, and emails
Permitted useNot documented
Sensitive fieldsCustomer email, free-text complaints

Options:

  • A. Remove duplicate tickets and standardize email formats

  • B. Join the tickets to CRM using customer email

  • C. Prepare the data and mask emails before publishing

  • D. Document ownership, scope, and permitted use first

Best answer: D

Explanation: Before data is cleaned, transformed, or joined, the analyst must confirm that the collection scope, source ownership, and permitted use are documented. This is especially important when the extract includes sensitive fields and broad content such as free text, attachments, and emails. Preparation activities can create new risk by combining sources, exposing personal data, or using data outside its approved purpose.

Data quality work can happen after the governance basics are clear. The key takeaway is to establish permission and accountability before changing or integrating the dataset.

  • Cleansing first misses that duplicates and formatting are secondary to confirming whether the data may be used.
  • Joining by email increases sensitivity by linking sources before permitted use is documented.
  • Masking later may reduce exposure, but it does not replace documenting source ownership, scope, and allowed use.

Question 5

Topic: Data Acquisition and Preparation

A data analyst supports a weekly product dashboard. The source system stores several years of event-level clickstream data, but the dashboard only needs recent summarized measures. Which acquisition pattern best meets the need?

Exhibit: Source and dashboard profile

ItemDetail
Source table12 TB, append-only events
New dataAbout 80 GB per day
Dashboard need90-day totals by date, product, region
Current processFull raw export to CSV each week
IssueLong transfer time and duplicate reprocessing

Options:

  • A. Use incremental source-side filtering and aggregation

  • B. Take a random sample of events before extraction

  • C. Export the full raw table to a staging folder weekly

  • D. Replicate the entire source database into the BI tool

Best answer: A

Explanation: The best acquisition pattern is to reduce data as close to the source as possible while preserving the analytical grain required by the report. Because the dashboard needs only 90-day totals by date, product, and region, the extract should use source-side filters, incremental logic for new rows, and aggregation before movement. This avoids repeatedly transferring terabytes of raw events and prevents duplicate reprocessing, but it still retains the dimensions and measures needed for analysis.

A full replica or full weekly export moves unnecessary data. A random sample reduces volume, but it can distort dashboard totals and does not satisfy the stated reporting requirement.

  • Full raw export repeats the current inefficient pattern and reprocesses data that has already been acquired.
  • Random sampling reduces volume but does not preserve accurate totals for the dashboard.
  • Full replication may support many future uses, but it moves far more data than this dashboard requires.

Question 6

Topic: Data Acquisition and Preparation

A data analyst maintains a weekly sales dashboard. The refresh query used to finish in under 1 minute, but after adding a customer dimension table and expanding history from 3 months to 3 years, it now runs for 25 minutes. The KPI definitions have not changed, the source system is online, and executives need the same metrics by region today. What is the best professional decision?

Options:

  • A. Replace the dashboard with a static spreadsheet export

  • B. Inspect filters, indexes, joins, and row counts first

  • C. Redefine the KPIs to use a sampled dataset

  • D. Remove the customer dimension from the analysis

Best answer: B

Explanation: A sudden refresh slowdown after adding a dimension table and much more history is a query performance symptom, not a reason to change the business analysis immediately. The analyst should first check whether the query is scanning unnecessary rows, missing useful filters, joining on inefficient or incorrect keys, multiplying rows, or failing to use available indexes. This preserves the approved KPI definitions while targeting the most likely causes of the performance problem. If the diagnosis shows excess history or a problematic join, the analyst can then narrow the extraction or adjust the join logic with evidence.

  • Sampling the KPI fails because it changes the approved metric and may reduce accuracy without first diagnosing the performance issue.
  • Static export fails because it avoids the refresh problem instead of addressing the repeatable extraction requirement.
  • Dropping the dimension fails because it may remove required regional reporting detail before confirming the join is actually the cause.

Question 7

Topic: Data Acquisition and Preparation

A data analyst is preparing a monthly churn report. The extraction query is becoming hard to validate because the same filtered customer set is rebuilt in several downstream steps.

Exhibit: Query review note

FindingDetail
Base rows8,200,000 customer activity records
Reused subsetActive customers in the last 12 months
Downstream useRevenue, tickets, plan changes
NeedCheck row counts between steps and rerun during one report build

Which next action best supports the analyst’s need?

Options:

  • A. Repeat the filter in each nested subquery

  • B. Add a dashboard filter for active customers

  • C. Export the subset to a local spreadsheet

  • D. Stage the reused subset in a temporary table

Best answer: D

Explanation: Temporary tables are useful intermediate structures when a query has multiple steps that reuse the same derived dataset. In this case, the active-customer subset is needed by several downstream aggregations and must be checked during the report build. Staging that subset in a temporary table can make the workflow easier to validate, reduce repeated logic, and keep the intermediate data scoped to the session or job rather than creating a permanent managed table. This supports repeatable extraction while preserving a clear point for row-count checks. The key distinction is that the analyst needs an intermediate query structure, not a presentation filter or manual file export.

  • Dashboard filter changes report presentation but does not simplify or validate the extraction query steps.
  • Local spreadsheet can break repeatability and lineage for a monthly extraction workflow.
  • Repeated nested filters preserve the current complexity and increase the chance of inconsistent logic.

Question 8

Topic: Data Acquisition and Preparation

A data analyst is building a repeatable monthly query that combines sales, returns, and customer tables. The logic requires several joins, filters out test accounts, and calculates return rates by region before loading a small reporting table. The source tables are large, and the analyst needs a clear way to validate the intermediate regional totals before the final load. What is the best professional decision?

Options:

  • A. Stage the cleaned regional results in a temporary table

  • B. Put all joins and calculations into one nested query

  • C. Export each source table to separate spreadsheets

  • D. Create a permanent duplicate of each source table

Best answer: A

Explanation: Temporary tables are useful intermediate structures when query work has multiple steps, large source tables, or validation checkpoints. In this scenario, the analyst can filter test accounts, join the needed data, aggregate return rates by region, and store those staged results temporarily. That makes the final load simpler and gives the analyst a concrete intermediate dataset to inspect for data quality issues before publishing the reporting table. Temporary tables are especially appropriate when the staged data is needed only during the current workflow or session, not as a long-term governed copy.

  • Single nested query may work technically, but it is harder to validate and maintain when the logic has several steps.
  • Spreadsheet exports add manual handling and reduce repeatability for a monthly extraction process.
  • Permanent duplicates create unnecessary storage and governance concerns when only intermediate staging is needed.

Question 9

Topic: Data Acquisition and Preparation

A data analyst prepares a monthly sales variance report for regional managers. The report uses the same database query each month, but the region, product category, and date range change by request. The analyst must reduce manual edits and avoid hard-coded filter values while keeping the extraction repeatable. Which approach is the BEST professional decision?

Options:

  • A. Export all sales data to a spreadsheet

  • B. Create a parameterized extraction query

  • C. Save separate queries for each region

  • D. Edit the WHERE clause before each run

Best answer: B

Explanation: Parameterized extraction is best when a recurring analysis uses the same query pattern but needs different input values each run. In this scenario, the filters are predictable inputs: region, product category, and date range. Using parameters avoids repeatedly editing SQL text, reduces the risk of accidental hard-coded values, and makes the extraction easier to document and rerun. It also supports repeatability without building unnecessary separate pipelines or exporting more data than needed.

The key takeaway is to keep the query logic stable and make the changing filter values explicit inputs.

  • Separate saved queries increase maintenance because each version can drift from the standard logic.
  • Full spreadsheet export ignores the need for targeted, repeatable extraction and may create unnecessary data handling risk.
  • Manual WHERE edits are error-prone and keep the process dependent on hard-coded filter changes.

Question 10

Topic: Data Acquisition and Preparation

A marketing analyst is preparing a monthly campaign dataset for a revenue dashboard. The source file contains order_amount_raw as text; most values are valid amounts, but some contain entries such as TBD, blank values, or currency symbols. The dashboard needs a numeric amount for aggregation, and the data steward wants the original quality issue to remain auditable. Which preparation approach best meets these requirements?

Options:

  • A. Replace all invalid amounts with 0 before loading

  • B. Overwrite order_amount_raw with cleaned numeric values

  • C. Remove every row with a nonnumeric amount

  • D. Create a numeric derived field and retain the raw field with an error flag

Best answer: D

Explanation: Analysis readiness improves when a transformation creates a usable field without destroying evidence of the source issue. In this case, the dashboard needs a numeric value for aggregation, but the steward also needs auditability. A derived numeric amount field supports calculations, while retaining order_amount_raw preserves lineage. Adding an error or validity flag makes invalid values visible for quality review instead of silently hiding them. This approach separates reporting usability from data-quality remediation.

  • Overwriting raw data removes the original values, making it harder to audit how quality issues were handled.
  • Using zero imputation can distort revenue totals and hide whether values were truly zero or invalid.
  • Dropping rows may bias the dashboard and removes evidence needed for follow-up with the source owner.

Continue with full practice

Use the CompTIA Data+ DA0-002 Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try CompTIA Data+ DA0-002 on Web View CompTIA Data+ DA0-002 Practice Test

Free review resource

Read the CompTIA Data+ DA0-002 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 28, 2026