Try 10 focused CompTIA Data+ DA0-002 questions on Data Acquisition and Preparation, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try CompTIA Data+ DA0-002 on Web View full CompTIA Data+ DA0-002 practice page
| Field | Detail |
|---|---|
| Exam route | CompTIA Data+ DA0-002 |
| Topic area | Data Acquisition and Preparation |
| Blueprint weight | 22% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Data Acquisition and Preparation for CompTIA Data+ DA0-002. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 22% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These original IT Mastery practice questions are aligned to this topic area. Use them for self-assessment, scope review, and deciding what to drill next.
Topic: Data Acquisition and Preparation
A retail analyst is preparing customer purchase data for a loyalty dashboard. The business wants to compare spending behavior by easy-to-read customer segments rather than individual dollar amounts.
Exhibit: Data profile
| Field | Sample values | Requirement |
|---|---|---|
annual_spend | 124.50, 985.00, 2,430.75 | Show as Low, Medium, High, VIP |
customer_id | C1021, C1022, C1023 | Preserve as identifier |
Which preparation method best supports the requirement?
Options:
A. Parse customer_id into separate character fields
B. Impute missing annual_spend values with the mean
C. Standardize annual_spend to z-scores
D. Bin annual_spend into defined spending ranges
Best answer: D
Explanation: Binning is the appropriate transformation when detailed or continuous values need to be grouped into meaningful ranges. Here, annual_spend contains precise numeric amounts, but the dashboard requirement is to display spending segments such as Low, Medium, High, and VIP. The analyst should define business-approved range boundaries and assign each amount to the matching category. This preserves the ability to compare groups without showing every unique dollar value. Standardization changes scale for analysis, and imputation handles missing values; neither creates the requested business categories.
customer_id should be preserved, not split for spending analysis.Topic: Data Acquisition and Preparation
A data analyst is profiling orders_extract before publishing a monthly revenue dashboard. The dashboard filters status = "Completed" and sums order_amount. Finance reports the dashboard total appears overstated.
Profile summary:
| Field | Profile finding | Business rule |
|---|---|---|
order_id | 128 duplicate values | One row per completed order |
order_date | 2 null values | Required for trend charts |
order_amount | 47 negative values | Valid approved adjustments |
customer_segment | 22 nulls; 3 label variants | Optional grouping field |
Which inconsistency should be investigated first?
Options:
A. Inconsistent customer_segment labels
B. Negative order_amount values
C. Duplicated order_id values
D. Null order_date values
Best answer: C
Explanation: The first investigation should target the inconsistency most likely to affect the stated business problem and the primary metric. The dashboard sums order_amount for completed orders, and finance says the total is overstated. Because the business rule says each order_id should appear once, duplicate order_id values could cause completed orders to be counted more than once. That directly threatens the revenue total. Null dates may affect trend placement, segment label variants may affect grouping, and negative amounts are explicitly valid approved adjustments. Prioritize the issue that both violates a key rule and matches the reported KPI symptom.
customer_segment is optional and does not explain an overstated total.Topic: Data Acquisition and Preparation
A marketing analyst is preparing customer records for an exploratory campaign review. The manager does not have predefined customer categories but wants customers grouped based on similar purchase frequency, average order value, and product mix. Which preparation approach best meets this requirement?
Options:
A. Apply clustering to group similar records
B. Bin average order value into fixed ranges
C. Parse product names into separate fields
D. Standardize currency values to one format
Best answer: A
Explanation: Clustering is used during preparation or exploratory segmentation when the goal is to discover natural groups in records based on similarity across selected variables. In this scenario, the manager has no predefined customer categories and wants groups based on multiple behavioral measures, so clustering fits the requirement better than rule-based transformations. Binning would create fixed ranges for one variable, while parsing and standardization improve field usability but do not discover similar customer segments.
Topic: Data Acquisition and Preparation
A data analyst receives a new extract for a customer retention analysis and is about to start cleansing and joining it to CRM data. Based on the intake note, what should the analyst do before preparing the data?
Exhibit: Intake note
| Field | Status |
|---|---|
| Source system | Support ticket export |
| Source owner | Not listed |
| Collection scope | All ticket text, attachments, and emails |
| Permitted use | Not documented |
| Sensitive fields | Customer email, free-text complaints |
Options:
A. Remove duplicate tickets and standardize email formats
B. Join the tickets to CRM using customer email
C. Prepare the data and mask emails before publishing
D. Document ownership, scope, and permitted use first
Best answer: D
Explanation: Before data is cleaned, transformed, or joined, the analyst must confirm that the collection scope, source ownership, and permitted use are documented. This is especially important when the extract includes sensitive fields and broad content such as free text, attachments, and emails. Preparation activities can create new risk by combining sources, exposing personal data, or using data outside its approved purpose.
Data quality work can happen after the governance basics are clear. The key takeaway is to establish permission and accountability before changing or integrating the dataset.
Topic: Data Acquisition and Preparation
A data analyst supports a weekly product dashboard. The source system stores several years of event-level clickstream data, but the dashboard only needs recent summarized measures. Which acquisition pattern best meets the need?
Exhibit: Source and dashboard profile
| Item | Detail |
|---|---|
| Source table | 12 TB, append-only events |
| New data | About 80 GB per day |
| Dashboard need | 90-day totals by date, product, region |
| Current process | Full raw export to CSV each week |
| Issue | Long transfer time and duplicate reprocessing |
Options:
A. Use incremental source-side filtering and aggregation
B. Take a random sample of events before extraction
C. Export the full raw table to a staging folder weekly
D. Replicate the entire source database into the BI tool
Best answer: A
Explanation: The best acquisition pattern is to reduce data as close to the source as possible while preserving the analytical grain required by the report. Because the dashboard needs only 90-day totals by date, product, and region, the extract should use source-side filters, incremental logic for new rows, and aggregation before movement. This avoids repeatedly transferring terabytes of raw events and prevents duplicate reprocessing, but it still retains the dimensions and measures needed for analysis.
A full replica or full weekly export moves unnecessary data. A random sample reduces volume, but it can distort dashboard totals and does not satisfy the stated reporting requirement.
Topic: Data Acquisition and Preparation
A data analyst maintains a weekly sales dashboard. The refresh query used to finish in under 1 minute, but after adding a customer dimension table and expanding history from 3 months to 3 years, it now runs for 25 minutes. The KPI definitions have not changed, the source system is online, and executives need the same metrics by region today. What is the best professional decision?
Options:
A. Replace the dashboard with a static spreadsheet export
B. Inspect filters, indexes, joins, and row counts first
C. Redefine the KPIs to use a sampled dataset
D. Remove the customer dimension from the analysis
Best answer: B
Explanation: A sudden refresh slowdown after adding a dimension table and much more history is a query performance symptom, not a reason to change the business analysis immediately. The analyst should first check whether the query is scanning unnecessary rows, missing useful filters, joining on inefficient or incorrect keys, multiplying rows, or failing to use available indexes. This preserves the approved KPI definitions while targeting the most likely causes of the performance problem. If the diagnosis shows excess history or a problematic join, the analyst can then narrow the extraction or adjust the join logic with evidence.
Topic: Data Acquisition and Preparation
A data analyst is preparing a monthly churn report. The extraction query is becoming hard to validate because the same filtered customer set is rebuilt in several downstream steps.
Exhibit: Query review note
| Finding | Detail |
|---|---|
| Base rows | 8,200,000 customer activity records |
| Reused subset | Active customers in the last 12 months |
| Downstream use | Revenue, tickets, plan changes |
| Need | Check row counts between steps and rerun during one report build |
Which next action best supports the analyst’s need?
Options:
A. Repeat the filter in each nested subquery
B. Add a dashboard filter for active customers
C. Export the subset to a local spreadsheet
D. Stage the reused subset in a temporary table
Best answer: D
Explanation: Temporary tables are useful intermediate structures when a query has multiple steps that reuse the same derived dataset. In this case, the active-customer subset is needed by several downstream aggregations and must be checked during the report build. Staging that subset in a temporary table can make the workflow easier to validate, reduce repeated logic, and keep the intermediate data scoped to the session or job rather than creating a permanent managed table. This supports repeatable extraction while preserving a clear point for row-count checks. The key distinction is that the analyst needs an intermediate query structure, not a presentation filter or manual file export.
Topic: Data Acquisition and Preparation
A data analyst is building a repeatable monthly query that combines sales, returns, and customer tables. The logic requires several joins, filters out test accounts, and calculates return rates by region before loading a small reporting table. The source tables are large, and the analyst needs a clear way to validate the intermediate regional totals before the final load. What is the best professional decision?
Options:
A. Stage the cleaned regional results in a temporary table
B. Put all joins and calculations into one nested query
C. Export each source table to separate spreadsheets
D. Create a permanent duplicate of each source table
Best answer: A
Explanation: Temporary tables are useful intermediate structures when query work has multiple steps, large source tables, or validation checkpoints. In this scenario, the analyst can filter test accounts, join the needed data, aggregate return rates by region, and store those staged results temporarily. That makes the final load simpler and gives the analyst a concrete intermediate dataset to inspect for data quality issues before publishing the reporting table. Temporary tables are especially appropriate when the staged data is needed only during the current workflow or session, not as a long-term governed copy.
Topic: Data Acquisition and Preparation
A data analyst prepares a monthly sales variance report for regional managers. The report uses the same database query each month, but the region, product category, and date range change by request. The analyst must reduce manual edits and avoid hard-coded filter values while keeping the extraction repeatable. Which approach is the BEST professional decision?
Options:
A. Export all sales data to a spreadsheet
B. Create a parameterized extraction query
C. Save separate queries for each region
D. Edit the WHERE clause before each run
Best answer: B
Explanation: Parameterized extraction is best when a recurring analysis uses the same query pattern but needs different input values each run. In this scenario, the filters are predictable inputs: region, product category, and date range. Using parameters avoids repeatedly editing SQL text, reduces the risk of accidental hard-coded values, and makes the extraction easier to document and rerun. It also supports repeatability without building unnecessary separate pipelines or exporting more data than needed.
The key takeaway is to keep the query logic stable and make the changing filter values explicit inputs.
Topic: Data Acquisition and Preparation
A marketing analyst is preparing a monthly campaign dataset for a revenue dashboard. The source file contains order_amount_raw as text; most values are valid amounts, but some contain entries such as TBD, blank values, or currency symbols. The dashboard needs a numeric amount for aggregation, and the data steward wants the original quality issue to remain auditable. Which preparation approach best meets these requirements?
Options:
A. Replace all invalid amounts with 0 before loading
B. Overwrite order_amount_raw with cleaned numeric values
C. Remove every row with a nonnumeric amount
D. Create a numeric derived field and retain the raw field with an error flag
Best answer: D
Explanation: Analysis readiness improves when a transformation creates a usable field without destroying evidence of the source issue. In this case, the dashboard needs a numeric value for aggregation, but the steward also needs auditability. A derived numeric amount field supports calculations, while retaining order_amount_raw preserves lineage. Adding an error or validity flag makes invalid values visible for quality review instead of silently hiding them. This approach separates reporting usability from data-quality remediation.
Use the CompTIA Data+ DA0-002 Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try CompTIA Data+ DA0-002 on Web View CompTIA Data+ DA0-002 Practice Test
Read the CompTIA Data+ DA0-002 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.