Try 10 focused CompTIA DataAI DY0-001 questions on Machine Learning, with explanations, then continue with IT Mastery.
Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.
Try CompTIA DataAI DY0-001 on Web View full CompTIA DataAI DY0-001 practice page
| Field | Detail |
|---|---|
| Exam route | CompTIA DataAI DY0-001 |
| Topic area | Machine Learning |
| Blueprint weight | 24% |
| Page purpose | Focused sample questions before returning to mixed practice |
Use this page to isolate Machine Learning for CompTIA DataAI DY0-001. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.
| Pass | What to do | What to record |
|---|---|---|
| First attempt | Answer without checking the explanation first. | The fact, rule, calculation, or judgment point that controlled your answer. |
| Review | Read the explanation even when you were correct. | Why the best answer is stronger than the closest distractor. |
| Repair | Repeat only missed or uncertain items after a short break. | The pattern behind misses, not the answer letter. |
| Transfer | Return to mixed practice once the topic feels stable. | Whether the same skill holds up when the topic is no longer obvious. |
Blueprint context: 24% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.
These original IT Mastery practice questions are aligned to this topic area. Use them for self-assessment, scope review, and deciding what to drill next.
Topic: Machine Learning
A streaming service analytics team wants to identify viewing behaviors that tend to occur together during the same user session so it can design bundle recommendations. Which method is best supported by the exhibit?
Exhibit: Session event data profile
| Field | Example | Notes |
|---|---|---|
session_id | S-10492 | Groups events in one visit |
event_type | trailer_view, add_watchlist | Multiple events per session |
content_tag | sci-fi, documentary | Multiple tags per session |
conversion_label | not collected | No target outcome available |
Options:
A. Association rule mining
B. Logistic regression
C. Linear regression
D. K-nearest neighbors classification
Best answer: A
Explanation: Association rule mining is used to discover items, events, or behaviors that frequently occur together within a shared context, such as a shopping basket or user session. The exhibit shows multiple event types and content tags grouped by session_id, and it explicitly lacks a target label. That supports finding rules such as “sessions with trailer views and sci-fi tags often also include watchlist adds,” typically evaluated with measures such as support, confidence, and lift.
Supervised methods require a labeled outcome to predict, while this use case is about uncovering co-occurrence patterns.
Topic: Machine Learning
A fraud analytics team needs to choose a first production classifier for manual investigator triage. A small performance gap is acceptable if each decision can be explained as a readable sequence of feature thresholds.
Exhibit: Validation summary
| Candidate | AUC | Diagnostic finding |
|---|---|---|
| Regularized logistic regression | 0.77 | Misses threshold interactions |
| Shallow decision tree | 0.83 | Uses readable split paths |
| Random forest | 0.85 | Opaque aggregate voting |
| Gradient-boosted trees | 0.86 | Opaque additive ensemble |
Which candidate best fits the requirement?
Options:
A. Use gradient-boosted trees
B. Use regularized logistic regression
C. Use a shallow decision tree
D. Use a random forest
Best answer: C
Explanation: A decision tree is the best fit when the scenario values both interpretability and non-linear split behavior. The exhibit shows that logistic regression is more transparent than the ensembles, but it misses the threshold interactions that matter for the fraud patterns. The random forest and gradient-boosted models have slightly higher AUC, but their ensemble mechanisms make individual decisions harder to explain as a simple readable path. A shallow tree gives investigators a sequence of feature thresholds that can be reviewed and challenged while still modeling non-linear decision boundaries. The key trade-off is accepting a small metric gap to satisfy the explanation requirement.
Topic: Machine Learning
A regional lender needs a default-risk model for a regulated approval workflow. The training set has 2,800 historical loans, 32 mostly clean tabular predictors, and a 12% default rate. Business requirements include audit-friendly explanations, stable quarterly retraining on CPU-only infrastructure, and clear factor directionality for adverse-action review. A pilot multilayer neural network improved cross-validated AUC from 0.781 to 0.786 but showed higher fold-to-fold variance. Which method best maps to these requirements?
Options:
A. A convolutional network over reshaped feature vectors
B. Regularized logistic regression with calibrated probabilities
C. A deeper neural network with dropout and early stopping
D. An autoencoder to learn latent tabular features
Best answer: B
Explanation: For a small, structured tabular problem with strict interpretability and stable retraining requirements, a simpler statistical model is usually preferable unless a complex model delivers a meaningful, validated gain. Regularized logistic regression supports coefficient-based directionality, odds-ratio style explanations, calibration, and reproducible CPU-based retraining. The neural network’s tiny AUC lift is not compelling because it comes with higher validation variance and weaker auditability. Deep learning is most defensible when the data modality or scale justifies representation learning, such as images, text, speech, or very large heterogeneous datasets. Here, the business constraints make unnecessary complexity a liability rather than an advantage.
Topic: Machine Learning
A hospital analytics team needs to prioritize follow-up calls for patients at discharge. The target is whether a patient is readmitted within 30 days, the positive class is 8%, predictors are mixed tabular EHR variables available at discharge, and stakeholders need calibrated risk scores that can be thresholded as staffing changes. Nonlinear interactions are expected, and nightly batch scoring is acceptable. Which supervised method is the best professional choice?
Options:
A. K-means clustering with risk labels assigned after training
B. Ordinary least squares regression
C. Calibrated gradient-boosted tree classifier
D. Cox proportional hazards survival model
Best answer: C
Explanation: This is a supervised binary classification problem: the label is readmission within 30 days. Because the business needs ranked, thresholdable risk scores rather than only class labels, probability calibration matters. A gradient-boosted tree classifier is a strong fit for mixed tabular healthcare data when nonlinear interactions are expected, and nightly batch scoring reduces latency concerns. Class imbalance should be handled during training and evaluation, but the method still needs to optimize for calibrated binary risk rather than a continuous outcome or unsupervised grouping. The key distinction is matching the binary target and calibrated decision workflow, not simply choosing the most interpretable or most statistically familiar model.
Topic: Machine Learning
A data science team is selecting a model for a regulated loan-default early-warning system. The business goal is stable recall at a fixed false-positive budget, and the dataset has repeated observations per borrower over time. The current comparison tunes 200 gradient-boosting configurations on the final holdout set, tunes 10 logistic-regression configurations with random row-level cross-validation, and reports the best observed holdout score for each model.
Which action is the BEST professional decision before recommending a model?
Options:
A. Average all tuned models into an ensemble before evaluation
B. Increase logistic-regression trials until its holdout score improves
C. Select gradient boosting because it had the highest holdout recall
D. Use borrower-grouped nested cross-validation with a comparable tuning budget
Best answer: D
Explanation: A fair model selection process must separate hyperparameter tuning from final performance estimation and apply a comparable validation protocol across candidates. Here, the final holdout has been reused for tuning, so the reported best score is optimistically biased. Random row-level cross-validation can also leak borrower-specific information because repeated observations from the same borrower may appear in both train and validation folds. A grouped nested cross-validation design addresses both issues: inner folds tune hyperparameters, outer folds estimate generalization, and borrower grouping prevents identity leakage. Comparable search budgets do not require identical grids, but they should give each candidate a defensible opportunity without overfitting the selection process.
The key takeaway is that the model recommendation should be based on an unbiased, like-for-like evaluation, not the best score from uneven holdout probing.
Topic: Machine Learning
A claims analytics team must choose a production model for fraud triage. The business requires ROC-AUC of at least 0.84, per-claim reason codes for analyst review, weekly retraining by a small MLOps team, and batch scoring overnight.
| Candidate | Validation ROC-AUC | Notes |
|---|---|---|
| Pruned decision tree | 0.78 | Easy to explain |
| Random forest | 0.84 | Many trees; slower explanations |
| Gradient-boosted trees | 0.87 | Supports monotonic constraints and SHAP values |
| Stacked ensemble | 0.89 | Combines trees and neural network |
Which option best maps to these requirements?
Options:
A. Pruned decision tree with analyst-readable rules
B. Random forest without local explanation artifacts
C. Gradient-boosted trees with constraints and SHAP reason codes
D. Stacked ensemble optimized only for ROC-AUC
Best answer: C
Explanation: The key trade-off is ensemble performance versus interpretability and operational complexity. The pruned tree is easiest to explain, but it fails the required ROC-AUC threshold. The stacked ensemble has the best validation score, but its mixed architecture increases deployment, monitoring, and explanation burden for a small team. Gradient-boosted trees provide stronger performance than the threshold and can support governance needs through monotonic constraints and local explanation methods such as SHAP. Because scoring is overnight batch rather than real-time, the extra explanation computation is more acceptable. The best fit is not the most accurate model in isolation; it is the model that satisfies accuracy, explainability, and operational requirements together.
Topic: Machine Learning
A retailer is training a model to forecast same-day order volume per fulfillment center for staffing. Operations wants a forecast that covers demand on most days, because understaffing causes missed delivery windows and is about four times more costly than overstaffing. The target is continuous, right-skewed, and the validation metric will include empirical coverage of the forecast. Which loss-function consideration is the BEST professional decision?
Options:
A. Train with symmetric mean absolute error
B. Train with mean squared error
C. Train with high-quantile pinball loss
D. Train with binary cross-entropy
Best answer: C
Explanation: The core issue is aligning the loss function with the prediction task and business cost. This is not a request for the conditional mean; operations needs an upper-demand forecast that reduces costly understaffing. A high-quantile pinball loss, such as a 0.8 or 0.9 quantile chosen from the stated service objective, directly trains the model to estimate a conditional quantile rather than an average. Validation should then check empirical coverage and operational cost, not only generic error metrics. Symmetric losses can be useful for typical-value forecasting, but they do not encode the stated asymmetry.
Topic: Machine Learning
A financial services team is selecting a fraud triage model. Compliance requires investigator-facing reason codes, and the scoring service must keep p95 latency under 75 ms. The current baseline ROC-AUC is 0.78; the target is at least 0.82.
Exhibit: Validation and deployment summary
| Model | ROC-AUC | p95 latency | Explanation support | Ops complexity |
|---|---|---|---|---|
| Single decision tree | 0.79 | 9 ms | native path rules | low |
| Random forest | 0.83 | 42 ms | local reason codes in batch | medium |
| Gradient boosting | 0.85 | 118 ms | SHAP job required | high |
| Stacked ensemble | 0.86 | 210 ms | inconsistent across layers | very high |
Which conclusion is best supported by the exhibit?
Options:
A. Select the stacked ensemble because it performs best.
B. Select the single decision tree for maximum interpretability.
C. Select gradient boosting because it has higher ROC-AUC.
D. Select the random forest as the best trade-off.
Best answer: D
Explanation: Ensemble selection should consider performance, interpretability, and operational fit together. The random forest is the only model shown that clears the target ROC-AUC of 0.82, stays under the 75 ms p95 latency limit, and still supports investigator-facing reason codes. It is less transparent than a single decision tree, but the tree fails the performance target. The higher-scoring gradient boosting and stacked ensemble options introduce operational problems that violate latency and explanation requirements. In regulated or investigator-assisted workflows, a small AUC gain is not automatically worth added complexity when deployment constraints are missed.
Topic: Machine Learning
A healthcare network is building a triage model for routing patients to outreach programs. Clinicians require a small set of human-readable rules they can review, and EDA shows that risk changes sharply at different thresholds for age, recent visits, and lab values. Which model family best maps to these requirements?
Options:
A. k-nearest neighbors
B. Decision tree
C. Gradient-boosted trees
D. Logistic regression
Best answer: B
Explanation: A decision tree is the best fit when the business requirement emphasizes transparent, reviewable decision rules and the data suggests non-linear split points. Trees recursively partition the feature space, so they can capture threshold effects such as different risk above a lab-value cutoff or within an age band without requiring manual interaction terms. The resulting path from root to leaf is also easier for clinicians or compliance reviewers to inspect than distance-based or ensemble behavior. A boosted tree model may improve predictive performance, but it sacrifices the simple single-tree rule structure requested in the scenario.
Topic: Machine Learning
A facilities analytics team needs a model to predict monthly maintenance cost for each production line and explain how cost changes with operating conditions. Which supervised learning method is best supported by the exhibit?
Exhibit: EDA and requirements summary
| Item | Finding |
|---|---|
| Target | Monthly cost in USD |
| Predictors | Run hours, machine age, load, temperature |
| EDA pattern | Mostly linear trends with cost |
| Stakeholder need | Explain marginal effect of each predictor |
Options:
A. Multiple linear regression
B. Logistic regression
C. K-means clustering
D. Random forest classifier
Best answer: A
Explanation: Multiple linear regression fits the scenario because the target is a continuous numeric outcome: monthly maintenance cost. The exhibit also states that predictor relationships are mostly linear and stakeholders need to explain the marginal effect of each predictor. A linear regression coefficient can be interpreted as the expected change in cost for a one-unit change in a predictor, holding other predictors constant. That directly supports both prediction and explainability. A more complex model could be considered later if residual analysis shows nonlinearity or interactions, but the provided evidence supports starting with an interpretable linear model.
Use the CompTIA DataAI DY0-001 Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.
Try CompTIA DataAI DY0-001 on Web View CompTIA DataAI DY0-001 Practice Test
Use the full IT Mastery practice page above for the latest review links and practice page.