Free Python Institute PCEI Practice Questions: ML Fundamentals

Last revised: July 25, 2026

Practice 10 free Python Institute PCEI - Certified Entry-Level AI Specialist with Python (PCEI-30-01) questions on ML Fundamentals, with answers, explanations, and the IT Mastery next step.

Try the IT Mastery web app for a richer interactive practice experience with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Python Institute PCEI on Web

Topic snapshot

Field	Detail
Practice target	Python Institute PCEI
Topic area	Block 2: Machine Learning Fundamentals
Blueprint weight	16.5%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Block 2: Machine Learning Fundamentals for Python Institute PCEI. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 16.5% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These are original IT Mastery practice questions aligned to this topic area. They are not official Python Institute questions, copied live-exam content, or exam dumps. Use them to preview question style and explanation depth before continuing with topic drills, mixed sets, and timed mocks in IT Mastery.

Question 1

Topic: Block 2: Machine Learning Fundamentals

A beginner classifier uses nearest-neighbor classification with k = 1. A smaller distance means the training example is more similar to the new message.

Exhibit: Distances to the new message

Training message	Known label	Distance
Message A	not spam	3.2
Message B	spam	1.1
Message C	not spam	2.4
Message D	spam	4.0

Which label should the classifier assign to the new message?

Options:

A. Assign not spam
B. Assign spam
C. Average the labels first
D. Wait for more training rows

Best answer: B

Explanation: Nearest-neighbor classification assigns a label by comparing a new item to labeled training examples. With k = 1, only the single closest training example matters. In the exhibit, the smallest distance is 1.1 for Message B, so the new message receives Message B’s known label. The other distances are larger and do not affect the decision when k = 1.

The key takeaway is that smaller distance means greater similarity, and k = 1 uses only the nearest labeled example.

Not spam fails because the closest not-spam example has distance 2.4, which is farther than Message B.
More rows fails because the provided labeled examples and distances are enough for a k = 1 decision.
Averaging labels fails because nearest-neighbor classification selects labels from nearby examples, not by averaging text labels.

Question 2

Topic: Block 2: Machine Learning Fundamentals

A small clinic has 800 past appointment records. Each record includes features such as day of week, appointment type, and patient age group, plus a known label: missed or attended. The team wants a Python model that predicts missed or attended for new appointments. A coworker proposes using k-means clustering because it can group similar records. What is the best judgment about this proposal?

Options:

A. Use linear regression because the model predicts the future
B. Use a supervised classification algorithm instead
C. Use k-means because the records have similar features
D. Remove the labels so k-means can learn without bias

Best answer: B

Explanation: This is an algorithm-fit question. The clinic already has labeled examples (missed or attended) and wants the same kind of label for new records. That is a supervised classification task: the model learns from input features paired with known class labels. K-means clustering is unsupervised and is used to find groups when labels are not provided or when the goal is exploration, not direct prediction of a known class. Regression is also mismatched because the desired output is not a continuous number. The key takeaway is to match the algorithm family to the available labels and the desired output type.

Similarity alone is not enough for k-means when the goal is to predict an existing label.
Regression wording fails because “predict” can mean class prediction, not only numeric forecasting.
Removing labels wastes useful training information and changes a supervised task into an unsupervised one.

Question 3

Topic: Block 2: Machine Learning Fundamentals

A beginner ML script compares a new point with one known point using Euclidean distance. The formula is \(d = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}\). Select ONE result that follows.

import math

new_point = (2, 3)
known_point = (5, 7)

distance = math.sqrt((5 - 2)**2 + (7 - 3)**2)
print(distance)

Options:

A. 25.0
B. 7.0
C. 5.0
D. 1.0

Best answer: C

Explanation: Euclidean distance is the straight-line distance between two numeric points. For the points (2, 3) and (5, 7), the x-values differ by 3 and the y-values differ by 4. Squaring those differences gives 9 and 16, and their sum is 25. The square root of 25 is 5. Because math.sqrt() returns a floating-point number in Python, the printed output is shown with a decimal as 5.0. The key idea is to square the coordinate differences before adding them, then take the square root.

Adding coordinates would lead to unrelated totals and does not follow the Euclidean distance formula.
Stopping at 25.0 uses the squared distance but misses the final square root step.
Using 1.0 treats the coordinate changes as nearly identical rather than computing straight-line distance.

Question 4

Topic: Block 2: Machine Learning Fundamentals

A beginner AI project has this note:

System: warehouse robot simulator
Behavior: tries different paths to a pickup point
Feedback: +5 for reaching the item, -2 for hitting a wall
Result: after many trials, it chooses shorter paths more often

Select ONE: Which type of machine learning does this example best illustrate?

Options:

A. Reinforcement learning
B. Rule-based programming
C. Supervised learning
D. Unsupervised learning

Best answer: A

Explanation: Reinforcement learning is used when an agent learns by taking actions in an environment and receiving feedback as rewards or penalties. In the note, the robot simulator tries paths, receives positive feedback for reaching the item, and receives negative feedback for hitting a wall. Over repeated trials, it changes its behavior toward better paths.

Supervised learning would require labeled examples, such as paths already marked as “good” or “bad.” Unsupervised learning would look for patterns in unlabeled data, not learn from action-based rewards.

Supervised learning is tempting because the system improves, but no labeled training examples are provided.
Unsupervised learning does not fit because the system is not grouping or discovering structure in unlabeled data.
Rule-based programming would follow fixed instructions rather than improving through trial-and-feedback.

Question 5

Topic: Block 2: Machine Learning Fundamentals

A beginner ML exercise uses fixed thresholds to classify machine readings before any model is trained. Interpret the Python logic shown.

readings = [
    {"id": "A", "temp": 72, "vibration": 4},
    {"id": "B", "temp": 66, "vibration": 6},
    {"id": "C", "temp": 81, "vibration": 3},
]

def classify(r):
    if r["temp"] >= 80 or r["vibration"] >= 7:
        return "urgent"
    elif r["temp"] >= 70 or r["vibration"] >= 5:
        return "watch"
    else:
        return "normal"

labels = {r["id"]: classify(r) for r in readings}
print(labels)

Which output is produced?

Options:

A. {'A': 'urgent', 'B': 'watch', 'C': 'urgent'}
B. {'A': 'watch', 'B': 'normal', 'C': 'urgent'}
C. {'A': 'watch', 'B': 'watch', 'C': 'urgent'}
D. {'A': 'normal', 'B': 'watch', 'C': 'urgent'}

Best answer: C

Explanation: Rule-based classification applies the first condition whose threshold test is true. The urgent rule is checked first and requires temp >= 80 or vibration >= 7. Reading C has temp 81, so it is urgent. Readings A and B do not meet the urgent thresholds, so Python checks the elif: A has temp 72, and B has vibration 6, so both are watch. The else branch is used only when neither threshold group is met.

A as normal fails because temp 72 satisfies the watch threshold.
B as normal fails because vibration 6 satisfies the watch threshold.
A as urgent fails because neither temp >= 80 nor vibration >= 7 is true for A.

Question 6

Topic: Block 2: Machine Learning Fundamentals

A beginner AI team records this project note:

Goal: Group customers with similar buying patterns.
Data: Past purchases and visit counts for each customer.
Provided answers: No category names or target labels are included.
Expected result: Customer groups for later marketing review.

Which type of machine learning does this task describe?

Options:

A. Supervised learning
B. Rule-based classification
C. Unsupervised learning
D. Reinforcement learning

Best answer: C

Explanation: Unsupervised learning is used when the data has inputs but no provided correct answers, labels, or target values. In this note, the team wants the model to discover customer groups from purchase and visit patterns. Because the expected result is a set of groups rather than predictions against known labels, this is a clustering-style unsupervised task. Supervised learning would require examples such as “customer type = budget buyer,” and reinforcement learning would involve an agent learning from rewards after actions.

Supervised label trap fails because the note explicitly says no category names or target labels are provided.
Reward-system trap fails because there is no agent taking actions and receiving rewards.
Rules trap fails because the note describes discovering groups from data, not applying manually written if/then rules.

Question 7

Topic: Block 2: Machine Learning Fundamentals

A team is preparing a small supervised ML dataset to predict support-ticket priority. The priority label must be one of Low, Medium, or High, and days_open must be numeric. Before splitting the data into training and test sets, you inspect this sample:

ticket_id,days_open,customer_tier,priority
T101,2,Gold,High
T102,,Silver,Low
T103,three,Bronze,Medium
T104,5,Gold,High
T104,5,Gold,Low
T105,999,Silver,Urgent

Which is the best next action?

Options:

A. Convert every column to text so all values have the same type
B. Investigate and clean the missing value, type error, outlier, duplicate, and invalid label
C. Remove only the repeated ticket_id row and keep the rest unchanged
D. Split the data now because the model can learn around noisy rows

Best answer: B

Explanation: Data-quality checks should happen before training and usually before the final train/test split, so the team understands what data the model will learn from. This sample has several common issues: a missing days_open value, a nonnumeric value (three) in a numeric field, a likely outlier (999 days), a duplicate ticket_id with conflicting labels, and an invalid label (Urgent) outside the allowed label set. The best action is to investigate and clean or document these issues using consistent rules, rather than letting them silently affect training or evaluation. Cleaning only one issue would leave other problems that can distort model behavior and metrics.

Training immediately ignores that noisy labels and invalid feature values can harm both learning and evaluation.
Text conversion hides the numeric-type problem instead of fixing whether days_open is valid numeric data.
Duplicate-only cleanup addresses one visible issue but leaves missing values, an outlier, and an invalid label unresolved.

Question 8

Topic: Block 2: Machine Learning Fundamentals

A beginner ML team is building a model to classify support tickets as billing, technical, or account.

Workflow note:

1. Exported 2,000 past support tickets from the help desk system
2. Removed duplicate tickets and fixed missing category labels
3. Used the cleaned labeled tickets to build a classification model

According to the basic machine learning workflow, what should the team do next?

Options:

A. Clean the same labels again
B. Use the model for live inference immediately
C. Evaluate the model on test data
D. Collect the original tickets again

Best answer: C

Explanation: The basic machine learning workflow usually follows this order: data collection, data cleaning, training, evaluation, and inference. In the note, the team has already collected past tickets, cleaned the dataset, and trained a classification model. The next step is evaluation: checking the trained model on data not used for training to estimate how well it performs. Only after evaluation shows acceptable results should the team use the model for inference, such as classifying new live tickets. The key distinction is that training builds the model, while evaluation checks whether the trained model is reliable enough to use.

Collecting again repeats an earlier step and is not the normal next step after training.
Cleaning again may happen if problems are found, but the workflow note says cleaning has already been completed.
Immediate inference skips evaluation, so the team would not know whether the model performs acceptably.

Question 9

Topic: Block 2: Machine Learning Fundamentals

A beginner model checks products for defects. The positive class is defect.

Item	Actual class	Model prediction
A	defect	defect
B	no defect	defect
C	defect	no defect
D	no defect	no defect

Which statement correctly interprets the model results? Select ONE.

Options:

A. A is a false positive; D is a false negative.
B. B is a false negative; C is a false positive.
C. B is a false positive; C is a false negative.
D. A is a true negative; D is a true positive.

Best answer: C

Explanation: In classification metrics, “positive” means the class being detected: here, defect. A false positive happens when the model predicts the positive class but the actual class is negative. Item B is actually no defect but was predicted as defect, so it is a false positive. A false negative happens when the model predicts the negative class but the actual class is positive. Item C is actually defect but was predicted as no defect, so it is a false negative.

The key is to compare each prediction with the actual class and keep track of which class is defined as positive.

Swapped errors fails because B is predicted positive, not negative, and C is predicted negative, not positive.
Correct predictions as errors fails because A and D match their actual classes.
Reversed true labels fails because A is true positive and D is true negative under the positive class defect.

Question 10

Topic: Block 2: Machine Learning Fundamentals

A beginner team wants to predict whether a new support ticket should be assigned to billing or technical. They propose using k-means on old tickets.

Project note:

Old data fields: word_count, has_invoice_number, has_error_code, assigned_team
Example assigned_team values: billing, technical
Proposed output needed: billing or technical
Proposed algorithm: k-means clustering

What does this evidence show? Select ONE.

Options:

A. The algorithm is well matched because k-means predicts labels directly.
B. The label column should be removed because it prevents learning.
C. The task should use regression because there are two possible outputs.
D. The algorithm is mismatched; use supervised classification.

Best answer: D

Explanation: This is an algorithm-fit issue. The project already has labeled examples in assigned_team, and the desired result for a new ticket is one of the known categories: billing or technical. That makes the task supervised classification. K-means is an unsupervised clustering algorithm; it groups similar records but does not learn from the provided class labels or directly predict meaningful category names. A classifier such as a decision tree, k-nearest neighbors classifier, or similar supervised method would match the task better.

The key takeaway is to match the algorithm family to the data and output: labeled category prediction calls for classification, not clustering.

K-means label prediction is misleading because cluster numbers are not the same as trained category labels.
Removing the label would discard the field that makes supervised learning possible.
Regression is for predicting numeric values, not choosing between named teams.

Continue in the web app

Use IT Mastery for interactive Python Institute PCEI practice with mixed sets, timed mocks, topic drills, explanations, and progress tracking.

Try Python Institute PCEI on Web

AI Fundamentals

Data Handling and Visualization

Free Python Institute PCEI Practice Questions: ML Fundamentals

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue in the web app

Related focused pages

Browse Certification Practice Tests by Exam Family