AWS AIF-C01: Fundamentals of AI and ML

Try 10 focused AWS AIF-C01 questions on Fundamentals of AI and ML, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AWS AIF-C01 on Web View full AWS AIF-C01 practice page

Topic snapshot

FieldDetail
Exam routeAWS AIF-C01
Topic areaFundamentals of AI and ML
Blueprint weight20%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Fundamentals of AI and ML for AWS AIF-C01. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 20% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Fundamentals of AI and ML

A team is building an NLP feature to process customer emails.

Exhibit: Feature requirements (excerpt)

1 Input: "EmailBody" (free-form text)
2 Output A: "Sentiment" = {positive, neutral, negative}
3 Output B: "Entities" = {OrderId, ProductName, DeliveryDate}
4 Output C: "Summary" = 3-bullet recap of the email

Which set of NLP tasks best matches Outputs A, B, and C?

Options:

  • A. Classification, extraction, summarization

  • B. Summarization, regression, speech recognition

  • C. Extraction, image segmentation, anomaly detection

  • D. Translation, clustering, topic modeling

Best answer: A

Explanation: NLP is a branch of AI focused on understanding and generating human language text. In the exhibit, a sentiment label (line 2) indicates classification, extracting specific fields like OrderId and DeliveryDate (line 3) indicates extraction, and producing a short recap (line 4) indicates summarization.

Natural language processing (NLP) refers to AI techniques that work with human language, such as interpreting text and generating text. Using only the exhibit:

  • Line 2 asks for a single category from a fixed set (positive/neutral/negative), which is a text classification task.
  • Line 3 asks to pull specific real-world fields (OrderId, ProductName, DeliveryDate) out of unstructured text, which is information extraction (often entity extraction).
  • Line 4 asks to create a shorter representation of the email, which is summarization.

A useful rule of thumb is: labels imply classification, pulled fields imply extraction, and condensed text implies summarization.

  • Translation/unsupervised tasks don’t match because no language conversion or grouping/discovery is requested in lines 2–4.
  • Regression doesn’t fit because the exhibit asks for discrete categories and fields, not a continuous numeric value.
  • Non-NLP modalities like speech recognition or image segmentation are not relevant because the input is text-only (line 1).

Question 2

Topic: Fundamentals of AI and ML

A team is deciding which ML paradigm to use for several new features in an AWS environment (for example, using Amazon SageMaker AI). Which TWO proposed matches of learning paradigm to scenario are INCORRECT? (Select TWO.)

Options:

  • A. Reinforcement learning to optimize an online offer based on conversion reward

  • B. Supervised learning to discover themes in unlabeled support tickets

  • C. Supervised learning to predict fraud using labeled past transactions

  • D. Unsupervised learning to segment customers by similar behavior patterns

  • E. Reinforcement learning to learn navigation by maximizing a reward score

  • F. Unsupervised learning to choose game actions to maximize long-term points

Correct answers: B and F

Explanation: Supervised learning uses labeled input-output examples, unsupervised learning finds structure in unlabeled data, and reinforcement learning learns actions through rewards. The incorrect proposals either assume labels exist when they do not, or use a paradigm that cannot represent sequential decision-making and reward maximization. Correctly matching the paradigm helps ensure the model can be validated and governed appropriately.

The core distinction is the learning signal. Supervised learning trains on labeled examples (a known target) to predict outcomes like fraud/not-fraud. Unsupervised learning has no target labels and instead discovers patterns such as clusters or topics in text. Reinforcement learning (RL) learns a policy for taking actions in an environment by maximizing a reward over time.

When a use case involves unlabeled “theme discovery,” supervised learning is a mismatch because there is no ground-truth label to train and evaluate against. When a use case involves choosing actions to maximize long-term points, unsupervised learning is a mismatch because it does not model rewards, actions, or sequential feedback. A close distractor is using RL for online optimization (for example, conversions), which fits because a reward signal is available.

  • Fraud prediction is a classic supervised problem because historical labels provide a target.
  • Customer segmentation fits unsupervised learning because the goal is grouping without predefined labels.
  • Navigation with rewards fits reinforcement learning because behavior is learned from reward feedback.
  • Online offer optimization can fit reinforcement learning (or bandits) when conversions define the reward.

Question 3

Topic: Fundamentals of AI and ML

A healthcare support center wants to modernize its contact center workflow on AWS. The center needs to convert recorded phone calls to text, translate transcripts for bilingual agents, and analyze transcripts to detect sentiment and key topics. The transcripts can contain PHI and must be protected.

Which recommendation is INCORRECT?

Options:

  • A. Use Amazon Comprehend for sentiment and key phrase detection.

  • B. Use Amazon Polly for sentiment analysis; store transcripts in public S3.

  • C. Use Amazon Translate to translate transcripts for agents.

  • D. Use Amazon Transcribe for call audio speech-to-text.

Best answer: B

Explanation: Amazon Polly is a text-to-speech service and does not perform sentiment analysis. Storing PHI-containing transcripts in a public Amazon S3 bucket is a security and governance anti-pattern because it can expose sensitive data. The other recommendations correctly match AWS AI services to the stated use cases.

The core skill is matching AWS AI services to the right task while following basic data-protection principles for sensitive data. Use Amazon Transcribe to convert call audio into text. Use Amazon Translate to translate the resulting text into other languages. Use Amazon Comprehend to analyze text for sentiment and key phrases.

For PHI, a key governance principle is data confidentiality: keep transcripts private, encrypted, and access-controlled (for example, least-privilege IAM and S3 bucket policies, and encryption with AWS KMS). Using Amazon Polly for sentiment is also a functional mismatch, because Polly is designed to synthesize speech from text.

A wrong service choice combined with public data exposure makes the incorrect recommendation clearly unacceptable.

  • Speech-to-text mapping is appropriate because Transcribe is designed to convert audio to text.
  • Translation mapping is appropriate because Translate provides machine translation for text.
  • Text analytics mapping is appropriate because Comprehend performs sentiment and key phrase extraction.

Question 4

Topic: Fundamentals of AI and ML

A bank uses an ML model to recommend credit line increases. During an internal review, the compliance team finds that approval rates differ noticeably across customer demographic groups, and regulators require evidence that decisions are fair and explainable. The bank must keep sensitive customer data in its AWS account and wants an AWS-managed way to assess and reduce potential bias without building a custom evaluation pipeline.

Which solution BEST addresses these requirements?

Options:

  • A. Encrypt training data with AWS KMS to ensure the model is fair

  • B. Use an unsupervised clustering model to group customers by similarity

  • C. Use Amazon Comprehend to detect sentiment in application notes

  • D. Use Amazon SageMaker Clarify to measure bias and explain predictions

Best answer: D

Explanation: Bias is a systematic skew in data or model behavior that can lead to unequal outcomes for different groups, while fairness focuses on reducing unjustified differences in those outcomes. Amazon SageMaker Clarify provides managed bias detection and explainability to help the bank produce evidence for auditors and regulators, while keeping sensitive data in the bank’s AWS account.

Bias in AI systems occurs when data, features, or modeling choices cause systematic differences in outcomes across groups (for example, different approval rates by demographic segment). Fairness is the goal of identifying, measuring, and reducing those unjustified disparities so outcomes are more equitable and defensible.

In this scenario, the bank needs to (1) quantify whether the model’s recommendations differ across groups, and (2) explain why the model made particular recommendations to support compliance reviews. Amazon SageMaker Clarify is built for these tasks: it can compute bias metrics (for pre-training and post-training checks) and produce model explainability artifacts, helping teams detect issues and take corrective actions (such as revisiting features or data) while keeping sensitive data within AWS.

The key takeaway is that fairness requires measurement and transparency, not just general security controls or unrelated analytics.

  • Sentiment analysis mismatch Comprehend analyzes text (sentiment/entities) and does not measure decision bias or fairness in credit recommendations.
  • Wrong technique Clustering groups similar records but does not evaluate disparate outcomes or provide fairness evidence for a supervised decision.
  • Security ≠ fairness KMS encryption protects data confidentiality but does not detect or reduce biased model behavior.

Question 5

Topic: Fundamentals of AI and ML

A retail company wants to use historical transaction data to automatically flag each new transaction as either fraud or not fraud. The company has past examples labeled with these two outcomes.

Which machine learning approach best fits this goal?

Options:

  • A. Classification

  • B. Anomaly detection

  • C. Clustering

  • D. Regression

Best answer: A

Explanation: This problem’s discriminating factor is the desired output type: a categorical label for each transaction. Predicting one of two labeled outcomes (fraud/not fraud) is a classification use case. Classification models learn from labeled examples to assign a class to new records.

Choose classification when you need to assign each input to one of a fixed set of categories (including binary yes/no outcomes). In this scenario, the business objective is to label each new transaction as fraud or not fraud, and the training data already contains those labels, which matches supervised classification.

Anomaly detection can be useful when you don’t have reliable labels and want to surface unusual activity, but it does not directly optimize for producing the specific fraud/not-fraud class labels the business requested.

  • Regression predicts numbers and is used for continuous values like amount or risk score.
  • Clustering is unsupervised and groups similar transactions without using fraud/not-fraud labels.
  • Anomaly detection finds outliers and is commonly used when labels are missing or sparse rather than for direct labeled class assignment.

Question 6

Topic: Fundamentals of AI and ML

A retail company wants to create customer segments from clickstream and purchase data stored in Amazon S3. The company does not have existing labels for customer groups and wants to discover natural groupings to personalize marketing. The dataset includes some PII, and the solution must follow basic AWS security best practices.

Which TWO actions should the company take?

Options:

  • A. Add Amazon Bedrock Guardrails to prevent harmful text outputs from the segmentation model

  • B. Use an LLM in Amazon Bedrock to generate a persona label for each customer from raw records

  • C. Use PCA and treat the top principal component value as the customer segment

  • D. Use an unsupervised clustering algorithm (for example, SageMaker k-means) to group customers by similarity

  • E. Encrypt the S3 data with SSE-KMS and restrict access using least-privilege IAM permissions

  • F. Train a supervised classification model to predict a segment label for each customer

Correct answers: D and E

Explanation: Because there are no labels and the goal is to discover natural groupings, an unsupervised clustering approach is the appropriate ML technique. Since the data includes PII, the solution should also apply standard AWS security controls such as encryption with AWS KMS and least-privilege IAM access.

Customer segmentation without pre-existing group labels is a classic unsupervised learning use case. Clustering algorithms (such as k-means) group items based on similarity in their features (for example, purchase frequency, categories, and browsing patterns), which directly matches the requirement to discover “natural groupings” in the data.

Because the dataset contains PII, the solution must also follow AWS security best practices to reduce risk:

  • Encrypt data at rest in Amazon S3 using SSE-KMS.
  • Use IAM least privilege so only approved roles and services can access the data.

Techniques like supervised classification require labeled targets, and dimensionality reduction alone (PCA) does not produce discrete segments.

  • ✔ Use an unsupervised clustering algorithm; ✖ a supervised classifier requires labeled segment targets.
  • ✔ SSE-KMS encryption plus least-privilege IAM protects PII; ✖ using an LLM to assign personas is not a clustering approach and is harder to make consistent and auditable.
  • ✖ PCA reduces dimensions but does not create clusters; ✖ Bedrock Guardrails are for controlling generative model outputs, not numeric segmentation results.

Question 7

Topic: Fundamentals of AI and ML

Which statement best defines artificial intelligence (AI) and its goal in business terms?

Options:

  • A. Rule-based automation with fixed if-then logic

  • B. Tools that generate new text or images from prompts

  • C. Systems that perform human-like tasks to improve outcomes

  • D. Models that learn only from labeled training data

Best answer: C

Explanation: AI is the broad concept of building systems that can perform tasks associated with human intelligence, such as perception, reasoning, learning, and decision-making. In business terms, the goal is to augment or automate work to improve outcomes like productivity, accuracy, speed, and customer experience.

Artificial intelligence (AI) is an umbrella term for systems that can sense/understand information, reason about it, learn from data or feedback, and take actions toward a goal. Framed for business, AI’s purpose is not a specific algorithm—it is improving measurable outcomes (for example, better decisions, reduced operational effort, higher quality, or more personalized experiences).

Machine learning is a subset of AI focused on learning patterns from data. Generative AI is a further subset that creates new content (text, images, code) from prompts. Traditional if-then automation can be useful but is typically not considered AI because it does not adapt or infer beyond predefined rules.

  • ML-only definition is too narrow because AI includes non-ML reasoning and decision systems.
  • GenAI-only definition describes one AI capability (content generation), not AI overall.
  • Fixed rules automation lacks learning/inference and is usually not categorized as AI.

Question 8

Topic: Fundamentals of AI and ML

Which sequence correctly orders the high-level stages of an ML pipeline from data collection through ongoing monitoring?

Options:

  • A. Collect data → prepare/label → train → evaluate → deploy → monitor

  • B. Collect data → train → prepare/label → evaluate → deploy → monitor

  • C. Collect data → prepare/label → deploy → train → evaluate → monitor

  • D. Collect data → prepare/label → train → deploy → evaluate → monitor

Best answer: A

Explanation: A standard ML lifecycle flows from collecting data to preparing it (including labeling when needed), then training a model and evaluating its performance before deployment. Once deployed, monitoring is continuous to detect drift and performance issues. The key fact is that evaluation/validation happens before production deployment, and monitoring occurs after deployment.

The core idea is that you should validate a model before exposing it to production traffic, and then continuously observe it after release. At a high level, an ML pipeline typically proceeds in this order:

  • Collect data
  • Prepare the data (clean/transform; label for supervised learning)
  • Train the model
  • Evaluate/validate the model against metrics
  • Deploy for inference
  • Monitor in production for performance and drift

A common mistake is placing deployment before evaluation or treating monitoring as a pre-deployment activity rather than an ongoing post-deployment stage.

  • Training before preparation fails because training requires prepared (and often labeled) data.
  • Deploying before training is not possible because deployment packages a trained model artifact.
  • Evaluating after deployment is backwards; you evaluate/validate before production release.

Question 9

Topic: Fundamentals of AI and ML

An online retailer wants to reduce credit-card chargebacks and account takeovers. The company has 2 years of historical transaction and login data that includes a confirmed label for each event (fraud or not fraud). The company wants to assign a fraud risk score to each new transaction or login in near real time.

Which approach best matches this use case?

Options:

  • A. Use Amazon Lookout for Metrics to detect metric anomalies

  • B. Use Amazon Forecast to predict future sales totals

  • C. Use Amazon Fraud Detector to score events using labeled history

  • D. Use Amazon Comprehend to classify customer messages

Best answer: C

Explanation: The discriminating factor is the presence of historical labels (fraud vs not fraud) and the need to score each incoming event. That aligns with fraud detection, which learns from known fraud outcomes to produce a risk score for new transactions or logins. Amazon Fraud Detector is designed for this event-level fraud scoring workflow.

Fraud detection focuses on identifying known bad behavior (for example, fraudulent transactions or account takeovers) by learning patterns from historical events that are labeled as fraud or not fraud, and then producing a risk score or prediction for each new event. In this scenario, the company explicitly has confirmed fraud labels and wants near-real-time scoring per transaction/login, which matches a managed fraud detection approach.

Anomaly detection is the closer-but-different pattern: it primarily looks for unusual or rare behavior without needing confirmed fraud labels, often at an aggregate/metric level (for example, sudden spikes in failures or traffic). The key takeaway is to choose fraud detection when you have labeled outcomes and want event-level fraud risk scoring.

  • Metric anomaly detection fits when you lack fraud labels and want unusual-pattern alerts, not per-event fraud scoring.
  • NLP classification applies to text (sentiment/topics/entities), not transaction/login fraud risk.
  • Time-series forecasting predicts future numeric values (like sales), not whether an event is fraudulent.

Question 10

Topic: Fundamentals of AI and ML

A company has a customer-churn model built in Amazon SageMaker AI. The current design starts a new SageMaker training job every time a web app needs a churn score, so users wait several minutes for a response.

Constraints:

  • The web app needs a churn score in under 200 ms.
  • The model should be refreshed weekly using newly labeled outcomes.
  • The team wants to reduce cost and operational effort.

Which change best meets these requirements?

Options:

  • A. Keep training per request but switch the training job to Amazon EC2 Spot Instances

  • B. Train weekly, deploy the latest model to a SageMaker real-time endpoint for predictions

  • C. Keep training per request but use larger GPU instances to reduce wait time

  • D. Replace the churn model with an LLM in Amazon Bedrock and use prompt caching for faster responses

Best answer: B

Explanation: This workload needs a clear separation between training and inference. Run training on a schedule using labeled data to output updated model artifacts, and use inference to take feature inputs and return predictions with low latency. Hosting the trained model behind a real-time endpoint meets the 200 ms requirement while reducing unnecessary training cost and operational overhead.

Training and inference serve different purposes and have different inputs/outputs. Training is the periodic, compute-heavy process that takes labeled historical data (features + known outcomes) as input and outputs a trained model artifact. Inference is the lightweight process that takes a new unlabeled feature vector as input and outputs a prediction (for example, churn probability) quickly.

In this scenario, retraining on every web request wastes cost and cannot meet sub-200 ms latency. The optimized design is to retrain weekly (as required) and deploy the latest model artifact to a SageMaker real-time endpoint so the web app only performs inference during user requests. The key tradeoff is paying for an always-on endpoint, which is still far cheaper than per-request training and meets the latency constraint.

  • Scale training hardware still performs training during user requests, so it remains high latency and costly.
  • Use Spot for training may reduce training cost, but it does not address the need for low-latency per-request responses.
  • Switch to an LLM + caching is unrelated to churn classification needs and does not address labeled-training vs prediction-inference separation.

Continue with full practice

Use the AWS AIF-C01 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AWS AIF-C01 on Web View AWS AIF-C01 Practice Test

Free review resource

Read the AWS AIF-C01 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026