AIF-C01 — AWS Certified AI Practitioner Quick Review

Quick Review for AWS Certified AI Practitioner (AIF-C01): high-yield AI, ML, generative AI, AWS service selection, security, evaluation, and practice focus.

Quick Review purpose

This Quick Review is for candidates preparing for the real AWS Certified AI Practitioner (AIF-C01) exam from AWS, exam code AIF-C01. Use it to refresh the concepts that are easiest to confuse before moving into IT Mastery practice, original practice questions, topic drills, mock exams, and detailed explanations.

The AIF-C01 exam is foundational. Expect questions that test whether you can:

  • Explain core AI, ML, and generative AI concepts.
  • Match business use cases to appropriate AWS services.
  • Recognize responsible AI, security, privacy, and governance concerns.
  • Understand high-level model lifecycle, data preparation, evaluation, and monitoring.
  • Choose between managed AI services, Amazon Bedrock, and Amazon SageMaker based on scenario clues.

This page is IT Mastery review support and is not affiliated with AWS.

Exam mindset: how to choose the best answer

Many AIF-C01 questions are scenario-based. Do not answer only from memorized service names. First identify the task, then the level of customization, then the operational responsibility.

If the scenario says…Think first…Common trap
“Extract text, forms, or tables from scanned documents”Amazon TextractChoosing Amazon Rekognition just because an image is involved
“Analyze sentiment, entities, key phrases, or language in text”Amazon ComprehendChoosing Amazon Bedrock when a managed NLP service is enough
“Convert speech to text”Amazon TranscribeConfusing with Amazon Polly
“Convert text to lifelike speech”Amazon PollyConfusing with Amazon Transcribe
“Translate text between languages”Amazon TranslateConfusing translation with summarization
“Build a conversational bot with intents and slots”Amazon LexChoosing a general LLM when the exam emphasizes intent-based bot design
“Use foundation models through an API without managing infrastructure”Amazon BedrockChoosing Amazon SageMaker by default
“Train, tune, build, deploy, or monitor custom ML models”Amazon SageMakerChoosing Bedrock when the task is custom ML lifecycle work
“No-code or low-code ML predictions for business users”Amazon SageMaker CanvasChoosing full SageMaker Studio-style development
“Enterprise generative AI assistant over business data”Amazon Q Business or Bedrock with retrievalTreating all chatbots as Amazon Lex
“Developer coding assistant”Amazon Q DeveloperConfusing with Amazon Q Business
“Search internal enterprise content”Amazon Kendra or retrieval architectureConfusing keyword search, semantic search, and generative answering

High-yield concepts to know cold

AI, ML, deep learning, and generative AI

ConceptQuick definitionWhat AIF-C01 may test
Artificial intelligenceBroad field of systems performing tasks associated with human intelligenceAI is the umbrella term
Machine learningSystems learn patterns from data rather than being explicitly programmed for every ruleTraining data, features, labels, evaluation
Deep learningML using neural networks with many layersOften used for images, speech, NLP, and foundation models
Generative AIAI that creates new content such as text, images, code, or summariesPrompts, tokens, foundation models, hallucinations, responsible use
Foundation modelLarge model trained on broad data and adaptable to many tasksOften accessed through Amazon Bedrock
Large language modelFoundation model focused on language tasksSummarization, Q&A, generation, reasoning-like responses
EmbeddingNumeric vector representation of meaningSearch, recommendations, similarity, RAG
TokenUnit of text processed by a modelCost, latency, context window, output length
InferenceUsing a trained model to make predictions or generate outputProduction use, latency, throughput, cost
TrainingLearning model parameters from dataRequires data, compute, evaluation, iteration

Supervised, unsupervised, and reinforcement learning

Learning typeUsesExamplesWatch for
Supervised learningLearn from labeled examplesClassification, regressionNeeds labeled training data
Unsupervised learningFind structure without labelsClustering, anomaly patterns, dimensionality reductionNo “correct label” in training data
Reinforcement learningLearn actions using rewards or penaltiesOptimization, game-like decision environments, agent policiesNot the default answer for ordinary prediction
Semi-supervised learningMix of small labeled data plus larger unlabeled dataReducing labeling effortUseful when labels are expensive
Self-supervised learningModel learns from data structure itselfMany foundation model pretraining approachesOften foundational for generative AI

Classification, regression, clustering, and anomaly detection

TaskOutputExampleBest metric clue
ClassificationCategory/classFraud vs not fraud, image label, sentiment classAccuracy, precision, recall, F1, ROC-AUC
RegressionNumeric valuePrice, demand, wait timeMAE, RMSE, R-squared
ClusteringGroupsCustomer segmentsSilhouette score, business usefulness
Anomaly detectionUnusual eventsUnusual transactions, abnormal sensor readingsFalse positives vs missed anomalies
RecommendationRanked itemsProducts, media, contentClick-through, conversion, ranking metrics

Core AWS service selection review

Managed AI services vs Amazon Bedrock vs Amazon SageMaker

ChoiceUse when…Candidate mistake to avoid
Managed AWS AI serviceThe task is standard and specific: OCR, speech, translation, sentiment, image labels, chatbot intentsOver-engineering with custom ML
Amazon BedrockYou need foundation models, generative AI, embeddings, RAG, agents, or guardrails through managed APIsTreating Bedrock as a traditional custom model training platform
Amazon SageMakerYou need to prepare data, train, tune, deploy, monitor, or manage custom ML modelsChoosing SageMaker when a simpler managed AI API satisfies the use case
Amazon SageMaker CanvasBusiness users need no-code or low-code predictionsAssuming every SageMaker scenario requires data scientists writing code
Amazon Q BusinessOrganization wants a generative AI assistant connected to company data and business appsConfusing enterprise assistant use cases with Amazon Lex
Amazon Q DeveloperDevelopers want coding, AWS guidance, or software development assistanceConfusing with business-user knowledge assistant use cases

AWS AI and ML services at a glance

ServicePrimary useFast exam cue
Amazon BedrockBuild generative AI applications with foundation modelsLLMs, embeddings, RAG, agents, guardrails
Amazon SageMakerBuild, train, tune, deploy, and monitor ML modelsFull ML lifecycle
Amazon SageMaker CanvasNo-code ML for business analystsPredictions without writing code
Amazon SageMaker Ground TruthData labeling workflowsHuman labeling, annotation
Amazon SageMaker ClarifyBias detection and model explainabilityFairness, explainability
Amazon SageMaker Model MonitorMonitor deployed model quality and driftProduction ML monitoring
Amazon TextractExtract printed/handwritten text, forms, and tables from documentsOCR plus document structure
Amazon ComprehendNLP for text insightsSentiment, entities, key phrases, language
Amazon TranscribeSpeech to textAudio becomes text
Amazon PollyText to speechText becomes audio
Amazon TranslateLanguage translationTranslate between languages
Amazon LexConversational interfaces using voice/textIntents, slots, chatbot flow
Amazon RekognitionImage and video analysisObjects, scenes, faces, moderation labels
Amazon PersonalizePersonalized recommendationsUser-item recommendations
Amazon KendraIntelligent enterprise searchSearch across internal documents
Amazon OpenSearch ServiceSearch, analytics, and vector search patternsSemantic search, vector retrieval
Amazon Q BusinessGenerative AI assistant for enterprise knowledgeBusiness assistant over company data
Amazon Q DeveloperGenerative AI assistant for developersCode, AWS development help
AWS GlueETL and Data CatalogPrepare/catalog data
Amazon S3Object storage for data lakes, datasets, artifactsDurable storage foundation
AWS Lake FormationData lake governancePermissions and governance for data lakes
Amazon AthenaQuery S3 data using SQLServerless interactive query
Amazon RedshiftData warehouse analyticsLarge-scale structured analytics
Amazon QuickSightBusiness intelligence dashboardsVisualize and share insights
Amazon MacieDiscover sensitive data in S3PII/sensitive data detection
AWS IAMIdentity and access controlLeast privilege
AWS KMSEncryption key managementProtect data at rest
AWS CloudTrailAPI activity audit logsWho did what, when
Amazon CloudWatchMetrics, logs, alarmsOperational monitoring

ML lifecycle quick review

AIF-C01 usually tests lifecycle understanding at a conceptual level: what happens before, during, and after model development.

    flowchart LR
	  A[Define business problem] --> B[Collect and govern data]
	  B --> C[Prepare, clean, label, and split data]
	  C --> D[Train or select model]
	  D --> E[Evaluate against metrics]
	  E --> F[Deploy for inference]
	  F --> G[Monitor quality, drift, latency, and cost]
	  G --> H[Retrain, tune, or improve]
	  H --> C
StageKnow thisCommon trap
Define problemConvert business goal into ML task and success metricStarting with a model before defining success
Collect dataData must be relevant, permitted, representative, and high qualityAssuming more data always fixes poor data quality
Label dataSupervised learning needs correct labelsIgnoring label noise and inconsistent annotation
Prepare dataClean, normalize, transform, handle missing values, remove duplicatesAccidentally introducing data leakage
Split dataUse training, validation, and test data appropriatelyEvaluating on the same data used to train
Train/select modelChoose model based on task, data, cost, latency, and explainabilityPicking the largest model by default
EvaluateUse metrics aligned with business riskRelying on accuracy for imbalanced data
DeployMake model available for inferenceIgnoring latency, scale, and security
MonitorWatch for drift, degraded quality, bias, errors, and costTreating deployment as the finish line
ImproveTune, retrain, add data, change prompts, or redesignChanging the model without measuring impact

Data concepts that commonly appear

Data types and storage patterns

Data conceptMeaningAWS-related clue
Structured dataRows and columns with schemaDatabases, warehouses, SQL analytics
Semi-structured dataFlexible structure such as JSON, logs, XMLData lakes, Glue, Athena
Unstructured dataText, images, audio, video, documentsS3, Textract, Comprehend, Rekognition, Transcribe
Data lakeCentral storage for raw and processed dataAmazon S3 plus governance/catalog tools
Data warehouseOptimized analytics on structured dataAmazon Redshift
Data catalogMetadata about data assetsAWS Glue Data Catalog
FeatureInput variable used by a modelCustomer age, text embedding, transaction amount
LabelCorrect answer used in supervised learningFraud/not fraud, category, price
Feature engineeringTransforming data into useful model inputsScaling, encoding, extracting features
Data leakageTraining uses information unavailable at prediction timeInflated test results, poor real-world performance
Data driftInput data distribution changes over timeMonitoring and retraining needed
Concept driftRelationship between inputs and target changesModel may become stale even if pipeline works

Data quality and bias checks

High-yield review points:

  • Representative data matters. If training data excludes important populations, conditions, products, geographies, or use cases, predictions may be biased or unreliable.
  • Labels must be accurate. Bad labels create bad supervised models.
  • Missing values need deliberate handling. Dropping records may bias the dataset; imputing values may introduce assumptions.
  • Outliers are not always errors. In fraud or anomaly detection, unusual points may be the signal.
  • PII and sensitive data require controls. Use data minimization, access control, encryption, masking/redaction where appropriate, and auditability.
  • Training and test sets must remain separate. If the model “sees” test data during training or tuning, evaluation is not trustworthy.

Generative AI quick review

Foundation model concepts

ConceptWhat to remember
PromptInput instructions and context given to a generative model
System instructionHigher-level behavior or constraints for the model
Context windowAmount of input/output text the model can consider at once
TemperatureControls randomness; lower is more predictable, higher is more varied
Top-pControls sampling from probable tokens
Max tokensLimits output length and affects cost/latency
Stop sequenceText pattern that tells generation to stop
EmbeddingsVector representations used for semantic similarity and retrieval
HallucinationPlausible but incorrect or unsupported output
GroundingTying model output to trusted context or source data
RAGRetrieval-Augmented Generation: retrieve relevant content, then generate an answer using it
Fine-tuningAdapting a model’s behavior using task-specific examples
AgentSystem that uses a model to reason over tasks and call tools/APIs
GuardrailControl to reduce unsafe, unwanted, or noncompliant outputs

RAG vs fine-tuning vs prompt engineering

NeedBest first approachWhy
Improve instructions, format, tone, or constraintsPrompt engineeringFastest and lowest operational change
Answer using current or private company knowledgeRAGAdds external context without retraining the model
Reduce hallucinations by grounding in approved documentsRAG plus evaluation and guardrailsThe model can cite or use retrieved sources
Teach a repeated task style or domain-specific output patternFine-tuningChanges behavior based on examples
Add new factual knowledge that changes oftenRAGEasier to update documents than retrain
Enforce safety boundariesGuardrails plus prompt controlsDo not rely on prompt wording alone
Connect model to actions or APIsAgent architectureModel can plan and invoke tools under controls

Typical RAG flow

  1. Store trusted documents in a searchable knowledge source.
  2. Convert document chunks into embeddings.
  3. Store embeddings in a vector-capable store.
  4. Convert the user query into an embedding.
  5. Retrieve the most relevant chunks.
  6. Add retrieved context to the prompt.
  7. Generate a grounded response.
  8. Apply guardrails, logging, evaluation, and human review where needed.

Common RAG traps:

  • Poor chunking can retrieve irrelevant or incomplete context.
  • Stale source documents produce stale answers.
  • Retrieval does not guarantee correctness; evaluate generated answers.
  • RAG helps with knowledge grounding but does not automatically solve authorization. Users should only retrieve data they are allowed to access.
  • Prompt injection can occur when retrieved content contains malicious instructions. Guardrails and input/output controls matter.

Model evaluation and metrics

Classification metrics

Know what each metric favors. You usually do not need heavy math, but you should understand the tradeoff.

MetricPlain meaningUse when…Trap
AccuracyOverall percent correctClasses are balanced and errors have similar costMisleading for imbalanced data
PrecisionOf predicted positives, how many were correctFalse positives are costlyHigh precision may miss true positives
RecallOf actual positives, how many were foundFalse negatives are costlyHigh recall may create many false positives
F1 scoreBalance of precision and recallNeed one combined metricHides which error type matters more
ROC-AUCRanking/separation quality across thresholdsComparing binary classifiersDoes not directly pick the operating threshold
Confusion matrixCounts true/false positives/negativesUnderstanding error typesMust interpret positive class correctly

Useful formulas:

\[ \text{Accuracy} = \frac{\text{correct predictions}}{\text{all predictions}} \]\[ \text{Precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}} \]\[ \text{Recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} \]\[ \text{F1} = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}} \]

Regression and generative AI evaluation

Evaluation areaMetric or methodWhat it tells you
Regression errorMAEAverage absolute error; easier to explain
Regression errorRMSEPenalizes large errors more strongly
Regression fitR-squaredAmount of variance explained
Generative AI qualityHuman evaluationWhether output is useful, accurate, and appropriate
Generative AI groundingFactuality/groundedness checksWhether response is supported by source context
Generative AI safetyToxicity, harmful content, policy checksWhether output violates safety requirements
Generative AI relevanceRelevance scoringWhether output answers the user’s question
OperationsLatency, throughput, error rateWhether the solution performs in production
CostCost per request, token usage, infrastructure costWhether the solution is economically viable

Responsible AI review

AIF-C01 candidates should recognize responsible AI as a lifecycle concern, not a single feature.

ThemeMeaningPractical controls
FairnessAvoid unjustified performance gaps or harmful biasRepresentative data, bias checks, SageMaker Clarify, human review
ExplainabilityUnderstand why a model produced an outputFeature attribution, interpretable models, documentation
TransparencyCommunicate AI use, limitations, and confidence appropriatelyUser notices, model documentation, clear escalation paths
PrivacyProtect personal and sensitive dataData minimization, masking, encryption, access control
SecurityProtect systems, models, data, and promptsIAM, KMS, network controls, logging, secure APIs
SafetyReduce harmful, toxic, or inappropriate outputsGuardrails, content filters, testing, human oversight
RobustnessMaintain quality under realistic input variationEvaluation, adversarial testing, monitoring
GovernanceManage approvals, accountability, and auditabilityPolicies, versioning, logs, risk review, ownership
ControllabilityKeep humans and systems in control of AI behaviorConstraints, approval workflows, rollback options

Common responsible AI mistakes:

  • Treating fairness as only a data science issue. It also involves product design, monitoring, and governance.
  • Assuming a model is objective because it is mathematical.
  • Using sensitive data without a clear purpose or access controls.
  • Deploying generative AI without testing for hallucinations, unsafe output, and prompt injection.
  • Failing to document known limitations.
  • Ignoring human review for high-impact or ambiguous decisions.

Security, privacy, and governance decision rules

Core AWS controls

RequirementAWS control to considerExam cue
Restrict who can call a service or access dataAWS IAMLeast privilege, roles, policies
Encrypt data at restAWS KMS with service encryption featuresKey management, encryption
Protect data in transitTLS/HTTPSSecure communication
Audit API activityAWS CloudTrailWho called which API
Monitor metrics and logsAmazon CloudWatchAlarms, logs, dashboards
Detect sensitive data in S3Amazon MaciePII discovery
Govern data lake accessAWS Lake FormationData lake permissions
Avoid hardcoded secretsAWS Secrets ManagerSecure secret storage
Private connectivity to supported servicesVPC endpoints / AWS PrivateLink patternsAvoid public internet paths where required
Control S3 accessBucket policies, IAM, encryption, block public accessProtect datasets and artifacts

AI-specific security concerns

ConcernWhy it mattersMitigation direction
Prompt injectionMalicious input tries to override instructionsInput validation, guardrails, isolation, retrieval controls
Data leakageSensitive data appears in prompts, logs, or outputsData minimization, redaction, access control
Unauthorized retrievalRAG returns documents a user should not seeEnforce permissions before retrieval and generation
Hallucinated authorityModel fabricates policies, citations, or factsGrounding, citations, human review, evaluation
Model driftProduction behavior degrades over timeMonitoring, retraining, rollback
Over-permissioned agentsAgent can perform actions beyond user intentLeast privilege, scoped tools, approvals
Unsafe outputHarmful, biased, or noncompliant contentGuardrails, filters, testing, escalation

Cost, performance, and operational tradeoffs

AIF-C01 questions may include practical constraints such as budget, latency, scale, and maintainability.

Decision factorWhat to remember
Model sizeLarger models may improve quality but often increase cost and latency
Token volumeMore input/output tokens usually increase cost and response time
Context lengthLonger context can help but may add cost and noise
Prompt qualityBetter prompts can improve results without changing models
RAG retrieval qualityGood retrieval can reduce hallucinations and improve relevance
Batch vs real timeBatch processing can be cheaper or simpler when immediate response is not needed
Managed servicesReduce operational burden for common AI tasks
MonitoringNeeded for errors, latency, drift, quality, and cost
Human reviewAdds cost but may be necessary for high-risk or low-confidence outputs
Right-sizingMatch solution complexity to business value and risk

High-yield scenario patterns

Scenario clueLikely answer direction
“Business users want predictions without coding”Amazon SageMaker Canvas
“Data scientists need to build, train, and deploy a custom model”Amazon SageMaker
“Use multiple foundation models through a managed service”Amazon Bedrock
“Add enterprise documents to a generative AI Q&A workflow”RAG, Knowledge Bases-style architecture, or Amazon Q Business depending on wording
“Prevent harmful generative AI responses”Guardrails, content filtering, evaluation, human review
“Find sensitive data in S3 before using it for ML”Amazon Macie
“Catalog and prepare data for analytics or ML”AWS Glue and AWS Glue Data Catalog
“Query data directly in S3 with SQL”Amazon Athena
“Central data lake governance”AWS Lake Formation
“Analyze call recordings by converting audio to text”Amazon Transcribe, then text analysis if needed
“Extract fields from invoices or forms”Amazon Textract
“Detect objects or moderation labels in images”Amazon Rekognition
“Identify sentiment and entities in customer reviews”Amazon Comprehend
“Create natural-sounding audio from text”Amazon Polly
“Build a bot that collects required fields from users”Amazon Lex
“Translate support content into another language”Amazon Translate
“Personalized product recommendations”Amazon Personalize
“Audit who accessed AI resources”AWS CloudTrail
“Encrypt data used by AI workloads”AWS KMS and service-level encryption settings

Common candidate mistakes

  1. Choosing the most advanced service instead of the most appropriate service. If a managed AI service directly solves the use case, it is often the best foundational answer.

  2. Confusing Amazon Bedrock and Amazon SageMaker. Bedrock is the first thought for managed foundation model and generative AI application patterns. SageMaker is the first thought for custom ML lifecycle work.

  3. Using fine-tuning when RAG is the better fit. If the problem is “answer from current company documents,” think retrieval and grounding before fine-tuning.

  4. Using accuracy for imbalanced classification. A fraud model that predicts “not fraud” almost every time may have high accuracy and still be useless. Think precision, recall, F1, and business cost of errors.

  5. Ignoring data leakage. If future information appears in training data, evaluation results may look excellent but fail in production.

  6. Treating deployment as the end. Real systems require monitoring for drift, quality, latency, errors, security, and cost.

  7. Assuming generative AI output is always correct. LLMs can hallucinate. Use grounding, evaluation, guardrails, citations, and human review where appropriate.

  8. Forgetting authorization in RAG. Retrieval must respect user permissions. A model should not expose documents just because they exist in the vector store.

  9. Confusing speech, text, and language services. Transcribe is speech-to-text. Polly is text-to-speech. Translate changes language. Comprehend analyzes text.

  10. Overlooking responsible AI. Fairness, privacy, security, explainability, safety, robustness, transparency, and governance are all testable themes.

Fast final review checklist

Before starting topic drills or a mock exam, make sure you can answer these without hesitation:

  • Can you explain AI vs ML vs deep learning vs generative AI?
  • Can you distinguish supervised, unsupervised, and reinforcement learning?
  • Can you identify classification, regression, clustering, recommendation, and anomaly detection scenarios?
  • Can you explain features, labels, training, validation, testing, and inference?
  • Can you identify overfitting, underfitting, data leakage, drift, and bias?
  • Can you choose between managed AI services, Amazon Bedrock, and Amazon SageMaker?
  • Can you explain embeddings, vector search, semantic similarity, and RAG?
  • Can you decide when prompt engineering, RAG, fine-tuning, agents, or guardrails are appropriate?
  • Can you choose the right evaluation metric for common scenarios?
  • Can you recognize responsible AI risks and mitigation controls?
  • Can you map IAM, KMS, CloudTrail, CloudWatch, Macie, Glue, S3, and Lake Formation to security and governance needs?

Practice plan after this Quick Review

Use this Quick Review as a checkpoint, then move into IT Mastery practice:

  1. Start with topic drills on AI/ML fundamentals, generative AI, AWS service selection, responsible AI, and security.
  2. Use original practice questions to force scenario recognition rather than memorization.
  3. Read detailed explanations for every missed question and every guessed question.
  4. Create a miss log with three columns: concept missed, why the wrong answer was tempting, and the decision rule to remember.
  5. Take a mixed mock exam only after your topic drills show consistent performance across service selection, generative AI, evaluation, and governance.

Next step: choose a focused AIF-C01 question bank topic drill, answer without notes, then review the detailed explanations until you can explain why each wrong option is wrong.

Continue in IT Mastery

Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official AWS questions, copied live-exam content, or exam dumps.

Browse Certification Practice Tests by Exam Family