AIF-C01 — AWS Certified AI Practitioner Quick Review
Quick Review for AWS Certified AI Practitioner (AIF-C01): high-yield AI, ML, generative AI, AWS service selection, security, evaluation, and practice focus.
Quick Review purpose
This Quick Review is for candidates preparing for the real AWS Certified AI Practitioner (AIF-C01) exam from AWS, exam code AIF-C01. Use it to refresh the concepts that are easiest to confuse before moving into IT Mastery practice, original practice questions, topic drills, mock exams, and detailed explanations.
The AIF-C01 exam is foundational. Expect questions that test whether you can:
- Explain core AI, ML, and generative AI concepts.
- Match business use cases to appropriate AWS services.
- Recognize responsible AI, security, privacy, and governance concerns.
- Understand high-level model lifecycle, data preparation, evaluation, and monitoring.
- Choose between managed AI services, Amazon Bedrock, and Amazon SageMaker based on scenario clues.
This page is IT Mastery review support and is not affiliated with AWS.
Exam mindset: how to choose the best answer
Many AIF-C01 questions are scenario-based. Do not answer only from memorized service names. First identify the task, then the level of customization, then the operational responsibility.
| If the scenario says… | Think first… | Common trap |
|---|---|---|
| “Extract text, forms, or tables from scanned documents” | Amazon Textract | Choosing Amazon Rekognition just because an image is involved |
| “Analyze sentiment, entities, key phrases, or language in text” | Amazon Comprehend | Choosing Amazon Bedrock when a managed NLP service is enough |
| “Convert speech to text” | Amazon Transcribe | Confusing with Amazon Polly |
| “Convert text to lifelike speech” | Amazon Polly | Confusing with Amazon Transcribe |
| “Translate text between languages” | Amazon Translate | Confusing translation with summarization |
| “Build a conversational bot with intents and slots” | Amazon Lex | Choosing a general LLM when the exam emphasizes intent-based bot design |
| “Use foundation models through an API without managing infrastructure” | Amazon Bedrock | Choosing Amazon SageMaker by default |
| “Train, tune, build, deploy, or monitor custom ML models” | Amazon SageMaker | Choosing Bedrock when the task is custom ML lifecycle work |
| “No-code or low-code ML predictions for business users” | Amazon SageMaker Canvas | Choosing full SageMaker Studio-style development |
| “Enterprise generative AI assistant over business data” | Amazon Q Business or Bedrock with retrieval | Treating all chatbots as Amazon Lex |
| “Developer coding assistant” | Amazon Q Developer | Confusing with Amazon Q Business |
| “Search internal enterprise content” | Amazon Kendra or retrieval architecture | Confusing keyword search, semantic search, and generative answering |
High-yield concepts to know cold
AI, ML, deep learning, and generative AI
| Concept | Quick definition | What AIF-C01 may test |
|---|---|---|
| Artificial intelligence | Broad field of systems performing tasks associated with human intelligence | AI is the umbrella term |
| Machine learning | Systems learn patterns from data rather than being explicitly programmed for every rule | Training data, features, labels, evaluation |
| Deep learning | ML using neural networks with many layers | Often used for images, speech, NLP, and foundation models |
| Generative AI | AI that creates new content such as text, images, code, or summaries | Prompts, tokens, foundation models, hallucinations, responsible use |
| Foundation model | Large model trained on broad data and adaptable to many tasks | Often accessed through Amazon Bedrock |
| Large language model | Foundation model focused on language tasks | Summarization, Q&A, generation, reasoning-like responses |
| Embedding | Numeric vector representation of meaning | Search, recommendations, similarity, RAG |
| Token | Unit of text processed by a model | Cost, latency, context window, output length |
| Inference | Using a trained model to make predictions or generate output | Production use, latency, throughput, cost |
| Training | Learning model parameters from data | Requires data, compute, evaluation, iteration |
Supervised, unsupervised, and reinforcement learning
| Learning type | Uses | Examples | Watch for |
|---|---|---|---|
| Supervised learning | Learn from labeled examples | Classification, regression | Needs labeled training data |
| Unsupervised learning | Find structure without labels | Clustering, anomaly patterns, dimensionality reduction | No “correct label” in training data |
| Reinforcement learning | Learn actions using rewards or penalties | Optimization, game-like decision environments, agent policies | Not the default answer for ordinary prediction |
| Semi-supervised learning | Mix of small labeled data plus larger unlabeled data | Reducing labeling effort | Useful when labels are expensive |
| Self-supervised learning | Model learns from data structure itself | Many foundation model pretraining approaches | Often foundational for generative AI |
Classification, regression, clustering, and anomaly detection
| Task | Output | Example | Best metric clue |
|---|---|---|---|
| Classification | Category/class | Fraud vs not fraud, image label, sentiment class | Accuracy, precision, recall, F1, ROC-AUC |
| Regression | Numeric value | Price, demand, wait time | MAE, RMSE, R-squared |
| Clustering | Groups | Customer segments | Silhouette score, business usefulness |
| Anomaly detection | Unusual events | Unusual transactions, abnormal sensor readings | False positives vs missed anomalies |
| Recommendation | Ranked items | Products, media, content | Click-through, conversion, ranking metrics |
Core AWS service selection review
Managed AI services vs Amazon Bedrock vs Amazon SageMaker
| Choice | Use when… | Candidate mistake to avoid |
|---|---|---|
| Managed AWS AI service | The task is standard and specific: OCR, speech, translation, sentiment, image labels, chatbot intents | Over-engineering with custom ML |
| Amazon Bedrock | You need foundation models, generative AI, embeddings, RAG, agents, or guardrails through managed APIs | Treating Bedrock as a traditional custom model training platform |
| Amazon SageMaker | You need to prepare data, train, tune, deploy, monitor, or manage custom ML models | Choosing SageMaker when a simpler managed AI API satisfies the use case |
| Amazon SageMaker Canvas | Business users need no-code or low-code predictions | Assuming every SageMaker scenario requires data scientists writing code |
| Amazon Q Business | Organization wants a generative AI assistant connected to company data and business apps | Confusing enterprise assistant use cases with Amazon Lex |
| Amazon Q Developer | Developers want coding, AWS guidance, or software development assistance | Confusing with business-user knowledge assistant use cases |
AWS AI and ML services at a glance
| Service | Primary use | Fast exam cue |
|---|---|---|
| Amazon Bedrock | Build generative AI applications with foundation models | LLMs, embeddings, RAG, agents, guardrails |
| Amazon SageMaker | Build, train, tune, deploy, and monitor ML models | Full ML lifecycle |
| Amazon SageMaker Canvas | No-code ML for business analysts | Predictions without writing code |
| Amazon SageMaker Ground Truth | Data labeling workflows | Human labeling, annotation |
| Amazon SageMaker Clarify | Bias detection and model explainability | Fairness, explainability |
| Amazon SageMaker Model Monitor | Monitor deployed model quality and drift | Production ML monitoring |
| Amazon Textract | Extract printed/handwritten text, forms, and tables from documents | OCR plus document structure |
| Amazon Comprehend | NLP for text insights | Sentiment, entities, key phrases, language |
| Amazon Transcribe | Speech to text | Audio becomes text |
| Amazon Polly | Text to speech | Text becomes audio |
| Amazon Translate | Language translation | Translate between languages |
| Amazon Lex | Conversational interfaces using voice/text | Intents, slots, chatbot flow |
| Amazon Rekognition | Image and video analysis | Objects, scenes, faces, moderation labels |
| Amazon Personalize | Personalized recommendations | User-item recommendations |
| Amazon Kendra | Intelligent enterprise search | Search across internal documents |
| Amazon OpenSearch Service | Search, analytics, and vector search patterns | Semantic search, vector retrieval |
| Amazon Q Business | Generative AI assistant for enterprise knowledge | Business assistant over company data |
| Amazon Q Developer | Generative AI assistant for developers | Code, AWS development help |
| AWS Glue | ETL and Data Catalog | Prepare/catalog data |
| Amazon S3 | Object storage for data lakes, datasets, artifacts | Durable storage foundation |
| AWS Lake Formation | Data lake governance | Permissions and governance for data lakes |
| Amazon Athena | Query S3 data using SQL | Serverless interactive query |
| Amazon Redshift | Data warehouse analytics | Large-scale structured analytics |
| Amazon QuickSight | Business intelligence dashboards | Visualize and share insights |
| Amazon Macie | Discover sensitive data in S3 | PII/sensitive data detection |
| AWS IAM | Identity and access control | Least privilege |
| AWS KMS | Encryption key management | Protect data at rest |
| AWS CloudTrail | API activity audit logs | Who did what, when |
| Amazon CloudWatch | Metrics, logs, alarms | Operational monitoring |
ML lifecycle quick review
AIF-C01 usually tests lifecycle understanding at a conceptual level: what happens before, during, and after model development.
flowchart LR
A[Define business problem] --> B[Collect and govern data]
B --> C[Prepare, clean, label, and split data]
C --> D[Train or select model]
D --> E[Evaluate against metrics]
E --> F[Deploy for inference]
F --> G[Monitor quality, drift, latency, and cost]
G --> H[Retrain, tune, or improve]
H --> C
| Stage | Know this | Common trap |
|---|---|---|
| Define problem | Convert business goal into ML task and success metric | Starting with a model before defining success |
| Collect data | Data must be relevant, permitted, representative, and high quality | Assuming more data always fixes poor data quality |
| Label data | Supervised learning needs correct labels | Ignoring label noise and inconsistent annotation |
| Prepare data | Clean, normalize, transform, handle missing values, remove duplicates | Accidentally introducing data leakage |
| Split data | Use training, validation, and test data appropriately | Evaluating on the same data used to train |
| Train/select model | Choose model based on task, data, cost, latency, and explainability | Picking the largest model by default |
| Evaluate | Use metrics aligned with business risk | Relying on accuracy for imbalanced data |
| Deploy | Make model available for inference | Ignoring latency, scale, and security |
| Monitor | Watch for drift, degraded quality, bias, errors, and cost | Treating deployment as the finish line |
| Improve | Tune, retrain, add data, change prompts, or redesign | Changing the model without measuring impact |
Data concepts that commonly appear
Data types and storage patterns
| Data concept | Meaning | AWS-related clue |
|---|---|---|
| Structured data | Rows and columns with schema | Databases, warehouses, SQL analytics |
| Semi-structured data | Flexible structure such as JSON, logs, XML | Data lakes, Glue, Athena |
| Unstructured data | Text, images, audio, video, documents | S3, Textract, Comprehend, Rekognition, Transcribe |
| Data lake | Central storage for raw and processed data | Amazon S3 plus governance/catalog tools |
| Data warehouse | Optimized analytics on structured data | Amazon Redshift |
| Data catalog | Metadata about data assets | AWS Glue Data Catalog |
| Feature | Input variable used by a model | Customer age, text embedding, transaction amount |
| Label | Correct answer used in supervised learning | Fraud/not fraud, category, price |
| Feature engineering | Transforming data into useful model inputs | Scaling, encoding, extracting features |
| Data leakage | Training uses information unavailable at prediction time | Inflated test results, poor real-world performance |
| Data drift | Input data distribution changes over time | Monitoring and retraining needed |
| Concept drift | Relationship between inputs and target changes | Model may become stale even if pipeline works |
Data quality and bias checks
High-yield review points:
- Representative data matters. If training data excludes important populations, conditions, products, geographies, or use cases, predictions may be biased or unreliable.
- Labels must be accurate. Bad labels create bad supervised models.
- Missing values need deliberate handling. Dropping records may bias the dataset; imputing values may introduce assumptions.
- Outliers are not always errors. In fraud or anomaly detection, unusual points may be the signal.
- PII and sensitive data require controls. Use data minimization, access control, encryption, masking/redaction where appropriate, and auditability.
- Training and test sets must remain separate. If the model “sees” test data during training or tuning, evaluation is not trustworthy.
Generative AI quick review
Foundation model concepts
| Concept | What to remember |
|---|---|
| Prompt | Input instructions and context given to a generative model |
| System instruction | Higher-level behavior or constraints for the model |
| Context window | Amount of input/output text the model can consider at once |
| Temperature | Controls randomness; lower is more predictable, higher is more varied |
| Top-p | Controls sampling from probable tokens |
| Max tokens | Limits output length and affects cost/latency |
| Stop sequence | Text pattern that tells generation to stop |
| Embeddings | Vector representations used for semantic similarity and retrieval |
| Hallucination | Plausible but incorrect or unsupported output |
| Grounding | Tying model output to trusted context or source data |
| RAG | Retrieval-Augmented Generation: retrieve relevant content, then generate an answer using it |
| Fine-tuning | Adapting a model’s behavior using task-specific examples |
| Agent | System that uses a model to reason over tasks and call tools/APIs |
| Guardrail | Control to reduce unsafe, unwanted, or noncompliant outputs |
RAG vs fine-tuning vs prompt engineering
| Need | Best first approach | Why |
|---|---|---|
| Improve instructions, format, tone, or constraints | Prompt engineering | Fastest and lowest operational change |
| Answer using current or private company knowledge | RAG | Adds external context without retraining the model |
| Reduce hallucinations by grounding in approved documents | RAG plus evaluation and guardrails | The model can cite or use retrieved sources |
| Teach a repeated task style or domain-specific output pattern | Fine-tuning | Changes behavior based on examples |
| Add new factual knowledge that changes often | RAG | Easier to update documents than retrain |
| Enforce safety boundaries | Guardrails plus prompt controls | Do not rely on prompt wording alone |
| Connect model to actions or APIs | Agent architecture | Model can plan and invoke tools under controls |
Typical RAG flow
- Store trusted documents in a searchable knowledge source.
- Convert document chunks into embeddings.
- Store embeddings in a vector-capable store.
- Convert the user query into an embedding.
- Retrieve the most relevant chunks.
- Add retrieved context to the prompt.
- Generate a grounded response.
- Apply guardrails, logging, evaluation, and human review where needed.
Common RAG traps:
- Poor chunking can retrieve irrelevant or incomplete context.
- Stale source documents produce stale answers.
- Retrieval does not guarantee correctness; evaluate generated answers.
- RAG helps with knowledge grounding but does not automatically solve authorization. Users should only retrieve data they are allowed to access.
- Prompt injection can occur when retrieved content contains malicious instructions. Guardrails and input/output controls matter.
Model evaluation and metrics
Classification metrics
Know what each metric favors. You usually do not need heavy math, but you should understand the tradeoff.
| Metric | Plain meaning | Use when… | Trap |
|---|---|---|---|
| Accuracy | Overall percent correct | Classes are balanced and errors have similar cost | Misleading for imbalanced data |
| Precision | Of predicted positives, how many were correct | False positives are costly | High precision may miss true positives |
| Recall | Of actual positives, how many were found | False negatives are costly | High recall may create many false positives |
| F1 score | Balance of precision and recall | Need one combined metric | Hides which error type matters more |
| ROC-AUC | Ranking/separation quality across thresholds | Comparing binary classifiers | Does not directly pick the operating threshold |
| Confusion matrix | Counts true/false positives/negatives | Understanding error types | Must interpret positive class correctly |
Useful formulas:
\[ \text{Accuracy} = \frac{\text{correct predictions}}{\text{all predictions}} \]\[ \text{Precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}} \]\[ \text{Recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} \]\[ \text{F1} = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}} \]Regression and generative AI evaluation
| Evaluation area | Metric or method | What it tells you |
|---|---|---|
| Regression error | MAE | Average absolute error; easier to explain |
| Regression error | RMSE | Penalizes large errors more strongly |
| Regression fit | R-squared | Amount of variance explained |
| Generative AI quality | Human evaluation | Whether output is useful, accurate, and appropriate |
| Generative AI grounding | Factuality/groundedness checks | Whether response is supported by source context |
| Generative AI safety | Toxicity, harmful content, policy checks | Whether output violates safety requirements |
| Generative AI relevance | Relevance scoring | Whether output answers the user’s question |
| Operations | Latency, throughput, error rate | Whether the solution performs in production |
| Cost | Cost per request, token usage, infrastructure cost | Whether the solution is economically viable |
Responsible AI review
AIF-C01 candidates should recognize responsible AI as a lifecycle concern, not a single feature.
| Theme | Meaning | Practical controls |
|---|---|---|
| Fairness | Avoid unjustified performance gaps or harmful bias | Representative data, bias checks, SageMaker Clarify, human review |
| Explainability | Understand why a model produced an output | Feature attribution, interpretable models, documentation |
| Transparency | Communicate AI use, limitations, and confidence appropriately | User notices, model documentation, clear escalation paths |
| Privacy | Protect personal and sensitive data | Data minimization, masking, encryption, access control |
| Security | Protect systems, models, data, and prompts | IAM, KMS, network controls, logging, secure APIs |
| Safety | Reduce harmful, toxic, or inappropriate outputs | Guardrails, content filters, testing, human oversight |
| Robustness | Maintain quality under realistic input variation | Evaluation, adversarial testing, monitoring |
| Governance | Manage approvals, accountability, and auditability | Policies, versioning, logs, risk review, ownership |
| Controllability | Keep humans and systems in control of AI behavior | Constraints, approval workflows, rollback options |
Common responsible AI mistakes:
- Treating fairness as only a data science issue. It also involves product design, monitoring, and governance.
- Assuming a model is objective because it is mathematical.
- Using sensitive data without a clear purpose or access controls.
- Deploying generative AI without testing for hallucinations, unsafe output, and prompt injection.
- Failing to document known limitations.
- Ignoring human review for high-impact or ambiguous decisions.
Security, privacy, and governance decision rules
Core AWS controls
| Requirement | AWS control to consider | Exam cue |
|---|---|---|
| Restrict who can call a service or access data | AWS IAM | Least privilege, roles, policies |
| Encrypt data at rest | AWS KMS with service encryption features | Key management, encryption |
| Protect data in transit | TLS/HTTPS | Secure communication |
| Audit API activity | AWS CloudTrail | Who called which API |
| Monitor metrics and logs | Amazon CloudWatch | Alarms, logs, dashboards |
| Detect sensitive data in S3 | Amazon Macie | PII discovery |
| Govern data lake access | AWS Lake Formation | Data lake permissions |
| Avoid hardcoded secrets | AWS Secrets Manager | Secure secret storage |
| Private connectivity to supported services | VPC endpoints / AWS PrivateLink patterns | Avoid public internet paths where required |
| Control S3 access | Bucket policies, IAM, encryption, block public access | Protect datasets and artifacts |
AI-specific security concerns
| Concern | Why it matters | Mitigation direction |
|---|---|---|
| Prompt injection | Malicious input tries to override instructions | Input validation, guardrails, isolation, retrieval controls |
| Data leakage | Sensitive data appears in prompts, logs, or outputs | Data minimization, redaction, access control |
| Unauthorized retrieval | RAG returns documents a user should not see | Enforce permissions before retrieval and generation |
| Hallucinated authority | Model fabricates policies, citations, or facts | Grounding, citations, human review, evaluation |
| Model drift | Production behavior degrades over time | Monitoring, retraining, rollback |
| Over-permissioned agents | Agent can perform actions beyond user intent | Least privilege, scoped tools, approvals |
| Unsafe output | Harmful, biased, or noncompliant content | Guardrails, filters, testing, escalation |
Cost, performance, and operational tradeoffs
AIF-C01 questions may include practical constraints such as budget, latency, scale, and maintainability.
| Decision factor | What to remember |
|---|---|
| Model size | Larger models may improve quality but often increase cost and latency |
| Token volume | More input/output tokens usually increase cost and response time |
| Context length | Longer context can help but may add cost and noise |
| Prompt quality | Better prompts can improve results without changing models |
| RAG retrieval quality | Good retrieval can reduce hallucinations and improve relevance |
| Batch vs real time | Batch processing can be cheaper or simpler when immediate response is not needed |
| Managed services | Reduce operational burden for common AI tasks |
| Monitoring | Needed for errors, latency, drift, quality, and cost |
| Human review | Adds cost but may be necessary for high-risk or low-confidence outputs |
| Right-sizing | Match solution complexity to business value and risk |
High-yield scenario patterns
| Scenario clue | Likely answer direction |
|---|---|
| “Business users want predictions without coding” | Amazon SageMaker Canvas |
| “Data scientists need to build, train, and deploy a custom model” | Amazon SageMaker |
| “Use multiple foundation models through a managed service” | Amazon Bedrock |
| “Add enterprise documents to a generative AI Q&A workflow” | RAG, Knowledge Bases-style architecture, or Amazon Q Business depending on wording |
| “Prevent harmful generative AI responses” | Guardrails, content filtering, evaluation, human review |
| “Find sensitive data in S3 before using it for ML” | Amazon Macie |
| “Catalog and prepare data for analytics or ML” | AWS Glue and AWS Glue Data Catalog |
| “Query data directly in S3 with SQL” | Amazon Athena |
| “Central data lake governance” | AWS Lake Formation |
| “Analyze call recordings by converting audio to text” | Amazon Transcribe, then text analysis if needed |
| “Extract fields from invoices or forms” | Amazon Textract |
| “Detect objects or moderation labels in images” | Amazon Rekognition |
| “Identify sentiment and entities in customer reviews” | Amazon Comprehend |
| “Create natural-sounding audio from text” | Amazon Polly |
| “Build a bot that collects required fields from users” | Amazon Lex |
| “Translate support content into another language” | Amazon Translate |
| “Personalized product recommendations” | Amazon Personalize |
| “Audit who accessed AI resources” | AWS CloudTrail |
| “Encrypt data used by AI workloads” | AWS KMS and service-level encryption settings |
Common candidate mistakes
Choosing the most advanced service instead of the most appropriate service. If a managed AI service directly solves the use case, it is often the best foundational answer.
Confusing Amazon Bedrock and Amazon SageMaker. Bedrock is the first thought for managed foundation model and generative AI application patterns. SageMaker is the first thought for custom ML lifecycle work.
Using fine-tuning when RAG is the better fit. If the problem is “answer from current company documents,” think retrieval and grounding before fine-tuning.
Using accuracy for imbalanced classification. A fraud model that predicts “not fraud” almost every time may have high accuracy and still be useless. Think precision, recall, F1, and business cost of errors.
Ignoring data leakage. If future information appears in training data, evaluation results may look excellent but fail in production.
Treating deployment as the end. Real systems require monitoring for drift, quality, latency, errors, security, and cost.
Assuming generative AI output is always correct. LLMs can hallucinate. Use grounding, evaluation, guardrails, citations, and human review where appropriate.
Forgetting authorization in RAG. Retrieval must respect user permissions. A model should not expose documents just because they exist in the vector store.
Confusing speech, text, and language services. Transcribe is speech-to-text. Polly is text-to-speech. Translate changes language. Comprehend analyzes text.
Overlooking responsible AI. Fairness, privacy, security, explainability, safety, robustness, transparency, and governance are all testable themes.
Fast final review checklist
Before starting topic drills or a mock exam, make sure you can answer these without hesitation:
- Can you explain AI vs ML vs deep learning vs generative AI?
- Can you distinguish supervised, unsupervised, and reinforcement learning?
- Can you identify classification, regression, clustering, recommendation, and anomaly detection scenarios?
- Can you explain features, labels, training, validation, testing, and inference?
- Can you identify overfitting, underfitting, data leakage, drift, and bias?
- Can you choose between managed AI services, Amazon Bedrock, and Amazon SageMaker?
- Can you explain embeddings, vector search, semantic similarity, and RAG?
- Can you decide when prompt engineering, RAG, fine-tuning, agents, or guardrails are appropriate?
- Can you choose the right evaluation metric for common scenarios?
- Can you recognize responsible AI risks and mitigation controls?
- Can you map IAM, KMS, CloudTrail, CloudWatch, Macie, Glue, S3, and Lake Formation to security and governance needs?
Practice plan after this Quick Review
Use this Quick Review as a checkpoint, then move into IT Mastery practice:
- Start with topic drills on AI/ML fundamentals, generative AI, AWS service selection, responsible AI, and security.
- Use original practice questions to force scenario recognition rather than memorization.
- Read detailed explanations for every missed question and every guessed question.
- Create a miss log with three columns: concept missed, why the wrong answer was tempting, and the decision rule to remember.
- Take a mixed mock exam only after your topic drills show consistent performance across service selection, generative AI, evaluation, and governance.
Next step: choose a focused AIF-C01 question bank topic drill, answer without notes, then review the detailed explanations until you can explain why each wrong option is wrong.
Continue in IT Mastery
Use this Quick Review as a final concept map, then move into IT Mastery for focused topic drills, mixed practice sets, timed mock exams, and detailed explanations. The practice questions are original IT Mastery practice items; they are not official AWS questions, copied live-exam content, or exam dumps.