AI-900: Computer Vision

May 1, 2026

Try 10 focused AI-900 questions on Computer Vision, with explanations, then continue with IT Mastery.

On this page

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try AI-900 on Web View full AI-900 practice page

Topic snapshot

Field	Detail
Exam route	AI-900
Topic area	Describe Features of Computer Vision Workloads on Azure
Blueprint weight	19%
Page purpose	Focused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate Describe Features of Computer Vision Workloads on Azure for AI-900. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

Pass	What to do	What to record
First attempt	Answer without checking the explanation first.	The fact, rule, calculation, or judgment point that controlled your answer.
Review	Read the explanation even when you were correct.	Why the best answer is stronger than the closest distractor.
Repair	Repeat only missed or uncertain items after a short break.	The pattern behind misses, not the answer letter.
Transfer	Return to mixed practice once the topic feels stable.	Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 19% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These questions are original IT Mastery practice items aligned to this topic area. They are designed for self-assessment and are not official exam questions.

Question 1

Topic: Describe Features of Computer Vision Workloads on Azure

A retailer plans to analyze product photos with an Azure computer vision solution. For each returned item, the system must assign exactly one label: damaged or undamaged. It does not need to read text in the image or mark the location of scratches and dents. Which computer vision task is most appropriate?

Options:

A. Object detection
B. Optical character recognition (OCR)
C. Image tagging
D. Image classification

Best answer: D

Explanation: This scenario needs one decision for each whole photo, so image classification is the best fit. The requirement explicitly rules out reading text and locating damage within the image.

Image classification is used when an AI system must assign one category to an entire image based on its visual content. Here, each product photo must be labeled as either damaged or undamaged, so the task is a whole-image prediction problem.

A quick way to separate similar vision tasks is:

Use image classification for one label for the full image.
Use object detection when you must find and locate items or defects.
Use OCR when you need to extract printed or handwritten text.
Use image tagging when you want multiple descriptive labels instead of one main category.

The key clue is the need for exactly one label per product photo, not text extraction or damage location.

Image tagging is better for adding several descriptive labels, not choosing one mutually exclusive category.
OCR is for extracting text from images, which the scenario does not require.
Object detection would be useful if the system had to locate the damaged area with boxes or coordinates.

Question 2

Topic: Describe Features of Computer Vision Workloads on Azure

A shipping company wants a mobile app that lets workers photograph package labels. The app must extract printed text from each image and should use a prebuilt Azure AI service instead of training a custom model. Which Azure service category is the best choice?

Options:

A. Azure AI Vision
B. Azure AI Language
C. Azure OpenAI Service
D. Azure Machine Learning

Best answer: A

Explanation: Azure AI Vision is the best fit because extracting printed text from photos is an OCR task within computer vision. The scenario also asks for a prebuilt service, so a purpose-built vision service is more appropriate than language analysis, custom machine learning, or generative AI.

Extracting text from an image is an optical character recognition (OCR) problem, which is part of computer vision. In Azure, Azure AI Vision is the fundamentals-level service category that matches this need because it can analyze images and read text from photos, labels, signs, and scanned content without requiring you to build and train your own model.

Choose Azure AI Vision when the input is an image and the goal is to detect visual information or read text.
Choose Azure AI Language when the input is already text and you want sentiment, entities, or key phrases.
Choose Azure Machine Learning when you need to build custom models.
Choose Azure OpenAI Service for generative tasks such as chat, summarization, or content generation.

The simplest way to separate these choices is to match the service to the input type and workload.

Azure AI Language is for analyzing existing text, not for reading text directly from images.
Azure Machine Learning can build custom models, but the scenario asks for a prebuilt service.
Azure OpenAI Service is for generative AI experiences, not OCR as the primary requirement.

Question 3

Topic: Describe Features of Computer Vision Workloads on Azure

A grocery retailer wants to analyze photos of store shelves uploaded from mobile devices. The solution must identify each cereal box in a photo and return a bounding box for every detected box so the app can highlight them onscreen. Which computer vision approach best matches this requirement?

Options:

A. Object detection
B. Optical character recognition (OCR)
C. Image classification
D. Face detection

Best answer: A

Explanation: The key requirement is to find each cereal box and return a rectangle showing where it appears in the image. That output matches object detection, which identifies object instances and their locations.

Object detection is the computer vision approach used when a solution must find one or more objects in an image and report where each object appears. In this scenario, the retailer needs both the object type (cereal box) and the position of each one as a bounding box so the app can draw highlights.

Image classification assigns a label to an image or region but does not return bounding boxes for each detected item.
OCR is used to extract printed or handwritten text from images.
Face detection is specialized for locating human faces, not general retail products.

For AI-900, a requirement to return a bounding box is a strong clue that object detection is the right choice.

Whole-image label The classification option can categorize an image but does not locate each cereal box with rectangles.
Text extraction The OCR option is for reading characters, not for detecting product instances on a shelf.
Face-specific output The face detection option is limited to human faces rather than general objects such as cereal boxes.

Question 4

Topic: Describe Features of Computer Vision Workloads on Azure

A warehouse safety team wants to review photos from loading docks. The solution must classify each image and detect objects such as forklifts, pallets, and hard hats. The team wants a prebuilt Azure service and does not want to train its own model. Which Azure service category should it choose?

Options:

A. Azure Machine Learning
B. Azure AI Language
C. Azure AI Vision
D. Azure OpenAI Service

Best answer: C

Explanation: This is a computer vision scenario because the input is images and the goal is to classify visual content and detect objects. Since the team wants a prebuilt Azure AI service instead of building its own model, Azure AI Vision is the best fit.

The core concept is matching the AI workload to the correct Azure service family. When a business needs to analyze images to identify content or detect objects in pictures, that is a computer vision task. Azure AI Vision is designed for these prebuilt image-analysis capabilities, so it fits a fundamentals scenario where the team wants fast setup without custom model training.

Azure Machine Learning is more appropriate when you need to build, train, and manage custom models. Azure AI Language is for text-based workloads such as sentiment analysis or entity extraction, and Azure OpenAI Service is for generative AI tasks such as creating or summarizing content. The key takeaway is that image classification and object detection map directly to Azure AI Vision.

Text workload the language option is for analyzing written text, not identifying objects inside images.
Generative AI mix-up the Azure OpenAI option focuses on generating or transforming content rather than serving as the primary prebuilt image-analysis service.
Custom model path the machine learning option is better when you must train your own model, which the scenario explicitly says is unnecessary.

Question 5

Topic: Describe Features of Computer Vision Workloads on Azure

In Azure AI workloads, which description best matches optical character recognition (OCR)?

Options:

A. Converting spoken words in audio files to text, a speech capability
B. Extracting printed or handwritten text from images, a computer vision capability often used in document processing
C. Identifying sentiment and key phrases in document text, an NLP capability
D. Generating a new summary from document content, a generative AI capability

Best answer: B

Explanation: OCR reads text that appears in visual inputs such as photos, screenshots, and scanned pages. Because the source is an image, OCR is a computer vision capability, even when it is used inside larger document-processing workflows. Later analysis of the extracted text can involve NLP, but OCR itself is not NLP.

OCR stands for optical character recognition. Its job is to detect text inside an image or scanned document and convert that visual text into machine-readable text. Because it starts with visual input, OCR is classified as a computer vision capability.

In Azure AI scenarios, OCR is often one step in a broader document-processing workflow. For example, a system might first use OCR to read a scanned invoice, then use other AI capabilities to classify the document or extract meaning from the text. Those later steps can involve language or document analysis, but the OCR step is still about reading text from an image.

The key distinction is that OCR reads visual text; it does not analyze sentiment, transcribe audio, or generate new text.

Sentiment and key phrase extraction analyze text that is already available as text, so they are NLP tasks.
Speech-to-text starts from audio input, not from an image or scanned page.
Summary generation creates new text from source content, which is generative AI rather than text extraction.

Question 6

Topic: Describe Features of Computer Vision Workloads on Azure

A finance team uploads photos of receipts to Azure and needs to extract the merchant name, date, and total from each image, not just identify that the image is a receipt. Which computer vision workload should they use?

Options:

A. Object detection
B. Optical character recognition (OCR)
C. Face detection
D. General image classification

Best answer: B

Explanation: OCR is the right choice when the goal is to read text from an image and return it as usable data. In this scenario, the team needs specific receipt values, which is different from classifying the whole image into a category.

OCR extracts printed or handwritten text from images and converts it into machine-readable output. That fits receipt processing because the business needs actual text values such as merchant name, date, and total. General image classification would only label the entire image, such as “receipt” or “invoice,” and would not return the text itself.

OCR reads text in images.
Image classification assigns labels to the whole image.
Object detection finds items or regions in an image, but does not transcribe text by itself.

When the requirement is to pull words and numbers from an image, OCR is the best fit.

Whole-image label: General image classification is useful for categorizing an image, not for reading receipt text.
Locate, not read: Object detection can identify and position items in an image, but it does not extract the written content.
Wrong target: Face detection is for identifying or locating human faces, which does not address receipt data extraction.

Question 7

Topic: Describe Features of Computer Vision Workloads on Azure

A company is building a mobile app that accepts profile photos. The app must detect whether each photo contains a human face and return the face location in the image. The company wants to use a prebuilt Azure AI service and does not want to train a custom model. Which service should it choose?

Options:

A. Azure AI Document Intelligence
B. Azure AI Face detection service
C. Azure AI Vision
D. Azure Machine Learning

Best answer: B

Explanation: The requirement is specifically to detect faces in photos and return face positions. Azure AI Face detection service is the prebuilt Azure service intended for face detection scenarios, so it fits better than a general vision, document, or custom ML service.

The deciding factor is that the app needs a face-specific capability, not general image analysis or custom model development. Azure AI Face detection service is designed to identify faces in images and provide face location data, which matches the profile-photo validation scenario directly. Because the company wants a prebuilt service, there is no need to use Azure Machine Learning to build and train its own model. Azure AI Vision is broader and used for tasks such as image analysis, OCR, and other vision workloads, while Azure AI Document Intelligence focuses on extracting information from documents and forms. When the requirement explicitly centers on detecting faces, the face service is the most appropriate choice.

General image analysis The Azure AI Vision option is broader computer vision, but the stem asks for a face-specific task.
Document extraction Azure AI Document Intelligence is for forms and document data, not profile-photo face detection.
Custom model path Azure Machine Learning is used to build or manage custom models, which the scenario says is unnecessary.

Question 8

Topic: Describe Features of Computer Vision Workloads on Azure

A retailer wants to analyze shelf images and receive a rectangle around each product it finds, along with the product type. Which computer vision approach best fits this requirement?

Options:

A. Image classification
B. Object detection
C. Optical character recognition (OCR)
D. Image captioning

Best answer: B

Explanation: Object detection is the right choice when you need both what an item is and where it appears in the image. The requirement for rectangles around each product means the solution must return bounding boxes, not just a single label or a text description.

This scenario is about matching the required output to the correct computer vision approach. When a solution must find multiple items in an image and show each item’s position, the key clue is the need for rectangles around objects, which are bounding boxes.

Image classification returns a category for the whole image.
Object detection returns object labels plus bounding boxes.
OCR extracts printed or handwritten text.
Image captioning generates a natural-language description.

Because the retailer needs both product identification and object locations, object detection is the best fit; OCR and captioning produce different kinds of output, and image classification does not locate individual items.

Whole-image label image classification can label an image but does not return coordinates for each detected product.
Text extraction OCR is for reading text, such as signs or labels, rather than locating general objects as products.
Natural-language summary image captioning describes what is in an image but does not provide structured bounding boxes.

Question 9

Topic: Describe Features of Computer Vision Workloads on Azure

A company scans expense receipts and wants to automatically read the merchant name, purchase date, and total amount from each image. Which computer vision task is most appropriate?

Options:

A. Face detection
B. Image classification
C. Object detection
D. Optical character recognition (OCR)

Best answer: D

Explanation: This scenario is about extracting text from receipt images. OCR is the computer vision task that converts visible characters into machine-readable text, which can then be used to capture values such as merchant name, date, and total.

When the goal is to read characters from an image, the core computer vision task is optical character recognition. Receipts, scanned forms, and license plates are common OCR scenarios because the required output is text or numbers taken from the image. In Azure AI fundamentals, OCR is used when the system must convert printed or handwritten content into usable digital text.

Image classification labels an entire image, such as identifying that an image contains a receipt. Object detection finds and locates objects within an image. Face detection identifies human faces. Those tasks may be useful in other vision solutions, but they do not by themselves return the receipt text you need. The key takeaway is that text extraction from images points to OCR.

Image classification can label the image as a receipt, but it does not read the receipt text.
Object detection can locate items or regions, but it does not convert characters into usable text.
Face detection is for finding faces in images and is unrelated to receipt data extraction.

Question 10

Topic: Describe Features of Computer Vision Workloads on Azure

A retail company analyzes entrance-camera images in Azure. It needs to count how many shoppers appear in each image and mark where each shopper is located. It does not need to identify individuals or read any text. Which computer vision task is most appropriate?

Options:

A. Optical character recognition (OCR)
B. Image classification
C. Object detection
D. Face detection

Best answer: C

Explanation: This scenario requires finding each person in the image, not just labeling the image as a whole. Object detection is designed to detect multiple instances and provide their locations, which makes people counting possible.

The core concept is that people counting usually needs detection of individual objects. Image classification gives one overall label for an entire image, such as whether a scene is crowded, but it does not return one result per person. Object detection identifies each person separately and provides location information such as bounding boxes, so the system can count shoppers and mark where they appear.

Use image classification for a single label for the full image.
Use OCR to read printed or handwritten text.
Use face detection when the requirement is specifically to find faces or facial attributes.

If the requirement changed from counting people to checking whether a selfie contains a face, face detection would be the better fit.

Image classification fails because it labels the whole image rather than locating each shopper.
OCR fails because the scenario is about people in camera images, not extracting text.
Face detection is for finding faces or facial attributes, not counting all people in a scene.

Continue with full practice

Use the AI-900 Practice Test page for the full IT Mastery route, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try AI-900 on Web View AI-900 Practice Test

Free review resource

Read the AI-900 Cheat Sheet on Tech Exam Lexicon, then return to IT Mastery for timed practice.

Revised on Thursday, May 14, 2026

Machine Learning

NLP Workloads

Browse Certification Practice Tests by Exam Family

AI-900: Computer Vision

Topic snapshot

How to use this topic drill

Sample questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Continue with full practice

Related focused pages

Free review resource