Cisco 300-640 DCAI: AI Infrastructure Components and Architecture

Try 10 focused Cisco 300-640 DCAI questions on AI Infrastructure Components and Architecture, with explanations, then continue with IT Mastery.

Open the matching IT Mastery practice page for timed mocks, topic drills, progress tracking, explanations, and full practice.

Try Cisco 300-640 DCAI on Web View full Cisco 300-640 DCAI practice page

Topic snapshot

FieldDetail
Exam routeCisco 300-640 DCAI
Topic areaAI Infrastructure Components and Architecture
Blueprint weight30%
Page purposeFocused sample questions before returning to mixed practice

How to use this topic drill

Use this page to isolate AI Infrastructure Components and Architecture for Cisco 300-640 DCAI. Work through the 10 questions first, then review the explanations and return to mixed practice in IT Mastery.

PassWhat to doWhat to record
First attemptAnswer without checking the explanation first.The fact, rule, calculation, or judgment point that controlled your answer.
ReviewRead the explanation even when you were correct.Why the best answer is stronger than the closest distractor.
RepairRepeat only missed or uncertain items after a short break.The pattern behind misses, not the answer letter.
TransferReturn to mixed practice once the topic feels stable.Whether the same skill holds up when the topic is no longer obvious.

Blueprint context: 30% of the practice outline. A focused topic score can overstate readiness if you recognize the pattern too quickly, so use it as repair work before timed mixed sets.

Sample questions

These original IT Mastery practice questions are aligned to this topic area. Use them for self-assessment, scope review, and deciding what to drill next.

Question 1

Topic: AI Infrastructure Components and Architecture

A distributed training job uses RoCE for GPU all-reduce and reads checkpoints from an NVMe storage cluster. After a new rack was added, Nexus Dashboard shows that only jobs scheduled on the new rack have unstable step times and higher storage read latency.

ObservationExisting racksNew rack
GPU all-reduce p99 latency180 µs650 µs
Storage read p99 latency2 ms11 ms
Path to storageSame fabricInter-fabric link
Inter-fabric link utilization35%92% during jobs

What is the most likely cause?

Options:

  • A. The GPU nodes require a newer container image.

  • B. The NVMe storage cluster lacks enough usable capacity.

  • C. The training framework is using an inefficient batch size.

  • D. The new rack is placed outside the storage and GPU latency domain.

Best answer: D

Explanation: Fabric placement is the deciding issue when AI traffic must have predictable latency for both GPU-to-GPU communication and storage access. The existing racks keep GPU all-reduce and storage reads inside the same fabric, while the new rack crosses an inter-fabric link that becomes highly utilized during jobs. That placement adds contention and variable latency on the path shared by collective GPU communication and checkpoint reads. A suitable remediation would be to place the new GPU rack and storage access in the same AI fabric or latency domain, or provide non-oversubscribed dedicated connectivity for that path. The key is not just raw bandwidth; it is whether the workload’s critical traffic stays on a predictable fabric path.

  • Batch-size tuning may affect compute efficiency, but it does not explain why only the rack crossing the inter-fabric link is affected.
  • Storage capacity is not indicated by the latency and path evidence; the issue appears during cross-fabric access under load.
  • Container image version would not normally create a rack-specific network path and storage latency pattern.

Question 2

Topic: AI Infrastructure Components and Architecture

A data center team is designing an on-premises AI training and RAG environment. The first phase uses 8 GPU servers, but the approved roadmap expands to 32 servers within 18 months. The facility has limited spare power and cooling, a sustainability target to avoid a major PUE increase, and a requirement to expand without disruptive recabling. Which architecture decision is best?

Options:

  • A. Place 8 servers in existing racks and reassess capacity later

  • B. Deploy a modular AI pod sized for 32 servers, installing 8 initially

  • C. Purchase and power all 32 servers during the first phase

  • D. Use larger GPUs in 8 servers and keep the facility design unchanged

Best answer: B

Explanation: Dense AI capacity planning must account for both the day-1 footprint and the expected expansion path. A modular pod approach can reserve rack space, power feeds, cooling strategy, leaf-spine ports, and storage bandwidth for the 32-server target while installing only the first 8 servers now. This reduces stranded design choices, avoids disruptive recabling, and supports sustainability goals by planning cooling efficiency instead of reacting after heat density increases. The key is not simply buying more compute; it is aligning facility, network, storage, and orchestration scale with the AI workload roadmap.

  • Reassess later risks rack, power, cooling, and cabling constraints becoming blockers during expansion.
  • Buy everything now overbuilds the first phase and unnecessarily consumes power and cooling capacity.
  • Bigger GPUs only increases density pressure but does not solve rack, fabric, storage, or cooling scalability.

Question 3

Topic: AI Infrastructure Components and Architecture

A team moved a multi-node LLM fine-tuning job into containers on Cisco UCS GPU servers. Kubernetes shows all pods Running, each pod has its requested GPUs, and there are no restarts. Epoch time increased sharply.

Telemetry: GPU utilization cycles between high and low, storage read latency spikes during data loading and checkpoints, and the storage leaf uplinks show ECN marks.

Options:

  • A. Increase pod CPU limits for every training container

  • B. Remove GPU requests to improve pod placement flexibility

  • C. Rebuild the image with a newer CUDA base layer

  • D. Validate storage and fabric bandwidth for the AI job

Best answer: D

Explanation: Containerization packages the workload and helps orchestration, but it does not virtualize away the physical GPU, storage, and network path. In this case, the pods are healthy and GPUs are assigned, but the GPUs are intermittently underutilized while storage latency and ECN marks appear during data movement. That pattern points to the infrastructure feeding the GPUs, not to the container state itself. The next step is to validate storage throughput/latency, fabric congestion, QoS behavior, and placement relative to the storage path for this AI workload.

A containerized AI job can still fail performance goals if the fabric or storage cannot keep GPUs continuously supplied with data.

  • CPU limits may matter for preprocessing, but the observed storage latency and ECN marks point to the data path.
  • CUDA image changes are not supported by the telemetry because the job runs and GPUs are allocated.
  • Removing GPU requests would make scheduling less reliable and would not address storage or fabric congestion.

Question 4

Topic: AI Infrastructure Components and Architecture

A team is designing an on-premises AI training pod for large model fine-tuning. The pod must keep jobs running through a single leaf switch or storage path failure, and it must support doubling GPU and dataset capacity next year without replacing the initial infrastructure. Which design best maps to these requirements?

Options:

  • A. Active/standby management nodes with fixed compute capacity

  • B. Scale-out pod with redundant fabrics and storage paths

  • C. Extra GPU nodes behind one leaf and one storage path

  • D. Larger GPU servers with a single high-capacity fabric

Best answer: B

Explanation: Redundancy and scalability solve different AI infrastructure requirements. Redundancy improves availability by removing single points of failure, such as using dual fabrics, dual-homed hosts, multipath storage, or redundant controllers. Scalability supports capacity growth by allowing additional GPU nodes, storage nodes, links, or pods to be added without redesigning the architecture. In this scenario, the design must satisfy both: jobs should survive a single network or storage-path failure, and the pod must expand to handle more GPUs and larger datasets later. A scale-out pod with redundant network and storage paths maps to both requirements. A larger single fabric may increase initial capacity but does not provide availability through a single fabric failure.

  • Scale-up only fails because bigger initial servers do not remove the single high-capacity fabric as a failure point.
  • Management redundancy helps control-plane availability but does not expand GPU or storage capacity for workloads.
  • Single-path expansion adds capacity but keeps a single leaf and storage path as availability risks.

Question 5

Topic: AI Infrastructure Components and Architecture

A team runs RAG inference on an on-premises Cisco UCS GPU cluster and queries a cloud-hosted vector database. Requirements are P95 retrieval under 40 ms and no data-plane access through public cloud endpoints.

Telemetry during latency spikes:

ObservationValue
GPU utilizationDrops from 82% to 35%
Private cloud circuit18% utilized, no drops
Internet IPsec tunnel94% utilized, retransmits increasing
Firewall logsConnections to vector DB public FQDN

Which cause or remediation is best supported by the facts?

Options:

  • A. Enable PFC on the RoCE fabric

  • B. Fix private endpoint DNS and routing

  • C. Disable TLS inspection for vector queries

  • D. Add more GPUs to the UCS cluster

Best answer: B

Explanation: The evidence points to a hybrid connectivity design problem, not a local GPU or data center fabric problem. The private circuit has capacity and no drops, while the Internet IPsec tunnel is saturated and retransmitting. Firewall logs also show connections to the vector database public FQDN, which violates the security requirement and explains the performance issue. In a secure hybrid AI design, cloud service access should resolve to private endpoints and route over the intended private connectivity path. Correcting private DNS, endpoint binding, and route preferences preserves both latency and security requirements.

  • RoCE fabric tuning does not address traffic leaving the site through the wrong cloud path.
  • More GPUs would not help when GPU utilization drops because the workload is waiting on vector retrieval.
  • TLS inspection changes do not explain why traffic is using a public FQDN and congested Internet tunnel.

Question 6

Topic: AI Infrastructure Components and Architecture

A team runs distributed AI training on 12 GPU servers. The dataset and checkpoints are stored on an NVMe-oF block LUN mounted to one server and re-exported to the training pods over NFS. During maintenance on that server, resumed jobs cannot find the latest checkpoint, and GPU utilization drops during checkpoint writes. Fabric telemetry shows no congestion or packet loss. What is the most likely remediation?

Options:

  • A. Move checkpoints to an HA shared file service

  • B. Enable ECN on the storage fabric

  • C. Increase GPU memory on each server

  • D. Present the same block LUN to every server

Best answer: A

Explanation: The issue is a storage architecture mismatch, not a fabric or GPU capacity problem. Distributed AI training needs shared access to datasets and checkpoints across multiple nodes, predictable I/O during checkpoint operations, and operational recovery when a node is maintained or fails. Re-exporting a block LUN through one server creates a single dependency and can cause stale or unavailable checkpoint access when that server is disrupted. An HA file service, scale-out NAS, or parallel file system is better suited because it is designed for multi-client access and can provide failover, snapshots, and performance scaling. Block storage can be fast, but ordinary block LUNs do not provide safe shared file semantics without an appropriate clustered file system.

  • ECN tuning may help with congestion, but telemetry shows no congestion or loss to validate a network cause.
  • Shared block access can risk corruption unless a clustered file system or application-aware coordination is used.
  • More GPU memory does not address missing checkpoints or I/O stalls caused by the storage access model.

Question 7

Topic: AI Infrastructure Components and Architecture

An enterprise is expanding an on-premises generative AI training environment. The new GPU nodes will triple accelerator count, but the existing data hall has limited floor space, marginal chilled-air capacity, and a sustainability target to reduce energy per completed training job. The design must keep the workload on premises for data residency and avoid a new data hall build. Which design best maps to these requirements?

Options:

  • A. Spread lower-density air-cooled GPU servers across legacy rows

  • B. Use denser GPU racks with liquid cooling and facility telemetry

  • C. Add CPU-only compute nodes and limit GPU job concurrency

  • D. Move training jobs to public cloud GPU instances

Best answer: B

Explanation: Dense AI growth creates a facility tradeoff: more GPUs improve throughput, but they also increase rack power density, heat rejection, and space pressure. When floor space is constrained and chilled-air cooling is already marginal, simply adding more air-cooled racks can increase PUE and may still fail cooling requirements. A better sustainability design uses higher-density GPU racks with cooling appropriate for the heat load, such as liquid cooling, and adds telemetry for power, thermal behavior, and energy per completed training job. This supports capacity growth without building a new data hall and keeps data-resident workloads on premises. The key is improving useful work per watt, not just adding more equipment.

  • Legacy row spreading uses available space inefficiently and can worsen airflow and energy efficiency under marginal chilled-air capacity.
  • CPU-only expansion does not satisfy the GPU-driven training growth requirement and reduces AI throughput.
  • Public cloud migration may help capacity, but it violates the stated on-premises data residency requirement.

Question 8

Topic: AI Infrastructure Components and Architecture

A data center team is validating a planned expansion for an on-premises AI training pod. Four GPU servers rated at 12 kW each will be added. The site standard requires N+1 power and cooling capacity during maintenance. During a trial with one cooling unit out of service, Intersight reports rising server inlet temperatures and GPU throttling, while RoCE and storage telemetry remain normal.

MetricValue
Current critical IT load432 kW
Planned added GPU load48 kW
UPS usable capacity with N+1500 kW
Cooling usable capacity with N+1450 kW

Which finding best explains why the expansion should not proceed as planned?

Options:

  • A. UPS N+1 capacity is exceeded by the planned load.

  • B. RoCE congestion is causing the GPU throttling.

  • C. Storage bandwidth is limiting the benchmark.

  • D. Cooling N+1 capacity is exceeded by the planned load.

Best answer: D

Explanation: For AI infrastructure expansion, reliability must be checked against the capacity available while preserving N+1, not just normal operating capacity. The planned load is 432 kW + 48 kW = 480 kW. That is below the UPS N+1 usable capacity of 500 kW, so power can support the expansion under the stated reliability requirement. However, the same IT load becomes heat that cooling must remove, and 480 kW exceeds the 450 kW N+1 cooling capacity. The observed inlet temperature rise and GPU throttling during a one-unit-maintenance condition support a thermal capacity issue rather than a network or storage bottleneck.

  • UPS overload fails because 480 kW remains within the 500 kW usable UPS capacity for N+1.
  • RoCE congestion is not supported because network telemetry is reported as normal during the trial.
  • Storage bottleneck is not supported because storage telemetry is normal and the symptoms are thermal.

Question 9

Topic: AI Infrastructure Components and Architecture

An AI platform team is deploying a shared fabric for two workloads: distributed GPU training that uses RoCEv2 for node-to-node gradient exchange, and RAG inference that receives API requests and retrieves data from a vector database in another rack. Operations must distinguish fabric congestion, storage latency, and application errors without inspecting payloads. Which monitoring design best maps to these communication patterns?

Options:

  • A. Use only GPU utilization, NVLink status, and application logs from the servers.

  • B. Collect fabric flow telemetry and QoS counters from GPU-facing leaves and spines, then correlate them with storage and service health.

  • C. Mirror all GPU node links for full-packet capture and centralized payload inspection.

  • D. Monitor only the internet edge, load balancers, and API error rates.

Best answer: B

Explanation: Distributed training over RoCEv2 creates heavy east-west traffic between GPU nodes, so troubleshooting needs visibility inside the data center fabric, especially on GPU-facing leaf switches and spines. Useful signals include flow telemetry, queue depth, ECN marking, PFC pause activity, drops, and latency by traffic class. RAG inference also has north-south API traffic and storage or vector database dependencies, so fabric telemetry should be correlated with storage and service health. This approach avoids payload inspection while still showing whether the symptom is caused by congestion, storage delay, or application behavior. Edge-only or server-only monitoring misses key fabric paths.

  • Edge-only monitoring can see API symptoms but misses east-west RoCEv2 congestion between GPU nodes.
  • Full-packet capture adds overhead and conflicts with the requirement to avoid payload inspection.
  • Server-only signals may show GPU or application impact but cannot prove fabric queueing, drops, or congestion.

Question 10

Topic: AI Infrastructure Components and Architecture

A team is reviewing an on-premises AI training design for Cisco UCS GPU servers. The workload performs distributed fine-tuning with frequent shared dataset reads and checkpoint writes. The design must scale from 8 to 16 servers without reducing GPU utilization.

Architecture factCurrent design
GPU nodes8 servers, 8 GPUs per server, NVLink within each server
AI fabricRedundant 100 GbE RoCEv2 leaf-spine with QoS planned
Shared storageSingle NAS pair, 2 × 25 GbE NFS uplinks
Observed pilotGPUs idle during data load and checkpoint phases

Which design review conclusion is best supported?

Options:

  • A. Replace GPU servers to increase NVLink bandwidth.

  • B. Tune only PFC and ECN on the AI fabric.

  • C. Move the workload to edge inference nodes.

  • D. Redesign shared storage for scalable parallel throughput.

Best answer: D

Explanation: The design facts point to a storage-layer limitation in an AI training architecture. The GPU servers already have local NVLink for intra-server GPU communication, and the AI fabric has redundant 100 GbE RoCEv2 with QoS planned for east-west training traffic. The weak point is the single NAS pair with only 2 × 25 GbE uplinks serving shared reads and checkpoint writes for a cluster that must double in size. GPU idle time during data load and checkpoint phases further supports storage throughput and scalability as the design concern. A better conclusion is to use a storage architecture that can scale bandwidth and availability with the training cluster, such as parallel file or other high-throughput shared storage with redundant paths.

  • GPU replacement misses that the idle periods occur during shared data and checkpoint activity, not proven GPU-to-GPU limits.
  • Fabric-only tuning may help RoCE traffic, but it does not increase NAS throughput or remove the storage head bottleneck.
  • Edge inference changes the workload model and ignores the stated distributed fine-tuning requirement.

Continue with full practice

Use the Cisco 300-640 DCAI Practice Test page for the full IT Mastery practice bank, mixed-topic practice, timed mock exams, explanations, and web/mobile app access.

Try Cisco 300-640 DCAI on Web View Cisco 300-640 DCAI Practice Test

Free review resource

Use the full IT Mastery practice page above for the latest review links and practice page.

Revised on Thursday, May 28, 2026