300-640 DCAI — Cisco Implementing Data Center AI Infrastructure Exam Blueprint
Practical exam blueprint for Cisco Implementing Data Center AI Infrastructure (300-640 DCAI) exam readiness.
How to Use This Exam Blueprint
Use this independent Exam Blueprint to organize your preparation for the Cisco Implementing Data Center AI Infrastructure (300-640 DCAI) exam. It translates likely readiness areas into practical tasks: what you should be able to explain, configure, validate, and troubleshoot.
No exact official weights are assumed here. Treat Cisco’s published exam information as the source of truth for current scope, then use this checklist to test whether you are operationally ready.
A strong candidate should be able to:
- Explain why AI infrastructure stresses the data center differently from traditional enterprise workloads.
- Choose appropriate Cisco data center fabric, compute, storage, automation, and observability approaches for AI workloads.
- Understand lossless Ethernet, RDMA, RoCEv2, congestion control, and QoS design tradeoffs.
- Validate configuration artifacts and troubleshoot degraded training or inference performance.
- Reason through implementation scenarios, not just recall product names.
Topic-area readiness map
| Readiness area | What to review | Ready means you can… |
|---|---|---|
| AI workload fundamentals | Training, inference, GPU clusters, east-west traffic, data pipelines, job behavior | Explain how AI workloads drive bandwidth, latency, storage, and resiliency requirements |
| Data center fabric design | Leaf-spine, ECMP, underlay/overlay, scale-out design, oversubscription, failure domains | Select a fabric approach for GPU clusters and justify capacity, redundancy, and operational tradeoffs |
| RDMA and lossless Ethernet | RoCEv2, PFC, ECN, DCB concepts, jumbo MTU, queue mapping | Explain end-to-end requirements for low-loss transport and identify misconfiguration risks |
| QoS and congestion management | Classification, marking, queuing, buffer behavior, priority classes, congestion signals | Map AI traffic to queues and troubleshoot drops, pauses, latency, or throughput collapse |
| Cisco data center switching | Nexus switching concepts, NX-OS validation, port channels, routing, VXLAN/EVPN where applicable | Interpret common Cisco show outputs and connect configuration choices to AI fabric behavior |
| Compute and GPU platform integration | GPU servers, NICs, DPUs, PCIe, NUMA awareness, firmware, drivers, power and cooling | Identify platform dependencies that affect AI job performance and reliability |
| Storage and data movement | File, object, block, distributed storage, caching, ingest, dataset locality | Match storage patterns to AI workload needs and detect storage bottlenecks |
| Automation and orchestration | Templates, APIs, infrastructure as code, fabric controllers, validation pipelines | Describe how repeatable provisioning reduces risk in large GPU-cluster deployments |
| Observability and telemetry | Streaming telemetry, interface counters, queue metrics, logs, flow data, GPU/node metrics | Build a troubleshooting view that connects application symptoms to infrastructure signals |
| Security and segmentation | AAA, RBAC, VRFs, tenant isolation, management-plane protection, secure automation | Apply least privilege and segmentation without breaking high-performance AI workflows |
| Operations and lifecycle | Firmware alignment, change windows, rollback plans, capacity planning, documentation | Plan safe changes in an environment where small mismatches can cause major performance loss |
AI infrastructure fundamentals
Core concepts to know
- Difference between AI training, fine-tuning, inference, batch inference, and interactive inference.
- Why distributed training creates heavy east-west traffic between GPU nodes.
- Why storage throughput and data preprocessing can bottleneck expensive GPU capacity.
- How low latency, high bandwidth, and predictable packet delivery affect training completion time.
- Why “link is up” does not mean “fabric is healthy” for AI workloads.
- How application behavior, framework communication patterns, NIC behavior, and network design interact.
- Why tail latency and microbursts matter for synchronized distributed jobs.
- How failure of one node, link, queue, or path can reduce the efficiency of an entire training job.
Can you explain these distinctions?
| Prompt | You should be able to answer |
|---|---|
| Training vs inference | Which one is more likely to require large-scale GPU-to-GPU communication? Which one is more latency-sensitive to users? |
| North-south vs east-west | Why do AI clusters often emphasize east-west bandwidth inside the fabric? |
| Bandwidth vs latency | When is raw throughput the main concern, and when does jitter or tail latency become critical? |
| Oversubscription | Why might a traditional oversubscription ratio be unacceptable for a large training cluster? |
| Storage vs network bottleneck | How would symptoms differ if GPUs are waiting for data rather than waiting on inter-node communication? |
| Resiliency vs performance | Why can a redundant design still perform poorly if traffic hashing or queue policy is wrong? |
Data center fabric design for AI workloads
Fabric architecture checklist
- Understand the role of leaf, spine, border, management, and out-of-band components.
- Explain why scale-out fabrics are commonly used for GPU clusters.
- Identify where ECMP helps and where it does not solve congestion by itself.
- Know how link speed, port count, cabling, optics, and spine capacity affect cluster scale.
- Understand failure-domain design: rack, leaf pair, spine, power domain, management domain.
- Review when VXLAN/EVPN, VRFs, VLANs, and routed underlays may appear in a data center design.
- Understand why consistent MTU and QoS treatment must be end-to-end for RDMA-style workloads.
- Know the difference between production data traffic, storage traffic, management traffic, telemetry traffic, and control-plane traffic.
- Be able to reason about adding a rack of GPU servers without creating hidden bottlenecks.
Capacity and oversubscription checks
Be ready to calculate or reason through basic bandwidth relationships. You do not need invented exam-specific numbers; focus on the method.
\[ \text{Oversubscription ratio} = \frac{\text{Total server-facing bandwidth}}{\text{Total fabric-facing bandwidth}} \]Use this to answer scenario questions such as:
- If each rack adds more GPU nodes, do spine uplinks still provide enough aggregate capacity?
- Does a proposed design meet the expected traffic pattern, or only the link-speed requirement?
- Which is the bottleneck: server NIC, leaf uplink, spine capacity, storage path, or application pipeline?
- What happens to effective bandwidth during a link or spine failure?
- Are traffic patterns balanced enough for ECMP to use available paths efficiently?
Design decision prompts
| Scenario | Decision points |
|---|---|
| New GPU training pod | Leaf-spine capacity, nonblocking requirements, cabling plan, power/cooling, management access, telemetry baseline |
| Expanding an existing fabric | Available spine ports, uplink utilization, QoS consistency, route scale, automation template changes |
| Mixed AI and general workloads | Traffic isolation, QoS classes, tenant segmentation, storage access, noisy-neighbor controls |
| Multi-tenant AI platform | VRF or segmentation model, RBAC, quotas or policy boundaries, observability per tenant |
| High-throughput storage access | Network path to storage, storage protocol behavior, cache strategy, congestion domain |
| Latency-sensitive inference | Placement close to consumers, load-balancing behavior, failure handling, monitoring of tail latency |
RDMA, RoCEv2, and lossless Ethernet readiness
Concepts to master
- What RDMA is intended to provide and why AI workloads may benefit from it.
- How RoCEv2 relies on IP networking while still requiring careful loss and congestion handling.
- Why packet drops can severely affect RDMA traffic compared with ordinary TCP applications.
- The purpose and risk of Priority Flow Control.
- The purpose of Explicit Congestion Notification.
- How traffic classification and marking must remain consistent across hosts, switches, and paths.
- Why jumbo MTU mismatch can cause hard-to-diagnose performance issues.
- Why PFC should be scoped carefully and not treated as a universal “make everything lossless” setting.
- How congestion spreading, pause storms, or head-of-line blocking can harm an AI fabric.
- How host NIC settings and switch QoS policies must align.
PFC, ECN, and QoS comparison
| Control | Purpose | What to watch for |
|---|---|---|
| QoS classification | Places traffic into the intended class or queue | Wrong DSCP/CoS marking, remarking at boundaries, mixed traffic in lossless queues |
| Priority Flow Control | Pauses traffic for selected priorities to avoid loss | Pause propagation, head-of-line blocking, enabling too broadly |
| ECN | Signals congestion before drops occur | Threshold mismatch, host response behavior, inconsistent configuration |
| Queuing policy | Allocates bandwidth and scheduling behavior | Starvation, wrong queue mapping, insufficient buffer allocation |
| MTU | Supports larger frames where required | End-to-end mismatch, silent fragmentation or drops, host-switch inconsistency |
| Congestion monitoring | Detects early signs of fabric stress | Ignoring queue depth, pause counters, microbursts, and retransmission symptoms |
End-to-end RoCE readiness checklist
- Confirm server NIC, OS, driver, and firmware expectations.
- Confirm switch interface speed, optics, cabling, and error counters.
- Confirm MTU consistency from host to host across all paths.
- Confirm DSCP/CoS marking at the host and preservation across the fabric.
- Confirm queue mapping and lossless class configuration.
- Confirm PFC is applied only where intended.
- Confirm ECN behavior and congestion thresholds are consistent with the design.
- Confirm routing and ECMP paths are symmetric enough for expected behavior.
- Confirm storage or management traffic is not competing inside the same lossless class.
- Confirm telemetry exists for drops, pause frames, queue depth, and utilization.
Cisco data center switching and fabric operations
For the Cisco Implementing Data Center AI Infrastructure (300-640 DCAI) exam, be comfortable connecting Cisco data center implementation concepts to AI infrastructure outcomes. Do not study commands in isolation; study what each command proves.
Cisco-oriented readiness tasks
- Identify the role of Cisco Nexus switching in an AI data center fabric.
- Interpret common NX-OS interface, routing, port-channel, QoS, and overlay validation outputs.
- Understand how underlay routing supports equal-cost path selection and resilient forwarding.
- Understand VXLAN/EVPN terminology where it appears in data center fabric designs.
- Explain how VLANs, VNIs, VRFs, and route targets relate in overlay scenarios.
- Validate that physical links, transceivers, port channels, and neighbors match the intended design.
- Check that configuration is consistent across redundant fabric devices.
- Recognize when a problem is physical, Layer 2, Layer 3, overlay, QoS, host, or application related.
- Know why management-plane access, AAA, logging, and change control matter during fabric implementation.
Command and validation readiness
Be able to explain what you would look for in outputs similar to these. Exact syntax can vary by platform, software release, and configuration style.
show interface ethernet x/y
show interface ethernet x/y counters errors
show interface ethernet x/y transceiver details
show lldp neighbors
show port-channel summary
show running-config interface ethernet x/y
show ip route
show ip bgp summary
show bgp l2vpn evpn summary
show nve peers
show nve vni
show policy-map interface ethernet x/y
show queuing interface ethernet x/y
show interface ethernet x/y priority-flow-control
show logging log
What each validation should prove
| Validation target | Evidence to look for |
|---|---|
| Physical health | Link speed, duplex where relevant, optics status, CRC/errors, flaps, FEC-related symptoms |
| Cabling correctness | LLDP neighbor matches design, expected leaf/server/spine adjacency |
| Port-channel health | Members bundled, no suspended links, hashing suitable for expected flows |
| Underlay routing | Expected adjacencies, route reachability, ECMP paths present |
| Overlay status | Peers established, VNIs present, endpoints or routes learned as expected |
| QoS and queues | Traffic in expected class, drops or pauses understood, policy applied at correct interface |
| PFC/ECN behavior | Pause counters and congestion signals consistent with design, not unexpectedly increasing |
| Management readiness | AAA, logging, time synchronization, telemetry, backup, and rollback access available |
Compute, GPU, and platform integration
Compute readiness checklist
- Understand the relationship between GPU, CPU, memory, NIC, PCIe, storage, and operating system.
- Explain why a GPU server can be network-bound, storage-bound, CPU-bound, or thermally constrained.
- Know why NIC placement, PCIe topology, and NUMA locality can affect performance.
- Review the role of firmware, BIOS settings, drivers, CUDA or accelerator software stacks where applicable.
- Know how DPUs or smart NICs may affect networking, security, telemetry, or offload behavior.
- Identify why consistent firmware and driver baselines matter across a training cluster.
- Understand high-level Kubernetes or scheduler interactions if AI workloads are containerized.
- Recognize that node health includes GPU, NIC, disk, thermal, power, and OS signals.
Platform dependency table
| Component | AI infrastructure concern | Exam-style readiness cue |
|---|---|---|
| GPU | Utilization, memory, interconnect, thermal limits | Can you explain why GPUs are idle even when the job is running? |
| CPU | Data preprocessing, orchestration, interrupts, storage stack | Can you identify when the CPU is starving the GPU? |
| NIC | RDMA support, speed, queueing, offloads, firmware | Can you align NIC settings with fabric QoS? |
| PCIe / local interconnect | Bandwidth path between GPU, CPU, NIC, and storage | Can you spot a placement or topology bottleneck? |
| Memory | Dataset staging, host memory pressure, page behavior | Can you explain memory pressure symptoms? |
| Local storage | Caching, temporary files, checkpointing | Can you separate storage delay from network delay? |
| Power and cooling | Sustained GPU performance, throttling, rack density | Can you consider facilities constraints in an implementation plan? |
| Firmware and drivers | Compatibility and stability | Can you plan a safe baseline and rollback strategy? |
Storage and data pipeline readiness
AI infrastructure is not only a network exam topic. If data cannot reach GPUs fast enough, the fabric may look idle while the job still performs poorly.
Storage topics to review
- File, block, and object storage characteristics at a practical level.
- Distributed file systems and parallel read behavior.
- Dataset staging, caching, preprocessing, and checkpointing patterns.
- High-throughput reads for training versus low-latency access for inference.
- Storage network segmentation and QoS interaction.
- Impact of many small files versus fewer large objects.
- Backup, replication, and recovery expectations for datasets and model artifacts.
- Metadata bottlenecks and control-plane pressure in large data pipelines.
- Data locality: when moving compute to data is better than moving data to compute.
Storage decision checks
| If the scenario says… | Think about… |
|---|---|
| GPUs are underutilized | Data loader, storage throughput, preprocessing, CPU saturation, network path to storage |
| Training slows during checkpoints | Storage write path, burst handling, queue depth, shared fabric congestion |
| Many jobs read the same dataset | Caching strategy, object/file layout, storage fan-out, metadata scaling |
| Inference response time is inconsistent | Model loading, cache misses, storage latency, network path, autoscaling behavior |
| Storage traffic competes with RDMA traffic | Segmentation, QoS class separation, queue policy, congestion isolation |
| Dataset transfer affects training | Scheduling bulk transfers, rate limiting, path isolation, telemetry alerts |
Automation, orchestration, and repeatability
Automation readiness checklist
- Explain why manual per-switch configuration is risky in large AI fabrics.
- Understand the purpose of templates, golden configurations, and configuration drift detection.
- Recognize where APIs, CLI automation, infrastructure as code, and controller-based workflows fit.
- Know how Cisco data center management tools may be used for fabric operations where applicable.
- Understand pre-change validation and post-change verification.
- Be able to describe a safe deployment pipeline for adding racks, VLANs, VRFs, QoS policies, or telemetry.
- Know how to automate without exposing credentials or bypassing change control.
- Understand idempotency at a practical level: rerunning automation should not create unintended changes.
- Know the difference between intended state, running state, and observed state.
Implementation artifact checklist
| Artifact | What it should contain |
|---|---|
| Physical topology | Rack layout, server-to-leaf mapping, spine links, cabling standards |
| IP address plan | Loopbacks, routed links, management, host networks, storage networks |
| VLAN/VRF/VNI map | Segmentation model, tenant boundaries, routing relationships |
| QoS policy map | Traffic classes, markings, queue mapping, PFC/ECN scope |
| Host configuration standard | NIC settings, MTU, driver/firmware baseline, OS prerequisites |
| Telemetry plan | Interface, queue, routing, overlay, host, GPU, storage, and application metrics |
| Change plan | Scope, dependencies, validation steps, rollback plan, maintenance window |
| Troubleshooting runbook | Symptom-to-signal mapping and escalation path |
| Security plan | AAA, RBAC, secrets handling, management access, audit logging |
| Capacity model | Port use, bandwidth, power, cooling, growth assumptions |
Observability and troubleshooting readiness
Signals to collect and correlate
- Interface utilization, errors, discards, and link flaps.
- Queue occupancy, tail drops, WRED/ECN-related counters where available.
- PFC pause counters by priority.
- Port-channel member state and load distribution.
- Routing adjacency status and route changes.
- Overlay peer and endpoint state where overlays are used.
- Host NIC counters, RDMA counters, driver logs, and OS network statistics.
- GPU utilization, memory use, temperature, and job-level metrics.
- Storage throughput, latency, queue depth, metadata load, and cache hit rate.
- Application logs from training frameworks, inference platforms, or job schedulers.
Troubleshooting workflow
| Step | Question | Examples of evidence |
|---|---|---|
| 1. Define the symptom | Is it slow training, failed job, packet loss, link flap, or inconsistent inference latency? | Job logs, user report, monitoring alert |
| 2. Scope the blast radius | One host, one rack, one fabric, one tenant, or all jobs? | Affected nodes, interfaces, VRFs, queues |
| 3. Check recent change | Was there a config, firmware, cabling, routing, or policy change? | Change records, config diffs, automation logs |
| 4. Validate physical layer | Are links clean and stable? | Errors, optics, LLDP, FEC symptoms, flaps |
| 5. Validate routing/fabric | Are expected paths available? | Routing tables, adjacencies, overlay peers |
| 6. Validate QoS/lossless behavior | Are traffic classes, PFC, ECN, and queues behaving as intended? | Queue counters, pause frames, drops, markings |
| 7. Validate host and storage | Are GPUs waiting on network, CPU, storage, or data pipeline? | GPU metrics, NIC counters, storage metrics |
| 8. Confirm remediation | Did the fix restore performance without creating a new risk? | Before/after metrics, job completion time, alerts |
Security and governance readiness
Security topics to review
- Management-plane isolation and secure administrative access.
- AAA, RBAC, logging, and auditability.
- Least privilege for operators, automation accounts, and service accounts.
- Secure handling of API tokens, SSH keys, certificates, and secrets.
- Segmentation using VRFs, VLANs, policy constructs, or other data center mechanisms.
- Separation of management, storage, tenant, and training traffic where appropriate.
- Secure telemetry export and log retention.
- Firmware and software image integrity.
- Secure baseline configuration for switches, servers, and orchestration systems.
- Change governance for high-impact QoS, routing, and fabric-wide settings.
Security decision prompts
| Scenario | What a ready candidate considers |
|---|---|
| Shared GPU cluster | Tenant isolation, access control, data separation, audit logs |
| Automation account needs device access | Least privilege, credential storage, rotation, command authorization |
| Telemetry platform receives fabric data | Secure transport, RBAC, retention, sensitive metadata exposure |
| Developer needs troubleshooting access | Role-based visibility, temporary access, logging, approval path |
| New storage network added | Segmentation, firewall or policy path, QoS interaction, data protection |
| Emergency change requested | Risk, rollback, approvals, validation, post-change review |
Scenario and decision-point practice
Use these prompts to test whether you can reason through implementation choices.
Scenario 1: Distributed training is slower than expected
Checklist:
- Compare expected vs actual GPU utilization.
- Check whether all nodes are affected or only a rack/subset.
- Review fabric utilization and ECMP path balance.
- Check interface errors, discards, and link flaps.
- Inspect PFC pause counters and queue drops.
- Validate MTU consistency.
- Confirm DSCP/CoS marking and queue mapping.
- Check storage read throughput and data preprocessing load.
- Review recent changes to fabric, host drivers, firmware, or job configuration.
- Confirm whether the problem follows a host, link, rack, or workload.
Scenario 2: RoCE traffic has intermittent performance collapse
Checklist:
- Verify the intended lossless priority is marked correctly at the host.
- Confirm the marking is preserved through the fabric.
- Confirm PFC is enabled only for the intended class.
- Look for pause storms or increasing pause counters.
- Check ECN thresholds and host response behavior.
- Confirm no bulk storage or backup traffic is sharing the lossless queue.
- Validate MTU end-to-end.
- Check for physical errors that could trigger retransmission or degraded performance.
- Correlate congestion events with job timing.
Scenario 3: A new GPU rack is being added
Checklist:
- Confirm available leaf ports, uplink capacity, and spine capacity.
- Confirm cabling, optics, link speeds, and power/cooling readiness.
- Update IP addressing, VLANs, VRFs, VNIs, and routing as needed.
- Apply consistent QoS, PFC, ECN, and MTU policy.
- Validate automation templates before deployment.
- Confirm telemetry onboarding.
- Run post-install validation before accepting production jobs.
- Check whether the new rack changes oversubscription or failure-domain assumptions.
Scenario 4: Inference latency is inconsistent
Checklist:
- Determine whether latency is network, application, storage, or model-loading related.
- Check load balancer or service routing behavior if applicable.
- Review CPU, GPU, memory, and network utilization.
- Inspect tail latency, not just average latency.
- Confirm that autoscaling or scheduling behavior is not moving workloads unpredictably.
- Check whether storage cache misses or model fetches correlate with latency spikes.
- Validate security controls are not introducing unexpected path changes or bottlenecks.
Common weak areas and traps
| Weak area | Why it hurts candidates | How to fix it |
|---|---|---|
| Memorizing terms without traffic reasoning | AI infrastructure questions often require cause-and-effect thinking | Practice mapping workload behavior to fabric, host, and storage signals |
| Treating PFC as universally good | PFC can prevent drops but can also spread congestion | Know where it should apply and what counters reveal trouble |
| Ignoring host configuration | The switch may be correct while NIC, driver, MTU, or marking is wrong | Include host-side validation in every RoCE checklist |
| Looking only at average utilization | AI issues often involve microbursts, queues, pauses, or tail latency | Review queue, pause, and burst-related telemetry |
| Assuming ECMP guarantees balance | Flow hashing and traffic patterns can still create hot spots | Understand path distribution and flow characteristics |
| Forgetting storage | GPUs can be idle because data is not arriving fast enough | Include storage throughput and data pipeline checks |
| Confusing underlay and overlay symptoms | Routing reachability, VXLAN/EVPN state, and endpoint learning are different checks | Build a layered troubleshooting sequence |
| Skipping physical-layer evidence | Bad optics, cabling, or flaps can look like application instability | Start with interface and transceiver health |
| Overlooking change control | Fabric-wide QoS or MTU changes can have broad impact | Pair every change with validation and rollback |
| Studying product names only | The exam title is implementation-oriented | Practice “what would you configure, verify, or troubleshoot?” prompts |
Consolidated “Can you do this?” checklist
Before exam day, you should be able to answer “yes” to most of these.
Architecture and design
- Can you describe a leaf-spine AI fabric and its failure domains?
- Can you explain why AI training traffic is often east-west intensive?
- Can you reason through oversubscription and capacity after a link or spine failure?
- Can you choose when segmentation is needed and what it protects?
- Can you identify when an overlay design changes troubleshooting steps?
- Can you connect cabling, optics, port speed, and topology to cluster scale?
RDMA, QoS, and congestion
- Can you explain RoCEv2 at a practical level?
- Can you distinguish QoS classification, PFC, and ECN?
- Can you identify symptoms of pause-related congestion?
- Can you validate that traffic markings are preserved end-to-end?
- Can you explain why MTU consistency matters?
- Can you troubleshoot queue drops without assuming the application is at fault?
Cisco implementation and validation
- Can you identify the Cisco data center devices and tools relevant to the scenario?
- Can you interpret common NX-OS show commands for interface, port-channel, routing, overlay, and QoS state?
- Can you explain what evidence proves a fabric is ready for AI workload testing?
- Can you separate physical, routing, overlay, QoS, host, and storage problems?
- Can you describe safe implementation steps for adding or changing fabric policy?
Compute, storage, and operations
- Can you explain why GPUs may be idle even when the network is healthy?
- Can you identify host-side dependencies for high-performance networking?
- Can you compare file, block, and object storage concerns for AI workflows?
- Can you plan telemetry that includes switches, hosts, GPUs, storage, and applications?
- Can you describe a rollback plan for a fabric-wide configuration change?
- Can you apply security controls without breaking performance-critical paths?
Final-week checklist
Seven to five days out
- Re-read the Cisco exam identity and current public exam information for Cisco Implementing Data Center AI Infrastructure (300-640 DCAI).
- Build a one-page map of the major readiness areas: AI workloads, fabric, RDMA/QoS, compute, storage, automation, telemetry, security.
- Review your weakest infrastructure layer first, not the topic you already know best.
- Practice explaining PFC, ECN, QoS, MTU, and RoCEv2 out loud in plain language.
- Review topology diagrams and trace traffic from GPU node to GPU node and GPU node to storage.
Four to two days out
- Work through troubleshooting scenarios without looking at notes.
- Review Cisco data center validation commands and what each output proves.
- Practice identifying whether a symptom is physical, routing, overlay, QoS, host, storage, or application related.
- Review automation and change-control workflows.
- Revisit security, management-plane, and telemetry topics.
Final day
- Do a quick pass through your personal weak-area notes.
- Review definitions only after scenario practice, not instead of it.
- Memorize no unsupported numbers, quotas, or weights.
- Focus on decision logic: what would you check first, what evidence confirms it, and what risk does the fix introduce?
- Rest enough to read scenario wording carefully.
Practical next step
Use this Exam Blueprint as a gap-finding tool: mark each item as confident, needs review, or needs hands-on practice. Then spend your next study session on scenario questions and lab-style validation tasks for the weakest marked areas, especially RDMA/QoS behavior, Cisco fabric verification, and AI workload troubleshooting.