CCAAK Syllabus — Learning Objectives by Topic

Blueprint-aligned learning objectives for CCAAK (Confluent Certified Administrator for Apache Kafka), organized by topic with quick links to targeted practice.

Use this syllabus as your source of truth for CCAAK. Work topic-by-topic, and drill questions after each section.

What’s covered

Topic 1: Kafka Architecture & Core Concepts (Admin View)

Practice this topic →

1.1 Brokers, controllers, and cluster metadata

  • Describe the role of a broker and how brokers host partition replicas.
  • Explain the controller’s responsibility for partition leadership and cluster metadata at a high level.
  • Differentiate between data plane traffic (produce/consume) and control plane operations (metadata, elections).
  • Explain how clients use bootstrap servers and why multiple bootstrap brokers are recommended.
  • Describe how controller instability can surface as client errors, leader changes, and operational alerts.
  • Recognize Zookeeper-mode vs KRaft-mode clusters conceptually (metadata quorum vs external ZK).
  • Given a scenario, identify which component is most likely responsible for a leadership/metadata symptom.

1.2 Topics, partitions, offsets, and ordering

  • Explain how a topic is partitioned and why partitions are the unit of parallelism and ordering.
  • Define an offset and describe how offsets represent position within a partition log.
  • Describe ordering guarantees (per partition) and why ordering is not preserved across partitions.
  • Explain how record keys affect partition selection and why key choice impacts load and ordering.
  • Describe consumer groups and the one-consumer-per-partition rule within a group.
  • Identify how adding partitions affects key→partition mapping and consumer parallelism.
  • Given a scenario, choose a partitioning approach that balances parallelism with ordering requirements.

1.3 Replication, ISR, and durability trade-offs

  • Define replication factor and describe leader/follower replication at a high level.
  • Explain the in-sync replica (ISR) concept and why it matters for durability and acknowledgements.
  • Describe under-replicated partitions (URP) and what it implies about cluster health.
  • Explain how producer acknowledgements and topic min ISR interact conceptually (durability vs availability).
  • Describe what unclean leader election means and why it can lead to data loss.
  • Identify the operational symptoms of ISR shrink (URP, increased risk, durability constraints).
  • Given a scenario, choose safer durability defaults for production topics.

Topic 2: Broker Configuration & Cluster Setup

Practice this topic →

2.1 Listener configuration and client connectivity

  • Explain the purpose of listeners and advertised listeners in broker networking.
  • Identify common connectivity failures caused by incorrect advertised hostnames or ports.
  • Describe listener security protocol mapping at a high level (PLAINTEXT, SSL, SASL_SSL).
  • Explain the role of the inter-broker listener and why it must be consistent across brokers.
  • Describe how DNS, load balancers, and NAT can affect advertised listener design.
  • Given a scenario, diagnose why clients can connect to one broker but not others.
  • Choose a listener design that supports internal vs external client access safely.

2.2 Storage, log directories, and retention basics

  • Explain what log directories store and how partitions map to files on disk.
  • Identify the operational risks of disk pressure and why free space must be protected.
  • Describe retention and compaction at a high level and how they affect disk usage.
  • Explain segment roll settings conceptually and why segment size/time impacts compaction and retention behavior.
  • Describe the difference between broker defaults and topic-level overrides for retention/compaction.
  • Given a scenario, choose the fastest safe mitigation for a disk pressure incident.
  • Explain why extremely large records can create broker instability and how max message size settings help.

2.3 Performance and capacity planning (admin level)

  • Explain how partitions affect throughput and why too many partitions can add overhead.
  • Describe how replication factor affects write amplification and storage needs.
  • Identify key bottlenecks for Kafka clusters (disk I/O, network, CPU) and the signals they produce.
  • Explain why controller and metadata operations can become bottlenecks in very large clusters.
  • Describe the purpose of quotas conceptually and when they are used to protect shared clusters.
  • Given a scenario, choose a scale strategy: add brokers vs increase partitions vs tune producer/consumer behavior.
  • Recognize common anti-patterns: oversized partitions, uncontrolled topic sprawl, and unbounded retention.

Topic 3: Topic Lifecycle & Data Management

Practice this topic →

3.1 Creating topics: partitions, replication factor, and placement

  • Choose an appropriate partition count based on target consumer parallelism and expected throughput.
  • Choose an appropriate replication factor based on fault tolerance requirements and broker count.
  • Explain why replication factor cannot exceed available brokers and what happens when brokers are unavailable.
  • Describe rack awareness conceptually and when it matters for failure domain separation.
  • Explain the operational trade-offs of increasing partitions after a topic is in use.
  • Given a scenario, choose a topic configuration that balances ordering, throughput, and resilience.
  • Explain why many small topics can be operationally expensive and how consolidation decisions are made.

3.2 Topic configs: retention, compaction, and durability controls

  • Differentiate cleanup policies (delete vs compact) and map them to event stream intent.
  • Configure and reason about time/size retention settings and their impact on storage.
  • Explain how compaction interacts with keys and why keyless records are not compacted as expected.
  • Describe min ISR as a durability control and the availability consequences of setting it too high.
  • Explain the trade-offs of unclean leader election and when it should be avoided.
  • Given a scenario, decide whether a topic should be a changelog (compact) or an event log (delete).
  • Describe how topic-level overrides interact with broker defaults.

3.3 Partition reassignment and leader balancing (operational intent)

  • Explain why partition reassignment is performed (add brokers, balance load, decommission brokers).
  • Describe the risk of moving large partitions and how throttling/stepwise moves reduce blast radius.
  • Explain preferred leader election at a high level and why leader imbalance can occur.
  • Identify the difference between balancing replicas vs balancing leaders and why both matter.
  • Recognize when reassignment can temporarily increase network and disk load.
  • Given a scenario, pick a safe sequence for broker decommissioning and partition movement.
  • Describe the post-change verification steps: replication health, ISR, and client impact checks.

Topic 4: Security (TLS, SASL, ACLs) & Access Control

Practice this topic →

4.1 TLS and encryption in transit (admin level)

  • Explain why TLS is used between clients and brokers and between brokers.
  • Differentiate server authentication from mutual TLS (mTLS) at a conceptual level.
  • Identify common TLS misconfigurations (truststore issues, hostname mismatch, wrong listener protocol).
  • Describe certificate rotation risk and the importance of staged rollout and validation.
  • Explain how TLS settings affect client connection troubleshooting (handshake failures).
  • Given a scenario, choose a safe rollout approach for enabling TLS on a running cluster.
  • Recognize which endpoints typically require TLS (brokers, Connect, Schema Registry) in secure deployments.

4.2 Authentication with SASL (high level) and common patterns

  • Differentiate authentication (who) from authorization (what) in Kafka access control.
  • Explain SASL’s role at a high level and recognize that mechanisms vary by environment.
  • Identify how SASL configuration interacts with listeners and security protocols.
  • Recognize common auth failures from the client perspective (invalid credentials, mechanism mismatch).
  • Explain why separating admin, producer, and consumer identities reduces blast radius.
  • Given a scenario, determine whether a failure is due to TLS, SASL authentication, or ACL authorization.
  • Describe safe secret handling for credentials (no hardcoding, rotate, use secret managers).

4.3 Authorization with ACLs and least privilege

  • Describe what ACLs control conceptually (topic read/write, group access, cluster actions).
  • Identify the minimum permissions for a producer-only application vs consumer-only application.
  • Explain why consumers often need both topic READ and group permissions to operate correctly.
  • Recognize how wildcard ACLs increase risk and how to scope ACLs safely.
  • Describe super user concepts at a high level and why they should be tightly controlled.
  • Given a scenario, select the ACL changes required to fix an authorization error without over-granting.
  • Explain why auditing and logging access changes matters for compliance.

Topic 5: Monitoring, Metrics & Observability

Practice this topic →

5.1 Health indicators: URP, offline partitions, and controller behavior

  • Differentiate under-replicated partitions from offline partitions and explain severity differences.
  • Identify the common causes of URP (broker down, disk/network issues, load).
  • Explain why offline partitions indicate missing leadership and require immediate attention.
  • Recognize symptoms of controller churn and why it destabilizes leadership and metadata operations.
  • Describe the relationship between ISR shrink and durability constraints (min ISR vs availability).
  • Given a scenario, choose the first diagnostic step for URP/offline partitions (logs, broker status, disk).
  • Describe safe remediation sequencing: stabilize brokers, restore replication, then optimize.

5.2 Throughput and latency monitoring (broker and client signals)

  • Identify key broker-side signals for throughput/latency (request rates, request times, network, disk).
  • Describe how disk I/O bottlenecks surface in lag, replication delays, and increased request latency.
  • Explain why network saturation can cause replication lag and producer timeouts.
  • Describe how client configs can amplify load (too many requests, tiny batches, aggressive fetches).
  • Given a scenario, determine whether a bottleneck is broker CPU, disk, or network based on symptoms.
  • Identify safe tuning levers: adjust quotas, reduce retention pressure, scale brokers, or shift workload.
  • Explain why monitoring should include both broker metrics and client behavior (producer/consumer metrics).

5.3 Consumer group monitoring and lag diagnosis

  • Define consumer lag and explain why it is a symptom, not a root cause.
  • Identify common lag causes: insufficient partitions, slow processing, downstream dependencies, rebalances.
  • Explain how frequent rebalances can look like lag spikes and unstable processing.
  • Describe how max poll interval and session timeouts relate to consumer stability (high level).
  • Given a scenario, choose a fix for lag: scale consumers, increase partitions, or optimize processing.
  • Identify why consumer group metrics and offsets are essential for incident triage.
  • Explain why backlogs can be ‘normal’ during planned maintenance and how to plan for catch-up.

Topic 6: Troubleshooting & Incident Response

Practice this topic →

6.1 Client connectivity and authentication failures

  • Diagnose common connectivity failures caused by wrong advertised listeners or DNS/route issues.
  • Differentiate TLS handshake failures from SASL authentication failures based on symptoms.
  • Identify authorization failures (ACLs) vs authentication failures and choose the right remediation.
  • Explain how misaligned security protocol configuration across listeners can break inter-broker replication.
  • Given a scenario, choose the fastest safe way to validate connectivity (test client, logs, listener checks).
  • Describe safe incident practice: don’t disable security controls broadly to ‘make it work’.
  • Explain why configuration changes should be staged and validated with a controlled client.

6.2 Replication issues, ISR shrink, and leader election problems

  • Identify the most common causes of ISR shrink and replication lag (disk, network, broker instability).
  • Explain why increasing timeouts can mask symptoms but not fix root causes.
  • Describe safe steps to recover replication health after a broker failure (bring broker back, reassign, verify).
  • Explain why unclean leader election is risky and how it trades durability for availability.
  • Given a scenario, choose the safest remediation path for URP and leadership churn.
  • Describe verification steps after recovery: ISR size, leader distribution, client error rates, lag trends.
  • Explain why incident response should prioritize stabilizing the cluster before tuning performance.

6.3 Disk pressure, retention incidents, and topic sprawl

  • Identify early warning signals for disk pressure and why it quickly cascades into replication issues.
  • Explain safe immediate mitigations (free space, reduce ingestion, temporarily adjust retention) and risks.
  • Describe how retention and compaction affect disk usage differently and how to pick the right lever.
  • Explain why deleting log segments is not a safe manual fix and what policies should be used instead.
  • Given a scenario, choose a remediation plan that restores safety first, then optimizes long-term settings.
  • Describe how to prevent recurrence: capacity planning, guardrails, quotas, topic lifecycle governance.
  • Explain why topic sprawl increases operational load and how naming/ownership policies help.

Topic 7: Maintenance, Upgrades & Ecosystem Operations

Practice this topic →

7.1 Safe maintenance: rolling restarts and configuration changes

  • Explain why rolling restarts reduce downtime and how they preserve availability.
  • Describe a safe rolling restart sequence and what to verify between broker restarts.
  • Recognize when maintenance should be postponed due to cluster health (URP, disk pressure, controller churn).
  • Explain why config changes should be small, reversible, and paired with validation steps.
  • Given a scenario, choose the safest next action during maintenance when errors appear (pause, verify, rollback).
  • Describe how to communicate and plan for lag/backlog during maintenance windows.
  • Explain why automation should include guardrails and health checks before proceeding.

7.2 Upgrades and compatibility (high level)

  • Explain why upgrades are typically performed in a rolling fashion to reduce impact.
  • Describe the importance of version compatibility between brokers and clients at a high level.
  • Identify why testing in lower environments is necessary before production upgrades.
  • Explain how to evaluate upgrade risk: feature flags, protocol compatibility, and operational changes.
  • Given a scenario, choose a safer upgrade approach (staged rollout, verification, backout plan).
  • Describe post-upgrade verification: controller stability, replication health, client error rates, throughput.
  • Recognize when ecosystem components (Connect, Schema Registry) require coordinated upgrades.

7.3 Confluent ecosystem services (admin awareness)

  • Describe the purpose of Schema Registry and why schema governance reduces breaking changes.
  • Explain Kafka Connect’s role and operational considerations (distributed mode, tasks, error handling) at a high level.
  • Describe how Control Center (or monitoring tools) support visibility into cluster health and consumer lag.
  • Recognize common operational failure modes for Connect and Schema Registry (connectivity, auth, schema incompatibility).
  • Given a scenario, decide whether an issue is likely broker-side or ecosystem-service-side.
  • Describe why multi-component security must be consistent (TLS/SASL) across Kafka and supporting services.
  • Explain how incident response should include verifying dependencies, not just broker health.

Tip: After finishing a topic, take a 15–25 question drill focused on that area, then revisit weak objectives before moving on.