Comprehensive CCAAK quick reference: Kafka cluster architecture, broker/topic configs, replication/ISR durability rules, security (TLS/SASL/ACLs), monitoring signals, and safe operational playbooks.
Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate instincts.
flowchart LR
P["Partition leader"] --> F1["Follower replica"]
P --> F2["Follower replica"]
F1 --> ISR["ISR set"]
F2 --> ISR
High-yield rule: durability choices are mostly about acks (producer) + min.insync.replicas (topic/broker).
| You want… | Do this | Why |
|---|---|---|
| More consumer parallelism | Increase partitions | One consumer per partition per group |
| Higher durability | Use higher replication factor | More copies; better fault tolerance |
| “Changelog” style topic | Enable compaction | Keeps latest value per key |
| Audit/event log | Use retention | Keep full history for N days/size |
Ordering reminder: ordering is per partition, not across partitions.
| Config | What it controls | Notes |
|---|---|---|
cleanup.policy | delete vs compact | Compaction for latest-by-key streams |
retention.ms / retention.bytes | Delete policy bounds | Applies when cleanup.policy=delete |
min.insync.replicas | Required ISR for acks=all | Too high can reduce availability |
unclean.leader.election.enable | Allow data-loss failover | Usually false for durability |
segment.ms / segment.bytes | Log segment roll | Affects compaction/retention behavior |
max.message.bytes | Max record size | Protects brokers from huge messages |
compression.type | Broker-side compression | Usually set by producer; broker may enforce |
| Setting | Why it matters | Typical pitfall |
|---|---|---|
listeners | Where broker binds | Wrong interface/port |
advertised.listeners | What clients use | Wrong hostname → clients can’t connect |
listener.security.protocol.map | TLS/SASL mapping | Mismatch between listeners and protocols |
inter.broker.listener.name | Broker-to-broker traffic listener | Incorrect security settings break replication |
| Setting | Why it matters | Typical pitfall |
|---|---|---|
log.dirs | Where partition logs live | Disk fills → ISR shrink/URP |
num.network.threads / num.io.threads | Throughput | Too low for high traffic |
socket.*.bytes | Network buffers | Can throttle throughput if too small |
| Control | What it provides | Examples |
|---|---|---|
| TLS | Encryption in transit | SSL listeners, certs, truststores |
| SASL | Authentication | SASL_PLAINTEXT, SASL_SSL with mechanisms |
| ACLs | Authorization | topic read/write, group access |
Remember: Consumers typically need topic READ + group access permissions to operate.
1# List topics
2kafka-topics --bootstrap-server <broker:9092> --list
3
4# Describe a topic (partitions, ISR, leaders)
5kafka-topics --bootstrap-server <broker:9092> --describe --topic <topic>
6
7# Describe consumer group lag
8kafka-consumer-groups --bootstrap-server <broker:9092> --describe --group <group>
9
10# View or alter topic configs
11kafka-configs --bootstrap-server <broker:9092> --entity-type topics --entity-name <topic> --describe
Mental model: almost every operational question reduces to: what is the cluster state → what is unsafe → what is the least risky next step.
Most common causes:
This is more severe:
Common causes:
Controller (cluster metadata leader) • ISR (in-sync replicas) • URP (under replicated partitions) • Leader election (choosing partition leader) • Compaction (latest per key) • ACL (authorization rules).