CCDAK Cheatsheet — Kafka Producers/Consumers, Offsets, Semantics & Schema Basics

Comprehensive CCDAK quick reference: Kafka architecture mental models, producer/consumer configs, consumer groups and rebalancing, offset management, delivery semantics (idempotence/transactions), and schema evolution basics.

On this page

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate speed/accuracy.

1) Kafka mental models (the exam is testing these)

The unit-of-ordering rule

Kafka preserves order only within a single partition.
If you need ordering for a key/entity, ensure all its records route to the same partition (usually by key).

Record anatomy

Field	Meaning	Why it matters
Topic	Logical stream name	Partition count drives parallelism
Partition	Sub-stream shard	Ordering + parallelism boundary
Offset	Position in a partition log	Consumer progress marker
Key	Used for partitioning (default)	Controls ordering affinity
Value	Payload	Serialization choice matters
Headers	Metadata	Tracing/versioning/hints
Timestamp	Create/append time	Useful for time-based processing

    flowchart LR
	  P["Producer"] --> T["Topic"]
	  T --> P0["Partition 0: offsets 0..n"]
	  T --> P1["Partition 1: offsets 0..n"]
	  C0["Consumer"] --> P0
	  C1["Consumer"] --> P1

2) Partitions, keys, and throughput (high-yield table)

You want…	Do this	Why
Ordering per customer/order/user	Use a stable key (e.g., `customer_id`)	Same key → same partition
Higher parallelism	Increase partitions	More partitions → more consumers can work
Avoid hot partitions	Use a better key / custom partitioner	Skewed keys create bottlenecks
Predictable consumer scaling	Partitions ≥ max consumer count	One partition can be consumed by only one consumer in a group

Rule: A consumer group can have at most one consumer per partition (extra consumers idle).

3) Producer essentials (configs you must recognize)

Producer reliability and performance pickers

Goal	Key settings	Notes
Lowest latency	low `linger.ms`, smaller batches	More requests, higher overhead
Highest throughput	`linger.ms` + larger `batch.size`	More batching, higher latency
Strong durability	`acks=all` + sufficient replication	Wait for ISR to ack
Safe retries	`enable.idempotence=true`	Prevent duplicates due to retries

Producer configs (what they mean)

Setting	What it controls	High-yield notes
`acks`	Durability acknowledgement	`0` fire-and-forget; `1` leader only; `all` ISR
`retries`	Retry attempts	Works with backoff; avoid tight loops
`delivery.timeout.ms`	Total time for send	Upper bound across retries
`linger.ms`	Batch wait time	Higher → better throughput
`batch.size`	Batch capacity (bytes)	Bigger batches can improve throughput
`compression.type`	Compression	`snappy`/`lz4` often good defaults
`max.in.flight.requests.per.connection`	In-flight requests	With idempotence, Kafka enforces safe bounds

Java: minimal producer pattern (safe by default)

 1Properties props = new Properties();
 2props.put("bootstrap.servers", "broker:9092");
 3props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 4props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 5props.put("acks", "all");
 6props.put("enable.idempotence", "true");
 7
 8KafkaProducer<String, String> producer = new KafkaProducer<>(props);
 9producer.send(new ProducerRecord<>("orders", "customer-123", "{\"id\":42}"));
10producer.flush();
11producer.close();

Common traps

Retries without idempotence can create duplicates.
Keys control partition choice (and therefore ordering). No key → round-robin partitioner behavior (varies by client/config).

4) Consumer essentials (poll loop + offsets)

Poll loop invariants

Call poll() frequently enough to satisfy liveness requirements (heartbeats).
Don’t block processing so long that you exceed max.poll.interval.ms.

Consumer configs (what they mean)

Setting	What it controls	Exam cues
`group.id`	Consumer group identity	Enables load sharing + offsets per group
`enable.auto.commit`	Auto offset commits	Convenience, less control
`auto.offset.reset`	Start point if no committed offset	`earliest` vs `latest`
`max.poll.records`	Records per poll	Batch size for processing
`session.timeout.ms`	Liveness detection	Too low → false rebalances
`heartbeat.interval.ms`	Heartbeat frequency	Must be < session timeout
`max.poll.interval.ms`	Max time between polls	Long processing → rebalance risk

Java: “manual commit after processing” pattern

 1props.put("enable.auto.commit", "false");
 2KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
 3consumer.subscribe(List.of("orders"));
 4
 5while (true) {
 6  ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
 7  for (ConsumerRecord<String, String> r : records) {
 8    process(r);
 9  }
10  consumer.commitSync(); // commit only after processing succeeds
11}

Trap: committing before processing → at-most-once behavior.

5) Offset semantics (at-most-once vs at-least-once vs EOS)

Semantics	How you get it	Risk	Typical pattern
At-most-once	Commit before processing	Lost messages	Rare; only when duplicates are worse than loss
At-least-once	Process → commit after	Duplicates possible	Most common; make handlers idempotent
Exactly-once (Kafka)	Idempotent producer + transactions + `read_committed`	More complexity	Used for stream processing / pipelines

Practical rule: In distributed systems, you often implement “exactly-once” end-to-end by combining at-least-once delivery with idempotent processing.

6) Consumer groups and rebalancing (what to do safely)

Rebalance triggers (high level)

Consumer joins/leaves the group
Partition count changes
Session timeout / poll interval violations

Safe rebalance handling

On partition revoke: commit processed offsets and clean up local state.
On partition assign: initialize any needed state and resume processing.

    sequenceDiagram
	  participant B as "Broker (Group Coordinator)"
	  participant C1 as "Consumer 1"
	  participant C2 as "Consumer 2"
	
	  C1->>B: JoinGroup
	  C2->>B: JoinGroup
	  B-->>C1: Assign partitions
	  B-->>C2: Assign partitions
	  Note over C1,C2: Rebalance happens again when membership changes

7) Idempotence and transactions (exam-level understanding)

Idempotent producer

Idempotence is about preventing duplicates caused by retries. It helps when the network is unreliable or broker responses are delayed.

Transactions (EOS building block)

Transactions let you write to multiple partitions/topics atomically and (with consumer settings) avoid reading uncommitted data.

Concept	What it means	Remember
Transactional producer	Writes in a transaction	Requires `transactional.id`
`read_committed` consumer	Reads only committed data	Hides aborted tx records
`read_uncommitted`	Reads everything	May see aborted records

8) Serialization + Schema Registry (high-yield basics)

Serialization chooser

Format	Pros	Cons	Typical use
JSON	Human-readable, easy	No strict schema; larger payloads	Prototyping, logs
Avro	Compact + schema	Requires schema management	Strong default for evolving event contracts
Protobuf	Compact + strong types	Tooling complexity	Typed contracts across languages

Schema evolution mindset (compatibility)

Compatibility answers: “Can old consumers read new data?” and “Can new consumers read old data?”

Mode	Safe for…	Mental model
BACKWARD	New schema reads old data	New consumers ok
FORWARD	Old schema reads new data	Old consumers ok
FULL	Both directions	Safest, strictest
NONE	No guarantees	Fast iteration, high risk

9) Troubleshooting quick pickers

Consumer lag rising: too few partitions, slow processing, fetch/poll tuning, or downstream bottleneck.
Frequent rebalances: session.timeout.ms too low, processing blocks too long (max.poll.interval.ms), unstable membership.
Duplicates observed: retries without idempotence; commits after processing missing idempotency; producer retries due to timeouts.
Out-of-order events: multiple partitions for same key/entity; key missing/changed; multiple topics merged without ordering guarantees.

10) Mini-glossary

Broker (Kafka server) • Topic (stream name) • Partition (ordered shard) • Offset (position) • Consumer group (parallel readers) • ISR (in-sync replicas) • Rebalance (partition reassignment) • Idempotence (safe retries) • Transaction (atomic writes).

Syllabus

Practice

Browse Exams — Mock Exams & Practice Tests