CCDAK Cheatsheet — Kafka Producers/Consumers, Offsets, Semantics & Schema Basics

Comprehensive CCDAK quick reference: Kafka architecture mental models, producer/consumer configs, consumer groups and rebalancing, offset management, delivery semantics (idempotence/transactions), and schema evolution basics.

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate speed/accuracy.


1) Kafka mental models (the exam is testing these)

The unit-of-ordering rule

Kafka preserves order only within a single partition.
If you need ordering for a key/entity, ensure all its records route to the same partition (usually by key).

Record anatomy

FieldMeaningWhy it matters
TopicLogical stream namePartition count drives parallelism
PartitionSub-stream shardOrdering + parallelism boundary
OffsetPosition in a partition logConsumer progress marker
KeyUsed for partitioning (default)Controls ordering affinity
ValuePayloadSerialization choice matters
HeadersMetadataTracing/versioning/hints
TimestampCreate/append timeUseful for time-based processing
    flowchart LR
	  P["Producer"] --> T["Topic"]
	  T --> P0["Partition 0: offsets 0..n"]
	  T --> P1["Partition 1: offsets 0..n"]
	  C0["Consumer"] --> P0
	  C1["Consumer"] --> P1

2) Partitions, keys, and throughput (high-yield table)

You want…Do thisWhy
Ordering per customer/order/userUse a stable key (e.g., customer_id)Same key → same partition
Higher parallelismIncrease partitionsMore partitions → more consumers can work
Avoid hot partitionsUse a better key / custom partitionerSkewed keys create bottlenecks
Predictable consumer scalingPartitions ≥ max consumer countOne partition can be consumed by only one consumer in a group

Rule: A consumer group can have at most one consumer per partition (extra consumers idle).


3) Producer essentials (configs you must recognize)

Producer reliability and performance pickers

GoalKey settingsNotes
Lowest latencylow linger.ms, smaller batchesMore requests, higher overhead
Highest throughputlinger.ms + larger batch.sizeMore batching, higher latency
Strong durabilityacks=all + sufficient replicationWait for ISR to ack
Safe retriesenable.idempotence=truePrevent duplicates due to retries

Producer configs (what they mean)

SettingWhat it controlsHigh-yield notes
acksDurability acknowledgement0 fire-and-forget; 1 leader only; all ISR
retriesRetry attemptsWorks with backoff; avoid tight loops
delivery.timeout.msTotal time for sendUpper bound across retries
linger.msBatch wait timeHigher → better throughput
batch.sizeBatch capacity (bytes)Bigger batches can improve throughput
compression.typeCompressionsnappy/lz4 often good defaults
max.in.flight.requests.per.connectionIn-flight requestsWith idempotence, Kafka enforces safe bounds

Java: minimal producer pattern (safe by default)

 1Properties props = new Properties();
 2props.put("bootstrap.servers", "broker:9092");
 3props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 4props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 5props.put("acks", "all");
 6props.put("enable.idempotence", "true");
 7
 8KafkaProducer<String, String> producer = new KafkaProducer<>(props);
 9producer.send(new ProducerRecord<>("orders", "customer-123", "{\"id\":42}"));
10producer.flush();
11producer.close();

Common traps

  • Retries without idempotence can create duplicates.
  • Keys control partition choice (and therefore ordering). No key → round-robin partitioner behavior (varies by client/config).

4) Consumer essentials (poll loop + offsets)

Poll loop invariants

  • Call poll() frequently enough to satisfy liveness requirements (heartbeats).
  • Don’t block processing so long that you exceed max.poll.interval.ms.

Consumer configs (what they mean)

SettingWhat it controlsExam cues
group.idConsumer group identityEnables load sharing + offsets per group
enable.auto.commitAuto offset commitsConvenience, less control
auto.offset.resetStart point if no committed offsetearliest vs latest
max.poll.recordsRecords per pollBatch size for processing
session.timeout.msLiveness detectionToo low → false rebalances
heartbeat.interval.msHeartbeat frequencyMust be < session timeout
max.poll.interval.msMax time between pollsLong processing → rebalance risk

Java: “manual commit after processing” pattern

 1props.put("enable.auto.commit", "false");
 2KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
 3consumer.subscribe(List.of("orders"));
 4
 5while (true) {
 6  ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
 7  for (ConsumerRecord<String, String> r : records) {
 8    process(r);
 9  }
10  consumer.commitSync(); // commit only after processing succeeds
11}

Trap: committing before processing → at-most-once behavior.


5) Offset semantics (at-most-once vs at-least-once vs EOS)

SemanticsHow you get itRiskTypical pattern
At-most-onceCommit before processingLost messagesRare; only when duplicates are worse than loss
At-least-onceProcess → commit afterDuplicates possibleMost common; make handlers idempotent
Exactly-once (Kafka)Idempotent producer + transactions + read_committedMore complexityUsed for stream processing / pipelines

Practical rule: In distributed systems, you often implement “exactly-once” end-to-end by combining at-least-once delivery with idempotent processing.


6) Consumer groups and rebalancing (what to do safely)

Rebalance triggers (high level)

  • Consumer joins/leaves the group
  • Partition count changes
  • Session timeout / poll interval violations

Safe rebalance handling

  • On partition revoke: commit processed offsets and clean up local state.
  • On partition assign: initialize any needed state and resume processing.
    sequenceDiagram
	  participant B as "Broker (Group Coordinator)"
	  participant C1 as "Consumer 1"
	  participant C2 as "Consumer 2"
	
	  C1->>B: JoinGroup
	  C2->>B: JoinGroup
	  B-->>C1: Assign partitions
	  B-->>C2: Assign partitions
	  Note over C1,C2: Rebalance happens again when membership changes

7) Idempotence and transactions (exam-level understanding)

Idempotent producer

Idempotence is about preventing duplicates caused by retries. It helps when the network is unreliable or broker responses are delayed.

Transactions (EOS building block)

Transactions let you write to multiple partitions/topics atomically and (with consumer settings) avoid reading uncommitted data.

ConceptWhat it meansRemember
Transactional producerWrites in a transactionRequires transactional.id
read_committed consumerReads only committed dataHides aborted tx records
read_uncommittedReads everythingMay see aborted records

8) Serialization + Schema Registry (high-yield basics)

Serialization chooser

FormatProsConsTypical use
JSONHuman-readable, easyNo strict schema; larger payloadsPrototyping, logs
AvroCompact + schemaRequires schema managementStrong default for evolving event contracts
ProtobufCompact + strong typesTooling complexityTyped contracts across languages

Schema evolution mindset (compatibility)

Compatibility answers: “Can old consumers read new data?” and “Can new consumers read old data?”

ModeSafe for…Mental model
BACKWARDNew schema reads old dataNew consumers ok
FORWARDOld schema reads new dataOld consumers ok
FULLBoth directionsSafest, strictest
NONENo guaranteesFast iteration, high risk

9) Troubleshooting quick pickers

  • Consumer lag rising: too few partitions, slow processing, fetch/poll tuning, or downstream bottleneck.
  • Frequent rebalances: session.timeout.ms too low, processing blocks too long (max.poll.interval.ms), unstable membership.
  • Duplicates observed: retries without idempotence; commits after processing missing idempotency; producer retries due to timeouts.
  • Out-of-order events: multiple partitions for same key/entity; key missing/changed; multiple topics merged without ordering guarantees.

10) Mini-glossary

Broker (Kafka server) • Topic (stream name) • Partition (ordered shard) • Offset (position) • Consumer group (parallel readers) • ISR (in-sync replicas) • Rebalance (partition reassignment) • Idempotence (safe retries) • Transaction (atomic writes).