Few technology decisions generate more heated debate in architecture reviews than the choice between Apache Kafka and RabbitMQ. Both are mature, battle-tested, and widely deployed. Both handle messages between services. And both communities will tell you their tool is obviously the right choice.
In reality, they solve different problems. The confusion arises because those problems overlap enough that either can technically work — but using the wrong one means fighting your infrastructure for years. I've seen both mistakes in production, and they're not subtle when they happen.
How Each System Thinks About Messages
RabbitMQ: Messages as Tasks
RabbitMQ was built around the AMQP protocol and the metaphor of a post office. A producer sends a message to an exchange. The exchange routes that message to one or more queues based on rules (direct, fanout, topic, headers). A consumer picks up the message, processes it, and acknowledges it. The message is then removed from the queue.
This is the key behavior: once processed and acknowledged, the message is gone. RabbitMQ is designed for transient work — tasks that have a clear destination and a clear completion state. Order processing, email dispatch, notification delivery, background jobs. If something goes wrong, messages can be dead-lettered and requeued, but the base model is consume-and-discard.
Kafka: Events as a Log
Kafka thinks about messages entirely differently. Instead of a queue, Kafka has a log — an append-only, ordered, persistent sequence of events. Producers write events to a topic (a named log). Consumers read from that log at their own pace, tracking their position (offset) independently.
The events stay in the log for a configurable retention period (days, weeks, or indefinitely). Multiple consumer groups can read the same events completely independently. A new service can be added and replay the entire history. An existing service can be restarted and continue from where it left off.
This is a fundamentally different model: the log is the source of truth, not just a transit lane.
Where Each Excels
Apache Kafka
- Event streaming & real-time pipelines
- High-throughput data ingestion (millions of events/sec)
- Event sourcing & CQRS architectures
- Audit logs & compliance requirements
- Fan-out to many independent consumers
- Replay capability for reprocessing
- Stream processing (Kafka Streams, Flink)
RabbitMQ
- Task queues & background job processing
- Complex routing logic (topic exchanges)
- Priority queues
- Request-reply patterns (RPC over messaging)
- Per-message TTL and expiry
- Small teams, simpler operational model
- Low-latency delivery (< 1ms typical)
The Numbers That Actually Matter
| Dimension | RabbitMQ | Kafka |
|---|---|---|
| Throughput | ~50K msg/sec per node | Millions/sec across cluster |
| Latency | <1ms typical | 5–15ms typical |
| Message retention | Until consumed | Configurable (days/weeks/forever) |
| Consumer model | Push (broker delivers) | Pull (consumer polls) |
| Ordering guarantee | Per-queue (single consumer) | Per-partition (strict) |
| Operational complexity | Low to medium | Medium to high |
| Horizontal scaling | Federation, shovel | Native partition-based |
Common Mistakes I See in Practice
Using Kafka as a task queue
Kafka has no native concept of "acknowledge and delete." If you put tasks into Kafka and want exactly-once processing with clear completion semantics, you have to build this logic yourself — tracking consumer offsets, handling idempotency, managing consumer group assignments when services scale. Teams underestimate this work consistently. For task queues, RabbitMQ with proper dead-lettering is half the code and twice the reliability.
Using RabbitMQ for fan-out at scale
Fanout exchanges in RabbitMQ work well for a handful of consumers. But each queue gets a full copy of every message. At high throughput with many consumers, you're now multiplying your storage and memory usage. Kafka's consumer group model — where all members of a group share a partition, but different groups each get a full copy — scales to dozens of consumers without this overhead.
Choosing Kafka for greenfield projects "to be safe"
This is the most common mistake I see from teams who've read the right blog posts but haven't operated production Kafka. Kafka clusters require ZooKeeper (or KRaft in newer versions), careful partition sizing, replication factor decisions, schema registry for Avro/Protobuf, monitoring for consumer lag, and operators who understand the operational model deeply. For a team of three building a new product, this is often a multi-month distraction from the actual product.
A Practical Decision Guide
Ask these questions about your specific use case:
- Do you need to replay events? If you need to reprocess historical events, audit past state, or add new consumers that need full history — Kafka.
- Do you have >5 independent consumers of the same event stream? Kafka's consumer group model handles this naturally. RabbitMQ fanout gets expensive.
- Is your throughput measured in tens of thousands per second or more? Kafka. Under that, RabbitMQ's simpler operations won't be a bottleneck.
- Do you need complex routing (topic patterns, header matching, priority)? RabbitMQ's exchange types handle this natively.
- Is this a task with a clear completion? ("Send this email", "Process this order") RabbitMQ's acknowledge-and-delete model fits perfectly.
- Is your team smaller than 5 engineers? Strongly consider RabbitMQ first. Kafka's operational burden is real.
When to Use Both
In larger systems, using both is common and sensible — they're complementary, not competing. A typical pattern: Kafka handles the high-volume event stream (user activity, telemetry, order events), while RabbitMQ handles the downstream task dispatch (sending emails, triggering enrichment jobs, scheduling reports). Each tool does what it's best at.
The integration point is usually a Kafka consumer that translates events into RabbitMQ tasks. Keep that boundary thin and well-tested.
Final Thought
Kafka and RabbitMQ are both excellent systems built by smart people to solve specific problems. The teams that struggle with them are the ones who adopted without clearly understanding what problem they're solving. Before the next architecture discussion, settle this question first: are you building a task pipeline or an event log? The answer should drive the rest of the conversation.