Better, Faster, Stronger: How New Releases Are Combatting Kafka’s Scalability Hurdles

Apache Kafka is a popular distributed streaming platform for building real-time data pipelines and streaming applications. It's known for its high throughput, scalability, and fault tolerance. It's often used for crucial tasks like tracking website activity, processing orders, and analyzing financial transactions. Kafka is achieving remarkable dominance across industries, with operations from retailers to hospitals leveraging the platform’s impressive capabilities. You can make it the backbone of your data platform, seamlessly connect it with a plethora of other services and technologies, and tune it to support different delivery guarantees like at-least-once or exactly-once semantics. You can host the platform on-premises or in the cloud using self-managed or fully managed distributions. In fact, you could go as far as to say the possibilities are endless.

As Kafka's adoption grows, it's applied to increasingly diverse use cases, requiring continuous evolution of the platform. One area of ongoing development is scalability, which is crucial for handling the massive data volumes seen in modern applications like IoT. Here, I’ll outline how new developments in the platform are addressing these evolving challenges.

A Quick Refresher on Partitions

While Apache Kafka is incredibly flexible and widely used, it does have certain opinions and prescribed ways of doing things using its proprietary APIs and protocols. One area where customization has been limited is in scaling Kafka consumers. However, it's important to note that Kafka's default scaling capabilities are sufficient for the vast majority of use cases (around 95%). To understand this better, let's recap how Kafka scales on the consumer side without delving too deeply into its architecture. The core concept behind Kafka's scalability is the partition. Each topic is divided into one or more partitions. Every message within a topic is assigned to a specific partition, typically determined by the message’s key. This partition acts as the unit of scalability in Kafka. However, this design creates an upper limit on processing scalability. Furthermore, it tightly links the topic’s partition structure with the redundancy and performance of message consumption.

In simpler terms, the number of processing units in a consumer group is restricted by the total number of partitions within the consumed topics. Therefore, consumption of a topic with ten partitions can be scaled to a maximum of ten instances in a consumer group.

Figure 1 Kafka's scalability with ordinary Consumer Group. Producers generate messages for the Kafka topic’s representative partition. Based on a partitioner, usually a key partitioner, messages are distributed among available partitions. Consumer instances in the Consumer Group are assigned to these partitions. Messages in a single partition are consumed by, at most, one consuming instance.

Apart from being a unit of scalability, partitions are the only logical containers that support message ordering guarantees. This internal Kafka architecture, based on partitions, assignors, and consumer groups, is flexible enough to support most load distributions. However, as you can see, this tight coupling of message storage, the ordering of messages, and the processing scale-out might be limiting for some specific use cases.

Enter Confluent’s Parallel Consumers

Exactly this concern led to the creation of Confluent’s Parallel Consumer library.

Confluent’s Parallel Consumer introduces an additional abstraction layer that wraps ordinary Kafka Consumer API. It then lets instances in the Consumer Group be scaled further by leveraging Vert.x, the so-called non-blocking parallel architecture model.

Figure 2 Kafka Consumer scalability with Confluent's Parallel Consumer. Every instance in the Consumer Group, of which there can be only one, gets scaled further into (green components) processing units, which consume messages in parallel.

Confluent’s parallel consumer breaks Kafka's ordinary scalability limit. It does not, however, tinker with Kafka protocol or API. All you need to leverage this Consumer is to wrap ordinary Kafka consumer API with Confluent’s library, tune it with the help of multiple properties, and turn the Consumer even into a massively parallel processing solution. The details of the implementation are beyond the scope of this article. Still, it is worthwhile learning about this smart piece of engineering and the way it provides a balance between message ordering guarantees, consumption scalability, and delivery guarantee semantics.

More Exciting Things to Come

In the soon-to-be-released Apache Kafka Version 4.0, yet another way to scale your consumers will arrive. In the KIP-932, we get what’s called a “durable shared partition” in the Stream Processing Area. This opens up some exciting new possibilities, but it’s perhaps not the paradigm shift it might seem.

In layman’s terms, this new functionality could be seen as “Queues for Kafka.” This new functionality might seem to suggest the addition of new entities alongside traditional Kafka topics, but that's not the case. Kafka brokers remain structurally the same, with added complexity shifted to the clients. KIP-932 extends consumer grouping functionality, introducing Consumer Shared Groups and the existing Consumer Groups.

As in the case of Confluent Parallel Consumer, the engineering behind the upgrade is certainly advanced. In this case, however, changes in Kafka protocol and API were necessary. Although there are still only topics on brokers, this functionality requires an upgrade on both the central (brokers) and client sides.

Figure 3 With the new Consumer Shared Group protocol, the number of instances in the consuming group is completely decoupled from the number of partitions in a topic. All consumer instances process in parallel a common stream of messages as if from a single partition.

Extending Core Concepts for Enhanced Scalability

Despite being opinionated, Kafka is remarkably extensible, even with core architectural concepts like scalability. While its original approach to scaling might have been limiting in certain situations, like IoT processing platforms, solutions like Confluent's Parallel Consumers have already provided ways to overcome these limitations. Additionally, the upcoming Apache Kafka 4.0.0 introduces a new approach for addressing these use cases. In all, Kafka just keeps getting better. Its evolution is a clear demonstration of Kafka’s commitment to addressing the evolving needs of modern data streaming applications. By continuously refining its scalability mechanisms, the platform ensures its ability to handle the ever-growing volume and velocity of data in today’s data-driven world. With the upcoming advancements in Apache Kafka 4.0.0, the platform is well-equipped to tackle future scalability challenges and maintain its position as a leading choice for real-time data streaming solutions.

Pawel Wasowicz

Located in Bern, Switzerland, Pawel is our Lead Data Engineering within Digital Foundation. At Mimacom, he helps our customers get the most out of their data by leveraging latest trends, proven technologies and year of experience in the field.