Introduction to kafka
By: Date: October 19, 2020 Categories: Messaging Tags: , , , , , ,

Messaging as a integration paradigm

Messaging has been one of the most important aspect of integration between software components from beginning. My first hand experience with messaging was with RabbitMQ, Weblogic JMS etc to name some. Lets look at a scenario where you’d need a messaging integration. Let us talk about Autonomous/Smart cars as I have been doing that for long time 🙂 You want your car to tell you when it is time for maintenance, remember it is a new gen car with all smart features thus these cars are capable of recording the commands that you want. So, you configure the car to report back when the oil life falls to 20% or when the brakes are 25% left etc. Vehicle knows that it has to inform the backend based on the configured thresholds for various data points. Moment the configured data point reaches threshold, vehicle calls the back end. Now the service that is responsible for sending the notification has to be informed so it can dispatch a message to the customers phone/email etc. The service responsible for receiving the vehicle events takes the event from vehicle and now posts a message to notification service and thus it sends that maintenance due notification. In this simplified use case the vehicle communication service informs notification service through a message event. 

For those who are still new to messaging, essentially there two models of messaging that exist: point-to-point (Queue) and pub-sub (Topic). One very important thing that separates a topic and a queue is that, a topic can have many subscribers for a message but in a queue message is delivered exactly to one consumer. 

What is Kafka

Apache Kafka is a high-throughput distributed messaging system. So, what exactly this mean? Apache kafka is a messaging system that is used to move large amounts of data from one place to another.  When we talk about distributing data through messaging systems, thats when we talk about kafka. Traditional messaging systems do not scale when compared to Kafka, and also none can support the message packet size that Kafka can support. Kafka is a high-throughput data transfer eco-system, and its ability to scale horizontally makes it reliable and fault-tolerant. You can scale out a traditional message system (MQ) by adding more consumers, but this you are doing just to balance the load but not to kick off different actions and each consumer instance is just executing the same business logic. Kafka splits a topic into multiple partitions and each partition maintains its own commit log (key-value log file where key is offset and value is the streamed data), and these logs are persisted even after the message is read by the consumer, thus a consumer can replay messages of a topic. In a MQ messages are removed once they are processed. We will start exploring the Kafka eco-system in the next post.