RabbitMQ – A walk through
By: Date: May 15, 2021 Categories: Messaging Tags: , , ,


Lets understand little bit about RabbitMQ and get familiar with its Messaging Architecture. When you use RabbitMQ as your messaging middleware, this is what happens -> Producers send messages to an Exchange (often compared to a post office/mailbox) which is connected to one more Queues and based on the binding policy set -> queues receive messages. Delivering the message to a consumer from a queue is brokers job and broker here commonly is instance/node that is hosting the queue or the node where the queue was initially created. Once message is consumed and acknowledged then message is removed from the queue for ever. By default, nor state of a message is not maintained neither messages are persisted in RabbitMQ, and broker just helps with pushing messages to consumers. One important optimisation Rabbit does here is that instead of putting actual message in the queue it puts the message reference and when time comes to push the message to consumer, it will use same reference to compose the marshalled message back and push.

Queues can be given priorities and based on priority of the message is routed to them. A RabbitMQ queue is single threaded and is bounded by a single CPU core, so when it reaches memory threshold it starts to spill the messages to the mode where it is created. Parallelism in Rabbit is achieved through queues, adding nodes which can run queues will give high parallelism, along with this we need consumers too for each of the new node added.

Here is a bottleneck, RabbitMQ queues are fastest when they are empty, what this means is that as long as consumers are keeping with the producers, everything is good. If consumers become slow then the queue starts to fill up and in-memory cache starts to fill up too, this takes up more resources (RAM). At this time to conserve resources, Rabbit starts to flush the messages to the disk – this can take up significant processing and blocks the queue from further processing.

Message Overflow Behaviour – When the maximum length (can be defined while creation) of a queue is reached, message in the queue can either start to drop messages from head to accommodate new messages published or queue can be set to reject the newly published messages until backlog is cleared. This can be significant bottleneck when you need high speed and high throughput both.

Issue with slow consumers -> Messages that are prefetched and not acknowledged (due to slow consumers) need to stay in memory and this can take up more and more resources, as unacknowledged messages keep increasing. Lazy queues are something that can be used to to improve this situation as in lazy queues messages are first directly written to disk as they arrive. By design, a slow consumer would not impact processing of the other consumers from the group and messages from the queue can continue to be processed, however it does impact overall performance.

Ordering -> Rabbit does not guarantee order of messages and order is maintained only if your Queue had max one consumer. When group of consumers are connected to a Rabbit queue, then the order of messages is not maintained.

Database integration – Rabbit provides database integration plugins that can send transaction data when certain db triggers are executed from inside the database. This is both nice and bad, one good thing is that you don’t have to run your producers and consumers outside of database and the not-so-good thing is that if Rabbit server was unavailable for some reason this could have adverse impact on your database.

Monitoring – RabbitMQ comes with a management console which gives an insight into all the things that are happening in rabbit cluster. All the nodes, exchanges, queues etc that are part of your cluster can be administered from this UI. This can come with a cost in large clusters which will see next.

Cluster – Lets get into some complex stuff now which you will have to face in the real world. As long as Rabbit is used as a stand alone messaging broker, all is good and its a perfect world. But, when you need a more highly available central messaging hub with delivery guarantee, thats when we need to think about clustering. Cluster consists of 2 or more servers. Once you have defined and configured first node, all the new nodes that join the cluster automatically get the a copy of all the runtime state -> exchanges, bindings, queues, policies etc. Hence, every node can participate equally in publishing and consuming messages in the cluster.

Issues with Queue Locality in a Cluster – When a message is sent to an exchange that is on a node which doesn’t own the queue -> message is routed to the owner node, enqueued there and it is consumed. This puts a considerable load as message was published on a different node and now it has to travel across the cluster to reach the node where queue lives on. This could degrade performance the message processing. See the picture: Node is where queue was created and node 1 is where the message was published.

This can be improved just by adding a new consumer that actually can consumes from node 1 itself. Messages will still be persisted on node 2 but we can gain considerable performance but adding a parallel consumer.

HA Queues in a Cluster – For dealing with issues around single node storing the messages of a queue, which actually is a risky design for mission critical applications, we can take refuge of HA queues in our cluster. A queue can be configured to maintain state across cluster, so in an event of a primary node crashes, other nodes still have the state/data intact, hence consumer and producers both are unaffected – at least ones which don’t connect to the crashed node. This cross cluster state synchronisation comes in at a price as the state management is proportionate to the number of nodes in your cluster. State management and cross synchronisation across cluster demands low latency, so a cluster must always be created in a LAN and if you are hosting the cluster in cloud all the nodes should be in the same region. Newer versions of RabbitMQ have more optimised ways by introducing replicated queues which are implemented using the leader – follower design pattern, where you can choose to maintain how many replicas of a queues state you want and which nodes will participate in this process.

Though clustering can help you achieve a more fault tolerant and robust message delivery hub -> larger the cluster more complex it becomes. Each node that gets added to the cluster adds overhead to the state management. As mentioned in monitoring section management UI gathers stats and all the details of your cluster and the time taken for the management UI to update itself is proportionate to the slowest node in the cluster.

Goal of this article was to understand RabbitMQ at a high level, and learn some insights around how it processes messages. To get familiar with concepts we have discussed, here is a git link to a docker set up, which will spin up a three node cluster for you. Good luck!!!