New Kuzzle Cluster: What, Why, How

With Kuzzle 2.11, we shipped a brand new clustering architecture.
This took months of efforts to conceive it, code it, and test it. And we're prood of the results.

But perhaps you are asking yourself: what is a cluster? What is so important about it? What changes for me and for my users?

And since we are at it: why did we spend so much efforts to code a new cluster, especially since Kuzzle already had clustering capabilities?

I will answer to those questions in an easy to understand way: no degree in computer science needed!

And for those hungry for technical details, I will then answer to another question: how does our new cluster work?
This last chapter, I'm afraid, is TG-rated (Technical Guidance Suggested).

What is a Cluster, and What are the Changes

Let's start with the obvious: computers have finite resources.

When you run a program on a machine, it can only accomplish a finite number of tasks in a given time, depending on how well it is coded, and the machine's resources.

When that program is a backend, such as Kuzzle, this means that there is an unsurpassable number of how many users it can serve, how many requests it can process, before reaching the limits of the machine it runs on.
Enters "scalability": the mean of increasing the resources available to a program.
There are two ways of increasing available resources: vertical and horizontal scalability. The former means that you increase the available resources by installing the program on a more powerful machine. The latter is performed by running the program on as many machines as necessary.

Vertical scalability can easily be achieved, but is rarely a solution: even the most advanced and powerful supercomputers out there has finite resources. Plus, the more powerful a machine is, the more expensive it is to buy or rent.

Horizontal scalability, on the other hand, is cost-effective: you can run many cheap machines in parallel to absorb the workload, and you can even adjust the number of machines on-the-fly, to respond to activity peaks in a cost-efficient manner.
But for that to work, the program itself needs to be able to work in collaboration with other instances of itself. It must be conceived to be horizontally scalable.

And that is what Kuzzle Cluster is: Kuzzle's ability to horizontally scale to meet the demands.

Kuzzle features clustering capabilities since its early stages, in the form of a cluster plugin. That plugin allows all Kuzzle instances to seamlessly work in parallel, while keeping all of them synchronized.
That last point is very important: an application running on Kuzzle needs all its users to have the exact same experience, the same data available to them, without them knowing that, in fact, they are connected to different physical machines.

That is why clustering is so important, and why we spent a lot of efforts… in remaking it.
Most importantly: upgrading your Kuzzle to its newest versions, hence using the new cluster, does not have any impact for you, or for your users. This is all transparent.

But in that case… why did we bother creating a new architecture?

Why a new cluster?

When conceiving a product, you usually answer to problems with the information available to you at that time.
In its early stages of life, we wanted Kuzzle to be scalable.
We wanted to make it able to support hundreds… no, thousands of simultaneously connected users.

And we succeeded: our cluster plugin makes Kuzzle able to support A LOT of users, without a limit on how many Kuzzle instances you are running.
If your application is using a limited amount of Kuzzle's realtime features, I don't think there is a hard limit on how many users our old cluster can handle.

We were pretty happy with our cluster plugin.

But things change: Kuzzle is being used by more and more clients, with different use cases… and sometimes, with pretty heavy loads to handle.

We slowly started to encounter two limits:

when realtime is heavily used, we measured that Kuzzle starts to huff and puff when more than 30000 simultaneous users are using it. That's a pretty large number, but it is no longer sufficient for some of our clients needs;
our cluster plugin appears to be brittle against external events, for instance when the network slows down, or in case of a network shortage between instances. Sometimes, bits of information get lost between instances, meaning that they can be desynchronized, without Kuzzle instances even realizing it.

We first started to improve and enhance our cluster plugin. But, after a while, we realized that this was no longer sufficient: we had other, more ambitious challenges to tackle, and it was time for us to restart from scratch, and to design a new, more performant and more stable clustering architecture.

We now want Kuzzle to be able to handle millions of simultaneously connected users, with that number not being dependant on how Kuzzle is used.
And we want our cluster to be stable, and to be able to recover, even in the face of external events such as machine failures, or network failures.

And that is exactly what is our new Cluster Architecture: our response to today's challenges and, hopefully, to tomorrow's ones too.

How does our new Cluster work?

In this chapter, I will dive into technical details. But not too much: detailing the entire architecture would take a far longer blog post, and it's already too long as it is.

When everything goes well...

Our cluster is still masterless.
That kind of architecture is simple to use, and easy to scale on the fly: you only need to spawn or destroy instances of Kuzzle to fit your needs.
This makes things harder for us, but that's what Kuzzle is about after all.

Our cluster is no longer a plugin.
It is now a native feature, and it is always active.
Running only one Kuzzle instance means that you have, in fact, a "cluster" of 1 Kuzzle.
This has no impact on performances. This only means that getting a Kuzzle cluster is even easier than before: just spawn another instance, with the same parameters, and that increases the size of your cluster. No configuration needed.
Of course, our cluster can still be fine-tuned, using our trusty Kuzzle configuration file.

Our cluster uses 0mq for its inter-instances communications.
This protocol is awesome: it's fast, reliable, and it features enough tools to let you design any clustering architecture you want.

Our cluster handles synchronization differently.
The legacy cluster used Redis as a state repository. In other words: Redis was our single source of truth.
This caused Kuzzle to put a lot of backpressure on Redis. To the point that even Redis, which is really fast and powerful, was struggling to serve Kuzzle with all its demands. To the point of slowing down the entire backend in some scenarios. Synchronizing our realtime engine is THAT complex.
So… we removed the notion of a single source of truth. Instead, we went all the way to a fully distributed architecture, meaning that any Kuzzle instance can act as a source of truth at any giving time.
Behind that single sentence lies an entire architecture.

...and when it doesn't

Synchronization, for a cluster, is half the work.
The other half is detecting and handling external, unpredictable events preventing synchronization.

To make sure Kuzzle instances stay synchronized at all time, we designed a predictive algorithm: each Kuzzle instance knows what the next message identifier it will receive from any other node, at any given time. This means that, if an infrastructure problem happens making sync messages disappear, if they slow down, if they arrive garbled or whatnot… Kuzzle will know. And the culprit instance will be killed.

Yes, killed. We decided that that was the best way to deal with desynchronization issues. Because if the network, for instance, is already giving up, trying to resync will only put more pressure on it, accelerating its demise.
And the same thing may happen if a machine is so overloaded that it is unable to keep the pace.
So, when desync issues are detected, Kuzzle evicts faulty instances from the cluster, and kills them.

This strategy, coupled with a loadbalancer and the fact that Kuzzle instances spawn in a matter of seconds, ensures that the cluster as a whole stays stable.

One last thing about resiliency: Kuzzle's cluster is now able to detect and resolve network partioning issues.
Network partitions are what happen when some instances are not able to communicate with other ones: the cluster is split in multiple parts. And this can be disastrous, because each part will try to maintain its own state, independently from the others.

This is a well-known clustering issue (also known as Split-Brain).

Our new cluster knows how to detect split-brains.
Kuzzle instances each maintain and share its own knowledge about the entire cluster topology. And upon certain events, all instances reconcile their knowledge against the ones from other instances, independently, to determine if the cluster has been split. And if that happens, isolated instances are evicted, and killed, ensuring that the cluster stays homogeneous.

Last Words

In the last chapter, I tried to be as synthetic and to the point as possible. As you can imagine, there is so much more to be said and explained: how the fully distributed state works, how we detect and resolve network partitioning, how instances automatically discover and connect to existing ones, and so on.

But for now, I'll keep it at that.

I hope that this post, albeit lengthy, has brought you some insight about why we consider our new cluster architecture to be so important, and why we are so excited about it, even if this seems transparent and minor for our users.

But that last part is, also, something we are proud of: after all, we just brought a major improvement and a major change to Kuzzle, with as little impact as possible for our users.