Scalable Real-Time Applications

Build Real-Time applications is hard, really hard, but it could improve the user experience by offering immediate content updates.

There are multiple ways to implement them, either using WebSocket, Server-Sent Events or Long Polling. In this article, I'm going to talk about how to use WebSockets to build a high demand application without killing our servers.

Note: Everything that comes next is based on my experience working with WebSockets (WS) in projects like Platzi Live.

Tech Stack

It's important to understand this kind of application could be done practically with any technology, not only Node.js or Go, but Ruby, PHP, Python, etc. had ways to use WS, at the end of the day it's only a protocol, like HTTP.

That says the async nature of Node.js and its simplicity make it a good option to create a WS server, combined with libraries like socket.io it's possible to have a working server in a very short time. It's probably it will be eventually necessary to migrate to another technology like Go or Rust to support more users per instance of our server, but this usually happens when there is an extremely high demand.

Anyway, it's important to use a technology that we know and feels comfortable programming, if you use Ruby you could use ActionCable to create it without issues and eventually see if it's worth migrating, maybe to Elixir due the similarity.

Another important part of our WS is a queue or Pub/Sub system which let us scale it horizontally without losing messages, this way the instances would receive a message from those services and send them to the connected users. We could use Redis as a simple Pub/Sub or a specialized system like RabbitMQ or Apache Kafka or even user databases which let us subscribe to changes like RethinkDB.

Avoid Mutations from the WebSocket

Once we have our WS server working it's not unusual to start sending everything using this channel, this way if we need to query some data we use WS, if we need to create, update or delete a resource we send the required information using WS. Even though this works at a small scale, when we start to have more and more connected users doing this kind of mutations we will start to saturate our WS channel requiring to sping up more instances for fewer people.

The best way to avoid this is to have an HTTP API and use it to query and mutate data, being a stateless protocol it's easier to scale to support more request compared to WS.

After a mutation happens, the HTTP API should send a notification to our queue system with the change, our WS then will process the queue and send the notification to all the subscribed users, said notification could be either the updated resource or just the ID so the client could query the data from the HTTP API.

Information Deltas

A common error is to send too much information using our WS channel, this causes the same problems as doing queries and mutations over WS, we will saturate our WS channel with extra information. My recommendation here is to send information deltas. What does that mean? Send only what changed and a way to identify the resource, instead of sending the whole resource.

If we have a comment system we may have an object similar to this one in our client state:

{
  "id": 123,
  "author": 456,
  "message": "Hello, world!",
  "likes": [6546, 123213, 4678234, 12, 567, 98] // list of users ID
}

Let's say a new user, the 678, likes the comment, we could send the whole comment with the new array of likes, or we could just send a little message with the delta.

{
  "action": "like",
  "comment": 123,
  "user": 678
}

This way, our client app will update the application state to add the like to the comment already in the state and update the UI. This will reduce the information sent over WS to only the essential.

It's important to consider this could have a problem, if the client doesn't have the comment 123 in its state then the delta could not be applied and the UI could stay stalled if we get it later.

Similarly, it's possible the user will lose information during a disconnection, this could be mitigated doing queries every so often or sending the ID of the last delta if the user is outdated the HTTP API could then send the newer deltas until the client is up to date.

Another option if the user receives an update for the comment 123 and is not already in the state it could fetch it over HTTP and save it after applying the delta. It all depends on the level of data consistency we want in our system

Data flow

Let's illustrate the data flow as previously stated.

Client query data to the HTTP API
HTTP API reply with the data to the client
Client subscribe to the WS API
The Client sends a mutation to the HTTP API to create a new resource
HTTP API updates the database to insert the new resource
HTTP API enqueue the notification of the new resource
WS API process the queue to get the notification
WS API send the notification to the subscribed clients
Client update its state after receiving the notification

This way our WS API will be as simple as possible to reduce the workload, letting us handle more user with lesser resources.

Handling Disconnections

Internet connections are not perfect and could fail, actually the fail all the time, it's common to lose packages and many protocols manage that. If we go the mobile world it's even worse, whether the user moves away from its internet router and lose connection while changing to its data plan or it was on a vehicle and lose signal.

Most likely the user will lose connections, and when this happens it will be disconnected from the WS API. This way it's important to handle disconnections from the client side detecting them and trying to reconnect. Commonly this will be done in incremental intervals, this way the first one is immediate and the second one takes a few seconds and so it grows and can end up waiting minutes before trying again.

Information Recovery

After disconnection, the user will lose information delts, whether they are new or updated resources, depending on the level of consistency we want we could handle it in different ways.

If consistency is not a priority (Eventual Consistency), let's say we have a massive chat where is not an issue to lose some message, we could as mentioned above query the missing data when we receive a delta to update a resource which is not in our state. If we want to be sure we don't lose deltas we could even fetch each resource after is updated, even if it's already on our state, this will let us ensure we always be as up-to-date as possible.

If it's required to have consistency (Strong Consistency) we could let make the client after recovering from a disconnection send a timestamp of the last delta to the HTTP API to indicate at what moment he stayed, then the API could send the missing delts to the client and let it process it and update its state and UI.

In the case we want to have strong consistency we will need to store every notification with its date in a database to be able to send it later to the user, we could do this using a document based database like MongoDB which will let us create a collection for our deltas and store them all together, without worrying about the schema of the data.

Horizontal Scaling

Until now we have talked about how to optimize the usage of a single instance, which although the ideal is not realistic at all if we plan to have many users. And while we can try to scale vertically adding more and more resources to the server hosting our WS API it will eventually be more expensive and it doesn't benefit so much, it's then when we will need to scale horizontally.

What does that mean? Instead of improving our server we will add more servers running our WS API, to make it works we need to distribute the workload between our servers, in an HTTP API this is relatively easy, in WS since we have a state to keep (the connected clients) we will need to use a load balancer configured with sticky sessions, this means after the first request (the handshake) every connection between client and WS will use the same server.

If the user is disconnected we could then reconnect it to any server. If it's the server which goes down, either because we did it manually or crashed due to an error, each client connected to that server will lose connection and will try to reconnect and we will need to distribute the load.

Which bring us to that our load balancer must know the number of clients connected to each server to ensure the distribution of new users without overburden any server with more users than it could handle, it's possible we will need to reject connection attempts from users until a new server is started or a user is disconnected, note this should be the last resource.

Once configured, our load balance could even automate the spin-up of new server if the current ones are reaching their limits of users, something common in rush hours or if our application has a popularity burst, doing this will let the new server receive all the new users until it reaches the other servers, when yet another server could be started automatically.

When the traffic goes down again it could start to shutdown server, even server with low usage and force users reconnect to the rest of the server. This way we could save some money shutting down extra servers and only paying for what we need.

Final Words

Last, my biggest recommendation will be to try to avoid the usage of WebSockets as much as possible, long-polling and SSE are good alternatives, easier to scale and cheaper which in most cases will work as good as WS.